Introduction

A dam is an artificial structure of construction preserving water on a large scale and operates a water reservoir (Maimunah et al. 2019). It is essential as some dam provides hydroelectric power production and river navigation for better water management. Furthermore, it also provides flood mitigation by allowing the dam to store extra precipitation for future use (drought season) or release it under controlled conditions to replenish the irrigation water supply downstream of the dam. Farther-reaching profit of dams reservoir: it holds clean water which provides freshwater for domestic and municipal purposes. The most significant influence on water levels is hydrology: precipitation, evaporation, and groundwater. Due to its behaviour as a large water reserve underneath the Earth’s surface (Famiglietti 2014), groundwater is a significant contributor to the dam reservoir’s water level changes. More than one-third of the water consumed comes from underground (Famiglietti 2014). However, climate change is expected to accelerate the water cycle as global temperature increases the evaporation rate worldwide (Trenberth 2011). Impacts of higher evaporation and precipitation rates can be seen in many Malaysian areas, and the impacts are estimated to increase in intensity over this century as climate warms (Trenberth 2011). As a result, climate change-related unstable factors have an impact on groundwater systems. Climate change can impact the amount of soil infiltration, deeper percolation and hence groundwater recharge. At the same time, rising temperatures increase evaporative demand over land (Berg et al. 2016), limiting the amount of water available to refill groundwater (Wu et al. 2020). Other than that, operating the reservoir usually affects surface flow qualities and its relation between surface water where groundwater is also affected by it, like changes in feeding-discharge relations trigger changes in water panel level, particularly changes in the amount of water used by humankind (ÇELİK, R. 2018). The direct effect of dams on the groundwater is expected to be between upstream and downstream (ÇELİK, R. 2018). As a result, reliable prediction models and understanding the impact of groundwater factors on water levels have become critical for executing water supply management systems and maintaining water use quality. More accurate prediction of the water level will help prevent overexploiting groundwater and help control water resources. On the other hand, water level forecasting is a highly dynamic and non-linear process dependent on several complex factors (Chang and Chang 2006). Therefore, developing models to predict water levels accurately to optimize water resources management in the reservoir is essential.

Researchers have used numerous mathematical methods in recent decades to estimate the water level. Due to the multiplicity of input and structural parameters, multiple calibrations, and long-term efficiency, these methodologies have been developed to match the complexity of hydrological factors (Karami et al. 2018). Over the last few years, a more recent study has used soft computing models to predict water, especially water quality (Kaya et al. 2018) and lake level (Gong et al. 2016). The use of Machine Learning (ML) algorithms in this problem have been massive since ML shifts from a knowledge-driven approach to a data-driven approach for learning large amounts of data and conclude from the results. The artificial neural networks (ANN) and support vector machines (SVM) become popular since 90an and are a primary tool used in ML. ANN also has been in numerous study of hydrological factors such as groundwater forecasting (Karami et al. 2018; Kaya et al. 2018; Gong et al. 2016; Daliakopoulos et al. 2005), rainfall forecasting (Hung et al. 2009; Canchala et al. 2020; Lee et al. 2018) and streamflow forecasting (Adhikary et al. 2018; Reza et al. 2018; Wang et al. 2006). One of the approaches using ML algorithms for reservoir water level prediction was presented by Siegelmann and Sontag (1995), a study using ANN and neuro-fuzzy system on a short-term water level prediction showing both algorithms performed well and accurate compared to the linear statistical models. In (Chang and Chang 2006), the study develops two adaptive network-based fuzzy inference system (ANFIS) models with the emphasis of input variable: typhoon and rainfall data were used and one with a human decision and one without the human decision. Different input uses clearly show superior performance with a human option as input. It shows that ANFIS successfully gave high precision and reliability in the following three hours for the reservoir water level. The time series (TS) regression model was also considered by Khai et al. (2019) alongside SVM for estimating daily water level using historical inflows and water level as input at the Klang Gate dam. The TS model provides a better result than the SVM. In conclusion, the results will be different based on input–output variables and ML algorithms used. Most ML model can be applied in hydrological events.

This study expands on previous studies (Sapitang et al. 2020) which aim to predict the reservoir water level but with different input variables. The previous study consists of several scenarios and time horizon with four learning algorithms known as Boosted Decision Tree Regression (BDTR), Decision Forest Regression (DFR), Bayesian Linear Regression (BLR) and Neural Network Regression (NNR). The results show BLR and BDTR are the most outperformed model of the other two. The reservoir water level can be predicted by seeing various predictive variables (input data) (Phanindra et al. 2020) because it can be affected by other hydrological factors. Hence, the purposes of this study are to predict the monthly reservoir water level by finding the relationship between the input of monthly historical groundwater and water level using different machine learning algorithms like Linear Regression (LR), Support Vector Machines (SVM), Gaussian Processes Regression (GPR), and Neural Network (NN) to find the best modal mimicking the actual values of water level and the best input. The autocorrelation function (ACF) was also presented to decide on the best combination of lag between groundwater and water level. The model performances were compared using various statistical performance indices. The motivation of this study is necessary to control the release and store of water level for better water management at the dam reservoir and the groundwater recharge. The prediction of water level provides novel approaches by comparing the model of machine learning algorithms. Hence, this study can serve modellers and decision-makers in addressing site-specific and real-time water level prediction and management issues.

Methodology

Input Selection using Autocorrelation Function (ACF)

One of the main tasks in Machine Learning (ML) is to choose input parameters that will affect output parameters. It would necessitate focus and a thorough interpretation of the underlying physical mechanism dependent on causal factors and statistical analysis of potential inputs and outputs (Ahmed et al. 2019). For this research, the autocorrelation function (ACF) will be used to determine the input selection. The degree of correlation of the same variables over two successive time intervals is referred to as ACF. It calculates the lagged version of a variable’s value in a time series compared to the original version (Berne et al. 1966). The ACF standard estimator equation below is one of the most widely studied in literature and commonly used in computer programmes (Zieba and Ramza 2011).

$${r}_{k}=\frac{\sum _{i=1}^{n-k}({x}_{i}-\stackrel{-}{x})({x}_{i+k}-\stackrel{-}{x})}{\sum _{i=1}^{n}{({x}_{i}-x)}^{2}}$$
(1)

where, \({r}_{k}\) is the estimator, \({x}_{i}\) of analyzed data, \(\stackrel{-}{x}\) mean of data and \(n\) is the sample size. The main applications of ACF are to assess statistical correlations between observations in a single data series and test the models’ validity. A significant advantage of ACF measures the level of linear dependency between outcomes of a time series separated by a lag \(k\) (Parmar and Kinjal Mistree 2017).

Figure 1a and 1b show two different lags for water and groundwater levels. Figure 1a and 1b show actual data corresponding to 60 lags, comprising 60 months from January 2013 to December 2017. Sixty lags correspond to the same number of months, where lags 1 to 12 correspond to 2013, lags 13 to 24 correspond to 2014, lag 25 to 36 correspond to 2015, lag 37 to 48 correspond to 2016 lag 49 to 60 correspond to 2017. Lag 1, 13, 25, 37, 49 corresponds to January, whilst lags 2, 14, 26, 38, 50 correspond to February, lag 11, 23, 35, 47, 59 correspond to November and lag 12, 24, 36, 48, 60 correspond to December. For water level (Fig. 1a), it can be observed that January, February, November, and December have dependencies each year. However, several years also consist of dependencies on March and October values, while other months do not have such dependencies. Hence, several years witness the rise in water levels beginning in October and ending in March. Groundwater level (Fig. 1b) shows dependencies in January, February, November, and December.

Fig. 1
figure 1

24 Lags of (a) Water Level and (b) Groundwater Level

There are four different scenarios for predicting monthly water level, listed in Eqs. 2 to Eq. 5 following the ACF results.

$${GW}_{t}+{\left(GW+WL\right)}_{t-1}= {WL}_{t}$$
(2)
$${GW}_{t}+{\left(GW+WL\right)}_{t-1}+{\left(GW+WL\right)}_{t-2}= {WL}_{t}$$
(3)
$${GW}_{t}+{\left(GW+WL\right)}_{t-1}+{\left(GW+WL\right)}_{t-2}+{\left(GW+WL\right)}_{t-11}= {WL}_{t}$$
(4)
$${GW}_{t}+{\left(GW+WL\right)}_{t-1}+{\left(GW+WL\right)}_{t-2}+{\left(GW+WL\right)}_{t-11}+{\left(GW+WL\right)}_{t-12}= {WL}_{t}$$
(5)

where \({WL}_{t}\) is the historical water level at monthly t as output for all the scenarios. For output, Eq. 2 of \({GW}_{t}+{\left(GW+WL\right)}_{t-1}\) represents lag 1 (starting January) as scenario 1 (SC1), Eq. 3 of \({GW}_{t}+{\left(GW+WL\right)}_{t-1}+{\left(GW+WL\right)}_{t-2}\) represents lag 1 and 2 (starting January, February) as scenario 2 (SC2), Eq. 4 of \({GW}_{t}+{\left(GW+WL\right)}_{t-1}+{\left(GW+WL\right)}_{t-2}+{\left(GW+WL\right)}_{t-11}\) represents lag 1, 2, and 11 (starting January, February, November) as scenario 3 (SC3) and Eq. 5 of \({GW}_{t}+{\left(GW+WL\right)}_{t-1}+{\left(GW+WL\right)}_{t-2}+{\left(GW+WL\right)}_{t-11}+{\left(GW+WL\right)}_{t-12}\) represents lag 1, 2, 11, and 12 (starting January, February, November, December) as scenario 4 (SC4). All the data using the historical groundwater level and water level. The input and output’s primary statistical parameters in this study are presented in Table 1, taking an average total of 100 monthly historical data for 10 points of observation boreholes (OH) and water level data from 2012 until 2019. This study’s data were secondary, a total of 100 monthly historical data for groundwater level and water level data between the year 2012 until 2019 in Terengganu, Malaysia.

Table 1 Descriptive analysis for observation boreholes and water level

Machine Learning Algorithms

Various machine learning algorithms can be found, such as mention in the introduction section. This study uses various algorithms such as Linear Regression (LR), Support Vector Machines (SVM), Gaussian Processes Regression (GPR), and Neural Network (NN). LR is a graphical model that demonstrates probabilistic relationships among various factors derived from the Bayes theorem (Caral et al. 2019). The basic principle of linear regression is depicted in Eq. 6, which shows the relationship between a dependent or response, variable \(y\) and one or more independent or predictor, variables \({x}_{1},\dots ,{x}_{n}\) (Browne 1975). \({\beta }_{0}\) is the \(y\)-intercept and \({\beta }_{1}\) is the slope (or regression coefficient) and \(\varepsilon\) is the error. There are three types of regression were used in this study: Linear Regression (LR), Robust Linear Regression (RLR) and Stepwise Linear Regression (SLR). Outliers are less susceptible to RLR than standard LR. This implies that if the distribution of errors is asymmetric or prone to outliers, model assumptions are invalidated, and parameter estimates, confidence intervals and other derived statistics will be unreliable (Hampel et al. 2011). Hence, RLR, which assigns a weight to each data point using an iteratively reweighted least square method, is less sensitive to substantial changes in small parts of the data than a standard LR (Hampel et al. 2011; Ronchetti et al. 1997). SLR is a technique for systematically adding and removing components from a multilinear model based on their statistical significance in a regression. The technique begins with an initial model then the explanatory power of increasingly more prominent and smaller models is compared (Zhou et al. 2012).

$$y={\beta }_{0}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+{\beta }_{3}{x}_{1}{x}_{2}+{\beta }_{4}{x}_{1}^{2}+{\beta }_{5}{x}_{2}^{2}+\varepsilon$$
(6)

An SVM is a mathematical entity that maximizes a particular mathematical function (kernel) concerning a given data collection, as shown in Fig. 2. The support vectors (dots) strive to find the hyperplane that minimizes the separation between two classes, and it also tries to find the hyperplane that is best for patterns that can be separated linearly and mapped into new space by transforming the original data (Phanindra et al. 2020). The four basic SVM concepts are the separating hyperplane, the maximum margin hyperplane, the soft margin, and the kernel function (Meyer et al. 2003; Noble 2006). The biggest drawback to the SVM algorithms is that it only handles binary classifications. The only way to solve this is to train multiple, one versus all classifiers, but SVM proved to solve problems quite fast even if given thousands of datasets (Noble 2006). However, the disadvantages of SVM is that it requires plenty of training data to estimate the underlying function and their accuracy need to be improved (Gao et al. 2018). The three types of SVM regression used in this study are Fine Gaussian SVM (FG-SVM), Medium Gaussian SVM (MG-SVM) and Course Gaussian SVM (CG-SVM). The difference between these three methods is that the variance of the data classifier that explains FG-SVM makes finely detailed distinctions. MG-SVM makes fewer distinctions than an FG, and CG-SVM makes coarse distinctions (Ali et al. 2019).

Fig. 2
figure 2

The architecture of SVM models

A GPR is expressive, interpretable, avoids over-fitting, and has impressive predictive performance in many thorough empirical comparisons (Rasmussen 1997). GPR is a process regression developed out of neural networks research as the number of hidden units approached infinity, and this technique became the cornerstone of subsequent Gaussian process models (Neal 2012; Wilson et al. 2011). It is a nonparametric kernel-based probabilistic model with a limited set of random variables and multivariate distribution (Gao et al. 2018). The three types of GPR regression used in this study: Squared Exponential GPR (SE-GPR), Matern 5/2 GPR (M5/2-GPR) and Rational Quadratic GPR (RQ-GPR). SE-GPR is the function space representation of a radial basis function regression model with an infinite number of basis functions. The benefits of this algorithm include the fact that it is unlikely to cause substantial errors when dealing with massive data sets. The M5/2-GPR kernel uses the stationary kernel’s spectral densities to compute Fourier transforms of the RBF kernel, whereas the RQ-GPR kernel can represent data at various scales (Zhang et al. 2018). The covariance (kernel) function is a fundamental component in GPR, and data similarity among is vital; hence, Eq. 79 describes the covariance function of each algorithm used in this study (Gao et al. 2018).

$$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}exp\left[-\frac{1}{2}\frac{{\left({x}_{i}-{x}_{j}\right)}^{T}\left({x}_{i}-{x}_{j}\right)}{{\sigma }_{l}^{2}}\right]$$
(7)
$$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\left(1+\frac{\sqrt{5r}}{{\sigma }_{l}}+\frac{{5r}^{2}}{{3\sigma }_{l}^{2}}\right)exp\left[-\frac{\sqrt{5r}}{{\sigma }_{1}}\right]$$
(8)
$$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}{\left(1+\frac{{r}^{2}}{{2\alpha \sigma }_{l}^{2}}\right)}^{-\alpha }$$
(9)

A NN regression is a form of artificial intelligence imitating the human brain and nervous system’s function. It is widely used because of its capability to train and understand the output from a given input to simulate large-scale complex non-linear problems (Chau et al. 2005; Rumelhart et al. 1994). It can be defined as a chain of linear operations that have been scattered with different non-linear activation functions (Tan et al. 2017). These defaults are related to the network; input layer – hidden layer – output layer (Damian 2019) as explained in Fig. 3. Even though an NN is a flexible and efficient mapping tool, incorrectly allocated weights and biases can result in local convergence (Chau et al. 2005). There are three types of NN regression used in this study: Narrow NN (N-NN), Medium NN (M-NN), and Wide NN (W-NN). The ability of a NN model to simply predict interactions and nonlinearities may also be a disadvantage because it may result in overfitting a training data set and poor performance in external test data sets (Tu 1996).

Fig. 3
figure 3

The architecture of NN models

Model Performance Indicators

Model performance indicator was used to imply the successful scoring (datasets) by a trained model to replicate the output parameter’s actual values.

  1. i)

    Mean Absolute Error, MAE (Hyndman and Koehler 2006), signifies the degree of absolute error between the actual and predicted data as in Eq. 10.

    $$MAE=\frac{1}{n}\sum _{i=1}^{N}{|MSL}_{p}-{MSL}_{o}|$$
    (10)
  2. ii)

    Root Mean Square Error, RMSE (Hyndman and Koehler 2006), measure the distance between the actual and predicted values for each model, so it measures how spread these residuals are in Eq. 11.

    $$RMSE=\sqrt{\frac{\sum _{i=1}^{n}{{(MSL}_{p}\mathrm{ }-{MSL}_{o})}^{2}}{N}}$$
    (11)
  3. iii)

    Coefficient of determination, R (Nagelkerke 1991) demonstrates the predicting model’s performance where zero means the model is random while 1 means a perfect fit.

    $${R}^{}=\frac{\sum _{i=1\mathrm{ }}^{n}\left({MSL}_{o\mathrm{ }}-{\stackrel{-}{MSL}}_{o}\right)({MSL}_{p}-{\stackrel{-}{MSL}}_{p})}{\sqrt{\sum _{i=1}^{n}{({MSL}_{o}-{\stackrel{-}{MSL}}_{o})}^{2}}\sum _{i=1}^{n}{{(MSL}_{p}-{\stackrel{-}{MSL}}_{p})}^{2}}$$
    (12)

In a nutshell, each model performs better when the value of R is close to one, except for RMSE and MAE, where the model performs better when the value is close to zero (Cheng et al. 2015).

Uncertainty Analysis

Uncertainty analysis (UA) aims to calculate the variation of output caused by input variability. It is performed to identify the spectrum of potential outcomes based on the input uncertainty and to investigate the effect of the model’s lack of knowledge or errors. Consideration is given to the percentage of measured data bracketed by 95% Prediction Uncertainty (95PPU) determined by Abbaspour et al. (2007). This factor is calculated at the 2.5% XL and 97.5% XU levels of an output variable where it refused 5% of the very bad simulations.

$$\mathrm{B}\mathrm{r}\mathrm{a}\mathrm{c}\mathrm{k}\mathrm{e}\mathrm{t}\mathrm{e}\mathrm{d}\mathrm{ }\mathrm{b}\mathrm{y}\mathrm{ }95\mathrm{P}\mathrm{P}\mathrm{U}\mathrm{ }=\mathrm{ }\frac{1}{k}count\left(K|{X}_{L}\le K\le {X}_{U}\right)\times \mathrm{ }100$$
(13)

where k represents the total of actual data at test phases, based on Eq. 13, the value of “Bracketed by 95PPU” is greater (or 100%) when all measured data at testing stages are inserted between the XL and XU. If the accessed data is of outstanding consistency, 80% or more of it should be within the 95PPU. If data are lacking in a few areas, 50% of data in 95PPU will suffice (Noori et al. 2010). D-factors will be used to approximate the average width of the interim uncertainty band, with a value less than one indicating the best value (Noori et al. 2010) as presented in Eq. 14.

$$d-factor=\frac{{\stackrel{-}{d}}_{x}}{{\sigma }_{x}}$$
(14)

\({\sigma }_{x}\) represents the standard deviation of actual data x and \({\stackrel{-}{d}}_{x}\) is the average distance between the upper and lower bands (Noori et al. 2015) as in Eq. 15.

$${\stackrel{-}{d}}_{x}=\frac{1}{k}\sum _{i=1}^{k}({X}_{U}-{X}_{L})$$
(15)

Figure 4 depicts the study model development flow diagram.

Fig. 4
figure 4

Methodology flow diagram using ML algorithms to predict water level

Results and discussion

This study targeted to estimate the water level at time t, imitating the nearest values to actual by utilizing various ML algorithms for all scenarios. Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of Determination (R) was the index used to validate the performance of each model. The detailed findings are described in the subsequent sections.

Models Performance for Each Scenario

Tables 2, 3, 4 and 5 summarises each of the machine learning algorithms’ metrics for each scenario. Each evaluation performance demonstrates different results in which these models can learn and find the correlation between the input and output variable. Column 1 presents the scenario, column 2 presents the model used, column 3 to 4 presents the model performance, column 5 presents the training time for each model, and column 6 summarises the three-best model. As summarised in Tables 2, 3, 4 and 5, the performance varies from each of the models. Each result indicated a different result for each machine learning algorithm.

Table 2 Summary of Model Performance in Scenario 1
Table 3 Summary of Model Performance in Scenario 2
Table 4 Summary of Model Performance in Scenario 3
Table 5 Summary of Model Performance in Scenario 4

For SC1, Gaussian Processes Regression Models can perform well in predicting the water level with an R of average 0.70 and RMSE and MAE close to 0. The outcome of the results shows that Matern 5/2 outperformed the other models with MAE (1.0304), RMSE (1.499), and R (0.71) with a training time of 0.98308 s. When the GPR model scenarios are compared, it is clear that the model performs better in terms of R than the other model, indicating that the closest the outcome is to 1, the better the model’s accuracy and the closer the MAE and RMSE are to 0, the more accurate and reliable the model.

For SC2 in Table 3, R shows that the performance increases. The GPR models are still the best models to predict the water level in SC2 since an R-value close to 1 indicates a perfect fit. The Matern 5/2 GPR model gives the best performance in predicting water level compared to other ML algorithms. When referring to R, Matern 5/2 GPR has the highest value for training (0.87603, 1.3394, 0.78) with training time 0.67311 s followed by Squared Exponential GPR (0.89877, 1.3436, 0.7) with a training time of 0.73902 and Rational Quadratic GPR (0.89908, 1.3459, 0.77) with a training time of 0.80871.

For SC3 in Table 4, three models that give the best value mimicking the actual values are Linear Regression, Gaussian Processes Regression, and Support Vector Machines models. These three models give R’s acceptable value, but the Stepwise Linear Regression is the most outperformed than the two models with the highest result of 0.92367, 1.2701, and 0.79 with training time 12.166. Likewise, the two models only result from Exponential GPR (1.0291, 1.5301, 0.69) and Rational Quadratic GPR (1.049, 1.602, 0.66), much lower from Linear Regression models.

Lastly, for SC4 in Table 5, the Gaussian Processes Regression Models again have outperformed the other model with the highest value is Matern 5/2 GPR (1,0106, 1.4078, 0.73) followed by Rational Quadratic GPR (1.0166, 1.4343, 0.72) and Squared Exponential GPR (1.0169, 1.435, 0.72). It can be concluded that Matern 5/2 of Gaussian Processes Regression Models is the most reliable model to predict water level as the model gave a high performance in each scenario (except SC3) with a relatively fastest training time. The NN model performed the worst compared to the other three models because it has the highest MAE values, RMSE, and the lowest R-value in almost all four scenarios of input combinations. It mainly shows the negative result of R, indicating the models are overfitting the training data set with poor performance in external data sets.

A response plot plotted the predicted response against the observation in vertical lines. Figure 5a to 5d shows the response plot between the actual and predicted value. The diagram with red lines demonstrated the error between the actual (blue dot) and predicted (orange dot). The error shows the comparable pattern for each scenario (SC1, SC2, SC3, and SC4) since they have the value of R between 0.71 and 0.79, which rationalizes the response plot result.

Fig. 5
figure 5

Response Plot of Best Model for Each Scenarios (a) Scenario 1; (b) Scenario 2; (c) Scenario 3 and (d) Scenario 4

A predicted versus actual plot can be used to evaluate the model performance. The model’s predicted response is plotted against the actual, true response. An ideal regression model would have a predicted response identical to the observed values in the plot, so all of the points would be on the diagonal line. However, as this would not occur in real life, the target is for the points to be as close to the diagonal line as possible and scattered roughly symmetrically around the line. Figures 6a to 6d represent the scatter plot of each scenario’s best models. SC2 and SC3 show that the plot/distribution is close to the diagonal line compared to SC1 and SC4. Figure 6b, SC2 of Matern 5/2 GPR model has a great plot in predicting the water level in the reservoir. Similarly, the same performance has been noticed for SC3 when Stepwise Linear was used. These findings suggest that using scenarios of SC2 and SC3, both two models can predict changes in water levels.

Fig. 6
figure 6

Actual vs Predicted Value of Best Model for Each Scenarios (a) Scenario 1; (b) Scenario 2; (c) Scenario 3 and (d) Scenario 4

While Fig. 7a to 7d illustrates a residuals plot of the four best-fitted models used from Tables 2,3, 4 and 5. A fitted model’s residuals are defined as the discrepancies between the response data and the fit to the response data at each predicted value (residual = data – fit). If the residuals tend to behave randomly, it indicates that the model adequately matches the results. However, if the residuals exhibit a systemic trend, it is evident that the model does not adequately match the result. Figure 7 depicts the residuals that behave randomly, indicating that the model describes the data well.

Fig. 7
figure 7

Residual Plot of Best Model for Each Scenarios (a) Scenario 1; (b) Scenario 2; (c) Scenario 3 and (d) Scenario 4

Uncertainty Analysis of Best Models

The best model for SC1, SC2, and SC4 is Matern 5/2 GPR, while SC3 is Stepwise Linear was assessed using 95PPU and d-factor. Table 6 represents the UA results for each best model for each scenario in predicting water level.

Table 6 Uncertainty analysis for SC1, SC2, SC3, and SC4

Table 6 specifies 96.69%, 96.42%, and 95.85% of data for the best GPR in SC1, SC2, and SC4 and 95.89% of data for the best LR SC3. Hence, the d-factor has values of 0.026410, 0.028430, 0.036327 and 0.035352 for SC1, SC2, SC3, and SC4, respectively. According to the uncertainty analysis results, the suggested model has high precision in predicting the water level. All 95PPU for the four scenarios were greater than 80%, and the d-factor value was highly satisfactory, falling below 1.

Taylor Diagram

$${E}^{,2}={\sigma }_{f}^{2}+{\sigma }_{r}^{2}-{2\sigma }_{f}{\sigma }_{r}R$$
(16)

A Taylor diagram is a method of plotting three statistics on a 2-D graph that illustrates how closely a pattern matches actual data in terms of their correlation, their root-mean-square (RMS) difference and the variance ratio (Taylor 2001). Equation 16 show the theoretical basis for the diagram, where all four of the statistic (\(R \left(correlation coefficient\right), {E}^{\text{'}}(RMS), {\sigma }_{f}, and {\sigma }_{r}(standard deviations))\) is the key of constructing the taylor (Taylor 2001). These statistics and direct visual comparisons make it simple to distinguish the overall RMS difference in patterns due to variance and how much is due to poor pattern correlation. Figure 8 shows the comparison of the simulated; namely, GPR presents Gaussian Processes Regression, LR presents Linear Regression, SVM presents Support Vector Machines, NN presents Neural Network and Actual presents the observed monthly water level. From the diagram, it can be concluded that the most correspondence between the modelled and observed behaviour is GPR except for SC2, which shows LR models were in relatively good agreement with observations.

Fig. 8
figure 8

Taylor Diagram of Best Model for Each Scenarios (a) Scenario 1; (b) Scenario 2; (c) Scenario 3 and (d) Scenario 4

Conclusions

This study focuses on predicting water levels using groundwater level data with different scenarios of input utilizing ML algorithms such as Linear Regression (LR), Support Vector Machines (SVM), Gaussian Processes Regression (GPR), and Neural Network (NN) to identify each optimal prediction method. The autocorrelation function (ACF) was used to determine the input scenarios based on the two monthly historical variables (groundwater and water level) for 100 monthly data sets from 2012 to 2019. From the ACF, four scenarios created (using the lag behaviour) then each scenario were computed through the ML algorithms. The results showed that for SC1, SC2, and SC4, all model performance in GPR gave good results where the highest R equal to 0.71 in SC1, 0.78 in SC2, and 0.73 in SC4 using the Matern 5/2 GPR model. For SC3, the Stepwise LR model gave a better result with an R of 0.79. It can be concluded that Matern 5/2 of Gaussian Processes Regression Models is the most reliable model to predict water level as the model gave a high performance in each scenario (except SC3) with a relatively fastest training time. The NN model had the worst performance to the other three models since it has the highest MAE values, RMSE, and lowest value of R in almost all four scenarios of input combinations. 95PPU and Taylor diagram was also used to assess the uncertainty and better understand the best model by comparing the observed and predictive model in three statistics; correlation coefficient, RMSE and standard deviation. The uncertainty analysis results show that the suggested model has high precision in predicting the water level. All 95PPU for the various time horizons were greater than 80%, and the d-factor value was highly satisfactory, falling below 1. These results obtained in this study serves as an excellent benchmark for future water level prediction using the GPR and LR with four scenarios created. Hence, this study can assist modellers and decision-makers in addressing real-time water level prediction. Because there are countless methods proposed to predict water level, further work can be performed by comparing the performance and robustness among parametric and nonparametric methods under different scenarios and study the uncertainty of the identified modes.