1 Introduction

Sustainable development relies on energy resources (Ehteram et al. 2017). Energy is an important economic factor in industrial societies. The consumption of non-renewable energy leads to an increase in greenhouse gases that can change the global climate. Clean energy is energy that does not pollute the environment (Ehteram et al. 2018a). The hydropower industry is one of the most important sources of clean energy (Hou et al. 2021). Hydropower is an important renewable energy source due to its low cost. Climate, social, and economic factors can affect the hydropower system. The hydropower industry plays a key role in the development of electricity (Ehteram et al. 2018b). Hydropower plants play a key role in meeting the energy demand. An accurate prediction of power production is necessary for decision-makers in order to meet demand. Predicting hydropower production is necessary for managing energy resources. These predictions are needed for energy management. Furthermore, there are a number of uncertainties that may affect power production. Our research contributes to the development of energy prediction models. In addition, our study examines the impact of uncertainties on power generation.

The optimal operation of hydropower plants is a key topic in energy engineering. Accurate hydropower generation predictions can help the management of hydropower plants.

Hydropower production predictions can prevent energy shortages during droughts. Researchers have developed different models to predict hydropower production (Dehghani et al. 2019).

Machine learning models are widely used to predict hydropower production and energy demand. These models can find relationships between complex input and output data. The advantages of these models are fast calculation, easy implantation, and high precision. Guo et al. (2018) used support vector machine models (SVMs) to predict power production. The SVM model successfully predicted power generation. Dehghani et al. (2019) developed a neuro-fuzzy adaptive system (ANFIS) for hydropower production prediction. ANFIS parameters were adjusted using the grey wolf optimization (GWO) algorithm. It was reported that the ANFIS-GWO model performed better than the ANFIS model. Gao et al. (2019) proposed an ANN model for predicting one day-ahead power. They predicted power using a long short-term memory (LSTM) neural network. An LSTM is a recurrent neural network (RNN) that overcomes the vanishing gradient problem of conventional RNNs. An LSTM network uses memory cells to store. The meteorological data were used as the inputs to the models. The LSTM model predicted power successfully.

Rahman et al. (2021) developed LSTM, convolutional neural network (CNNE), and recurrent neural network models (RNN) to predict power energy. They stated that the different ANN models successfully predicted power energy. For electricity prediction, Zolfaghari and Golabi (2021) combined adaptive wavelet transforms (WT) with the LSTM model. They found that the WT-LSTM outperformed the LSTM model. They reported that the R2 of wavelet- ANN and wavelet- LSTM models was 0.951 and 0.979, respectively. The root mean square error of the wavelet-ANN and wavelet LSTM was 8.65 and 6.73, respectively.

Barzola-Monteses et al. (2022) used artificial neural networks (ANNs) to predict hydropower production. The model parameters were set using a grid search algorithm. The developed model was a reliable tool for energy management. They considered two scenarios for their study. One step (one-month) and multi-step (12 months) were used to predict hydropower production. The average execution of the models was 1.48 and 1.37 min for the first and second scenarios. The RMSE of the MLP and LSTM models was 195.1 and 177.68 for the first scenario. The RMSE of the MLP and LSTM models was 154.1 and 173.2 for the second scenario.

Hanoon et al. (2022) developed ANN, SVM, and auto regressive integrated moving average (ARIMA) models to predict power production of a reservoir in China. They reported that the ANN and SVM model successfully predicted power production. The correlation coefficient of the MLP model varied from 0.8761 to 0.8779. The correlation coefficient of the radial basis function neural network model varied from 0.8480 to 0.8710.

Studies have shown that ANN models can accurately predict power production, but they have some limitations. These models are unable to automatically extract the important features from time series data (Panahi et al. 2021). Setting model parameters requires robust algorithms. The preprocessing methods are required to determine the most appropriate input scenario (Panahi et al. 2021). Also, these models cannot automatically predict interval times. Using interval time prediction, we can quantify the uncertainty of a model. This paper aims to develop ANN models for predicting daily power production. This paper will use new techniques to fill research gaps. In recent years, many researchers have developed deep learning models to analyze complex data (Sharifzadeh et al. 2019). A deep learning model is becoming an increasingly popular alternative to traditional machine learning models for predicting hydropower generation. Deep learning models can capture complex relationships between the input parameters, and are more accurate than traditional models.

Studies have shown that hybrid ANN models outperform ANN models. Hybridizing ANNs and deep learning models can improve their performance (Sharifzadeh et al. 2019).

A Convolutional Neural Network model (CNNE) is a robust deep learning model because it can extract important information from time series data (Sinitsin et al. 2022). The CNNE model can be integrated with the ANN model to extract complex nonlinear patterns and important features (Sinitsin et al. 2022). An ANN model can be trained more efficiently if it receives relevant features as inputs. In this study, the CNNE model was coupled with an ANN to predict the daily power production of a hydropower plant. The CNNE-ANN model can provide better results because it combines the advantages of convolutional neural networks and multilayer perceptron layers. The CNNE-ANN model can be easily scaled to handle different sizes and complexities of input data. A convolutional neural network can efficiently extract features from input data, reducing the workload of the ANN layers. The architecture of the CNNE-ANN model allows it to identify features of the input data accurately. As the CNNE-ANN model uses pooling layers and regularization techniques, it is less prone to overfitting than ANN. For accurate hydro-power production predictions, the CNNE-ANN model can handle noisy data and outliers. As the CNNE-ANN model can be scaled up or down, it can be used for different applications and environments.

Since a CNNE-ANN model cannot capture uncertainty values, this model can be coupled with a Bayesian approach. This study also introduces an approach for determining the most appropriate input scenarios. The main contributions of the current study are as follows:

  • The CNNE-ANN is introduced for predicting daily power production.

  • We evaluate the accuracy of the new model against several ANN models, including ANN, CNNE, LSTM, and Bidirectional LSTM (BI-LSTM) models.

  • The CNNE-ANN model is integrated with a Bayesian approach to quantify the uncertainty values.

  • A new method is introduced to determine the appropriate input scenarios.

2 Materials and Methods

2.1 An Optimization Algorithm for Adjusting Model Parameters and Feature Selection

Since selection of the best model parameters is time-consuming and difficult, this study applied binary and continuous versions of an optimization algorithm to determine optimum values of model parameters, train different models, and select the best input combinations. For solving complex problems, the Salp swarm optimization algorithm (SSOA) is widely used. The SSOA is broadly applied in different fields such as feature selections (Faris et al. 2018), global optimization (Zhang et al. 2022), discounted knapsack problem (Dang and Truong 2022), training unreal network models (Panda and Majhi 2020), and training support vector machine models (Samantaray et al. 2022). The high speed and accuracy are the advantages of SSOA.

The salp chains are divided in two groups. The first group (leader) guides salps. The second group (remaining salps) follow leaders. In the search space, this swarm is looking for food sources. The leader location is updated based on the following equation

$$Sal{p}_{j}^{l}=\left[\begin{array}{c}Foo{d}_{j}+{\rho }_{1}\left(\left(\left(up{p}_{j}-lo{w}_{j}\right){\rho }_{2}+lo{w}_{j}\right)\right)\leftarrow {\rho }_{3}\ge 0.50\\ Foo{d}_{j}-{\rho }_{1}\left(\left(\left(up{p}_{j}-lo{w}_{j}\right){\rho }_{2}+lo{w}_{j}\right)\right)\leftarrow {\rho }_{3}<0.50\end{array}\right]$$
(1)

where \(Sal{p}_{j}^{l}\): the leader location, \(Foo{d}_{j}\): food source, \({\rho }_{1}\), \({\rho }_{2}\), and \({\rho }_{3}\): random parameters, \(up{p}_{j}\): upper value of decision variable, and \(lo{w}_{j}\): lower value of decision variable. Equation (2) is used to update the location of followers:

$$followe{r}_{j}^{i}=\frac{1}{2}\left(followe{r}_{j}^{i}+followe{r}_{j}^{i-1}\right)$$
(2)

where \(followe{r}_{j}^{i}\):the ith follower at the jth dimension. A population of solutions is created before the optimization process begins. In the next step, an objective function is used to evaluate the generated solution. A food source is considered as the best solution. Equations (1) and (2) are used to update the location of followers and salps. When the stop creation is met, the process ends. The SSOA is a continuous optimization algorithm. A transfer function can convert the continuous SSA to a binary SSOA (BSSOA).

$$T\left(sal{p}_{j}^{i}\left(t\right)\right)=\frac{1}{1+{\mathit{ex}p}^{-\mathit{sal}{p}_{j}^{i}\left(t\right)}}$$
(3)

where \(T\left(sal{p}_{j}^{i}\left(t\right)\right)\): transfer function and \(sal{p}_{j}^{i}\left(t\right)\): ith salp in jth dimension. The final location of salps is computed based on the following equation:

$$sal{p}_{i}^{j}=\left[\begin{array}{l}1\leftarrow if\left(\lambda \right)<T\left(sal{p}_{j}^{i}\right)\\ 0\leftarrow otherwise\end{array}\right]$$
(4)

2.2 Structure of Convolutional Neural Network Model

A CNNE is a fusion of feature extraction and feature classification (Zou and Ergan 2023). CNNEs consist of convolutional layers and pooling layers followed by fully connected layers. A convolutional layer contains convolutional kernels (Zhao et al. 2022). Convolution layers consist of a finite number of filters (kernels) that are combined with input data to extract relevant features. Convolution kernels represent a kind of feature named a feature map (Tang et al. 2021). The pooling layer has two important tasks. The pooling layer can accelerate the network operation. The next convolution layer requires fewer calculations if feature maps are pooled. The pooling layer also enhances the performance of the CNNE (Liu et al. 2021). The CNNE will yield better results by selecting the most important features. The fully connected layer is similar to the traditional ANN models.

The output feature map is computed based on the following equation (Tang et al. 2021):

$$ou{t}_{j}^{l}=f\left({\sum }_{i\in {M}_{j}}i{n}_{i}^{l-1}*{k}_{l}^{j}+{b}_{l}^{j}\right)$$
(5)

where \(ou{t}_{j}^{l}\): the new feature map, \(i{n}_{i}^{l-1}\): input, \({b}_{l}^{j}\): bias, f: activation function, \(*\): convolution operation, j: number of feature maps, and l: number of layers. A pooling layer resizes feature maps based on the pooling operation:

$${\alpha }_{j}=f\left({w}_{j}^{l}down\left({M}_{j}^{l-1}\right)+{b}_{j}^{l}\right)$$
(6)

where \(down\): the pooling operation, \({M}_{j}^{l-1}\): the feature map of layer l-1, \({w}_{j}^{l}\): weight connection, and \({\alpha }_{j}\): the new feature map after decreasing size. Figure 1 shows the structure of CNNE model.

Fig. 1
figure 1

The structure of CNN model

2.3 Structure of BI-LSTLM

LSTM neural network is an artificial deep learning method based on recurrent neural network (RNN), which was presented by Hochreiter and Schmidhuber (1997). The LSTM overcomes the vanishing gradient problem of recurrent neural networks. The LSTM model has been widely used for time series data prediction and has also achieved excellent results (Azzouni and Pujolle 2017).

LSTM networks consist of memory blocks, memory cells, and gate units. A cell stat retains information. An LSTM model uses these gates to store and process the relevant information (Zha et al. 2022). The gates will learn what information can be retained and forgotten (Imrana et al. 2021). Input gates determine which information should be added to a cell state. The output gate provides outputs. A forget gate determines which information must be retained from a previous state (Jaseena and Kovoor 2021; Li et al. 2022; Jamei et al. 2022).

In this article, a BI-LSTM model based on a conventional LSTM neural network is developed to predict hydroelectric power based on multivariable inputs. The BI-LSTM considers past and future states to improve prediction accuracy. While ordinary LSTMs consider only historical observations, BI-LSTM considers future and previous observations. Reverse LSTMs use future information and forward LSTMs use past information. The BI-LSTM achieves better accuracy than LSTM because it utilizes both past and future information (He et al. 2022). Equations (7)–(12) mathematically describes the relationship between weighted inputs and outputs:

$${O}_{t}=\sigma \left({\varphi }_{o}{z}_{t-1}+{V}_{o}{x}_{t}+{\kappa }_{o}\right)$$
(7)
$${I}_{t}=\sigma \left({\varphi }_{i}{z}_{t-1}+{V}_{i}{x}_{t}+{\kappa }_{b}\right)$$
(8)
$${F}_{t}=\sigma \left({\varphi }_{f}{z}_{t-1}+{V}_{f}{x}_{t}+{\kappa }_{f}\right)$$
(9)
$${\widehat{S}}_{t}=\mathit{tan}h\left(\varphi {z}_{t-1}+V{x}_{t}+b\right)$$
(10)
$${S}_{t}={F}_{t}\odot {S}_{t-1}+{i}_{t}\odot {\widehat{S}}_{t}$$
(11)
$${z}_{t}={O}_{t}\odot \mathit{tan}h\left({S}_{t}\right)$$
(12)

where Ot, It, and Ft: output, input, and forget gates, xt: input, \({z}_{t}\): the output state at time t, \({\widehat{S}}_{t}\): memory cell, \({\widehat{S}}_{t}\): new value of memory cell, \({\varphi }_{o}\), \({\varphi }_{i}\), and \({\varphi }_{f}\): weight matrices of hidden layer, and \({V}_{o}\), \({V}_{i}\), \({V}_{f}\): weights corresponding to input data, \({z}_{t}\): the output state a time t, and o, I, f: subscribes corresponding to output, input, and forget gate.

A BILSTM network consists of forward and backward LSTMs that can process data in both directions. In the forward LSTM layer, forward calculations are performed from time 1 to time t. The backward LSTM layer performs the backward calculation from time t to time 1. We obtain and save the output of the forward hidden states and backward hidden states. The BILSTM output is calculated by connecting the two hidden states.

Figure 2 shows a schematic diagram of a simple Bidirectional LSTM that has expanded over time (Zhou et al. 2016).

Fig. 2
figure 2

Schematic diagram of a simple BI-LSTM

2.4 Structure of the ANN Models

One of the most commonly used models in hydrological modeling, feed-forward multi-layer perceptron (MUPEs) ANN, is used in this study. The MUPE models consist of an input layer, a number of hidden layers, and an output layer. Each layer of the MUPE model has weight connections that connect it to the next layer (Panahi et al. 2021). The number of inputs determines the number of input neurons (Ehteram et al. 2023). Hidden layers receive weighted inputs from the input layer. Hidden and output layers can use linear or nonlinear activation functions. An activation function creates a relationship between weighted inputs and outputs. Figure 3 shows the structure of MUPE model. Unknown parameters of MUPE models include bias and weight. Parameter values are obtained through an optimization algorithm.

Fig. 3
figure 3

Structure of the MLP model (Heidari et al. 2020)

2.5 Evaluation of Models’ Uncertainty Using Gaussian Progress Regression

Since ANN models cannot capture uncertainty values, this study proposes a Gaussian progress model (GPRE) for quantifying uncertainty. The GPRE model is widely used in the different fields such as groundwater quality monitoring (Shadrin et al. 2021), short term solar power forecasting (Wang et al. 2021), traffic load prediction (Wang et al. 2020a, b), wind speed forecasting (Huang et al. 2018), and short term-prediction of wind speed (Wang et al. 2020a, b). GPRE is a nonparametric model based on the Bayesian framework. The GPRE can prediction interval times. Thus, GPRE model can quantify the uncertainty values. The mathematical model of the GPRE model is defined based on the following equation:

$$Q=f\left({b}_{i}\right)+\varepsilon$$
(13)

where, \(Q\): observation, \(f\left({\eta }_{i}\right)\): an underlying function, \({b}_{i}\): input, and \(\varepsilon\): noise.

$$\varepsilon \sim N\left(0,{\sigma }^{2}\right)$$
(14)

where \({\sigma }^{2}\): variance. The joint prior distribution of the observed data is computed based on the following equation (Sun et al. 2022):

$$Q\sim N\left(0,K\left(B,B\right)+{\sigma }_{n}^{2}{I}_{n}\right)$$
(15)
$$\left[\begin{array}{c}Q\\ q\end{array}\right]\sim N\left[0,\left[\begin{array}{cc}K\left(B,B\right)+{\sigma }^{2}& K\left(B,{B}_{*}\right)\\ K\left({B}_{*},B\right)& K\left({B}_{*},{B}_{*}\right)\end{array}\right]\right]$$
(16)

where q: estimate value, \(K\left(B,B\right)\): The covariance matrix of all input data, \(K\left(B,{B}_{*}\right)\): the covariance matrix of test data point and all input data, \({B}_{*}\):test points, \(K\left({B}_{*},{B}_{*}\right)\): the self-covariance of test points. The posterior distribution of estimated value is:

$$q|Q\sim N\left(\overset\leftharpoonup q,{\sigma }_{q}^{2}\right)$$
(17)

where \(\overset\leftharpoonup q\): mean, and \({\sigma }_{q}^{2}\): variance. The mean and variance are computed based on the following equations:

$$\bar{q}=K\left({B}_{*},B\right){\left[K\left(B,B\right)+{\sigma }^{2}I\right]}^{-1}q$$
(18)
$${\sigma }^{2}=K\left({B}_{*},{B}_{*}\right)-K\left({B}_{*},B\right){\left[K\left(B,B\right)+{\sigma }^{2}I\right]}^{-1}K\left(B,{B}_{*}\right)$$
(19)

2.6 Hybrid Structure of CNNE-MUPE-GPRE

In this study, CNNE-ANN-GPRE is used to predict hydropower production. The model is created based on the following levels:

  1. 1.

    The 80% and 20% of data are used for training and testing levels because they provided the lowest error function values.

  2. 2.

    A binary vector is created based on the names of the input variables. A binary vector is defined as an initial population of the BSSOA.

  3. 3.

    The CNNE parameters are considered as the initial population of the CSSOA.

  4. 4.

    I = I + 1 (I: Iteration number)

  5. 5.

    At the training level, the CNNE model is run using the training data.

  6. 6.

    The quality of the solutions is evaluated using an error function (Nash Sutcliff efficiency (NSE)).

  7. 7.

    The operators of CSSOA are applied to change the values of model parameters.

  8. 8.

    If I > maximum number of iterations and NSE > 0.90, the model goes to the next level; otherwise, go to step 6.

  9. 9.

    The testing data are used to run the CNNE model.

  10. 10.

    The outputs of CNNE models are flattened.

  11. 11.

    The flattened outputs of CNNE models are inserted into the MUPE model

  12. 12.

    The model parameters of MUPE are defined as the initial population of the CSSOA.

  13. 13.

    An error function (NSE) is used to evaluate the quality of the solutions.

  14. 14.

    The operators of CSSOA are used to update the values of model parameters.

  15. 15.

    If the stop creation is met, the MUPE model goes to the step 16; otherwise, it goes to the step 13.

  16. 16.

    The testing data are used to run MUPE model.

  17. 17.

    The GPRE receives the outputs of the MUPE model.

  18. 18.

    The GPRE model is run at the training and testing levels.

This study benchmarks CNNE-MUPE-GPRE against the LSTM, BI-LSTM, CNNE, GPR, CNNE-ANN, MLP-GPRE, and CNNE-GPRE models to compare the performance of models. Figure 4a illustrates the flowchart of proposed algorithms for predicting hydropower using deep neural networks. Figure 4b shows the mechanism of the modelling process.

Fig. 4
figure 4

a Proposed algorithm for predicting hydropower using deep neural network, b mechanism of modelling process

2.7 Evaluation Criteria

In order to evaluate the performance of the developed models, several evaluation criteria are used in the current study.

Root mean square error

$$RMSE=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{\left({P}_{esi}-{P}_{obi}\right)}^{2}}$$
(20)
  1. 1.

    Index of agreement (IA)

    $$IA=1-\frac{{\sum }_{i1}^{N}{\left({P}_{obi}-{P}_{esi}\right)}^{2}}{{\sum }_{i=1}^{N}{\left(\left|{P}_{obi}-\bar{P}\right|+\left|{P}_{esi}-\bar{P}\right|\right)}^{2}}$$
    (21)
  2. 2.

    Nash–Sutcliffe efficiency

    $$NSE=1-\frac{{\sum }_{i=1}^{N}{\left({P}_{obi}-{P}_{esi}\right)}^{2}}{{\sum }_{i=1}^{N}{\left({P}_{obi}-{\bar{P}}_{obi}\right)}^{2}}$$
    (22)
  3. 3.

    Kling–Gupta Efficiency (KGE)

    $$KGE = 1 - \sqrt {\left( {1 - r} \right) + \left( {\frac{{\left( {P_{obi} } \right)}}{{\left( {P_{esi} } \right)}} - 1} \right)^{2} + \left( {\frac{{CV_{p} }}{{CV_{em} }}} \right)^{2} }$$
    (23)
  4. 4.

    Prediction interval coverage probability

    $$PICP=\frac{1}{N}{\sum }_{i=1}^{N}{\mu }_{i}$$
    (24)
    $${\mu }_{i}=\left[\begin{array}{c}1,\,if\left({P}_{obi}\right)\in \left[{L}_{i},{U}_{i}\right]\\ 0,\,f\left({P}_{obi}\right)\notin \left[{L}_{i},{U}_{i}\right]\end{array}\right]$$
    (25)
  5. 5.

    Prediction Interval Normalized Average Width (PINW)

    $$PINW=\frac{1}{NR}{\sum }_{i=1}^{N}\left({U}_{i}-{L}_{i}\right)$$
    (26)

    where \({P}_{obi}\): observed data, \({P}_{esi}\): predicted power, R: range of data points, \(P\): average observed data, r: correlation coefficient, \(CV_{p}\) and \(CV_{em}\): a coefficient of variation for predicted data, and observed data. High and low values of PICP and PINW show the best model. PICP and PINW are used to quantify uncertainty values of models.

2.8 Case Study

The Karun basin (Fig. 5a) is one of the largest basins in Iran, located in the southwest of Iran. In this basin, average annual precipitation ranges from 153 mm in the southern plains to more than 2000 mm in mountainous regions. The variations of daily temperature over the basin are from a minimum of -30.6 °C at Koohrang station to a maximum of 52.2 °C at Ahvaz station (Fallah et al. 2020). The Karun-III dam is one of the most important dams located on the Karun River. The annual mean flow of the dam is 300 m3/s. The storage volume of the Karun-III dam is 2,970,000,000 m3. The dam is designed to generate hydropower. Thus, the dam reservoir is used for power generation. In this study, the models are used to predict the 1-, 2-, and 3-day ahead power. The elevation of water in reservoir and inflow discharge are used to predict power generation. The lag times were from (t-1), …, (t-30). Table 1 shows details of input and output data. Figure 5b shows time series data. The data were collected from 2005 to 2017.

Fig. 5
figure 5

a Location of the Karun III dam as case study, b Time series data

Table 1 Related information to the input data

3 Results and Discussion

3.1 Feature Selection

In the modeling process, feature selection plays an important role. It is time consuming and difficult to manually determine the most appropriate input combination among 260–1 input combinations. Correlation values and principal component analysis can identify significant inputs and lag times, but they cannot automatically determine the optimal inputs. A binary version of SSOA was used to determine the most suitable scenario for power generation prediction. Inputs were initialized as the initial population of salps. The salp location displays the names of input variables. Binary vectors contain 1 and 0 values that represent unselected and selected features, respectively. At each iteration, the SSOA updates input combinations using its operators. Table 2 lists the most appropriate scenario. The input scenarios are used to predict power generation. when a modeler encounters many data points and needs to estimate targets, the binary SSOA will be a useful tool.

Table 2 Optimal input scenarios for 1-day, 2-days, and 3-days ahead

3.2 Determination of Random Parameters and Model Parameters

For determining model parameters and features, CSSOA is used. A CSSOA includes random parameters such as population size (PSI) and maximum number of iterations (MNITER). The root mean square error (RMSE) is used to determine the optimal values of random parameters. Table 3 shows the optimal values of PSI and MNITER. PSI values varied from 100 to 600. The PSI = 200 gave the lowest RMSE values. MNITER values varied from 50 to 300. The MNITER = 100 gave the lowest value of RMSE. Tables 4 shows the optimal values of CNN, MLP, and LSTM parameters.

Table 3 Choice of random parameters of SSOA
Table 4 Optimal values of CNNE, MUPE, and LSTM parameters

3.3 Investigation of the Accuracy of Models

Figure 6 shows IA, NSE, and RMSE values of models for 1-day ahead power prediction. The training IA of the CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models was 0.97, 0.94, 0.90, 0.85, 0.84, 0.81, 0.80, 0.75, and 0.70, respectively. The IA values of the CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models were 0.95, 0.92, 0.86, 0.84, 0.82, 0.80, 0.76, 0.72, and 0.66 at the testing level. The NSE values of the CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models were 0.94, 0.92, 0.87, 0.80, 0.75, 0.70, 0.68, 0.66, and 0.62 at the training level. The NSE values of the CNNE-MUPE-GPRE, CNNE-MUPE-, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models were 0.93, 0.89, 0.85, 0.78, 0.74, 0.69, 0.67, 0.65, and 0.60 at the testing level. The RMSE values of CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE were obtained equal to 545, 612, 824, 855, 912, 914, 1400, 1700, and 1900 KW, respectively at training phase, and equal to 595, 723, 836, 897, 916, 1200, 1500, 1800, and 2000 KW, respectively at testing phase.

Fig. 6
figure 6

Values of AI, NSE, and RMSE criteria for 1-day ahead power production

Figure 7 shows the accuracy of models for 2-day ahead power production prediction. The RMSE values of the CNNE-MUPE-GPRE, CNN-MUPE, CNN-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models were 578, 615, 832, 861, 914, 934, 1436, 1712, and 1954 KW, respectively at training level. The CNNE-MUPE-GPRE decreased RMSE values of the CNNE-MUPE, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models by 18%, 28%, 50%, 36%, 52%, 61%, 66%, and 70%, respectively at the testing level.

Fig. 7
figure 7

Values of AI, NSE, and RMSE criteria for 2-day ahead power production

At training phase, the NSE values of CNNE-MUPE-GPRE, CNNE-MUPE-, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE were 0.93, 0.90, 0.85, 0.78, 0.72, 0.69, 0.65, 0.60, and 0.58, respectively. The training IA values of CNN-MLP-GPR, CNN-MLP-, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE were 0.96, 0.91, 0.86, 0.81, 0.76, 0.73, 0.68, 0.65, and 0.63, respectively. The testing IA values of models were 0.94, 0.90, 0.82, 0.78, 0.74, 0.70, 0.67, 0.64, and 0.62, respectively.

Figure 8 shows the accuracy of models for 3-day ahead power production prediction. The CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models had RMSE values of 582, 621, 839, 871, 924, 945, 1456, 1724, and 1831 KW at training level and 612, 731, 855, 923, 935, 1267, 1672, 1815, and 2045 KW at testing level, respectively. The NSE values of those models were 0.92, 0.89, 0.84, 0.77, 0.71, 0.65, 0.62, 0.58, and 0.54 at the training level, and 0.90, 0.86, 0.81, 0.72, 0.70, 0.64, 0.60, 0.55, and 0.63 at the testing level, respectively. The IA values for CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE- GPRE, BI-LSTM, LSTM, CNN, MUPE, and GPRE models were 0.92, 0.90, 0.85, 0.78, 0.75, 0.71, 0.67, 0.62, and 0.58 at the training level and 0.91, 0.88, 0.81, 0.77, 0.72, 0.69, 0.65, 0.60, and 0.57 at the testing level.

Fig. 8
figure 8

Values of AI, NSE, and RMSE criteria for 3-days ahead power production

The main findings of this section are:

  1. 1.

    A CNNE-MUPE-GPRE model has the best precision since it combines the advantages of CNNE, MUPE, and GPRE models. The CNNE model extracted important features. An MUPE received the extracted features from a CNNE model. Based on the outputs of the MUPE models, the GPR predicted the outputs. Models can deeply learn complex and nonlinear patterns through this process.

  2. 2.

    The CNNE-MUPE-GPRE model combines the feature extraction capabilities of CNNs with the flexibility of GPRE. In addition, the CNNE-MUPE-GPRE model can handle high-dimensional data that the MUPE cannot handle. The CNNE-MUPE-GPRE model also accounts for uncertainty in its predictions.

  3. 3.

    The CNNE-MUPE-GPRE decreased RMSE of the CNNE-MUPE, CNNE-GPRE, MUPE-GPRE by 10%, 33%, and 36% at the 1-day prediction horizon. The CNNE-MUPE and CNN-GPRE outperformed the MUPE-GPR. The CNN-MUPE and CNNE-GPRE models performed better than the MUPE-GPRE because they took advantage of the CNNE model in the modeling process. Thus, the CNNE model had a key role in the modeling process.

  4. 4.

    The BI-LSTM model outperformed the LSTM model because it used the past and feature data. The BI-LSTM decreased RMSEs of LSTM by 24%, 26%, and 27% at the periods of 1-day, 2-day, and 3-day.

  5. 5.

    The MUPE-CNNE-GPRE model had RMSE of 595, 600 and 612, NSE of 0.93, 0.91, 0.90, and AI of 0.95, 0.94, and 0.91 at the periods of 1-day, 2-days, and 3-days. The accuracy deteriorated with an extension of the prediction horizon.

  6. 6.

    The hybrid models outperformed the CNEE, MUPE, and GPRE models. The GPRE model had the worst performance among other models.

  7. 7.

    A bidirectional Long Short Term Memory (BILSTM) neural network has several advantages over a traditional Long Short Term Memory (LSTM) neural network. The network can incorporate past and future information into its decisions. Thus, bidirectional Long Short Term Memory can maintain long-term memories and deal with complex temporal relationships. In contrast to traditional neural networks, Bidirectional LSTM networks avoid vanishing or exploding gradients.

Figure 9 shows box plots of different models for 1-day, 2-day, and 3-day ahead. Based on the Fig. 9a, the median values were 12000, 12700, 12900, 13000, 13400, 13600, 14000, 14100, 14200, and 14500, while the minimum values were 612, 612, 615, 623, 626, 632, 656, 676, 679, and 681 for observed data, CNN-,MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE-GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models, respectively. Figure 9b shows the box plots of models for two-day-ahead prediction.

Fig. 9
figure 9

Box plots of models for 1-day, 2-days, 3-days ahead predictions

The median values of observed data, CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE-GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models were 12100, 12719, 12954, 13050, 13467, 13589, 14050, 14198, 14700, and 14655 KW, respectively. The minimum values of observed data and those models were 612, 615, 615, 623, 645, 655, 667, 682, 690, and 699, respectively, Fig. 9c shows the box plots of models for three-day-ahead prediction.

The median values for 3-day ahead of observed data, CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE-GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models were 12198, 12812, 12999, 13167, 13677, 13789, 14255, 14545, 15000, and 15050, respectively. Also, the minimum values of observed data and those models were 612, 624, 635, 682, 695, 700, 712, 724, 745, and 755, respectively.

Figure 10 shows KGE values of different models. The KGE of CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE, GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE was 0.97, 0.93, 0.91, 0.90, 0.86, 0.83, 0.78, 0.77, and 0.76, at the 1-day prediction horizon. The KGE of the CNN-MUPE-GPRE was 0.97, 0.96, and 0.94 at the 1-day, 2-day, and 3-day prediction horizons. The KGE of CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE, GPRE, BI-LSTM, LSTM, CNNE, MUPE, and GPRE models was 0.96, 0.92, 0.90, 0.88, 0.87, 0.85, 0.82, 0.76, 0.74, and 0.72 at the 2-day prediction horizon.

Fig. 10
figure 10

KGE values for different models

3.4 Investigation Ion of the Uncertainty of Models

For quantifying uncertainty values, the CNNE-MUPE model was coupled with the GPRE model. The CNNE-MUPE-GPRE model, MUPE-GPRE model, CNNE-GPRE model, and GPRE model can capture uncertainty values. Figure 11 shows 95% confidence for 1-day ahead prediction. The testing data points were used to draw these figures. The results show that more than 95% of data are bracketed by uncertainty bounds. The PICP values of the CNNE-MUPE-GPRE, CNNE-GPRE, MUPE-GPRE, and GPRE were calculated as 0.99, 0.98, 0.96, and 0.95, respectively. A high value of PICP demonstrated low uncertainty and instability and showed high accuracy of the model (Seifi et al. 2022). The PINW values of the CNNE-MUPE-GPRE, CNNE-GPRE, MUPE-GPRE, and GPRE models were 0.05, 0.10, 0.12, and 0.14, respectively. The results of PINW showed that the variability of GPR predictions was higher than hybrid models of CNNE-MUPE-GPRE, CNNE-GPRE, and MUPE-GPRE. The GPRE model had the highest uncertainty among other models. The CNN-MUPE-GPRE had the lowest uncertainty because it took advantage of three models.

Fig. 11
figure 11

The 95% confidence interavl for 1-day ahead predction

Figure 12 shows PICP and PINW values for 2-days and 3-day ahead predictions. The PICPs of the CNNE-MUPE-GPRE, CNNE-GPRE, MUPE-GPRE, and GPRE model were 0.95, 0.94, 0.92, and 0.92 at the 2-day prediction horizon. The values of PICP of the CNNE-MUPE-GPRE, CNNE-GPRE, MUPE-GPRE, and GPRE models were 0.94, 0.90, 0.87, and 0.87 at the 3-day prediction horizon, respectively. As prediction horizons increased, uncertainty of predictions increased. As the 2-day and 3-day prediction horizons may include irregular and nonlinear patterns, the uncertainty may increase. CNNE-MUPE-GPRE, CNNE-GPRE, MUPE-GPRE, and GPRE models had PINW values of 0.10, 0.16, 0.21, and 0.24 at the 2-day prediction horizon, respectively. The values of PINW of the CNNE-MUPE-GPRE, CNNE-GPRE, MUPE-GPRE, and GPRE models were 0.11, 0.18, 0.22, and 0.25 at the 3-day prediction horizon, respectively.

Fig. 12
figure 12

The PICP and PINW values of different models in uncertainty analysis

3.5 Main Findings of Paper

This paper used the CNNE-MUPE-GPRE for predicting hydrpower production. The main findings of paper are as follows:

  1. 1.

    A continuous and binary version of SSA was developed for adjusting model parameters and selecting inputs. Previous studies randomly selected the best input scenarios. The correlation method and principal component analysis were also suggested as methods for choosing the optimal input scenario. The binary SSA is superior to other methods because it automatically selects the best input scenario. Therefore, the current study fills a research gap between input selection and predictive models.

  2. 2.

    We used CNNE-MUPE-GPRE to predict different prediction horizons. The model successfully predicted 1-day, 2-day and 3-day ahead. The model can be used for both short-term and long-term predictions.

  3. 3.

    The CNNE-MUPE-GPRE can be used for long-term predictions because CNN helps extract features.

  4. 4.

    Since CNNE-MUPE-GPRE had the lowest uncertainty, it was a reliable tool. Due to the combination of three models, CNNE-MUPE-GPRE outperformed CNNE, MUPE, and GPRE.

  5. 5.

    Next studies can combine CNNE and BI-LSTM models since BI-LSTM uses past and feature data to predict outcomes. BI-LSTM uses backward and forward processes for deep learning.

  6. 6.

    For predicting spatial–temporal data, CNNE can be coupled with the LSTM model since both models can extract spatiotemporal patterns.

  7. 7.

    Furthermore, this study contributes to the development of energy engineering systems. This study develops energy monitoring systems that can be utilized in buildings, factories, and hydropower plants. Energy managers are looking for new technologies to monitor energy resources. Our models can be used to create sensors for energy monitoring. Also, these sensors can predict energy consumption. Our models will be useful for modifying patterns of energy consumption.

  8. 8.

    Different sciences can also use our models as early warning systems. These models can be used to monitor droughts and floods. In advanced engineering informatics, handling large data sets and quantifying uncertainty values are also important. The paper introduces a novel deep learning model and a GPRE model to achieve these aims. The CNNNE-MUPE-GPRE could quantify uncertainty values. Also, it had the lowest uncertainty among other models. Thus, this model can handle large data sets and quantify the uncertainty in advanced engineering informatics systems.

  9. 9.

    This study found that series hybridization improved the performance of standalone models. The hybrid structure allows the models to share their information with each other.

  10. 10.

    Hydropower plants can be successfully managed if their power generation capacity can be accurately predicted. When a drought reduces inflow to a hydropower plant, a power company can produce more electricity from alternative sources. In addition, our results can contribute to grid stability. Energy companies can better manage energy demand and supply on the grid when they can predict hydropower power production.

  11. 11.

    There are many factors that influence hydropower production, including location, weather patterns, and hydropower plant design. Precipitation can directly affect the water level of the reservoir. Thus, it can affect hydropower production. When evaporation increases, available water decreases, which results in a decrease in hydropower production. Evaporation can significantly affect hydropower production in areas with high evaporation. Hydropower production is particularly vulnerable to drought conditions and other environmental factors that reduce water availability. The relationship between relative humidity and hydropower production is complex and depends on multiple factors such as wind speed, air temperature, and location of the hydropower plant. Understanding the effect of meteorological variables on hydroelectric power plants is necessary to improve their efficiency.

  12. 12.

    Based on new input combinations, Table 5 compares the accuracy of models. Precipitation and evaporation have been added to the previous input combinations. The RMSE of the CNNE-MUPE-GPRE model was 589, 592, and 602 for one, two, and three-month ahead. The results revealed that the new input combination does not significantly change the accuracy of the CNN-MUPE-GPRE model.

Table 5 The accuracy of the CNN-MUPE-GPR based on new input combination

4 Conclusion

Prediction of hydropower production is essential for planning and managing water resources. Predictions of electricity production are used to make strategic decisions. This study is developed a new hybrid deep-learning model for daily hydropower production prediction. A CNNE-MUPE-GPRE is introduced to predict hydropower production. A binary SSA was used to select the best input combination. A CNNE-MUPE-GPRE model predicts data and selects outputs simultaneously. The results indicated that the CNNE-MUPE-GPRE was the best model among other models. The training IA of CNNE-MUPE-GPRE, CNNE-MUPE, CNNE-GPRE, MUPE, GPRE, BI-LSTM, LSTM, CNNE, MUPE, GPRE was 0.97, 0.94, 0.90, 0.85, 0.84, 0.81, 0.80, 0.75, and 0.70 at the 1-day prediction horizon. The RMSE of the CNN-MUPE-GPRE was 595, 600 and 612 at the 1-day, 2-days, and 3-days prediction horizons. The accuracy deteriorated with an extension of the prediction horizon. The CNNE-MUPE-GPRE had the lowest uncertainty among other models. The results revealed that the new hybrid models outperformed the MLP, LSTM, GPRE, BI-LSTM, and CNNE models. We combined a CNN-MLP model with a GPR model to reflect uncertainty. The values of PINW of the CNN-MUPE-GPRE, CNNE-GPRE, MUPE-GPRE, and GPR were 0.10, 0.16, 0.21, and 0.24 at the 2-days prediction horizon. The GPR model had the highest uncertainty among other models. Future studies can use the metrological parameters to predict hydropower production. This study suggests combining CNNE models with LSTM and BI-LSTM to improve accuracy.