Introduction

Energy is one of the most important factors in the process of industrialization and modernization, and it plays a critical role in technological and economic progress. Furthermore, massive population growth has exacerbated the global energy crisis. Electricity demand continues to rise, which has a negative influence on the environment [52]. As a result, several governments and regions are enacting laws to encourage the development of renewable energy [30]. Solar energy has become a crucial means of solving environmental and energy concerns due to its clean and plentiful properties [43]. The randomness and intermittent nature of PV power generation, on the other hand, makes integrating it into current energy networks extremely difficult [48]. Accurate PV power prediction is critical for ensuring the power grid’s security and storing alternative energy sources for a reasonable amount of time [2].

Different prediction approaches have been presented in recent years to forecast PV power output, which can be split into four groups: Physical method [36], statistical method [8], machine learning method [47], and hybrid method [32] are four types of PV power predicting techniques that have been established based on various forecasting principles.

The physical model forecasts PV power based on geological variables and meteorological data (i.e., air pressure, humidity, solar radiation, cloud volume, etc.) provided by the meteorological stations. After that, it creates a physical model based on the PV panel parameters, and then directly the PV generation power is calculated. The physical model is less reliant on previous data, but it is more complex to model because there are several unknown parameters [11].

Based on statistical data, time series models predict PV power. The impact of meteorological conditions is not considered throughout the forecasting phase; just the time factor is taken into account. To forecast PV output, a range of statistical approaches have been used, for example, the Markov Chain method [33, 42], gray theory [55], autoregressive and moving average (ARMA) models [6, 25]. A prior understanding of the PV system’s complicated photoelectric conversion link is not required for statistical modeling, but only a partial comprehension and realization using numerous data analysis approaches. However, the statistical approach uses a huge quantity of data to calculate and requires more time. With the rapid growth of artificial intelligence (AI) in recent years, Machine learning models with strong learning capacity and nonlinear mapping capability have been widely employed in PV power forecasting [18]. Furthermore, machine learning models can forecast PV generation power from easily available data, eliminating the need for complicated computations and other costly expenditures. For example, support vector machine (SVM) [31], back propagation neural network (BPNN) [19], extreme learning machine (ELM) [3] and Elman neural network [51]. These approaches forecast the generation capacity of PV power plants only based on historical data, without requiring any knowledge of PV power plants such as the number of panels, panel capacity [34]. Traditional single algorithms frequently neglect the fact that output power changes with a wide range of meteorological variables, which might lead to inaccurate forecasting [12]. Hybrid methods, which combine a variety of effective techniques, are more effective and efficient when compared to other ways for PV generation forecast [7]. Some examples of hybrid models used in PV power prediction are: support vector machine (SVM) and ant colony optimization (ACO) [37], convolution neural network (CNN) and gated recurrent unit (GRU) [40], bidirectional LSTM model with a genetic algorithm (GA) [58]. (SDA–GA–ELM) based on customized similar day analysis (SDA), genetic algorithm, and extreme learning machine [59].

The deep learning theory, introduced by Hinton et al. [15] has recently gained a lot of traction. With the advancement at a rapid pace of artificial intelligence techniques, deep learning models have a wider and more robust nonlinear network structure than classic machine learning models [28]. Some have already produced excellent results in predicting PV power generation, LSTM proposed by Hochreiter and Schmidhuber [16] has been extensively used to forecast PV power. The LSTM is a recursive neural network that can increase the network’s storage space and retain historical data for later use, it has the advantage of detecting long-term time series relationships. Gao et al. [13] proposed an LSTM model based on meteorological data to forecast the daily power production of PV power plants using weather categorization. Chen et al. [4] proposed a new method for very-short-term PV power prediction that combines similar time period collection using RCC (radiation classification coordinate) with LSTM. Lee et al. [27] proposed two models LSTM and GRU to predict PV power generation in a peak zone. The CNN is perfect for processing and analyzing high-dimensional data. For time series forecasting, some researchers employ CNN. For wind and solar energy forecasting, Díaz-Vico et al. [9] employed a CNN with input data from a numerical weather prediction system (NWP), the CNN’s excellent feature extraction capacity is demonstrated by the prediction results. Sabri et al. [41] proposed a new hybrid deep learning model (CNN–GRU) to predict the PV power output, the convolutional layer extracts the characteristics of the input data, while the GRU maintains the crucial details to improve prediction performance.

A new form of LSTM, known as the bidirectional LSTM (BLSTM), has recently been used for classification and regression problems. By structurally incorporating two forward and backward LSTM layers, the BLSTM can consider data from both the past and the future at the same time [24]. The BLSTM network has recently been utilized in electricity price forecasting [5], urban solid waste forecasting [21], and air pollution forecasting [35]. Aside from developing a time series forecasting model with BSLTM, other researchers have attempted to enhance BLSTM’s efficiency by combining the advantages of BLSTM and CNN to improve the prediction effect. Lawal et al. [26] proposed short-term wind forecasting at various elevations above ground level using a hybrid of 1D CNN and the BLSTM network. Unal et al. [46] proposed a spatiotemporal deep learning architecture to forecast energy consumption using the hybrid model CNN–BLSTM. Joseph et al. [22] proposed a novel hybrid deep learning model based on BLSTM and CNN for predicting traffic congestion in a smart city.

Weather forecast data has a restricted forecasting range, and historical PV power time series are non-periodic and non-stationary making classical AI algorithms ineffective. Specifically, to overcome current obstacles and attain the objectives of accurate PV power predicting. The following considerations contribute to the hypothesis of the proposed method in this paper: According to the current study, BLSTM has a high ability to extract bidirectional temporal characteristics, CNN can extract spatial characteristics. It is found that considering the combination of BLSTM and CNN model to predict PV power can achieve more accurate results.

Therefore, a new hybrid model of PV power forecasting, the BLSTM–CNN model, is suggested in this study based on the mechanistic characteristics of time series data. The main research contents of this paper are as follows:

  1. 1.

    To get acceptable prediction results, the input dataset is placed through a preprocessing step where redundant, outlier, or missing values are eliminated.

  2. 2.

    A hybrid PV power prediction network is proposed that takes into account the temporal–spatial characteristics extraction order.

  3. 3.

    The bidirectional temporal characteristics of the data are extracted first using the BLSTM model and then the spatial characteristics of the data are extracted using the CNN model while considering the PV data features.

Methods and materials

Convolutional neural network

A major component of a convolutional neural network is the convolution layer [23]. Convolution layer C and numerous filters are coupled to the input matrix, with each filter holding an \(i\) \(\times\) \(i\) weight matrix. Find the convolution matrix using a filtered scan of the input matrix. The CNN layer can extract local features from high-layer inputs and send them down to lower layers for more sophisticated features [54]. Equation (1) is the result of the vector \(y^1_{ij}\) output from the first convolutional layer, where x is the input vector for power production, and n is the number of units per window. The output vector x of the previous layer is used to calculate y. w is the weight of the kernel, \(\sigma\) is the activation function, \(b^{1}_j\) represents the bias for the \(j\mathrm{th}\) feature map, and m is the index value of the filter. The result of Eq. (2) is the vector \(y^l_{ij}\) output from the \(l_{th}\) convolutional layer.

$$\begin{aligned} y^1_{ij}= & {} \sigma \left( b^1_j+\sum _{m=1}^{M}w_{m.j}^1x^0_{i+m-1.j}\right) \end{aligned}$$
(1)
$$\begin{aligned} y^l_{ij}= & {} \sigma \left( b^l_j+\sum _{m=1}^{M}w_{m.j}^lx^0_{i+m-1.j}\right). \end{aligned}$$
(2)

The pooling layer is a crucial component of CNN, and it is utilized to minimize the convolution matrix’s dimension. Eq. (3) represents the max-pooling layer operation. T is the step that specifies how far the input data area will be relocated, and R is the pooling size that is smaller than y.

$$\begin{aligned} p^l_{ij} = \max _{r\in R } y^{l-1}_{i\times T+r.j} \end{aligned}$$
(3)

Bidirectional long short-term memory neural network

Fig. 1
figure 1

General overview of recurrent neural networks [20]

Fig. 2
figure 2

The structure of hybrid model BLSTM–CNN [57]

In 1982, Hopfield proposed the recurrent neural network (abbreviated as RNN) [17]. Because of its unique network structure, which differs from traditional neural networks, each component of the RNN maintains the hidden layer parameters, allowing the current component to retain the memory of the information produced by the pre-order components. Figure 1 illustrates a comprehensive overview of RNN. Nevertheless, there is a clear disadvantage to RNN when the data series is too long, or the time interval is too large. The continuous multiplication impact in gradient reverse multiplication causes the vanishing gradient issue, rendering RNNs unable to train effectively. This problem was well handled by Schmidhuber’s LSTM network, which he introduced in 1997 [16]. The forget gate, output gate, and input gate are used to build a memory unit in LSTM [56], which replaces the memory unit in RNN.

The process of the three gates specific to the LSTM is described in the following details:

Forget gate Unwanted data can be discarded if desired. By reading the current moment’s input \(x_t\) and the previous moment’s output \(h_{t-1}\), and assigning a weight between 0 and 1 to each data in the cell state \(C_{t-1}\) at the previous moment, 0 denotes “all discarded” whereas 1 denotes “all retained.” The LSTM network can adjust this weight to improve the model through continual feedback learning. The following is the output \(f_t\):

$$\begin{aligned} f_t = \sigma (W_f.[h_{t-1},x_t]+b_f). \end{aligned}$$
(4)

Input gate It is used to figure out what data should be saved in the cell state. The sigmoid layer defines what value needs to be updated, and the output of the input gate is designated as \(i_t\). The tanh layer generates new \(\tilde{C_{t}}\), that is, ready-to-add information to the cell state:

$$\begin{aligned} i_t= & {} \sigma (W_i.[h_{t-1},x_t]+b_i) \end{aligned}$$
(5)
$$\begin{aligned} \tilde{C_{t}}= & {} tanh(W_c.[h_{t-1},x_t]+b_c). \end{aligned}$$
(6)

Then, multiply the cell state at the previous moment by the forget gate output function \(f_t\), and update the cell state at the previous moment. Add the newly generated candidate information and calculate the current unit state as follows:

$$\begin{aligned} C_{t} = f_t*c_{t-1}+i_t *\tilde{C_{t}}. \end{aligned}$$
(7)

Output gate Filter to ensure that just what is required in the cell state is output. It is also split into two layers: The tanh layer updates the cell state requirement to a value between \(-1\) and 1. The output is designated as \(o_t\), and the sigmoid layer defines which part of the cell state is output, and finally outputs \(h_t\):

$$\begin{aligned} o_{t}= \sigma (W_o.[h_{t-1},x_t]+b_o) \end{aligned}$$
(8)
$$\begin{aligned} h_{t}= o_{t}*tanh{(C_{t})} \end{aligned}$$
(9)

where \(o_t\) , \(i_t\) ,\(f_t\) are the output gate, input gate, and the output value of the forget gate, respectively. The \(b_{f,i,o}\) and \(W_{f,i,o}\) are the bias vectors and weight matrices. \(\sigma\) is a sigmoid function.

BLSTM stands for bidirectional LSTM and is commonly employed for natural language processing. In terms of time series forecasting, BLSTM may outperform LSTM. BLSTM is made up of two fundamental LSTMs [14]: a forward LSTM that utilizes past information and a backward LSTM that utilizes future information, allowing information from time t-1 and time t+1 to be utilized at time t. Usually, BLSTM is more efficient than LSTM and RNN in general since both past and future information may be used.

The calculating equation of the \(y_t\) :

$$\begin{aligned} y_{t}= g(W_y[h_t;h^{\prime}_t ]+b_y) \end{aligned}$$
(10)
$$\begin{aligned} h_{t}= f(W_t[c_{t-1};x]+b_t) \end{aligned}$$
(11)
$$\begin{aligned} h^{\prime}_{t}= f(W^{\prime}_t[c^{\prime}_{t-1};x]+b^{\prime}_t) \end{aligned}$$
(12)

where \(h^{\prime}_{t}\) and \(h_t\) is the hidden output of the backward LSTM cell and of the forward LSTM cell at time t, respectively, \(W^{\prime}_t\) is the weight matrix of the backward LSTM cell, \(W_t\) is the weight matrix of the forward LSTM cell at the time t, \(b^{\prime}_t\) is the bias vectors of the backward LSTM cell at the time t, \(b_t\) is the bias vectors of the forward LSTM cell at the time t.

BLSTM–CNN hybrid neural networks

Fig. 3
figure 3

Framework of proposed model

Hybrid models, on average, outperform single models. Maintaining the utility of BLSTM and CNN in consideration. We leveraged the complementary capabilities of both models to construct a new operational temporal and spatial extracting features model to predict PV power generation more precisely. In this paper, a hybrid approach called BLSTM–CNN is suggested to forecast PV power generation using a series connection of BLSTM and CNN, as illustrated in Fig. 2. The suggested approach excels at pulling complex characteristics and patterns from weather factors obtained for PV power generation forecasting. The historical time series PV power data is initially fed into the BLSTM model as an input, and the temporal characteristics of the data are extracted utilizing the BLSTM model’s capability of processing time series data. The resulting temporal characteristics are then transmitted to the CNN model input layer to extract the data’s spatial characteristics. A CNN often contains numerous levels of convolutional-pooling layers, with many convolution operations conducted at each level to capture significant data. CNN applies weights to weather factors depending on their effect on PV power. Finally, a fully connected layer is employed to gather the data and forecast the PV power generation using extracted characteristics. The dropout layer is also introduced to the model to minimize model overfitting.

Dataset description

In this study, the PV data from 1B DKASC, Alice Springs PV system was chosen as a case study [10]. For this experiment, data from October 1, 2020, to January 27, 2021, with a resolution of 5 min were chosen. The input parameters are global horizontal radiation (\(W/m^2 \times sr\)), weather temperature Celsius (\(^{\circ }\text {C}\)), diffuse horizontal radiation (\(W/m^2 \times sr\)), current phase average (A), weather relative humidity (\(\%\)) and wind direction (\({\hat{A}}^o\)), while the output is set to active power data (kW). To increase the effectiveness and precision of the model forecast, the data must be preprocessed and filtered before being fed into it. Preprocessing involves eliminating abnormal data, completing missing values, and normalizing the data. The data is separated into two parts: 80% for training and 20% for testing. The BLSTM–CNN hybrid model has two primary parts. The first one is the bidirectional long-term dependencies are learned using the temporal modeling tool BLSTM after data preprocessing. The second one is 1D CNN, which is applied to extract the data’s spatial characteristics. In order to evaluate the effectiveness of the proposed BLSTM–CNN. Five single deep learning models CNN, GRU, LSTM, RNN, BLSTM, and two hybrid models LSTM–CNN and CNN–LSTM are also used as comparison models for predicting the output of PV power. The metrics used to measure model prediction efficiency and accuracy are RMSE, MSE, MAE, R\(^2\). The experimental results were completed in Python 3.7 and a personal computer with a 64-bit operating system, Intel (R) Core (TM) i7-4600 CPU@2.10GHZ 2.70GHZ and 8.00 GB of RAM. The framework of PV power output forecasting is shown in Fig. 3.

Model evaluation indexes

To compare the performance of various predictive models, we utilize the mean absolute error (MAE), root mean square error (RMSE), mean square error (MSE), and coefficient of determination (R\(^2\)) [39]. Definitions of these evaluation indexes are as follows.

$$\begin{aligned} \text {MAE}= & {} \frac{1}{N}\sum _{i=1}^{N} \left|y_{i} - \tilde{y_{i}} \right|\end{aligned}$$
(13)
$$\begin{aligned} \text {MSE}= & {} \frac{1}{N}\sum _{i=1}^{N}(y_{i} - \tilde{y_{i}})^{2} \end{aligned}$$
(14)
$$\begin{aligned} \text {RMSE}= & {} \sqrt{\frac{1}{N}\sum _{i=1}^{n}({y_{i}} - \tilde{y_{i}})^{2}} \end{aligned}$$
(15)
$$\begin{aligned} R^2= & {} 1- \frac{\sum _{i=1}^{N}(y_i-\tilde{y_{i}})^2 }{\sum _{i=1}^{N}(y_i-\bar{y_i})^2} \end{aligned}$$
(16)

where \(y_{i}\) is the real PV power generation value, \(\tilde{y_{i}}\) predicted value and N is the number of \(y_{i}\). \(\bar{y_i}\) is the average of the real PV power generation in the test set.

Modeling results

Results and comparisons

This research proposes a hybrid model (BLSTM–CNN) for PV power prediction. The BLSTM model was used to extract bidirectional temporal features. Set up two hidden layers using the filtered index data in the BLSTM model, where Units = 128; Units = 256. The obtained temporal characteristics are then sent to the CNN model input layer, which uses the convolutional layer and pooling layer to extract spatial features of the dataset. In the CNN model, 2 layers of convolutional layers and 2 layers of pooling layers are used, there are 128 and 256 convolution kernels, respectively. In the convolutional layer, the kernel size is 3*3. The dropout layer [44] was also included in the model to avoid overfitting issues during training, which could reduce prediction accuracy. The batch size of the proposed model is 500. Finally, two layers of the fully connected layer with 512 and 256 neurons, respectively, output of the PV power generation forecasting result. The parameters settings of the suggested model in this work are shown in Table 1. Five single deep learning models CNN, GRU, LSTM, RNN, BLSTM, and two hybrid models LSTM–CNN and CNN–LSTM were used as comparison models for PV power output forecasting to verify the effectiveness of the suggested BLSTM–CNN model in this work. The RMSE, MSE, MAE, and R\(^2\) metrics are used to evaluate model forecasting accuracy and effectiveness.

Table 1 Parameters setting of the proposed method
Table 2 The results of the forecasting model
Fig. 4
figure 4

The MAE, RMSE, MSE, and R2 criterion in different models

Fig. 5
figure 5

Comparison between model predictions and actual values (a). Error between the forecasted and actual values (b)

Table 3 Statistical values of the experimental PV power (KW) data
Table 4 Error evaluation results of BLSTM–CNN and other deep learning models in four datasets

Table 2 shows the forecasting results of the eight models. Table 2 shows that the proposed BLSTM–CNN has lower values of RMSE, MSE, MAE values, and the highest value of R2 than the other seven models. To make the data in Table 2 more visible. Figure 4 illustrates the results of the various models evaluation criteria. PV power generation data for one day in 2021 were chosen randomly as validation data to test the efficiency of the BLSTM–CNN model. Figure 5a shows a comparison of forecasted and actual values. For all times in the range, the forecasting curves show high consistency with the actual data. As shown in Fig. 5b, the error between the forecasted and actual values is illustrated by the rose curve.

Subsequently, the suggested forecasting model’s performance and stability are tested in four months to ensure the BLSTM–CNN forecasting reliability and efficacy. The collected data is separated into four months: October, November, December, and January. Each month’s data is divided into two parts: 80% for training and 20\(\%\) for testing. Table 3 shows the partition of training and testing sets in four months.

Fig. 6
figure 6

Evaluation criteria results of BLSTM–CNN and other popular models in four datasets

Fig. 7
figure 7

The forecasting results of the different model and actual PV power in October (a), November (b), December (c), January (d)

Table 4 shows the results of various models CNN, GRU, RNN, LSTM, BLSTM, CNN–LSTM, LSTM–CNN, and BLSTM–CNN for the PV power forecasting in different months October, November, December, and January. The hybrid models outperform the single models in terms of prediction accuracy. In addition, the prediction effect of the BLSTM–CNN hybrid model is better than that of the two hybrid models (LSTM–CNN and CNN–LSTM). The result indicates that BLSTM–CNN outperforms other models in terms of prediction accuracy in four months, although various models can be used to forecast PV power generation, no single model consistently always outperforms the others, which confirms the proposed model’s ability to extract temporal–spatial features, which allows it to create a complicated relationship between input data and target PV power generation. For better visualization, the evaluation criteria result in different months of the various models is also illustrated in Fig. 6.

A four-day from each month was chosen at random for further examination. Figure 7 illustrates the prediction results for these sixteen days using the proposed model and seven comparable models. It is evident that all the models perform well in terms of predictions. The suggested hybrid deep learning model outperforms five single models and two hybrid models in terms of prediction accuracy. The BLSM-CNN curve and the actual value curve are very close and show better prediction performance, especially at night and during peak power.

Comparison of the proposed hybrid BLSTM–CNN model with state-of-the-art methods

Fig. 8
figure 8

Comparison of the proposed hybrid BLSTM–CNN model with previous studies

The historical data in this paper comes from the DKASC in Australia, Different state-of-the-art methodologies were conducted and examined with solar producing plants from the DKASC in previous studies. The comparison results are shown in Fig. 8.

Chen et al. [4] proposed a simple and efficient RCC-LSTM model for PV power prediction. The RCC (radiation classification coordinate) method as a tool is used for gathering identical time periods, then LSTM is used to extract characteristics from time series PV power data. The data was gathered from the Yulara in Alice Springs for two years (2017-2018) with a resolution of the historical dataset as 5 min. The average MAE value obtained is 0.587.

Zhou et al. [29] proposed a hybrid deep learning model (WPD-LSTM) for short-term PV power forecasting. The data was gathered from DKASC, Alice Springs from June 1, 2014, to June 12, 2016. The average MAE value obtained is 0.2357. Zhou et al. [59] proposed a hybrid model (SDA–GA–ELM) based on extreme learning machine (ELM), genetic algorithm (GA), and customized similar day analysis (SDA) to forecast hourly PV power generation. The dataset was collected from Jan 14, 2017, to Oct 15, 2018, with a resolution of 1 h from DKASC. The average MAE value obtained is 0.2367.

Wang et al. [50] proposed a hybrid model (LSTM–CNN) for PV power forecasting. The data was collected from 1B DKASC, Alice Springs PV for half-year data with a resolution of the historical dataset as 5 min. The average MAE value obtained is 0.2210. Zhen et al. [58] proposed a hybrid model (GA-BLSTM)) for ultra-short-term PV power prediction. The data was collected from 8 PV plants ranging from 2017 to 2019 with resolution of the historical dataset as 5 min. The average MAE value obtained is 0.242. Abdel-Basset et al. [1] proposed a novel deep learning architecture namely (PV-Net), to enable efficient extraction of positional and temporal features in PV power sequences, the gates of the GRU are modified utilizing convolutional layers (named Conv-GRU) for forecasting short-term PV energy production. The data was collected from 1B DKASC, Alice Springs PV throughout five years (2015–2019) with a resolution of the historical dataset as 5 min. The average MAE value obtained is 0.398.

When the above research’s results are compared, the suggested BLSTM–CNN has the minimum MAE value. It is obvious that the suggested model outperforms prior researches and provides higher PV generation predicting performance.

However, the suggested model has a few flaws that must be investigated further. For example, in this research, the structure and training hyper-parameters of the model were found by experimentation, which is time-consuming. As a result, automated settings estimation approaches, like heuristic optimization algorithms, will be utilized in our future research to pick and improve the parameters of the neural network more effectively.

Conclusion

Accurate PV power forecasting plays an important role in the maintenance, control, management, and operation of PV power generation systems. In this research, a novel hybrid PV power generation forecasting model based on a deep learning algorithm namely BLSTM–CNN was suggested to increase the accuracy and reliability of PV power generation forecasting. More specifically, BLSTM automatically extracts bidirectional temporal correlation characteristics of PV data and CNN extracts spatial correlation characteristics of PV data to produce the final PV power forecasting results. Four evaluation measures and five single deep learning (CNN, GRU, RNN, LSTM, BLSTM) and two hybrid models (LSTM–CNN, CNN–LSTM) were employed for the experimental study to validate the proposed model’s predicting performance. The BLSTM–CNN model is proposed and used in a novel way in the field of PV power forecasting with the highest R\(^2\) value of 0.9993, the lowest RMSE value of 0.0944, MAE value of 0.0531, MSE value of 0.0089. In terms of forecasting accuracy, the results indicate that the proposed model outperforms other traditional classical models. In the next study, the hybrid model will be combined with more sophisticated deep learning models to extract temporal and spatial features separately, resulting in more precise PV power forecast results. Moreover, the proposed model can also be enhanced and used in other domains, such as wind speed forecasting and residential load forecasting.