1 Introduction

Recently, the use of renewable energy sources (RESs) has obviously been increased worldwide. This increase is driven by the environmental and technical benefits of RESs as well as the massive increase in the load demand [15, 38]. Recent studies have been proposed for integrating a 100% renewable energy penetration in smart power systems [12, 39]. The RES units can be large-scale stations integrated to transmission power systems, small-scale distributed generation (DG) integrated to medium voltage distribution systems, or even rooftop-mounted units integrated to low-voltage distribution systems. One of the most notable RES types is photovoltaic (PV) systems. Due to the development in technology and the exponential rise of demand on PV systems, their costs are continuously decreasing [16, 19]. The global contribution of PV systems is expected to rapidly increase in several countries. For instance, in Egypt, there is a great interest in developing several PV projects in small, medium, and large scales, and this trend is highly motivated by the high level of solar radiation and the sunny pattern throughout the year in all parts of the country [3, 44].

PV systems convert sun light directly into electric power. The main characteristic of PV systems is that their output power is intermittent and unpredictable. The output PV power depends on the fluctuated environmental conditions, e.g., sun conditions. For example, the PV output power will have a maximum value during a clear day, while moving and transient clouds greatly reduce the amount of generated power. The authors of [13, 33] demonstrated that high PV penetration can cause many technical problems in power systems, such as reverse power flow and voltage regulation problems. Undesired fluctuations in voltages and power are also a common problem in distribution systems caused by high PV penetration. These fluctuations can lead to instability of small micro-grid systems with small capacities of storage devices [52]. Several other technical aspects are also linked to intermittent PV systems, including power quality, generation control, and protection [49]. These technical aspects constrain the allowed PV penetration level to maintain secure and optimal system operation.

To guarantee safe operation and economic integration of electrical power systems with high PV penetration, accurate forecasting of PV power is essentially required. At the planning stage of PV systems, optimal allocation of these PV systems is required, which needs accurate forecasting of environmental conditions at their recommended sites [37, 48]. Forecasting the output power of PV systems enables system operators to monitor their performance, perform control actions, optimally dispatch various DG types, and manage voltage control devices.

Accurate forecasting of PV power could be a complex task due to the fluctuated nature of the weather (e.g., cloud movement and the temperature changes). Recently, recurrent neural networks have been used in various applications, such as optimal demand response in smart grids [54], control of single-phase converters [17], manipulator control [27, 28, 30, 31], modeling crack growth of aluminum alloy [56], distributed task allocation of multiple robots [26], text recognition [43], and localization of wireless sensor networks [29]. Indeed, recurrent neural networks achieve good results in different applications because they can model the dynamics of the data. This paper proposes the use of long short-term memory (LSTM) recurrent neural network (LSTM-RNN) for forecasting PV output power. LSTM-RNN can model the temporal changes in the data due to their recurrent architecture and memory units. Unlike the traditional recurrent neural networks, LSTMs were designed to avoid the long-term dependency problem. Indeed, LSTM can capture abstract concepts in the PV power sequences. To the best of our knowledge, this is the first paper that uses LSTM-RNN to forecast PV power and considers the temporal changes in PV data when constructing the forecasting models. The main contributions of this paper can be summarized as follows:

  1. 1.

    We propose a novel PV power forecasting method based on deep LSTM recurrent neural networks. The proposed method considers the temporal changes in PV power when constructing the forecasting models.

  2. 2.

    We assess the performance of five LSTM models with different architectures in the forecasting of PV power.

  3. 3.

    To demonstrate the effectiveness of the proposed method, we compare it with three widely used PV power forecasting methods.

The rest of this paper is organized as follows. Section 2 presents the related work. Section 3 explains the proposed method. Section 4 presents and discusses the experiential results. The conclusions and some lines of future work are given in Sect. 5.

2 Related work

Horizons of PV power forecasting vary from seconds to months depending on their usage, where they can be classified into three categories: (1) short-term forecasting, (2) medium-term forecasting, and (3) long-term forecasting. Efficient methods are required to improve the accuracy of PV forecasting models, thus reducing the negative impacts of system uncertainty. In the literature, several forecasting methods have been developed for predicting PV power. These methods can be classified into four categories: (1) statistical methods, (2) artificial intelligence methods, (3) physical methods, and (4) hybrid methods [1, 51]. To achieve accurate results, a suitable forecasting method should be used with each PV data and the horizon length required. In the subsections below, we briefly review the four categories.

2.1 The statistical methods

Statistical methods depend on the given historic environmental data at the PV sites to generate their forecasting models. Persistence models belong to statistical methods category, where they are simple tools for PV power forecasting [7, 11, 34]. These models were developed for stationary time series, and thus they are not suitable to forecast PV power as solar radiation profile is non-stationary. Other examples for the statistical methods are auto-regressive moving average (ARMA) [22], auto-regressive integrated moving average (ARIMA) [45], and auto-regressive moving average model with exogenous inputs (ARMAX) [32]. In [5], a probabilistic forecasting model of PV systems for 6 h ahead was proposed for smart grids applications. These statistical methods are preferable for short-term and medium-term forecasting.

2.2 The artificial intelligence methods

Artificial intelligence techniques are widely used in several fields, including forecasting. In [42], an artificial neural network model was introduced to predict solar irradiation using physical and environmental data. An improved forecasting model that considers aerosol index data instead of using the traditional environmental data was proposed in [35]. In [41], different artificial neural network models were constructed according to sun condition (i.e., sunny, partly cloudy, and overcast) for short-term forecasting of PV production.

In [53], a Bayesian neural network model was proposed to predict solar irradiation. The efficiency of the model was demonstrated through comparisons with traditional neural network models. To improve the forecasting accuracy, the authors of [8] combined wavelet analysis with artificial neural networks. The long-term forecasting of PV output power was performed using historical data, fuzzy theory, and neural networks in [55]. In [24], an artificial neural network model was used to forecast PV power and determine the sufficient time horizon for accurate representation of PV data. Wavelet recurrent neural networks were used in [9] to predict solar radiation for two days ahead, where they considered the correlation between solar radiation, wind speed, air humidity, and temperature.

2.3 The physical methods

Unlike the aforementioned methods, the physical methods require detailed models of PV and local measurements. Satellites, with their ability to monitor cloud movement over wide areas, have been employed for forecasting solar radiation [21, 36, 46]. In [18], an advanced model was proposed to estimate the solar radiation with introducing new sensors that greatly improve the forecasting accuracy. In [10], a ground-based sky imager is employed for cloud and solar radiation forecasting, where images for sky are recorded every half minute.

Satellites are an effective way for short-term forecasting (up to 5 h). In the case of long-term forecasting of solar radiation, numerical weather prediction models are demonstrated to be more efficient than satellites [36]. The authors of [47] tested several numerical weather prediction models at several sites in the USA, Europe, and Canada. A comprehensive study to validate and test the accuracy of several numerical weather prediction models and forecasting systems in the USA is performed in [40].

2.4 The hybrid methods

Efficient and accurate hybrid methods can be formulated by combining different forecasting methods. In [4], ARMA and nonlinear auto-regressive models were combined in order to achieve accurate forecasting results. The authors of [25] demonstrated that the combination of ARMA and time delay neural network produces an efficient hybrid method for solar radiation prediction. The authors of [6] combined two traditional methods, auto-regressive integrated moving average and support vector machines to forecast PV power. Bacher et al. [2] proposed a two-stage method that incorporates auto-regressive model and auto-regressive with exogenous input model for short-term forecasting of PV power. To forecast solar radiation, both exponential smoothing state space model and artificial neural networks were proposed in [14].

The above-mentioned methods do not consider the temporal changes in PV historical data when constructing the forecasting models, and thus they discard key information about the dynamic of the data. In this paper, we propose the use of LSTM-RNN to construct an accurate forecasting model of PV output power. LSTM-RNN considers the temporal changes in the PV power, thereby producing more reliable models.

3 Proposed method

We use the LSTM-RNN to predict an hour-ahead power of PV. LSTM can model the temporal changes in the data and thus improves the forecasting results. In the subsections below, we briefly describe the LSTM unit which is the basic building block of our PV forecasting method, and we explain the proposed PV forecasting models.

3.1 Basic LSTM unit

In the learning phase, traditional neural networks cannot utilize the information learned at previous time steps in the modeling of the data at the current step. This point represents a major shortcoming of traditional neural networks. RNNs try to solve this problem by using loops that pass information from one step of the network to the next steps, allowing information to persist. In other words, RNNs connect previous information to the present task. Indeed, using previous sequence samples may help in the understanding of the present sample.

LSTMs are a special kind of RNNs that can learn short-term as well as long-term dependencies [23]. Unlike RNNs, LSTMs were designed to avoid the long-term dependency problem. LSTM network is trained using backpropagation through time, and it overcomes the vanishing gradient problem. The traditional neural networks have neurons, in turn, LSTM networks have memory blocks that are connected through successive layers. Each block contains gates that handle the state of the block and the output. In the LSTM unit, there are three types of gates: forget, input, and output. The task of each gate can be summarized as follows:

  • Forget gate sets what information to throw away from the block based on certain conditions.

  • Input gate sets which values from the input to update the memory state based on certain conditions.

  • Output gate sets what to output based on input and the memory of the block based on certain conditions.

Fig. 1
figure 1

LSTM unit [23]

As shown in Fig. 1, an LSTM block receives an input sequence and then each gate uses activation units to decide whether they are triggered or not. This operation makes the change of state and addition of information that flows through the block conditional. The gates have weights that can be learned during the training phase. Indeed, the gates make the LSTM blocks smarter than classical neurons and enable them to memorize recent sequences.

Each LSTM unit contains a cell which has a state \(c_t\) at time t. This cell can be considered as a memory unit. Reading/modifying this cell is controlled through the input gate \(i_t\) (a sigmoidal gate), forget gate \(f_t\) and output gate \(o_t\). The LSTM unit receives inputs from two external sources at each of the four terminals (i.e., the three gates and the input) at each time step. The two external sources are:

  • The current sample \(x_t\).

  • The previous hidden states of all LSTM units in the same layer \(h_{t-1}\).

Each gate has an internal source, the cell state \(c_{t-1}\) of its cell block. The LSTM sums the inputs coming from different sources with a bias. The gates are activated by inputting their total input into the logistic function. The total input at the input terminal is passed through \(\tanh\) nonlinearity. The LSTM multiplies the resulting activation by the activation of the input gate and then sums the result of the multiplication to the cell state after multiplying the cell state by the activation of the forget gate \(f_t\). The LSTM passes the updated cell state through \(\tanh\) nonlinearity and then multiplies it with the activations of the output gate \(o_t\) to determine the final output from the LSTM unit \(h_t\). The previous steps and the updates of the LSTM unit can be formulated as follows:

$$\begin{aligned} i_t=\sigma (W_{xi}X_t+W_{hi}h_{t-1}+W_{ci}c_{t-1}+b_i) \end{aligned}$$
(1)
$$\begin{aligned} f_t=\sigma (W_{xf}X_t+W_{hf}h_{t-1}+W_{cf}c_{t-1}+b_f) \end{aligned}$$
(2)
$$\begin{aligned} c_t= f_{t}c_{t-1}+i_{t}\tanh (W_{xc}X_t+W_{hc}h_{t-1}+b_c) \end{aligned}$$
(3)
$$\begin{aligned} o_t=\sigma (W_{xo}X_t+W_{ho}h_{t-1}+W_{co}c_{t}+b_o) \end{aligned}$$
(4)
$$\begin{aligned} h_t=o_t\tanh (c_t) \end{aligned}$$
(5)

The main advantage of using the LSTM unit, unlike the traditional neurons used in RNN, is that its cell state accumulates activities over time. Since derivatives distribute over sums, the derivatives of the error do not vanish quickly as they are sent back into time. In this way, LSTM can carry out tasks over long sequences and discover long-range features.

3.2 PV power forecasting using different LSTM architectures

To forecast PV output power, we construct five LSTM models using different architectures. We used different LSTM models for the purpose of specifying the model that gives the most accurate results with each PV dataset. Below, we briefly explain each model.

3.2.1 Model1: basic LSTM network for regression

In this architecture, we phrase the PV power forecasting as a regression problem. Given the PV power in this hour, we aim at predicting the output PV power in the next hour. We design the LSTM network for this problem as follows. The network has a visible layer with one input, a hidden layer with four LSTM blocks (neurons), and an output layer that gives the predicted power. We used the default sigmoid activation function for the LSTM blocks. We trained the network for 20, 50, and 100 epochs with a batch size of 1.

3.2.2 Model2: LSTM for regression using the window technique

In this architecture, we use multiple recent time steps to predict the PV output power at the next time step (a window technique). In this technique, we can tune the size of the window for the PV power forecasting problem. For instance, given the current time t, we aim at predicting the PV power at the next time in the sequence \(t+1\). To do so, we use the PV power of the current time t and the ones of two prior times (\(t-1\) and \(t-2\)) as input variables to the LSTM unit. In this case, the input variables of the LSTM unit are the PV power at \(t-2\), \(t-1\), and t while the output variable is the PV power at \(t+1\).

3.2.3 Model3: LSTM for regression with time steps

Indeed, time steps provide another way to phrase the PV output power forecasting problem. Like the previous model (Sect. 3.2.2), we take prior time steps in the PV power time series as inputs to predict the output power at the next time step. In this model, instead of using the past observations as separate input features, we use them as time steps of the one input feature, which is a more accurate framing of the PV power forecasting problem. For instance, if the time step equals 3, the LSTM unit outputs the PV power at t after it handles the PV power at \(t-3\), \(t-2\) and \(t-1\).

3.2.4 Model 4: LSTM with memory between batches

The LSTM network has a memory that enables it to remember across long sequences. When fitting the model in the normal configuration, we reset the state within the network after each training batch. We can make finer control over when the internal state of the LSTM network is cleared by making the LSTM layer stateful. In other words, LSTM can build state over the entire training sequence and even maintain that state if needed to predict PV output power. It requires that the training data not be shuffled when fitting the LSTM network.

3.2.5 Model 5: stacked LSTMs with memory between batches

Stacked LSTM adds capacity by stacking LSTM layers on top of each other [20, 50]. LSTM networks can be stacked in the same way that other layer types can be stacked (e.g., the layers of neural networks). Figure 2 demonstrates how LSTM layers can be stacked. The blue blocks belong to layer1, while the red blocks belong to layer2. The inputs to layer1 are the PV power \(x_t, x_{t+1}, \ldots x_{N}\) while the inputs to layer2 are \(h_t, h_{t+1}, \ldots h_{N}\). The intuition is that higher LSTM layers can capture abstract concepts in the sequences, which can improve the PV power forecasting results.

Fig. 2
figure 2

Stacked LSTM

4 Results and discussion

4.1 Datasets

We used two PV datasets for locations in Aswan (Dataset1) and Cairo (Dataset2) cities, Egypt. Figure 3 shows the distribution of PV power in the Dataset1 with hours, days, weeks, and months. As shown in Fig. 3a, the maximum PV power is generated at 12.00 h approximately (Egypt time zone: GMT + 2). As we can see, in Aswan city the PV operates for a long period (from 7.00 to 18.00 h) during the whole year (Fig. 3d). This is because Aswan has a subtropical desert low-latitude arid hot climate, and the summer runs from March to November with temperatures reaching upwards of \(40^{circ}\) during June, July, and August. The PV power per week is almost constant (Fig. 3b), while the PV power per day has small fluctuations (Fig. 3c).

Fig. 3
figure 3

The distribution of PV power in Dataset1 with a hours, b days, c weeks, and d months

4.2 Results

We divide the dataset into training and testing datasets. A total of 70% of the samples are used to train the PV power forecasting model, while the remaining samples are used for testing the model. We used the root-mean-square error (RMSE) to evaluate the performance of the forecasting models. RMSE can be defined as follows:

$$\begin{aligned} \hbox {RMSE}=\sqrt{\frac{1}{N}\sum _{i=1}^{N} (\hat{X_i}-X_i)^{2}} \end{aligned}$$
(6)

In this equation, \(\hat{X_i}\) and \(X_i\) are the ith foretasted and actual values, respectively, and N is the size of the testing dataset.

The loss function of LSTM was the mean-squared error, and the optimizer was ‘adam.’ The models were implemented using Keras library (theano backend). Model1 has a visible layer with 1 input, a hidden layer with 4 LSTM blocks, and an output layer that makes a single value prediction. We evaluated the performance of the five models with 20, 50, and 100 epochs.

Table 1 shows the training errors of the five models with Dataset1. In the training phase, model4 gave the smallest training error with 100 epochs, while model1 gave the highest RMSE with 50 epochs. Table 2 shows the testing errors of five models with Dataset1. With 50 epochs, model3 obtained the smallest RMSE value, while model1 gave the highest one.

Table 1 Training errors of the five models with Dataset1
Table 2 Testing errors of the five models with Dataset1

Figures 4, 5, 6, 7, and 8 present the predicted PV power using the five LSTM models with 20, 50, and 100 epochs with Dataset1. As we can see, model3 with 50 epochs accurately predicts the PV power compared to the other models. In turn, we notice big errors in the case of model1 with 50 epochs.

Fig. 4
figure 4

Predicting PV power of Dataset1 using model1 with a 20, b 50 and c 100 epochs

Fig. 5
figure 5

Predicting PV power of Dataset1 using model2 with a 20, b 50, and c 100 epochs

Fig. 6
figure 6

Predicting PV power of Dataset1 using model3 with a 20, b 50, and c 100 epochs

Fig. 7
figure 7

Predicting PV power of Dataset1 using model4 with a 20, b 50, and c 100 epochs

Fig. 8
figure 8

Predicting PV power of Dataset1 using model5 with a 20, b 50, and c 100 epochs

Table 3 presents the training errors of the five models with Dataset2. In the training phase, model2 gave the smallest RMSE value, while model1 gave the highest error with 50 epochs. Table 4 presents the testing errors of the five models with Dataset2. Model2 and model3 gave the smallest RMSE value, while model1 gave the highest one with 50 epochs.

Table 3 Training errors of the five models with Dataset2
Table 4 Testing errors of the five models with Dataset2

We can conclude that model3 gives the best results compared to model1, model2, model4, and model5. Thus, we recommend to use it for forecasting the PV power. The main reason of achieving good results (small RMSE) with LSTM is its ability to model the temporal changes in the PV power, while the traditional PV forecasting methods do not utilize the temporal information. In other words, LSTM can capture abstract concepts in the PV power sequences and thus improves the forecasting results.

4.3 Comparison with related methods

In this section, we compare the performance of the proposed method (model3) with three PV forecasting methods: multiple linear regression (MLR), bagged regression trees (BRT), and neural networks. As we can see in Table 5, MLR and BRT give high RMSE. Indeed, these methods were developed for stationary time series forecasting; therefore, they are not suitable for forecasting PV power because solar radiation profile is non-stationary. With NN, we have tried different configurations, such as using 1 and 2 layers while changing the number of neurons from 1 to 50. With Dataset1 and Dataset2, the NN model gives its best results with 2 layers and 7 neurons.

Table 5 Comparison with related methods

Unlike LSTM-RNN, MLR, BRT and NN methods do not contain memory units, and so they cannot model the temporal changes in PV output power. The NN method has a similar architecture to the LSTM-RNN, but it does not have memory units or a recurrent architecture. In turn, LSTM-RNN uses the information learned in the previous time steps in the predication of the current value, yielding robust and accurate forecasting results. As shown in Table 5, the proposed method (model3) gives very small forecasting errors with Dataset1 and Dataset2 compared to the other methods.

4.4 Applications of the proposed method

The proposed method can be used in several applications of smart grids, such as:

  • Optimal planning of PV units in transmission/distribution systems, i.e., determining the optimal locations and sizes of PV plants with considering their intermittent nature.

  • Optimal control of existing PV plants with avoiding their operational problems, such as voltage rise and reverse power flow.

  • Optimal scheduling of other generators (e.g., fuel-based generators) with considering the predicted values of PV power to minimize operational costs of the grid.

  • Optimal charging/discharging of storage devices (e.g., batteries) for profit maximization.

4.5 Limitations of the proposed method

As shown in Sects. 4.2 and 4.3, the proposed method outperforms the compared methods. However, the current study has some limitations, such as:

  • The effect of outliers in PV power sequences has not been studied in this paper.

  • We did not incorporate environmental parameters, such as, wind speed, air temperature, and humidity, in the forecasting of PV power.

In the future work, we will deeply consider the aforementioned limitations. Furthermore, we will use the proposed method in the applications mentioned in Sect. 4.4.

5 Conclusion and future work

In this paper, we have proposed a new method for forecasting PV output power using deep LSTM networks. Unlike the traditional PV power forecasting methods, our method based on LSTM can capture abstract concepts in the PV power sequences. Therefore, LSTM networks can model the temporal changes in PV output power due to their recurrent architecture and memory units. We have evaluated the performance of five LSTM models with different architectures in the forecasting of PV power. The proposed model3 (LSTM with time steps) gives the best results compared to the other models; therefore, it is recommended to employ it for forecasting the PV power. We also compared the proposed method (model3) with three PV forecasting methods based on MLR, BRT, and NN methods. The proposed method gave a very small forecasting error compared to the other methods. The future work will focus on utilizing different RNN architectures and loss functions for further improvement in the accuracy of forecasting results. Furthermore, we will use the proposed method to control and plan the operation of multiple renewable energy sources (e.g., PV, wind, and biomass) in smart grids.