1 Introduction

With the global eco-friendly trend, renewable energy resources such as photovoltaic (PV) systems now become a large percentage of the energy mix. In the end, their uncertain and variable nature requires abundant energy reserves and resources for maintaining power system reliability, which in turn increases operation costs (Mohandes et al. 2019). A fine method that can proactively handle the uncertainty is to predict future PV outputs accurately. Based on the prediction, power system operators can prepare economic, dispatchable resources for power balances in the future. In this sense, many researchers have studied various forecast methodologies relying on big data to increase forecast accuracy.

For improving PV forecast accuracy and practical use, it is necessary to utilize satellite image prediction because the images have essential information about weather, especially cloud shapes, locations, and trajectories. Furthermore, the predicted satellite images can track the movement of clouds, which significantly affects the PV power forecasting. For example, a moving cloud can cover the PV panel directly, and PV power can be temporarily lowered. However, the computation complexity of handling satellite images is so high that image prediction is quite challenging. To be specific, high-resolution images require more memory resources for calculation and more parameters for extracting features. Because the computer resources are finite, excessive computations can overload and take long to obtain results. In this context, this study first employs a hybrid deep learning model combining LSTM and GAN to predict satellite images and then quantifies cloud covers shown in predicted satellite images. The quantified numeric values are finally applied to the PV generation forecast model based on multivariate-LSTM. Indeed, GAN is a generative model that can efficiently produce a fake image that looks real from a random latent vector, and LSTM is a learning model with feedback structures that can extract time-series features from sequence images to predict future latent vectors. In this way, the hybrid LSTM-GAN model predicts future satellite images efficiently. The major contributions of this paper are twofold:

  • By applying the hybrid model of the time-series model and generative model in the PV prediction domain, it was confirmed that LSTM-GAN, GRU-GAN, and BILSTM-GAN predict future satellite images with cloud cover information.

  • By comparing the combinations with and without the satellite image prediction, it was confirmed that the combination of satellite image prediction and weather information represented the highest performance.

The rest of this paper is organized as follows. Related to the PV forecasting model, diverse research studies are described in Sect.  2. In Sect.  3, several deep learning algorithms that the proposed method employs are explained, followed by the detailed description of the proposed hybrid forecast model in Sect. 4. Then, experimental results are presented and discussed in Sect.  5, and the paper is finally concluded in Sect. 6.

2 Related work

Since PV generation considerably depends on weather conditions, several PV forecast models have used the past weather data recorded or meteorological information from numerical weather prediction (Yang et al. 2014; Du 2019). Among the information, solar irradiance and cloud coverage have been commonly-used variables for PV forecast (Lorenz et al. 2009), and the authors of Kanwal et al. (2018) formulated a power conversion curve from solar irradiance. In Eom et al. (2020), Kim et al. (2019), correlations among variables were analyzed, and then only highly correlated variables were used for regression models. Note that the accuracy of these forecast models using only meteorological data is affected by weather information prediction. Nevertheless, most weather information sites do not accurately forecast solar irradiation or cloud coverage and amount as numerical values.

In contrast, statistical models can forecast PV generation from the past power generation trends without weather information. The well-known statistical forecast models include the persistence, autoregressive moving average (ARMA) (Atique et al. 2019; Colak et al. 2015), artificial neural network (ANN) (Methaprayoon et al. 2007), support vector machine (SVM) (Shi et al. 2012), and long short-term memory (LSTM) (Zhou et al. 2019) models. The persistence and ARMA models are traditional linear models, while the ANN and SVM models can learn nonlinear characteristics from data. The prediction performance of linear models may substantially degrade when data suddenly fluctuate, whereas nonlinear models are robust to the fluctuation. The LSTM model is a time-series model based on deep learning (Hochreiter and Schmidhuber 1997), which has been widely used in voice recognition, stock price prediction, and power generation forecasting. Statistical models have outstanding performance in the short-term forecast but yield continuous errors as prediction periods increase because they only follow power generation trends. Even worse, the models are highly vulnerable to external disturbances that cause power generation fluctuation. For example, if a cloud casts a shadow on PV panels and then moves away, unspecific fluctuation may occur in power generation. This fluctuation is not reflected in generation trends and often results in significant errors. In the end, hybrid models that reflect both external factors and trends have been studied (Mosaico and Saviozzi 2019). For example, an autoregressive integrated moving average exogenous (ARIMAX) model fundamentally uses the previous time-series data and simultaneously involves external inputs in consideration of variability.

Despite the methodological improvement of prediction models, PV forecast performance still needs to depend on the accurate prediction of irradiation and cloud amount. In Ryu et al. (2019), sky images, captured by a camera on a rooftop, and the past global horizontal irradiance (GHI) values were used to predict short-term irradiance through a convolutional neural network (CNN). The authors of Wood-Bradley et al. (2012) predicted cloud movement using a cloud and optical flow tracking algorithm based on an individual ground-based sky camera. However, these methods based on sky cameras on the ground afford only ultra-short-term predictions because sky conditions can only be captured in the surrounding region.

Unlike sky images captured on the ground, satellite images can cover geographically large areas, providing irradiance (Catalina et al. 2020). In Marchesoni-Acland et al. (2019), an ARMA-based GHI forecasting model using the regional short-term local variability index and satellite images as exogenous inputs was proposed. It should be noted that the use of satellite images improved prediction performance. However, these methods were based on only past satellite images and underestimated the fact that the cloud movement patterns keep changing over time. Ultimately, the PV forecast requires to predict the future movement of clouds.

The prediction of future satellite images has not been generally attempted as it requires significant computations. To overcome this, the authors of Hong et al. (2017) reduced image dimensions via a convolutional sequence-to-sequence auto-encoder model. In Cheng et al. (2022), the authors designed an auto-encoder framework to process multiple satellite images and suggested shifting receptive attention to dynamic regions of interest to avoid complex computations. The authors of Si et al. (2021) extracted cloud cover factors from satellite visible images using a modified CNN and combined meteorological data with cloud cover factors to develop a GHI forecasting model. Si et al. (2021) utilized the convolutional LSTM model to extract the feature of satellite images and applied it to PV power forecasting. Schulte et al. (2021) suggested the ANN-based forecast model for training cloud motion vectors and predicting future cloud positions. Furthermore, (Xu et al. 2019) proposed a satellite image forecast model that can efficiently learn the characteristics of satellite images using a generative adversarial network (GAN)-LSTM, which can generate more precise images than auto-encoder models. However, the authors of Xu et al. (2019) did not fully utilize the image prediction model for PV generation forecast. The authors of Moskolaï et al. (2021) investigated many proposed DL-based methods for satellite image prediction based on CNN, MLP, and LSTM. In addition, Moskolaï et al. (2021) reviewed hybrid models, such as GAN-LSTM, ConvLSTM-AE, and CNN-LSTM in various prediction domains. However, LSTM-GAN has not yet been proposed as a PV forecasting model and can be considered a state-of-the-art method. In other words, the accuracy of image prediction might not guarantee that of the PV generation forecast without further investigation.

3 Deep learning algorithm

This section briefly describes deep learning algorithms that the proposed forecast model uses: deep convolutional GAN (DCGAN), convolutional LSTM, and multivariate-LSTM.

3.1 DCGAN

The GAN is a deep learning model that can learn a specific data distribution and create new data from the same distribution. Figure  1 depicts the GAN training process, pointing out that the GAN consists of two neural networks: the generator and discriminator. The generator aims to create (or imitate) data with the same structure as training data from a random vector, while the role of the discriminator is to determine whether input data are real or generated (i.e., imitated) correctly (Radford et al. 2016). The DCGAN is an extended concept of GAN that uses convolutional and deconvolutional layers, thereby improving stability in GAN learning. Furthermore, this structure can reduce the image-processing data complexity by using vector arithmetic operations. In addition, the fact that unsupervised learning does not require labeling acts as an advantage in linking with other models. In DCGAN, changes in the input are also reflected in the corresponding output. For example, when an input vector changes slightly and linearly, image creation also changes in the corresponding pattern.

Fig. 1
figure 1

Scheme of the DCGAN training process

In Fig. 1, x is the image in training data, and \(z \in R^{|z|}\) is the latent vector. \(G(\cdot )\) is the generator function, which maps z to the two-dimensional space, and G(z) indicates the generated image. \(D(\cdot )\) is the discriminator function, which judges whether the argument image is real or generated (i.e., fake). For example, D(x) outputs the probability that x is from training data, while D(G(z)) represents the probability that the generated image is real. In general, the GAN objective function can be written as follows:

$$\begin{aligned} \min _{G} \max _{D} V(D,G) = - E_{x \sim p_{data}(x)} \left[ \log D_{conv}(x) \right] \nonumber \\ - E_{z \sim p_z(z)} \left[ \log \left\{ 1 - D_{conv} \left( G_{deconv} (z) \right) \right\} \right] , \end{aligned}$$
(1)

where \(E[\cdot ]\) refers to the expected value, and \(p_{data}(x)\) and \(p_z(z)\) are the probability density function of x and z, respectively. The subscripts conv and deconv indicate convolution and deconvolution, respectively, as described in Fig.  2.

Fig. 2
figure 2

Structures of the generator and discriminator networks. TransConv indicates transpose and convolution

In convolution, the pixel at the ith row and jth column in the filtered image, \(G_{ij}\), can be processed by (2).

$$\begin{aligned} G_{ij}&= (F * X)(i,j) \nonumber \\&= \sum _{x=0}^{F_H-1} \sum _{y=0}^{F_W-1} F(x,y) \cdot X(i-x,j-y) , \end{aligned}$$
(2)

where F and X indicate the filter and original image, respectively, and the subscript H and W are the height and width, respectively. Then, the output can be obtained as follows:

$$\begin{aligned}&(O_H, O_W) = \nonumber \\&\left( \frac{X_H + 2P - F_H}{S} + 1, \frac{X_W + 2P - F_W}{S} + 1 \right) , \end{aligned}$$
(3)

where the stride S determines the movement size of the filter, and the padding P adjusts the spatial size of the output.

3.2 Convolutional LSTM

Figure 3 shows the structure of a general LSTM cell, of which operations are expressed as follows:

$$\begin{aligned} f_t&= \sigma _g (W_f x_t + W_f h_{t-1} + b_f), \end{aligned}$$
(4)
$$\begin{aligned} i_t&= \sigma _g (W_i x_t + W_i h_{t-1} + b_i), \end{aligned}$$
(5)
$$\begin{aligned} o_t&= \sigma _g (W_o x_t + W_o h_{t-1} + b_o), \end{aligned}$$
(6)
$$\begin{aligned} c_t'&= \tanh (W_c x_t + W_c h_{t-1} + b_c), \end{aligned}$$
(7)
$$\begin{aligned} c_t&= f_t \circ c_{t-1} + i_t \circ c_t', \end{aligned}$$
(8)
$$\begin{aligned} h_t&= o_t \circ \tanh (c_t) , \end{aligned}$$
(9)

where \(x_t\) is the input vector; \(f_t\), \(i_t\), and \(o_t\) are the activation vectors of the forget, input, and output gates, respectively; \(h_t\) is the output vector; \(c_t'\) is the cell input activation vector; \(c_t\) is the cell state vector; the subscript t is the time step; W is the weight; b is the bias; the subscripts f, i, o, and c denote the forget, input, output, and cell, respectively; \(\sigma _g(\cdot )\) is the sigmoid function; and the operator \(\circ \) indicates the element-wise product.

Fig. 3
figure 3

Structure of an LSTM cell

In a general LSTM model, a slight increase of the input image size excessively increases the number of parameters, which might hinder proper learning because of computational complexity and limited resources. This problem can be solved by a convolutional LSTM model that learns a temporal correlation of successive images (Marchesoni-Acland et al. 2019). The convolutional LSTM is a model that combines CNN and LSTM. While CNN reduces data dimension and finds data features, LSTM learns time-series characteristics of data. As shown in Fig. 4, feature vectors with a preferred size can be extracted from images through convolution operations. Finally, from successive time-series satellite images, a single feature vector \(v_t\) can be obtained as follows:

$$\begin{aligned} v_t = f_{c-lstm} (\tilde{x}_{t-n+1}, \tilde{x}_{t-n+2}, \cdots , \tilde{x}_{t}) , \end{aligned}$$
(10)

where \(f_{c-lstm}(\cdot )\) is the convolutional LSTM model, \(\tilde{x}\) are the input images, and n is the number of input images.

Fig. 4
figure 4

Structure of the convolutional LSTM

3.3 Multivariate-LSTM

Since a univariate-LSTM-based forecast model that uses only historical data is vulnerable to external disturbances, this work employs a multivariate-LSTM that reflects trends of external variables for PV generation forecast. Before training, min-max normalization of multiple inputs is necessary for preventing biased learning due to relative volume differences among inputs. Figure  5 describes the structure of the multivariate-LSTM used in this work, which finally outputs a single forecasted value as follows:

$$\begin{aligned} \hat{y}_{t+1} = f_{m-lstm} (u_{1,t}, u_{2,t}, \cdots , u_{k,t}) , \end{aligned}$$
(11)

where \(\hat{y}_{t+1}\) is the output, \(f_{m-lstm}(\cdot )\) is the multivariate-LSTM model, u are the inputs, and k is the number of inputs.

Fig. 5
figure 5

Structure of the multivariate-LSTM

4 Proposed hybrid forecast model

From time-series satellite images, one can extract cloud amount and movement, which are useful information for PV forecast. To that end, the paper proposes a hybrid forecast model that can imitate cloud shapes and patterns and predict cloud movement. The model consists of three parts as follows:

  • a DCGAN model that creates cloud images

  • an LSTM-GAN model that forecasts cloud images

  • a multivariate-LSTM model that forecasts PV generations

Figure 6 illustrates the overview and process of the proposed hybrid forecast model. As presented in the figure, the first step is to generate cloud images using the DCGAN, which is specialized for image learning through convolution layers. The LSTM–GAN model in Step 2 predicts future satellite images by combining time-series and image generation components. Finally, the multivariate-LSTM model in Step 3 converts the predicted cloud images to numerical values of cloud cover, which are then used to forecast PV generation along with weather information and historical PV generation data. The following subsections describe the details of each step.

Fig. 6
figure 6

Structure of the proposed hybrid model

4.1 Cloud image creation model based on DCGAN

Since image patterns are difficult to identify owing to large data size and complexity, their learning for precise image imitation has been challenging as existing image-related techniques created broken or blurred images. In contrast, DCGAN can express a large-sized image with a small-sized latent vector like a single point inside a specific space, thereby significantly reducing computational and memory resources used for learning. In this study, the dimension of original satellite images was reduced through bicubic interpolation for reducing computational complexity and experimental efficiency. Then, satellite image archives store the resized images, providing minibatch images in sequence to the discriminator as real data inputs, as described in Fig. 6. After many attempts to generate and discriminate images, the performance of the generator and discriminator improves, and DCGAN can eventually create cloud images that look like those in real satellite images. The detailed DCGAN training procedure is described in Algorithm 1.

figure a

4.2 Cloud image forecast model based on LSTM–GAN

In the previous step, a trained generator that imitates real satellite images from random latent vectors can be obtained, and then this step uses the generator for future cloud image prediction. To that end, one needs to identify how a latent vector is correlated with cloud image patterns and to predict future latent vectors from the previous time-series ones. However, if the latent vector dimension is large, it is impossible for humans to characterize the correlation between the vector and image patterns. Hence, as described in Fig. 6, this study combines the previously-trained DCGAN generator model with the convolutional LSTM model, of which role is to predict future latent vectors from time-series image inputs by inherently learning to produce a latent vector corresponding to specific features of cloud images. Then, the DCGAN generator creates a future satellite image from the predicted latent vector. In the end, the combined LSTM–GAN model works as a black-box model that can extract a future image from time-series input images.

Algorithm 2 presents the detailed training process of the LSTM–GAN model. During the process, the loss function to evaluate the similarity between a predicted image and a real one is the mean squared error (MSE) between them, which is used to update LSTM–GAN model parameters. After sufficient iterations, the LSTM–GAN model can identify patterns of cloud shape and movement and predict future cloud images, which is then used for an accurate PV forecast in the next step.

figure b

4.3 PV forecast model based on multivariate-LSTM

In the end, predicted cloud images should be converted to numerical values, which are then used as inputs to the PV generation forecast model. This numerical transition first requires finding a PV generation location–specifically a pixel–in an image and then obtaining the corresponding pixel value. Note that pixels are represented by normalized gray-scale values, ranging from zero to one. Considering that cloud (land) colors are white (black) in infrared satellite images, a higher pixel value indicates more clouds. If the image resolution is so high that a PV location may span over several pixels, a single representative value should be computed through proper methods such as simple averaging, weight assignment, and filter convolution. In this study, an image pixel contains the PV spot; and, therefore, a normalized gray-scale value corresponding to the pixel is selected.

As shown in Fig. 6, PV generation prediction is implemented by the multivariate-LSTM model, which is advantageous in time-series data analysis with long-period dependence and provides better accuracy with more data. The model can learn correlations between meteorological data and steadily respond to climate changes by solving a vanishing gradient problem in a recurrent neural network. More importantly, various external variables, including weather data, historical PV generation data, and numerical cloud data, are employed to improve prediction accuracy and robustness to external disturbances. Algorithm 3 describes the multivariate-LSTM training process of the proposed PV generation forecast method.

figure c

5 Simulation result

To test the performance of the PV generation forecast proposed by this work, Haenam, the southernmost area of the Korean Peninsula, was selected. The satellite cloud images of the East Asia region, meteorological data in Haenam, and power generation data in Haenam PV stations were used for the test. The total capacity of the stations was 7.3 MW, and the forecast horizons of image prediction and PV forecast were three hours. This experiment was conducted using MATLAB R2020b, and the computer specifications were a Windows 10-based workstation with a 3.9 GHz Intel Xeon CPU, 64 GB memory, and two Nvidia Quadro RTX 5000 graphic cards. The hyperparameters of DCGAN, LTSM–GAN, and Multivariate-LSTM are presented in Appendix.

5.1 Dataset description

Dataset can be classified into two sets: infrared satellite images for cloud image prediction and the historical data of weather information and PV generation for PV forecast.

5.1.1 Infrared satellite images

Infrared images are suitable for the proposed image prediction method since cloud shapes are visible even at night in those images. A total of 30,507 infrared images shot by Communication, Ocean, and Meteorological Satellite 1 of the National Meteorological Satellite Center of Korea every 15 min during 2018 were collected. Among them, 80% and 20% were used for training and test, respectively. The original image size was 1300-by-1500, but inputting this size of the image can cause system overload and errors due to the shortage of RAM capacity. Although the minibatch size was decreased to four to make the system work without any problems, the result of the model showed low quality and diversity of the images. Therefore, the size of the image was reduced to 128-by-128 for training efficiency.

5.1.2 Historical meteorological and pV generation data

The Korea Meteorological Administration (KMA) provides various meteorological observation data, of which temperature and humidity data were used for the proposed PV generation forecast model because of their high correlation with solar irradiance. Hourly data collected from October 21, 2018, to December 31, 2018, were used, divided into 60% training data and 40% testing data.

5.2 Prediction scheme

The prediction scheme is designed by multiple multi-step forecasting, consisting of separate forecast models for different forecast horizons. This scheme is advantageous for maintaining a constant prediction accuracy because the models are independent, and errors are not accumulated. For example, Fig.  7 illustrates three forecast models for consecutive future values, which are expressed as follows:

$$\begin{aligned} \hat{f}_{t+1}&= \arg \max _{f_{t+1}} p(f_{t+1} \vert \bar{d}_{t-n+1}, \bar{d}_{t-n+2}, \cdots , \bar{d}_t), \end{aligned}$$
(12)
$$\begin{aligned} \hat{f}_{t+2}&= \arg \max _{f_{t+2}} p(f_{t+2} \vert \bar{d}_{t-n+1}, \bar{d}_{t-n+2}, \cdots , \bar{d}_t), \end{aligned}$$
(13)
$$\begin{aligned} \hat{f}_{t+3}&= \arg \max _{f_{t+3}} p(f_{t+3} \vert \bar{d}_{t-n+1}, \bar{d}_{t-n+2}, \cdots , \bar{d}_t), \end{aligned}$$
(14)

where \(\hat{f}\) is the predicted image, f is the future image, \(\bar{d}\) is the set of input images, and n is the input window size.

Fig. 7
figure 7

Scheme of the multiple multi-step forecast method

5.3 Performance evaluation of satellite image generation

The DCGAN performance can be evaluated by comparing the generated and real images. Indeed, DCGAN compares an image created by the generator with a randomly selected real image, then updating internal model parameters. Figure 8 represents real images chosen randomly from test data and images created by a DCGAN model at different epochs, showing that how the DCGAN model learns to create cloud images looking real as training iterates. After the first epoch, cloud images created obviously looked noises, and then the cloud shapes eventually became clearer as epochs increase. Around 30 epochs, DCGAN simulates the cloud form well. At 50 epochs, clouds that are visually more like real ones were created. Therefore, epoch 50 was chosen for the generating model. Furthermore, in Fig. 9, these results were obtained under the same conditions except for minibatch size. At minibatch size 16, most images looked similar and seemed to be cracked. Various sizes of the minibatch were tested to increase the performance of the generating model, and minibatch size 128 showed the best performance. If the minibatch sizes were over 128, the system sometimes suffered memory errors and a long computing time.

Fig. 8
figure 8

Comparison among results at different epochs: 1, 3, 5, 15, 30, and 50

Fig. 9
figure 9

Comparison between results with different minibatch sizes

Precise evaluation of GAN performance is somewhat difficult since there is no general and strict criterion for the evaluation. In this context, a human subjectively assesses whether created images are acceptable or not or verifies whether image patterns change smoothly and linearly when a latent vector is linearly changed. For example, the image in the red circled area varied smoothly according to the linear change of the latent vector shown in Fig. 10. Evaluating the performance of the latent vectors generated by LSTM is difficult because LSTM-GAN is a black-box model that produces information without revealing any information about its internal workings. However, it is possible to indirectly evaluate the performance of LSTM by reviewing the distribution of the latent vectors. Figure  11 presents the distribution of 64-dimensional latent vectors generated by LSTM, and LSTM performs sufficiently well in creating latent vectors with various values of each dimension. Therefore, a 128-minibatch DCGAN model at 50 epochs is used for the LSTM–GAN model in the end.

Fig. 10
figure 10

Smooth variation of the image according to the gradual change of the 32nd space of the latent vector Z

Fig. 11
figure 11

Latent vectors with various values output from the LSTM

5.4 Performance evaluation of satellite image prediction

Figure 12 presents three examples of comparison between the three-hour-ahead image predicted with real one at the same time. The figure represents that the predicted and real images are not exactly identical to each other. Nevertheless, cloud shapes, movement patterns, appearance, and disappearance are similar to each other, and thus this image prediction model can be reasonably used for PV generation forecast using the multivariate-LSTM model.

Fig. 12
figure 12

Comparison between three-hour-ahead predicted images and real ones

5.5 Performance evaluation of PV generation forecast

This section evaluates the performance of the 3-h-ahead PV generation forecast model for a week and a month. As a performance metric to measure the forecast accuracy, the root MSE (RMSE) was used as follows:

$$\begin{aligned} RMSE = \sqrt{\frac{1}{n_s} \sum _t^{n_s} \left( y_{t,predict} - y_{t,target} \right) ^2} , \end{aligned}$$
(15)

where \(y_{t,predict}\) and \(y_{t,target}\) denote the forecasted and actual PV generations, respectively, and \(n_s\) is the total number of steps tested.

Table 1 presents RMSEs in four combinations of input variables for one month. Note that weather data indicate temperature and humidity. To compare the accuracy and performance of solar power generation prediction, CNN-ANN, CNN-LSTM, LSTM-GAN, GRU-GAN, and BILSTM-GAN were tested. Figures  13 and 14 represent the three-day simulation results of the five models. Figure  13 compared measurements and forecasted values of PV power, and Fig.  14 showed absolute errors between the measurements and forecasted values. These methods extract features and predict solar radiation using satellite images and weather data. In this work, LSTM–GAN is proposed as a new prediction methodology, and GRU–GAN and BILSTM–GAN are added for the diversity of the comparative group. Note that the GRU–GAN or BILSTM–GAN has yet to be applied to clouds or solar radiation forecasting. Although the performance of each GAN-based model has little difference, the performance of LSTM-GAN is the best. Apparently, LSTM–GAN accuracy is much superior to that of the CNN-based models, CNN–ANN and CNN–LSTM.

Table 1 Error Comparison in Various Combinations of Input Variables (one month)
Fig. 13
figure 13

3h-ahead forecast comparison (three days). Input data are hour, temperature, humidity, satellite image, and power

Fig. 14
figure 14

Absolute errors of 3h-ahead forecast (three days)

For all the five models, the forecast accuracy is highest when historical power data, hour information, and satellite images, which are converted to numerical values, are used as inputs. Moreover, the input combinations with satellite images yield more accurate results than those without the images. Figures  15, 16, 17 and 18 represent the weekly simulation results with four different combination inputs, showing forecast and observed values and their absolute errors.

Fig. 15
figure 15

Forecast results during a week when the input combinations are hour information and historical power data

Fig. 16
figure 16

Forecast results during a week when the input combinations are hour information, weather data, and historical power data

Fig. 17
figure 17

Forecast results during a week when the input combinations are hour information, satellite image information, and historical power data

Fig. 18
figure 18

Forecast results during a week when the input combinations are hour information, weather data, satellite image information, and historical power data

6 Conclusion

This study designs an innovative hybrid model that provides a new type of data for PV generation prediction. The model combines two deep learning models, DCGAN and convolutional LSTM, to predict future satellite images, which are then used for PV generation forecast based on multivariate-LSTM after numerical conversion. Ultimately, the proposed forecast method presents better accuracy than existing forecast models without using satellite images.The model can be used in various ways in real life. First, the primary purpose of PV forecasting is to provide precise information to power system operators and utilities. A warning system with the industrial internet of things (IIoT) can be used to check the abnormal status of the PV plant based on the difference between the forecasted PV generation and the actual PV output. Second, the installation of IIoT devices in a plant can increase the accuracy of the forecasting model because it can provide data that is highly correlated with the plant. In addition, using federating learning to update the parameters of the forecasting model may help decrease the system overload and improve the performance of the forecasting model, although the total amount of data increases. For example, divided learning can reduce the overload of a single computing device and train more diverse cloud images via multiple devices. Cloud images provided by cell phones or individual measurements on the ground will also be available as new input data. This federated learning approach for handling a large amount of data will be studied further. Furthermore, the current version of the method has yet to predict exact cloud patterns or locations with high resolution. However, this limitation can be overcome in further research by a longer period of data, higher image resolution, and ensemble use of other types of satellite images such as visible light images.