Keywords

1 Introduction

Energy is a crucial factor for a countrys economy and the ability of all living beings to survive. It is a primary source of sustainable economic growth and essential for the development of a country. Recently, energy demand has been increasing with industrialization and technological developments, rapid population growth, and competition among countries to reduce their external dependence. In order to meet energy demand, fossil fuels have been widely used all over the world for decades. However, they have lost their popularity due to global warming and a critical depletion of fossil fuels. Renewable energy presents effective solutions to the above issues of energy demand. These sources can be presented as an important alternative energy because they are natural such as sun, wind, and wave energy that time will not change in the foreseeable future. Hence, the use of alternative energy sources, as well as existing energy sources, is of great importance for countries and people in order to obtain energy with maximum efficiency. Solar energy has the greatest importance among renewable energy resources in terms of its available energy potential in most parts of the world. It is one of the most effective and easy ways to produce clean, sustainable, cheap and safe energy. PV panels are commonly known as a device or method for generating electric power from solar energy by using solar cells that catch sunlight and turn it directly into electricity. PV panels provide clean, cheap energy and there are no harmful greenhouse gas emissions during the generation of the electricity. The increasing popularity of this source leads to new studies in the solar energy area. It is used together in grid-connected systems or stand-alone in the meeting energy demands of the building, industries, and agriculture. One of the main areas of study is how to provide increased reliability and regular output power in PV panels, efficiently. This is because solar PV panels require direct access to solar energy but clouds and some environmental factors, such as dust, humidity, snow, and rain, prevent the constant production of power and maximizing the output power of PV panels. A PV panel output power strongly depends on solar radiation and but this solar radiation is not constant over time. Solar cells are used to produce electrical energy, but while they absorb 80% of solar radiation, they convert only a small portion to electricity. The remaining portion causes overheating of the photovoltaic cells. This causes decreasing the electrical efficiency of the panel. Each one degree increase in temperature results in a reduction of 0.4–0.5% in electrical efficiency [1]. Therefore, there is a necessity for exact modeling, estimation of solar radiation and temperature to increase control and decrease the negative impacts of PV power plants. In order to develop more reliable algorithms, modeling, optimization and the forecasting of hourly solar radiance and ambient temperature are required. Power forecasting from PV power plant generated, based on its forecasting time scale, is classified into four groups in Fig. 1.

Fig. 1
figure 1

Classification of solar PV generated power forecasting types based on forecasting time scale [2]

PV power forecasting is used to predict and balance energy generation and consumption for grid operators, thus it is notably important with respect to reduced penalties, grid stability, the reliability, the security of supply and lower maintenance. Forecasting time interval is classified into four groups.

Long-term forecasting ranges between one month and one year. In this type, it is generally used to operate operation security and energy planning.

Medium-term forecasting ranges from one week to one month. It regulates monthly maintenance of PV power by forecasting the availability of the generated electrical power.

Short-term forecasting timescales refer to one hour, one day or seven days. It is notably helpful for accurate planning of electrical transmission, distribution, and generation. Moreover, it has a critical issue in enhancing system stability, reliability and operation.

Very short-term forecasting timescales refer to a few seconds, one minute or several minutes (less than1hour). It uses small-scale data to obtain optimum forecasting, but it is not preferred because it is not necessary and is challenging for the estimation process in PV power plant. Accurate power forecasting from PV power plant prevents the effect of PV panel output power uncertainty and improves the system reliability. It also becomes more important and interesting because of the increasing penetration of PV power in many areas. Instantaneous fluctuating in temperature and solar radiance take place, mainly arising from the existence of clouds. Consequently, short-term, real solar parameters, such as solar radiation and temperature make it possible to obtain more accurate results so that stability, reliability and security of the PV system are increased by using real short-term data. In this study, the real-data of 1 MW PV power plant in Turkey is used for forecasting and the short-term generated power for PV power plant is estimated by Artificial Neural Network (ANN). The primary contribution of this paper is to provide useful information for academics and professionals who are interested in modeling and planning of PV power plants. Following a literature survey, Table 1 presents the forecasting and optimization methods used for solar parameters such as PV generated power, solar radiation, and temperature that benefit directly from solar energy. Different methods are used to forecast solar energy parameters over different time-scales. These methods are classified in Table 1 and discussed in detail. Finally, ANN, ANN-PSO and ANN-FA are selected for performance analysis does these methods are discussed in detail. Also, in particular, this method will present simulation results.

Fig. 2
figure 2

Classification of solar forecasting methods and models based on historical data [54]

Table 1 Various solar forecasting methods

2 Solar Forecasting Methods

Researchers have work on solar power forecasting with different methods in order to obtain more accuracy. These methods shown in Fig. 2 will be presented in detail. Forecasting methods used to forecast PV parameters or meteorological data presented in Fig. 2 can be categorized into three groups based on historical data and meteorological variables.

2.1 Physical Methods

Numerical Weather Prediction (NWP) is a method of weather forecasting. It is the process of determining a future state by making mathematical solutions of the equations that express changes of the variables indicating the state of the atmosphere (temperature, wind, humidity, and pressure) [53].

Sky Imaginary Forecasting Method (SIFM) is used for the detection of clouds, estimating the behavior and attitude cloud. This method generates superior resolution image of the sky from horizon to horizon. The sky imaginary method is generally used as a short-term power estimation for the power output of the PV plant. Satellite imaging is a method relatively similar to the sky imaginary method [53].

Satellite Image Methods (SIM) allow cloud motion to be traced in order to forecast future cloud movement. This forecasting method presents an effective way to forecast very short-term radiance. However, it delivers less effective performance when clouds are rapidly forming or dissipating [53].

2.2 Statistical Methods

Statistical methods are time series analyses that deal with time series and historical data. Time series data is in a series of particular time periods or intervals. Statistical methods are a useful data-driven approach that is able to forecast the future behavior of a power plant. This section will give brief information on widely used methods [7].

Support Vector Machine (SVM) method is used as a classification, regression and anomaly detection. It is found in a statistical and mathematical theory to achieve accurate forecasting. It also deals with nonlinear problems and solves complex computational problems. This method is generally used in forecasting, regression analysis, and classification.

Wavelet Analysis (WA) method is a useful way to satisfy noise in real-time input datasets before the forecasting method is applied. Thus, it provides improvements to the reliability of estimation. It is relatively effective for the analyzing of frequency and time-dependent data because of its capability of eliminating non-periodic and transient signals.

Fuzzy logic is a data-driven algorithm method based on a human-like way of thinking. This method can be relatively useful compared to other methods if there are a large number of input variables. It is used for forecasting solar radiation and temperature, or as an optimal clustering process. It categorizes several sets of temperature, cloud conditions and meteorological data.

ANN method is one of the major tools used in machine learning. It is a brain-inspired system based on a learning/training method. An ANN consists of the input layer, hidden layer(s), output layers, neurons and connections. Every layer includes neurons as part of the network structure and each neuron links to another neuron located in the next layer. ANN is widely used in solving various classifications and forecasting problems because of non-linearity in meteorological data. It is particularly suitable when compared to other statistical methods when the data is non-linear.

ARIMA can be thought of as an improved model of traditional linear regression. It is a popular forecasting model that utilizes historical information to make predictions. This type of model is a basic forecasting technique that can be used as a foundation for more complex models.

Kalman Filtering is actually an estimator. It is a method of estimating the state of many different field systems. It mathematically estimates the states of linear systems (linear equation first order equations in computational equations). This method uses real-time statistical data and provides real-time forecasting of power generation.

Support Vector Regression (SVR) is quite different than SVM. Because the SVR method uses real-time outputs, it becomes very difficult to predict the information. The main idea is to minimize error and provide accurate forecasting.

Grey forecasting method is the others forecasting method to predict for the behavior of non-linear time series. This method is especially effective as the data is insufficient. Its prediction results may be inaccurate sometimes, so other statistical methods can be more useful than this method.

Hidden-Markov models (HMMs) are a common tool for modeling time series data. This model seeks to recover the sequence of states from the observed data. It has been used for forecasting applications in recent years because of its flexibility and computational efficiency.

2.3 Hybrid Methods

These methods are the combination of two or more forecasting methods such as Fuzzy-ANN, ANFIS, ARIMA and SVR, ARIMA and GARCH, NARX, and SVM and GA. The main aim is to improve forecasting accuracy and reduce the forecasting error. These hybrid methods can provide better forecasting performance compared with each forecasting method [55].

3 Methodology for Solar Power Forecasting

In this study, a Multi-Layer Feed Forward (MLFF) neural network structure, which is a type of Artificial Neural Network (ANN), was used. The MLFF network structure consists of the input layer, the hidden layer, and the output layer. Numbers of the neuron for the input and output layer depend on problem structure. The neuron number at the hidden layer is defined with a trial and error method. Network training is the process of identifying weight values for the nerve element connections in ANN. Initially, these values are determined randomly. Then, network parameters are updated in order to get the best yield from the network. In this study, the Firefly Algorithm (FA) and Particle Swarm Optimization (PSO) are applied to train the network coefficients.

3.1 Particle Swarm Optimization

Particle Swarm Optimization (PSO) algorithm is a swarm-based heuristic algorithm widely used all over the world. First studies on the PSO algorithm were done by Kennedy and Eberhart in 1995. This method is a simulation of the food search behaviors of flocks of birds and shoals of fish [56]. Each possible solution in the PSO algorithm is called a particle. The algorithm begins to work with randomly distributed particles in the solution space. In this algorithm, particles have speed and location. The distance of each particle to the solution (food) is expressed by a function determined by the current position and speed of the particle. After each iteration, the particles update their speeds for themselves and the swarm, and they move accordingly. When the best solution for ith particle up to that point is expressed by\(p_{ibest} \), and the best solution of all particles up to that point is expressed by\(g{}_{best} \), the expression used to update the speed values (\(v_{i} \)) of the particles at each iteration is:

$$\begin{aligned} v_{i} (t+1)=v_{i} (t)+c_{1} \times r_{1} \times \left( p_{ibest} -x_{i} (t)\right) +c_{2} \times r_{2} \times \left( g_{best} -x_{i} (t)\right) \end{aligned}$$
(1)

where t is the iteration number, i is the index of the corresponding particle, \(x_{i} \) is the location of the particle \(r_{1} \) and \(r_{2} \) values are random numbers generated in the interval [0, 1], \(c_{1}\) and \(c_{2}\) are acceleration coefficients, which in general are chosen in the interval [0, 2].

Where the \(c_{1}\) coefficient takes the particles to the local best, the \(c_{2} \) coefficient takes the particles to the global best. The term w is the inertia weight and provides the balance between the local and global best. The term w is calculated as follows:

$$\begin{aligned} w_{k} =\left( w_{\min } -w_{\max } \right) \frac{\left( K-k\right) }{K} +w_{\max } \end{aligned}$$
(2)

where \(w_{\max } \) and \(w_{\min } \) are the maximum and minimum values initially determined for the inertia value, k is the number of the iteration, and K is the maximum number of iterations. Each iteration step, the position update of the particles is done:

$$\begin{aligned} x_{i} (t+1)=x_{i} (t)+v_{i} (t+1) \end{aligned}$$
(3)

The flowchart for the PSO algorithm is presented in Fig. 3. In addition, pseudo-code for PSO algorithm with ANN expressed as 5 steps are seen in Fig. 4.

Fig. 3
figure 3

Flowchart of the conventional PSO algorithm [57]

Fig. 4
figure 4

Pseudo-code for the particle swarm optimization algorithm with ANN

3.2 Firefly Algorithm

Firefly Algorithm (FA) is based on the swarm method that is inspired by the natural life behavior of fireflies and their communication through brightness. Fireflies use their brightness to protect themselves from predators and to attract their prey. FA algorithm is selected to optimize ANN parameters for this study because of few parameters, it is easily adaptable to the problem being worked on and it does not have a complex structure. In the FA method, there are two important criteria: the change of light intensity and the attractiveness of the firefly (\(\beta \)). The attractiveness value of a firefly (\(\beta \)) is changed according to the distance to other fireflies. Let us assume that \(x_{i} \) and \(x_{j} \) values are the positions of the ith and jth fireflies. The distance between these two fireflies (\(r_{ij} \)) is calculated as follows:

$$\begin{aligned} r_{ij} =\left\| x_{i} -x_{j} \right\| \end{aligned}$$
(4)

The attractiveness of the firefly is proportional to the light it emits and it decreases accordingly as the intensity of light decreases. The attractiveness of the firefly (\(\beta \)) is calculated as follows:

$$\begin{aligned} \beta (r)=\beta _{0} e^{-\gamma r^{2} } \end{aligned}$$
(5)

where the value \(\beta _{0} \) is the maximum attractiveness parameter at \(r=0\) and \(\gamma \) is the light emitting coefficient. The ith firefly will move to the jth firefly, which is more attractive than itself. This position change is as follows:

(6)

where the rand value is a random real number in the interval \([0,\mathrm{\; }1]\) and \(\alpha \) is a random selection parameter, respectively. This equation consists of three components: the first component represents the current position of the firefly, the second component shows the brightness of Firefly, and the third component represents the random movement of the Firefly when there is no brighter firefly around itself. The convergence speed of the algorithm and its ability to find the local/global best solution depends on the \(\alpha \), \(\beta \) and \(\gamma \) parameters used in the working steps of the algorithm. When the study investigated this, \(\beta _{0} =1\) and \(\alpha \in [0,\mathrm{\; }1]\) values were used. The \(\gamma \) parameter, which has a big impact on the working speed of the algorithm, has a value \([0.1,\mathrm{\; }10]\) for practical applications [58]. The flowchart is presented in Fig. 5. In addition, pseudo-code for FA algorithm with ANN expressed as 5 steps are shown in Fig. 6.

Fig. 5
figure 5

Flowchart of FA algorithm

Fig. 6
figure 6

Pseudo-code for the Firefly algorithm with ANN

4 Data Representation and Pre-processing

The behavior of solar radiation can be easily interpreted for daily and seasonal, where in the range of [0, 1000] indicates stronger radiation. In winter, dawn to dusk period is shorter than that of summer, while in summer, radiation at noon is the strongest of the whole year for Turkey. Such a 1D representation shows a significant insight into the solar radiation pattern as a function of time, which is represented in Fig. 7.

Fig. 7
figure 7

Series of PV panels in a solar power plant

4.1 Correlation Analysis

The embedded dimension of the input for the prediction model, i.e., the number of previous data samples used as the input, is determined by the auto-correlation coefficients of the samples:

$$\begin{aligned} r_{k} =\frac{1}{(N-k)s^{2} } \sum _{i=k}^{N}(x_{i} -\mu )(x_{i-k} - \mu ) \end{aligned}$$
(7)

where \(\mu \) and s are the mean and variance of the samples, respectively, \(r_{k} \) is the sample autocorrelation coefficient, k is a delay, x is data set and N is the number of samples of the series. Figure 8 shows a 1D view of the autocorrelation coefficients of the solar radiation in 2016 and 2017.

An important observation in Figs. 8 and 9 is that there are strong correlations between the solar radiation, not only in consecutive hours but also during some hours of consecutive days.

The correlation between two consecutive days for the same hour is stronger than that between the current hour and two hours ahead in the same day. Therefore, when constructing a prediction model, the data from the previous day at the time of prediction must be used with a higher priority than the data from the previous two hours. In this study, the former two days’ solar radiation data at the time of prediction, and the data at the current time, are used as the input to the prediction model.

Fig. 8
figure 8

Correlation between ambient temperature and time (Daily hours)

Fig. 9
figure 9

Correlation between PV panel temperature and time

5 Simulation Results

A PV plant consists of one on-grid inverter and 38 series PV panels that have the same characteristics in order to meet the energy demands of an industry. Table 2 shows the parameters of PV panels and inverter. Selected PV panels are shown in Fig. 10. In this study, real-time data were obtained from one of the inverters set in PV power plant, thus, simulation studies are done effectively by processing data with three different methods: ANN, ANN-FA, and ANN-PSO. The mean absolute error (MAE), mean absolute percentage error (MAPE), the coefficient of determination (\(R^{2} \)), and correlation coefficient (\(\rho \)) are used to evaluate the performance of the Solar Power Prediction (SPP) models. MAPE and regression criteria are analyzed in particular using ANN, ANN-PSO and ANN-FA methods in the simulation results section below.

Table 2 Inverter and PV panel datasheet
Fig. 10
figure 10

Series of PV panels in a solar power plant

In the study, two different swarm-based methods were used for network training. Swarm-based methods first work with a randomly distributed population in the search space. The success of that particle is then calculated using an objective function whose fitness value is determined for each particle in the search space. In the next step, the information of the particles is updated according to the structure of the algorithm (according to the relevant equations in the algorithm) and a new generation is created. These steps are continued until the termination criterion is reached. Once the training is complete, ANN is generated using the optimum values from the best particle and passed to the test phase.The data that were used for training and testing stages of the artificial neural network is recorded at 2015 and 2016 in Turkey. By using ANN the data for 2017 is estimated, and the results were compared with the real data. The output layer of the neural network structure is an instantaneous PV plant power. The input layer of the network is fed with as follows:

  • Ambient temperature [\(\mathrm {{}^\circ }\)C]

  • Solar radiation [W/m\(^2\)]

  • PV Panel temperature [\(\mathrm {{}^\circ }\)C]

For each method used, 8 neurons are used for hidden layer in the network structure. In addition, 1 neuron is used in the input layer and 3 neurons are used in the output layers. The training coefficients during network training are as follows:

  • Weights for the 24 interconnections between the input and the hidden layer

  • The bias value for the 8 neurons in hidden layers

  • Weights for the 8 interconnections between the hidden and the output layer

  • The bias value for the single output layer

During network training, 41 parameters are trained. The algorithms are run for 300 iterations, and during the training, 20 individuals are used for each optimization method. A total of 100479 data sets are sent to the network. After the training phase is finished, the test phase is complete.

Each method was used in the neural network test phase, which is created with the best network parameters obtained from 25 iterations. Estimation results obtained at the end of the optimization studies are analyzed according to MAPE criteria. The results obtained from the three different methods are given in Table 3 according to the MAPE criterion.

Table 3 Results of the test phase

The best result is obtained in ANN-FA, as shown in Table 3. Real-time results and test results obtained with ANN-FA are shown graphically in Figs. 11, 12, 13 and 14 over a 3-month period:

  • January–February–March results are shown in Fig. 11

  • April–May–June results are shown in Fig. 12

  • July–August–September results are shown in Fig. 13

  • October–November–December results are shown in Fig. 14.

Fig. 11
figure 11

Solar-generated PV power forecasting result (January–February–March)

Fig. 12
figure 12

Solar-generated PV power forecasting result (April–May–June)

Fig. 13
figure 13

Solar-generated PV power forecasting result (July–August–September)

Fig. 14
figure 14

Solar-generated PV power forecasting result (October–November–December)

Fig. 15
figure 15

The prediction error values for the proposed methods

Fig. 16
figure 16

Regression graph a ANN-FA, b ANN-PSO, c ANN

Figure 15 shows the error graphs obtained from the results of the three methods. As clearly shown in this figure, the least error is obtained with the ANN-FA method.

The regression graphs obtained at the end of the test phase are shown in Fig. 16. It can be seen that the best data distribution is shown in Fig. 16 shows that the least error obtained at the end of the test phase was achieved with ANN-FA. When examined against the numerical results given in Table 3, it is seen that the best method for PV power forecasting is obtained by using ANN-FA method.

6 Conclusion

The use of solar energy has increased significantly in recent years in meeting energy demand. Many studies have been carried out in order to make maximum use of these resources. Solar energy is one of the renewable energy sources in meeting the energy supply. Although solar energy is abundant in nature, this energy is not benefited at certain times of the day. There are many factors such as cloudiness, dust, solar radiation and average sunrise time changes during the day for solar power prediction. Therefore, it has no stable behavior structure. Power estimation is important for energy planning so that this study is carried out. The data of 1 MW PV power plant in Turkey is used to estimate output power by real-time data mining for short time prediction. The data of solar radiation, ambient temperature, and panel temperature were used for input parameters. Firstly, estimation was performed by traditional ANN. However, the expected performance has not been reached. It was seen that three input parameters obtained are insufficient to estimate in traditional ANN. It was decided that ANN should be used with optimization method to increase performance. Then, input data was optimized using different optimization methods. Optimized input data was processed in conventional ANN and the short-term prediction was performed. ANN, ANN-PSO and ANN-FA results used for estimation were compared in detail. In the performance analysis, it was seen that the artificial neural network algorithm was not able to predict short time output power by using a few input data alone. In the literature, it has been tried to obtain the optimum prediction by Particle Swarm Optimization (PSO) method which is one of the most used optimization management. The desired performance could not be reached due to continuous insertion of the local minimum values in the PSO method. Firefly Algorithm (FA), a new optimization method, has been shown to be the most efficient algorithm for short-term solar estimates by removing the problem of sticking to local minimum values.