Keywords

1 Introduction

Use of fossil fuel in conventional power generation created serious ecological problems such as the production of huge carbon dioxide. Photovoltaic energy is a clean renewable energy which can replace the fossil fuel in power generation. Nevertheless, the integration of renewable energy resource is most challenging such that its reliability, to be a major part of national power generation, will be marginal. But irrespective of production cost, it has been increasing speedily in past two decades. The PV power generation is stochastic because of unexpected weather conditions, which affect the steadiness of the source. Power system supervision, thus short-length PV power prediction is needed for assurance of grid quality and energy management [1,2,3,4,5]. Generally, there are short-term (less than 48 h) and long-term forecasting in solar power prediction depending on the forecasting horizon. There are many types of forecasting techniques out of which, numerical approaches are most appropriate methods for short-term forecasting. This approach is statistical connection of linked past data or numerical weather prediction (NWP) data to the short future and is widely used in PV power forecasting in short duration. This type of forecasting includes time series-based artificial neural network (ANN) and many other such techniques [6,7,8]. The autoregressive integrated moving average (ARIMA) forecasting method takes care of effect of weather change in the PV power generation which is discussed in the literature [9, 10]. A partial functional linear regression approach, which is suitable for PV power forecasting of time 1 day, is implemented in the study of the literature [11]. Historical time series forecast precision in steady weather condition and its accuracy reduce with change in weather condition. Artificial neural network (ANN) is most suitable machine learning technique for short-term prediction of PV power which takes care of the complex nonlinear relationship between input and output. A hybrid data set composed of local geographical data and capacity of adjacent photovoltaic cell is considered as the input vectors to nonlinear autoregressive exogenous (NARX) neural network [12]. A modified and improved back-propagation neural network based on genetic algorithm was proposed for photovoltaic power prediction connected to IEEE-30 bus system is proposed in [12,13,14]. In the recent past, most of the studies in PV power prediction focused on forecasting horizon from 1 to 24 h duration. In the reference [15, 16], the authors forecasted the cloud movement by trapping the sky images. The forecasting technique uses the irradiance sensor array in the region of the solar plant but they need extra equipment and have higher threshold. In all above research works, the weights of the forecasting models are randomly selected which affects the exactness of the model [17]. But in the above papers, the weights of the ELM are selected randomly. The proper selection of weights also affects the capability of ELM model. Thus in the literature [18, 19], the PSO and sine–cosine algorithm are used toward weight optimization in the ELM technique. In [20], genetic algorithm is used obtain optimized ELM weights. Further, the optimization techniques do not provide the ultimate values. Still the selection of proper weights is a challenging task.

In this paper, a much widely used forecasting model known as extreme learning machine is used to predict the photovoltaic power generation for short time length like 15, 30 min and 1 h time horizons. The weights of ELM are optimized by the runner root algorithm (RRA), and the results are matched with the PSO-ELM and CPSO-ELM. Unsystematic selection of weights in the input layer and bias in the hidden layer affects seriously the performance of ELM. The challenges like uncertainty in output and over fitting are still needed to be taken care. Thus, the input weights are optimized by a modified PSO, and then the hidden layer weights are optimized, then the system is simulated separately in each case, and the results are compared.

Regarding arrangement of this paper, the characteristics of solar forecasting, data selection, mathematical analysis of photovoltaic model and the operating process are explained in Sect. 2. Empirical mode decomposition is discussed in Sect. 3. ELM and variants are discussed in Sect. 4. Different optimization techniques like PSO, craziness PSO and runner root algorithm are explained in Sect. 5. The simulation results and performance of projected model are presented in Sect. 6. At last, the conclusion of the entire work described in Sect. 7.

2 Characteristics of Solar Forecasting and Data Selection

Taking many factors into account like intensity of solar radiation, weather condition, ambient temperature, time, cloud and geographical location, the output of PV power generation is fluctuating, and it is difficult to control. Also the PV output depends on environmental conditions [11].

Because of meteorological reservations, the PV generation has tough cyclical characteristics including the day time cycle and yearly cycle. The situation becomes extremely challenging when PV generation is integrated with conventional grid for solar power penetration. The duration of harvesting of solar energy is from 8.00 AM to 5.00 PM. The working temperature of PV power plant also affects the conversion efficiency. In general, the PV plant temperature is higher than the atmospheric temperature. For better prediction, both the temperatures are assumed to be same. The conversion efficiency of the PV cell is expressed as follows:

$$\eta = \eta_{0} \left[ {1 - \mu \left( {T_{p} - T_{\mu } } \right)} \right]$$
(1)

where \(T_{p}\) is the temperature corresponds to point P, \(T_{\mu }\) is the reference temperature (298 K), and \(\eta_{0}\) is the conversion efficiency at reference temperature. Considering the above factors in the prediction horizons is strongly recommended for PV power forecasting and energy management. Thus, out of various forecasting methods, the extreme learning machine technique is used here for prediction of solar power.

2.1 PV Power Generation

The highest PV power yield is modeled by

$$W_{v} = \eta AI\left[ {1 - 0.05(t - 25)} \right]$$
(2)

where

η:

conversion efficiency,

A:

area of solar panel in m2,

I:

solar radiation (kW/m2),

t:

outside air temperature (°C) [18].

Maximum value of power, voltage and current are \(P_{{\max} } = 0.106\,{\text{W}}\), \(V_{{\max} } = 18.453\,{\text{V}}\) and \(I_{{\max} } = 5.76\,{\text{A}}\), \(I_{\text{sc}} = 6.11\,{\text{A}}\), Rsh = 1000 Ω and Rs = 0.0001 Ω, 21.6 V, Ns = 36 cells in series, NP = 1, cells in parallel.

2.2 PV Power Forecasting

The performance of the forecasting model may be influenced by the factors like input parameters and the time horizon. There are a few major variables used as input of prediction model but important ones are (1) past data of PV power production and (2) previous descriptive variables, like meteorological factors, which consist of horizontal irradiance, temperature, cloud, moisture content, wind speed and so on. Different prediction horizons are selected as per the needs of decision making and grid energy management, like short-term and long-term horizons.

To get accuracy, proper grid management and smooth functioning of power system, these time horizons are suitable and helpful in PV power generating stations. The main functions in generating stations are unit commitment, battery control and energy trading etc. Thus, most of the research work focused on the recent development of PV system and forecasting models.

2.3 Performance Estimation

In this work, various numerical measurements like RMSE, MAE and MAPE are calculated using soft computing methods, and their mathematical expression are given below.

$${\text{MAE}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {P_{j} - t_{j} } \right|}$$
(3)
$${\text{MAPE}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\frac{{\left| {P_{j} - t_{j} } \right|}}{{y_{j} }}} \times 100\%$$
(4)
$${\text{RMSE}} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {P_{j} - t_{j} } \right)^{2} } }$$
(5)

where Pj and tj represent present and predicted PV powers, respectively.

2.4 Data Processing

The input and output data are normalized by the use of following equation. The input weights and bias are initially selected in the range of [0, 1], such that the training speed and calculation overflow need not be considered.

$$x^{*} = \left\{ {\frac{{x_{j} - x_{{\min} } }}{{x_{{\max} } - x_{{\min} } }}} \right\};\quad 0 \le x^{*} \le 1$$
(6)

Here, xmax and xmin are the upper and lower bound.

3 Empirical Mode Decomposition

Empirical mode decomposition (EMD) is a signal processing technique applied to divide a signal into multiple component signals called intrinsic mode functions (IMFs). The result of EMD also affects the stability and efficiency of the forecasting model. Nevertheless, the presentation of the model improves by signal disintegration, and prediction accuracy becomes very high. The model asserts that the following expression can describe any fraction of a fresh string of signals.

$$A(t) = \sum\limits_{i = 1}^{n} {\beta i(t) + {\text{rem}}_{n} (t)}$$
(7)

where β and remn(t) are IMFs and remainder of the signal in that order.

The decomposition process is a computational technique which followed iteration method, and the splitted signal has a range of amplitudes and frequencies.

The constituent formation agrees to the conditions that the maxima or minima be either same with the number of zero crossings or their difference is one, in the entire data set. Other condition is that their average must be zero in the envelop.

The first term of IMF is described by

$$i_{1} (t) = A(t) - \gamma_{1} (t)$$
(8)

When A(t) is divided into many IMFs, IMF1 is the first term. If it is not, then it is to be measured as the signal itself and split frequently. KIMF1 is taken to be an IMF and is given as

$$\sigma_{1} = ki_{1} (t)$$
(9)

This term (σ1) is subtracted from A(t) by taking

$$A(t) - \rho_{1} (t) = \lambda_{1} (t)$$
(10)

R is residue of the given signal and other IMFs are \(\rho_{1} ,\rho_{2} , \ldots ,\rho_{n}\).

Then

$$\lambda_{1} (t) - \rho_{2} (t) = \gamma (t)$$
(11)
$$\lambda_{n} (t) - \rho_{n} (t) = \gamma (t)$$
(12)

(11) and (12) are the altered version of Eq. (1)

The terminating state for this uneven procedure was recommended by Huag et al., and a normalized squared variation among the two consecutive changing operations is essential.

$$A_{1} D_{k} = \sum\limits_{t = 0}^{T} {\frac{{\left| {i_{{n\left( {k - 1} \right)}} \left( t \right) - i_{nk} \left( t \right)} \right|}}{{i_{{n\left( {k - 1} \right)}}^{2} \left( t \right)}}^{2} }$$
(13)

This continuous method is ended by any pre-considered criteria, by adding all the IMFs and the ending residue the signal \(A_{n} (t)\) can be restated as

$$A_{n} (t) = \sum\limits_{i = 1}^{m} {\alpha_{ni} (t) + {\text{rem}}_{nm} (t)}$$
(14)

4 Extreme Learning Machine (ELM)

ELM is a machine learning procedure, to train a single layer feed forward network, with arbitrary weights. The computational speed of ELM can be much higher than conventional feed forward network learning algorithms like back-propagation (BP) and some other neural networks. It is a high performed machine learning technique can be applied to a system with small input data. Thus, the suggested learning algorithm tends to have better presentation for feed forward neural networks. Due to the advantages of such machine learning algorithm like easy implementation, least training error, smallest weight norms and extremely faster speed over other conventional neural networks, it is called ELM. In this algorithm, the input parameters are selected randomly after computation, the system becomes linear, and the output weights are computed analytically through Moore–Penrose generalized inverse of the hidden layer output matrices. The structure of ELM is shown as in Fig. 1, and the block diagram of PV system using ELM algorithm is shown in Fig. 2.

Fig. 1
figure 1

Structure of the SLFN model (ELM)

Fig. 2
figure 2

Block diagram representation of proposed PV model using ELM

For m distinct sample, the parameters

$$x_{j} = \left[ {x_{j1} ,x_{j2} , \ldots x_{jn} } \right]^{T} \in R^{n}$$
(15)

and

$$y_{i} = \left[ {y_{i1} ,y_{i2} , \ldots, y_{im} } \right]^{T} \in R^{m} ,$$
(16)

usual SLFN with K hidden neurons and activation function, g, is expressed as

$$\sum\limits_{j = 1}^{K} {\alpha_{j} } \cdot g\left( {a_{j} x_{i} + b_{j} } \right) = t_{i} ,\;i = 1, \ldots ,N,$$
(17)

where aj is the weight interconnecting inputs node with the jth hidden nodes, αj is the weight interconnect output node with the jth hidden nodes, bj is the threshold of the jth hidden nodes, and ti is the ELM yield for the ith data.

This network can estimate these N data samples with 0 error

$$\sum\limits_{i = 1}^{N} {\left\| {P_{i} - t_{i} } \right\|} = 0.$$
(18)

That is, bj, aj and αj exist such that

$$\sum\limits_{j = 1}^{K} {\alpha_{j} } \cdot g\left( {a_{j} x_{i} + b_{j} } \right) = y_{i} ,i = 1, \ldots ,N,$$
(19)

The above N equations can be expressed into a compact form:

$$G\beta = T$$
(20)

where \(G = \left( {a_{j} , \ldots ,a_{X} ,b_{j} , \ldots ,b_{X} ,x_{i} , \ldots ,x_{N} } \right)\)

$$= \left[ {\begin{array}{*{20}c} {g(a_{1} x_{1} + b_{1} )} & . & . & {g(a_{K} x_{1} + b_{K} )} \\ . & . & . & . \\ . & . & . & . \\ {g(a_{1} x_{N} + b_{1} )} & . & . & {g(a_{K} x_{N} + b_{K} )} \\ \end{array} } \right]_{N \times K}$$
(21)
$$\beta = \left[ {\begin{array}{*{20}c} {\alpha_{1}^{T} } \\ . \\ . \\ . \\ {\alpha_{K}^{T} } \\ \end{array} } \right]_{K \times m} \quad {\text{and}}\quad T = \left[ {\begin{array}{*{20}c} {t_{1}^{T} } \\ . \\ . \\ . \\ {t_{N}^{T} } \\ \end{array} } \right]_{N \times m}$$
(22)

In ELM, \(\left( {a_{j} ,b_{j} } \right)\) remain unchanged once arbitrarily produced. To train the network, it is simply corresponding to find a least square solution \(\hat{\alpha }\) of the linear system refer to Eq. (9), that is,

$$\left\| {G\hat{\alpha } - y_{i} } \right\| = \mathop {{\min} }\limits_{\alpha } \;\left\| {G\alpha - y_{i} } \right\|$$
(23)

The smallest norm least square solution to Eq. (9) is

$$\hat{\beta } = G^{ + } T$$
(24)

where, \(H{}^{ + }\) is the Moore–Penrose generalized inverse of H, which can be computed through orthogonal projection

$$G{}^{ + } = (G^{\rm T} G)^{ - 1} G^{T}$$
(25)

In general, the learning algorithms tend to attain minimum training error, but they cannot, because of the local minimum or restriction of infinite training iterations.

The minimum norm least squares solution of  = T is unique, which is

$$\hat{\beta } = G^{ + } T.$$
(26)

4.1 Modified Extreme Learning Machine

In general, the resolution of a linear system is given

$$AZ = P$$
(27)

where

$$A \in R^{m \times n}$$
(28)
$$Z \in R^{m}$$
(29)

In a Euclidean space, a SLFN with arbitrary weight may be treated as a linear system, for ‘n’ samples \((x_{i} ,p_{i} )\)

$${\text{where}}\quad x_{i} = [x_{i1} ,x_{i2} , \ldots ,x_{m} ]^{T} \in R$$
(30)
$${\text{and}}\quad p_{i} = [p_{i1} ,p_{i2} , \ldots ,p_{m} ]^{T} \in R$$
(31)
$${\text{i}}.{\text{e}}.,\quad X_{i} = [x_{ij} ]^{T} \in R$$
(32)
$${\text{and}}\quad P_{i} = [p_{ij} ]^{T} \in R$$
(33)

The ELM can be modeled as

$$\sum\limits_{i = 1}^{k} {\beta_{i} f(wt_{i} \,.\,x_{j} + c_{i} ) = y_{j} ,\,j = 1,2, \ldots ,n}$$
(34)

where \(wt_{i} = [wt_{i1} ,wt_{i2} , \ldots ,wt_{in} ]^{T}\) is the input weight vector. It is input weight vector, arbitrarily selected and coupled with ‘ith’ hidden neuron.

\(\beta_{i}\) is represented as weight vector,

bi is sill assigned to ‘ith’ unseen neuron.

Equation (1) can be expressed as in the matrix form, as follows, where

$$G_{n} = \left[ {\begin{array}{*{20}c} {g\left( {w_{1} x_{1} + b_{1} } \right)} & . & . & {g\left( {a_{k} x_{1} + b_{k} } \right)} \\ \cdot & . & \cdot & . \\ . & . & \cdot & \cdot \\ {g\left( {w_{1} x_{n} + b_{1} } \right)} & . & \cdot & {g\left( {a_{k} x_{n} + b_{k} } \right)} \\ \end{array} } \right]_{n \times k}$$
(35)
$$\beta_{n} = [\beta_{1} ,\beta_{2} , \ldots ,\beta_{k} ]$$
(36)

To keep ‘G’ as unchanged, the learning parameters are chosen randomly.

ELM can be trained as a least square solution. \(\hat{\beta}\) of the linear system.

$$G_{n} \beta_{n} = P_{n}$$
(37)
$$\left\| {G_{n} (w_{1} \ldots w_{m} ,b_{1} \ldots b_{m} )\hat{\beta}_{n} - P_{n} } \right\| = \hat{\beta}_{n} \left\| {G_{n} (w_{1} \ldots w_{m} ,b_{1} \ldots b_{m} )\beta_{n} - P_{n} } \right\|$$
(38)

By the use of Moore–Penrose inverse matrix, theory output weight \(\beta_{n}\)

$$\hat{\beta}_{n} = (G_{n}^{\rm T} G_{n} )^{ - 1} G_{n}^{T} P_{n}$$
(39)

Throughout the simulation work, the number of hidden nodes taken is 5.

4.2 Optimized Extreme Learning Machine (OELM)

In case of number of training samples and hidden neurons are same, then the network can estimate error-free parameters. For a large data system with big number of hidden neurons, it becomes extremely challenging for computation. Thus, it is needed to select the approximate parameters to get outputs close to the pragmatic solution with least error. So, it is advisable to train a SLFN with set input weights \(a_{j}\), bias \(b_{j}\) along with a single hidden layer, and the least square solution is calculated which helps to reduce the error.

The optimized ELM algorithm improves the accuracy by applying optimization algorithm like PSO, CPSO and runner root algorithm. Figure 3 shows flowchart of the working of optimized ELM algorithm.

Fig. 3
figure 3

Flowchart of PSO optimizing ELM

5 Optimization Techniques

5.1 Particle Swarm Optimization

PSO is a sufficiently recognized sensible optimization method which has got very adaptive features for its application and is capable of converging quickly to get suitable solution. Also PSO is well competent to ensure working with huge search space and an objective function which is non-differential.

PSO algorithm is developed to simulate assuming random movement of birds in sky or fish in water. Velocity and position of each element are customized as per (40) and (42), respectively.

$$\begin{aligned} {\text{vel}}_{i}^{k + 1} & = w*{\text{vel}}_{i}^{k} + \eta_{1} *{\text{rand}}_{1} *\left\{ {{\text{pbest}}_{i} (k) - s_{i} (k)} \right\} \\ & \quad + \eta_{2} *{\text{rand}}_{2} *\left\{ {{\text{gbest}}_{i} (k) - s_{i} (k)} \right\} \\ \end{aligned}$$
(40)

where

$$\begin{aligned} {\text{vel}}_{i} & = {\text{vel}}_{{\max} } \quad {\text{where}}\;{\text{vel}}_{i} > {\text{vel}}_{{\max} } \\ & = {\text{vel}}_{{\min} } \quad {\text{where}}\;{\text{vel}}_{i} < {\text{vel}}_{{\min} } \\ \end{aligned}$$
(41)

and

$$s_{i}^{k + 1} = s_{i}^{k} + v_{i}^{k + 1}$$
(42)

The step-by-step procedure of the PSO algorithm is presented as follows:

  • Step 1: The entire data sheet is normalized within the range [0, 1]

  • Step 2: The preliminary velocities of all particles are arbitrarily selected by conveying the swarm dimension, final iterations and speed.

  • Step 3: The fitness value of the particle assigned as per the performance of ELM and the best location of the particles are set according to the swarm’s maximum fitness.

  • Step 4: Both the velocity and position of individual particle are updated as per Eqs. (26) and (27).

  • Step 5: The termination conditions are verified, and if the utmost iterations are not reached, then go back to step 3, or else move to the following step.

  • Step 6: The outcome is the best arrangement of (a, b) of ELM corresponding to the highest fitness value.

5.2 Craziness PSO

There are certain short falls of the classical PSO which can be overcome by its modification. Craziness PSO is a modification the classical PSO particularly in velocity expression. In classical PSO, the bird and fishes change direction suddenly. But that is taken care by a craziness factor η in velocity equation of craziness PSO. The velocity expression here in CPSO is given by

$$\begin{aligned} {\text{vel}}_{i}^{k + 1} & = {\text{ran}}_{2} * {\text{sign}}({\text{ran}}_{3} ) * {\text{vel}}_{i}^{k} \\ & \quad + (1 - {\text{ran}}_{2} ) * \eta_{1} * {\text{ran}}_{1} * \{ {\text{pbest}}_{i}^{k} - s_{i}^{k} \} \\ & \quad + \left( {1 - {\text{ran}}_{2} } \right) * \eta_{2} * (1 - {\text{ran}}_{1} ) * \{ {\text{gbest}}_{i}^{k} - s_{i}^{k} \} \\ \end{aligned}$$
(43)

where ran are the random parameter chosen whose values lie in [1, 0], \({\text{sign}}({\text{ran}}_{3} )\) is defined as

$$\begin{aligned} {\text{sign}}({\text{ran}}_{3} ) & = - 1\quad \quad {\text{ran}}_{3} \le 0.05 \\ & = 1\quad \quad \;\;{\text{ran}}_{3} > 0.05 \\ \end{aligned}$$
(44)

\({\text{ran}}_{1} ,{\text{ran}}_{2}\) are two arbitrary parameters chosen separately. If both the values chosen are big and have social experience, then the element will flutter from the neighboring optimum. The optimization technique will have slow convergence for small values of \(r_{1} ,{\text{and}}\;r_{2}\) that will converge faster for big value of r1 and the least value of (1 – r1). The equilibrium of global and local search may be achieved by selection of another random number r2. In some exceptional cases during changing the position for food searching, a bird may not able to fly for food searching because of inertia. But the element may fly to the probable region in opposite direction which is taken care by sign (r3).

The importance of CPSO is prior to updating this position the velocity of the element is crazed by

$${\text{vel}}_{i}^{k + 1} = {\text{vel}}_{i}^{k + 1} + p({\text{ran}}{}_{4}){\text{sign}}({\text{ran}}_{4} )*{\text{vel}}_{i}^{\text{craziness}}$$
(45)

where ran4 is a random variable in [0, 1], \({\text{vel}}_{i}^{\text{craziness}}\) is craziness velocity. \({\text{vel}}_{i}^{\text{craziness}} \in \left[ {{\text{vel}}_{i}^{{\min} } ,{\text{vel}}_{i}^{{\max} } } \right]\) is the signum function and probability is defined as

$$\begin{array}{*{20}r} \hfill {p({\text{ran}}_{4} ) = 1} & \hfill {{\text{ran}}_{4} \le p_{\text{cr}} } \\ \hfill { = 0} & \hfill {{\text{ran}}_{4} > p_{\text{cr}} } \\ \end{array}$$
(46)

where \(p_{\text{cr}}\) is probability craziness

$$\begin{array}{*{20}r} \hfill {{\text{sign}}({\text{ran}}_{4} ) = 1} & \hfill {{\text{ran}}_{4} \ge 0.5} \\ \hfill { = - 1} & \hfill {{\text{ran}}_{4} < 0.5} \\ \end{array}$$
(47)

Reverse flow of birds may not occur frequently for which \(r_{3} < 0.05\) very small values can be assigned and \({\text{sign}}({\text{ran}}_{3} ) = - 1\) is assumed in opposite direction.

Similarly, \(p_{\text{cr}} \le 0.3\) is chosen such that \(r_{4}\) will be assigned more chance, and lastly it comes to more than \(p_{cr}\).\(p({\text{ran}}_{4} )\) which will be zero in majority cases. Otherwise, there will be an unexpected oscillation in the convergence curve. \({\text{vel}}^{\text{craziness}}\) is selected very minute values (=0.0001).

5.3 Runner and Root Algorithm

Plants like strawberry and spider elongate from one place to another, with the help of that runner (also stolon). Plant generates a new daughter plant at the end of the runner. This daughter plant generates further and gets converted to a new mother plant after certain time. The series of growth events goes without hindrance. In this case, activities of runner roots are modeled as the global and local search, respectively. The plants grow the runners, distribute roots and pierce hairs in order to spread it to find nutrients and water, which is equivalent to search techniques in an optimization algorithm. Hence, it is derived that if a daughter plant traps a confined best possible point, it grows new runners and roots and becomes an able parent.

This can be considered as an unobstructed optimization function as

$${\min} \;\eta (x),\quad y_{l} \le y \le y_{u}$$
(48)

where η: \(R^{m} \in R\) is ‘m’ cost function.

\(y^{*} = \arg {\min} \eta (x) \in R^{m}\) is the most outstanding decision.

\(y_{l} ,y_{u} \in R^{m}\) are min and max values.

Each parent plant produces a daughter plant iteratively, similar to RRA. The expression for the offspring is given as

$$y_{\text{daughter}}^{k} (i) = \left\{ {\begin{array}{*{20}l} {y_{\text{mother}}^{1} (i)} & {k = 1} \\ {y_{\text{mother}}^{k} (i) + d_{\text{runner}} \times r_{k} } & {k = 2,3, \ldots ,P} \\ \end{array} } \right\}$$
(49)

where \(r_{k} \in R^{m}\) is a arbitrary vector within [−0.5, 0.5].

\(d_{\text{runner}}\) is offspring and parents.

The cost function is tested at new daughter plant. At least one among these plant grows considerably in the worth of the cost function matched to the best one in the preceding iteration, i.e., mathematically if the expression

$$\left| {\begin{array}{*{20}c} {{\min} \quad \eta \left( {y_{\text{daughter}}^{k} (i)} \right) - {\min} } & {\eta \left( {y_{\text{daughter}}^{k} (i - 1)} \right)} \\ {k = 1,2, \ldots ,P} & {k = 1,2, \ldots ,P} \\ \end{array} } \right| \ge l$$
(50)

suits, then the algorithm does not make the local search, i.e., the global search is in progress. In this process of finding, the best new solution may be chosen as a parent for the next generation, i.e.,

$$y_{\text{mother}}^{1} (i + 1) \leftarrow y_{\text{daughter best}} (i)$$
(51)

Pseudocode:

  • Initialize the distance of runner and root and, \(n_{\text{pop}}\), \({\text{stall\_max}}\), \({\text{tol}}\), a

  • \(x_{\text{mother}}^{k} \left( 1 \right) \leftarrow x_{l} + {\text{rand}} \times \left( {x_{u} - x_{i} } \right)\) for k = 1, …, npop

  • Assign initial values of parents.

  • \({\text{stal\_count}} \leftarrow 0,i \leftarrow 1\)

  • Repeat until termination condition does not fulfill

  • $$y_{\text{daughter}}^{k} (i) = \left\{ \begin{aligned} & y_{\text{mother}}^{k} (i) \\ & y_{\text{mother}}^{k} (1) + d_{\text{runner}} \times r_{1} \\ \end{aligned} \right.\;{\text{for}}\;k = 1, \ldots ,n_{\text{pop}}$$
    (52)
  • $$y_{\text{daughter,best}} (i) \leftarrow \arg {\min} f(x)$$
    (53)
  • //utilizes \(n_{\text{pop}}\) function evaluations \(y = x_{\text{daughter}}^{k} (i)\)

  •   IF i > 1 AND

  • $$\left| {\frac{{\mathop {{\min} }\limits_{{k = 1, \ldots ,N_{\text{pop}} }} f\left( {y_{\text{daughter}}^{k} (i)} \right) - \mathop {{\min} }\limits_{{k = 1, \ldots ,N_{\text{pop}} }} f\left( {y_{\text{daughter}}^{k} (i - 1)} \right)}}{{\mathop {{\min} }\limits_{{k = 1, \ldots ,N_{\text{pop}} }} f\left( {y_{\text{daughter}}^{k} (i - 1)} \right)}}} \right| < {\text{tol}}$$
    (54)
  •   then

  •   for k = 1: npop

  •   do//test locally at big steps

  • $$x_{{{\text{perturbed}},k}} \leftarrow {\text{diag}}\left\{ {1,1, \ldots ,1,1 + d_{\text{runner}} n_{k} ,1 \ldots ,1} \right\} \times y_{\text{daughter,best}} (i)$$
  • $${\mathbf{if}}\;f\left( {y_{{{\text{peturbed}},k}} } \right) < f\left( {y_{\text{daughter,best}} (i)} \right)$$
    (55)
  • then//uses a function estimate

  • $$y_{\text{daughter,best}} (i) \leftarrow y_{{{\text{perturbed}},k}}$$
  •     end

  •   end

  •    (k- Loop)

  •   for k FROM 1 UNTIL \(N_{\text{pop}}\)

  •   do//test locally with small steps

  • $$y_{{{\text{perturbed}},k}} \leftarrow {\text{diag}}\left\{ {1,1, \ldots ,1,1 + d_{\text{root}} r_{k} ,1 \ldots ,1} \right\} \times y_{\text{daughter,best}} (i)$$
    (56)
  •   If \(f\left( {y_{{{\text{peturbed}},k}} } \right) < f\left( {y_{\text{daughter,best}} (i)} \right)\)

  • then//consumes a function estimation.

  • $$x_{\text{new,best}} (i) \leftarrow x_{{{\text{perturbed}},k}}$$
  •    end

  •   end (k- loop)

  •   end (if)

  •   \(y_{\text{mother}}^{1} \left( {i + 1} \right) \leftarrow y_{\text{daughter,best}} (i)\)

  •   Compute the eligibility of kth offspring from the following and

  • $${\text{fit}}\left( {y_{\text{daughter}}^{k} (i)} \right) \leftarrow \frac{1}{{a + f\left( {y_{\text{daughter}}^{k} (i) - } \right) - f\left( {y_{\text{daughter,best}} (i)} \right)}}$$
    (57)
  •   the chance of selecting it from the following

  • $$p_{k} = \frac{{{\text{fit}}\left( {y_{\text{datgher}}^{k} (i)} \right)}}{{\sum\nolimits_{j = 1}^{{N_{pqp} }} {{\text{fit}}\left( {y_{\text{datgher}}^{j} (i)} \right)} }}\;{\text{for}}\;k = 1, \ldots ,N_{\text{pop}}$$
    (58)
  •   for k from2 until npop

  •   do //growing the mother plants of next generation

  • $$y_{\text{mother}}^{k} \left( {i + 1} \right) \leftarrow y_{\text{daughter}}^{\text{ind}} (i)$$
    (59)
  • here ‘ind’ is the index of the daughter plant chosen. This is selected from among the present iteration through a roulette wheel selection.

  •   end (k-Loop)

  •   if

  • $$\left| {\frac{{f\left( {y_{\text{daughter,best}} (i)} \right) - f\left( {y_{\text{daughter,best}} (i - 1)} \right)}}{{f\left( {y_{\text{daughter,best}} (i - 1)} \right)}}} \right| < {\text{tol}}$$
    (60)
  • then

  •   \({\text{stal count}} \leftarrow {\text{stal count}} + 1\)

  •   else

  •    \({\text{stal\_count}} \leftarrow 0\)

  •   end

  •   if \({\text{stall\_count}} > {\text{stall\_max}}\)

  •   then resume the program \(i \leftarrow i + 1\)

  •   end. (repeat)

6 Simulation Results

In this work first of all, ELM technique is used to forecast the PV power generation of a real-time solar on-grid power plant, (located at the roof top of an academic block, Bhubaneswar in Odisha, India) whose specification is given in Table 1. ELM is applied to the same plant data (80% testing and 20% training), and its performance was investigated. In further investigation, to improve the accuracy of prediction, the signal is decomposed by empirical mode decomposition, (EMD) followed by different optimization techniques like PSO, CPSO and RRA, which are applied and the results are tabulated for a short-term forecasting (15, 30 min and 1 h). Here, IMFs generated as a result of EMD are applied to the input layer nodes samplewise, i.e., one set of IMFs to one node, however small their amplitude may be. Sum total of all node input IMF sets represents the signal. It is like Fourier series decomposition. Figure 4 shows a part of it.

Table 1 Real-time data requirement
Fig. 4
figure 4

IMFs of the data with EMD in 30 min interval

Sometimes if the IMF is very feeble, 0 value is considered for that automatically. Here since the signal is random, so also is the IMFs. After this, various said optimization techniques are applied to train the weights in the branches. The simulation result depicts that the RRA-based EMD-ELM performs better than other ELM techniques. The result of simulation with above specifications and incremental conductance MPP, P-V characteristics of the said photovoltaic cell is shown in Fig. 5.

Fig. 5
figure 5

P-V characteristics for irradiance from 400 to 1000 W/m2 at 25 °C

Case-1: Prediction of PV power in 15 min Horizon

In this case, the PV power generated data between January 2010 and December 2010. The entire data are arranged in each 15 min time horizon. The data set is subjected to empirical mode decomposition (EMD) and then trained by the ELM techniques. Further different optimization techniques, like PSO, CPSO and runner root algorithm, are applied to enhance the stability and forecasting capability of the model. The prediction curves are shown in Fig. 6. Errors are shown in Table 2. It is depicted from the graphs and the forecasting measuring indices that the RRA-based ELM shows better performance than other models.

Fig. 6
figure 6

15 min prediction in various algorithms

Table 2 Forecasting errors in 15 min horizon

Case-2: Prediction of PV power in 30 min Horizon

In this study, the same PV power generated data of the same plant, between January 2010 and December 2010, were arranged in each 30 min interval of time. The data set is decomposed into different IMFs by empirical mode decomposition method (EMD). The data set was decomposed to enhance the strength of the ELM model. Further different optimization techniques, like PSO, CPSO and runner root algorithms, are applied to enhance the stability and forecasting capability of the model. The prediction curves are shown in Fig. 7. Errors are shown in Table 3 that follows. It is depicted from the graphs and the forecasting measuring indices that the RRA-based ELM shows better performance than other models.

Fig. 7
figure 7

30 min prediction in various algorithms

Table 3 Forecasting errors in 30 min horizon

Case-3: Prediction of PV power in 60 min horizon

In this case, the total data set is arranged in 60 min time interval with same ratio of 80% testing and 20% training.

The data set is decomposed into different IMFs by empirical mode decomposition method (EMD). The ELM based on different optimization techniques, like PSO, CPSO and RRA, investigates the forecasting errors with 60 min time interval. The prediction error curves are shown in Fig. 8. Errors are shown in Table 4. It is depicted from the graphs and the forecasting measuring indices that the RRA-based ELM shows better performance than other models.

Fig. 8
figure 8

60 min prediction in various algorithms

Table 4 Forecasting errors in 60 min horizon

7 Conclusion

Here in this work, a real-time PV model historical data set is prepared with time slots ranging 15, 30 and 60 min, and then simulated with application of a modified extreme learning machine technique. Further to increase the accuracy in forecasting, the arbitrary selected weights are optimized through PSO, CPSO and RRA and simulated. It is depicted that RRA-ELM shows better performance than other models. Thus, the use of various optimization algorithms is made, giving appropriate weights in the model and bias to the nodes in the model, so that ELM representation can offer a computed guess of PV power with high accuracy and low error. Here, the runner root algorithm is rigorous optimization techniques which optimize the random input weights of the ELM model. Hence, the RRA-ELM is superior to other models which can be verified from the table.