Keywords

1 Introduction

The advantages of solar with respect to its availability and environmental friendly nature cause increasing penetration of the same into electrical grid. But despite the advantages, the inherent nature of solar irradiation causes intermittency in the production of energy generated by a solar photovoltaic system posing potential challenges for power system operation or grid operators. Moreover, the main task of the power system is to ensure a reliable state-of-the-art supply-on-demand system. Thus reliability can be achieved only if the deliverable amount of energy from a photovoltaic system to the grid is known. This can be reinforced through predictive technologies or models employed for the prediction of energy generated by a solar power plant over a time-based horizon. The performance ratio or plant capacity factor of a solar photovoltaic power plant is always evaluated over a long-term horizon. Also, a performance comparison among solar PV plants is always made based on the annual long term (monthly average daily) monitored or evaluated response. Hence long-term prediction over an annual horizon is practically required for moving towards smarter grid or making reliability a reality.

The necessity of solar resource assessment for prediction of energy generated by a typical solar photovoltaic distribution system is made clear. This necessity creates a dependence on theoretical modelling of global solar irradiance and energy generation. There currently occur three modelling aspects for prediction of global irradiance and energy. These include physical or mathematical models, statistical models and empirical models. Physical or parameterized models of kind included for prediction of irradiance and energy were reported by researchers as seen in [18]. The physics-based models for irradiance rely on the physics of interaction between the extraterrestrial irradiance and constituents of atmosphere. The disadvantage of physics-based models for irradiance and energy include complex structure and dependence of it towards more number of input parameters such as station pressure, temperature, Rayleigh scattering, ozone reduced path, perceptible water, aerosol scattering albedo, aerosol optical thickness, temperature coefficient of modules, solar irradiance at plane of array and PV characteristic parameters. Hence models based on statistical approach were also reported to exist for prediction of irradiance. Statistical-based models employing time series based modelling strategies such as moving average, support vector regression and auto regression integrated moving average (ARIMA) [9, 10] are commonly employed for minutely or hourly based forecasting. These techniques are too complex to be employed for monthly average-based prediction. This further resulted in empirical-based approach for prediction of solar potential in a desired location of interest. Empirical models correlate the desired response (global horizontal irradiance and energy) to the available and accessible inputs. This reduces the dependence of the model towards more number of input parameters (as in case of physical models). Furthermore, the empirical constants are derived employing simple regression-based methodology limiting the complexity experienced with statistical models.

Though empirical models possess advantages as compared to physical and statistical-based approach, there remain certain research investigations among the prevailing empirical structures. These include the opportunity for improving accuracy by incorporation of significant input factors, which are absent in the existing models, proper validation check and limiting its dependency towards real-time measurable input parameters. This drives or motivates towards the formulation of an empirical model addressing the challenges as stated above. Before stepping into the formulation of an improved empirical model, the existing empirical models for prediction of irradiance and energy have to be known. There exist two basic classifications of empirical model for irradiance based on the incorporation of input parameters such as single parametric model and multi-parametric or hybrid models. Whereas, only a few polynomial regressive-based empirical models are found to be reported for energy prediction [1113]. Hence, this chapter contributes to the formulation of empirical model for prediction of global irradiance and energy generation for solar photovoltaic system. The location of interest for the formulation of proposed empirical model was selected based on the accessibility of testing data set required for its validation. Moreover, the proposed model can further be used for other locations on altering the empirical constants embedded in the model. A schematic showing the classification of empirical model for irradiance and energy prediction is presented in Fig. 1.

Fig. 1
figure 1

Classification of empirical irradiance and energy prediction models

1.1 Single Parametric Models for Prediction of Solar Irradiance

The single parametric models include only a single significant factor affecting irradiance which is reported to be practically measured. The commonly existing single parametric models include the sunshine-based and ambient temperature-based model. The section below describes the single parametric sunshine and temperature-based model.

1.1.1 Single Parametric Sunshine-Based Model

The sunshine-based model was first introduced by Angstrom in 1940 [14], who suggested a linear relationship between the clearness index (ratio of global horizontal irradiance to the extraterrestrial global irradiance) and the relative sunshine hour (ratio of sunshine hour to maximum possible bright sunshine hour). This was further modified by Presscott [15] to deliver Angstrom-Presscott model which supported the addition of an empirical constant to the Angstrom model. The equation of form reported by Presscott [15] is given by

$$\frac{H}{{H_{0} }} = a + b\left( {\frac{S}{{S_{0} }}} \right)$$
(1)

where H represents the monthly average daily global irradiance, H 0 represents the extraterrestrial global irradiance; S represents the monthly average daily sunshine hour and S 0 represents the maximum possible bright sunshine hour.

Having Angstrom-Prescott model as the basis several other researchers developed linear order-based sunshine model with the change in empirical constants ‘a’ and ‘b’ for certain locations of US, Zimbabwe and India [1620]. In 1984, Benson et al. [21] proposed a seasonal specific linear order-based model for 46 stations, which experiences improved prediction accuracy than yearly based models as described in [16, 17].

Ogelmann in 1984 [22] proposed a yearly based monthly average daily quadratic model for Andana and Ankara in Turkey with a training data set of 3 years. The prediction agreement was found better than the linear based sunshine model. This occurred due to the addition of quadratic order based factor to the linear order, increasing accuracy. The regression coefficient representing closeness between the response and the input is high for a quadratic order than for a linear order factor. The basic form of quadratic order based model [22] is given by

$$\frac{H}{{H_{0} }} = a + b\left( {\frac{S}{{S_{0} }}} \right) + c\left( {\frac{S}{{S_{0} }}} \right)^{2}$$
(2)

Furthermore, Samuel in 1991 [23] proposed a cubic model for a location with latitude 5.55°N. The MPE reflecting accuracy was reported to be 2.6 %. Thus a cubic order based empirical model would deliver better closeness between the predicted and actual value of irradiance than a quadratic and linear order. The basic form of cubic order-based model [23] is given by

$$\frac{H}{{H_{0} }} = a + b\left( {\frac{S}{{S_{0} }}} \right) + c\left( {\frac{S}{{S_{0} }}} \right)^{2} + d\left( {\frac{S}{{S_{0} }}} \right)^{3}$$
(3)

Ampratwum et al. in 1999 [24] compared models of linear, quadratic and logarithmic sunshine-based models. The author finally reported the usage of quadratic and logarithmic sunshine-based models for prediction of global irradiance. Further, a monthly specific quadratic order-based model for prediction of global irradiance in Sudan was proposed as in [25]. A least MPE of 0.36 % on training was observed. This occurs due to the nature of reported empirical constants, being monthly specific. Haydar et al. in 2006 [26] also experienced highly acceptable accuracy for the cubic model than the developed linear and quadratic-based sunshine models for certain provinces in Anatolia such as Afyon, Cankiri and Corum. A non-linear curve fitting model for prediction of monthly average daily global irradiance for Jeddah was reported in [27]. An acceptable prediction performance was observed due to the fact of deriving the empirical constants through curve fitting methodology than regression-based method [27]. A similar curve fitting-based methodology for gaining improved prediction accuracy, in formulation of empirical model was followed by Wanxiang et al. [28].

Finally summarizing, the single parametric-based sunshine model, improved accuracy would be rendered on employing monthly specific cubic models or curve fitting-based methodology for obtaining the empirical constants.

1.1.2 Single Parametric-Based Temperature Model

As already cited, ambient temperature also occur as a significant single parametric factor affecting global irradiance. This section cites the reported temperature-based global irradiance model with its performance comparison among sunshine-based model.

Similar to [15] which describes the basic form of sunshine model, an attempt was made in 1982 by Hargreaves and Sammi [29] to report a basic form of temperature-based model. The model reported by Hargreaves and Sammi [29] is given by Eq. (4) as follows:

$$\frac{H}{{H_{0} }} = a + b\left( {T_{\hbox{max} } - T_{\hbox{min} } } \right)^{0.5}$$
(4)

Further modifications to the basic form of temperature model as reported in [29] model was described by few researchers as seen in literatures [3036]. Though temperature-based models exist for prediction of global irradiance, it lies less accurate on comparison to sunshine-based models [37, 38]. Ultimately, a factor of temperature which is proven to affect the solar radiation cannot be neglected though. Hence, a hybrid model would form a better solution encompassing factors implicitly and explicitly affecting global horizontal irradiance.

1.2 Multi-Parametric or Hybrid Models for Prediction of Solar Irradiance

The hybrid parametric model evolved as the methodology for experiencing further improved accuracy than the reported sunshine and temperature-based models. The available input factors which are proved to affect the intensity of solar radiation for a desired location are considered for its prediction. The models in line with the consideration of available input factors are seen in literatures [3843]. The reported models constitute metrological input factors such as ambient temperature (T a), soil temperature (T so), relative humidity (RH), sine of declination angle (δ), mean sea level, ambient temperature, water vapour pressure (P v), and mean cloud cover (C m). The basic form of the existing hybrid models as reported by cited researchers include

$$\frac{H}{{H_{0} }} = a\left( {\left( {\Delta T} \right)^{b} \left( {1 + cP + dP^{2} } \right)} \right)\quad [ 3 9]$$
(5)
$$\begin{aligned}H & = 4.591 - 0.1135H_{0} + 2.522\left( {\frac{S}{{S_{0} }}} \right) + 6.1589(\sin \delta ) \\ & \quad - 0.0124({\text{RH}}) + 0.0187(T_{\text{so}} ) - 0.052(T_{\text{a}} )\quad [ 40] \\ \end{aligned}$$
(6)
$$\frac{H}{{H_{0} }} = a + b\left( {\frac{S}{{S_{0} }}} \right) + cT + dV + e{\text{RH}} + fP_{\text{v}} \quad [ 4 1]$$
(7)
$$\frac{H}{{H_{0} }} = a\ln (T_{{{\text{a}}(\hbox{max} )}} - T_{{{\text{a}}(\hbox{min} )}} ) + b\left( {\frac{S}{{S_{0} }}} \right)^{c} + d\quad [ 4 2]$$
(8)

The regression coefficient marking the closeness between the desired irradiance and the input factor increases on addition of input factors affecting irradiance [3843]. Furthermore, an exhaustive review as seen in [44] for empirical models on solar radiation prediction, reported non-linear model to be the best predictor on comparison with linear, ANN and fuzzy (complex methods) with least MPE of 0.11 %, RMSE of 0.0181 % and MBE of 0.0001 %.

Summarizing the facts delivered by hybrid model, the prediction accuracy is proved to increase on addition of significant available factors towards irradiance. The hybrid model can necessarily include the incorporation of proved significant factors of sunshine and ambient temperature towards global irradiance.

The hybrid models perform acceptably well in comparison with the single parametric model. But the prevailing challenge occurring in the present multi-parametric model is its dependence towards more number of real-time monitored input factors. These could be unavailable for stations other than its formulation or training. More simply, its reliability varies with location. Furthermore, the cost incurred in measuring the input model parameters aiding prediction of global irradiance (response) should be less than measuring the response directly. Hence formulating a model for prediction of irradiance, whose input factor limits its dependence towards realistic measurement, is required. Similar is the case with the prediction of energy generation for a photovoltaic system.

1.3 Multi-parametric Model for Prediction of Energy Generation

The multi-parametric model for prediction of energy delivered by a typical solar photovoltaic system is given by models of kind [1113] and [45]. These include either of the input parameters such as global irradiance, ambient temperature, module temperature and wind speed. The basic form of the reported models is given as follows:

$$P_{\text{ac}} = aH + bH^{2} + c\,\ln (H)\quad [ 1 1]$$
(9)
$$P_{\text{ac}} = aH + b\left( {\frac{H}{{T_{\hbox{max} } }}} \right)^{2} + cT_{\hbox{max} } \quad [12]$$
(10)
$$P_{\text{ac}} = H\left( {a + bH + cT_{\text{a}} + d(WS)} \right)\quad [13]$$
(11)
$${\text{P}}_{\text{ac}} = a + bHT_{\text{m}} + cH + dH^{2} \quad [45]$$
(12)

The performance of the prevailing models marked by absolute mean relative error varied from a minimum of 2 % to a maximum of 17 %, for the models reported as in Eqs. (9)–(12) on its application to a typical case study [13]. The prediction accuracy of the model depends on the nature of significant factors incorporated for its formulation as already cited. The generation of heat loss on power generation from a photovoltaic cell is well evident from the nature of photovoltaic effect [46] and reported research investigations [47, 48]. This significantly affects the yield or the energy generation. Hence the empirical model to be formulated for energy generation should account for the heat loss dissipated on power generation. This chapter further contributes to the evaluation of improved model for prediction of energy, generated by a typical PV system.

2 Formulation of Multi-parametric Global Irradiance Prediction Model

The formulation of an empirical multi-parametric model for prediction of global irradiance and energy involves the following sequential schematic as represented in Fig. 2. As seen, the formulation is assisted with the measured response (be it either irradiance or energy) during the training period. The significant atmospheric factors affecting the spectral properties of solar irradiance or energy are related through justified or proven facts (requiring prior knowledge). The proposed model is then evaluated for its coefficients employing simpler regression methodology for a desired location, making the same to be applicable on reality.

Fig. 2
figure 2

Methodology for empirical model formulation

The measured global irradiance (considering solar irradiance as the first response) for the formulation of the prediction model is inherited from the solar radiation database provided by RETscreen plus [49]. RETscreen plus provides the complied monthly average daily global irradiation data from NASA and WRDC. As the accuracy of the proposed model depends on the accuracy of the training data, a compiled input data set is preferred. The training period occurs for the duration of 1961–1990 where the monthly average daily global irradiance input set is available. The desired locations were selected based on the availability or accessibility of the validation data set. Furthermore, merely basic sunshine-based models occur for certain locations of India such as Mumbai, Kolkata, Jodhpur, Kodaikanal and Chennai, where the need for improved prediction accuracy lies important. The formulation of multi-parametric model which is believed to exhibit improved accuracy remains untested. Most particularly the state of Tamil Nadu shares about 35 % of its installed capacity from renewable source of energy [50]. The state also experiences around 300 sunny days which makes it to rely on solar power for supporting the created demand. As solar installations increase, the intermittency of energy generation increases, ultimately creating a need for reliability. This further can be made a reality, on existence of predictive technologies. Thus certain locations of Tamil Nadu were considered in the present chapter for training and validation.

2.1 Input Factors Considered Affecting Global Solar Irradiance

The atmospheric input factors affecting the clearness index (ratio of measured global irradiance to extraterrestrial global irradiance) includes the relative sunshine hour, temperature ratio and the air mass at solar noon which are briefly described as follows.

2.1.1 Relative Sunshine Hour

The sunshine duration is defined as the length of time during which the ground is irradiated with direct solar irradiance [51]. The duration during which the ground is irradiated or the amount of daylight implicitly marks the intensity of global irradiance received by it. This duration is recommended to rely on the measurement from the sunshine recorder as suggested by several researchers, who reported sunshine-based empirical models for prediction of global irradiance. Instead of relying on real-time measurement, it can be suggested to limit the real-time dependency by theoretical assessment of sunshine hour. Thus, the equation reported as in [52] is suggested for calculation of sunshine hour duration. This reduces one of the prevailing challenges of empirical-based models. Equation (13) gives the theoretical estimation of sunshine hour [52]

$$S = \frac{h}{360}{\text{arc}}\,\cos (\tan (L)\tan (23.5)\cos \left( {\frac{{360D_{n} }}{365.25}} \right)$$
(13)

where h represents hours per day; L corresponds to the latitude of the monitored site; the daily sunshine is averaged over a month to obtain monthly average daily sunshine hour (S). The maximum possible sunshine hour can be calculated by Eq. (14) as

$$S_{ 0} { = }\frac{ 2}{ 1 5}\omega_{\text{s}}$$
(14)

where ω s represents the hour angle in degrees. The hour angle is defined as the angular displacement of sun towards the east or west of the local meridian due to rotation of the earth on its axis at 15°/h. It is mathematically derived from declination and latitudinal angle as seen in Eq. (15).

$$\omega_{\text{s}} = \cos^{ - 1} ( - \tan \,\delta \,\tan \,L)$$
(15)

The declination angle represented by δ is defined as the angular position of the sun with respect to the equatorial plane. This varies with the value of ±23.45°. The declination angle can be found from the approximate equation given by Cooper [53]

$$\delta = 23.45 \sin \left( {360\frac{{(284 + D_{n} )}}{365.25}} \right)$$
(16)

The higher the sunshine duration, the more is the intensity of global irradiance received by a horizontal surface. The direct dependence of global irradiance towards sunshine hour or clearness index towards relative sunshine hour is further justified by Rahman [54]. Figure 3 shows the annual average values of clearness index for various locations such as Buenos Aries, Penang, New Delhi, Ibadan, Venezia, El Fasher, Port Sudan, Bhavnagar, Alicante, Lucknow, Abu Namma and certain other regions.

Fig. 3
figure 3

Variation of clearness index with respect to relative sunshine hour as reported in [54]

Figure 3 represents the closeness or the significance between the clearness index and relative sunshine hour, reflected through the value of regression coefficient between the same. Higher the value of regression coefficient, higher is the significance of the input factor with respect to the desired response. As the percentage contribution of relative sunshine hour towards clearness index is high as 82.9 %, the same is termed significant.

2.1.2 Temperature Ratio

The fact of sunshine duration implicitly alters the temperature of the ambient, which is incident to the radiation from the sun. This phenomenon occurs naturally and is self-evident [55]. Thus inclusion of ambient temperature towards global horizontal irradiance serves justified. Furthermore, the latitude of the location influences the amount of solar radiation. However, the pattern of temperature distribution across the globe is also latitudinal. Thus incorporation of ambient temperature indirectly marks the inclusion of latitudinal variation across the location, making the proposed model more significant.

Besides, the consideration of ambient temperature the physical reason behind its occurrence or source of origin should also be incorporated. The source of origin is none other than the black body or the sun. Hence consideration of sun’s temperature in addition to the ambient makes an empirical model physically significant. The physical significance of sun’s temperature towards the intensity of radiant flux is justified by physical laws of radiation defined by Stefan-Boltzmann [56] and Planck [57].

As a ratio of measured global irradiance (H) to the maximum extraterrestrial irradiance (H 0) is found to vary linearly with a ratio of sunshine hour to maximum possible sunshine hour, the same (H/H 0) is considered to vary with the ratio of minimum temperature (ambient temperature) to the maximum temperature (sun’s temperature).

2.1.3 Air Mass at Solar Noon

The solar irradiance passes through an atmospheric column of air surrounding the earth. This varies depending on the apparent position of the sun in the sky [58]. The path length of the column of air is minimum when the sun is exactly overhead (at zenith position) or at solar noon. For the instant other than solar noon, the rays have to pass through a long atmospheric air column preferably termed as optical air mass. Hence the distance between the earth and the sun decreases at solar noon increasing the magnitude of solar radiation received over the ground. Air mass is often approximated for a constant density atmosphere and is given by

$${\text{AM}}\, ( {\text{at}}\,{\text{solar}}\,{\text{noon)}} = \frac{1}{\cos Z};$$
(17)

Z is the Zenith angle at solar noon.

\(Z = 90 - \alpha\). Where \(\alpha = 90 + \delta - \phi\) for Northern Hemisphere, as India lies in the Northern hemisphere.

δ, Φ and α are the declination angle, latitudinal angle and altitude angle of the site respectively.

Hence the proposed model for prediction of monthly average daily global irradiance includes the input model parameters such as relative sunshine hour, temperature ratio and air mass at solar noon. The significance of the incorporated input factor is justified by the value of regression coefficient (R 2) generated between the same and the desired global horizontal irradiance. Figure 4a–c shows or justifies the significance of the incorporated input parameters such as sunshine hour, temperature ratio and air mass towards global irradiance respectively. The value of R 2 varied from 0.71 to 0.89 marking significant contribution of input factors such as sunshine hour, temperature ratio and air mass at solar noon in prediction of global irradiance.

Fig. 4
figure 4

ac Significance of considered input factor sunshine hour, temperature ratio and air mass towards clearness index (response)

Thus, the section has briefly described the factors considered for modelling global horizontal irradiance with its justification towards the same. The next section follows, relating the input parameters to the response leading to the formulation of a modified multi-parametric model for global irradiance.

3 Modified Multi-parametric Empirical Model

The next step under the process of formulating an empirical model is to relate the considered input parameters towards prediction of desired response. Hence summarizing the observed relationship between the global horizontal irradiance and the input parameters, a modified multi-parametric model is formulated. The intensity of global horizontal irradiance increases for increase in sunshine hour, ambient temperature and air mass at solar noon. Hence, the form as proposed in Eq. (18) is rightly employed for the prediction of monthly average daily global irradiance.

$$\frac{H}{{H_{0} }} = a + b\left( {\frac{S}{{S_{0} }}} \right) + c\left( {\frac{S}{{S_{0} }}} \right)^{2} + d\left( {\frac{S}{{S_{0} }}} \right)^{3} + e\left( {\frac{{T_{\text{a}} }}{{T_{\text{s}} }}} \right) + f({\text{AM}})$$
(18)

where a, b, c, d, e and f represents the empirical constants pertaining to a location of interest for which the model is formulated.

The proposed model incorporates explicitly the effect of sunshine, ambient temperature and air mass at solar noon. These factors implicitly mark the account of variation in latitude of the location, declination angle, altitude angle and hour angle. Thus the incorporation of more number of input parameters (multi-parametric model) either implicitly or explicitly refers to the strength of the model. The addition of significant factors also makes the model to exhibit improved prediction accuracy.

3.1 Case Studies for the Prediction of Global Horizontal Irradiance

The case studies for the applicability of the proposed irradiance model falls where the validation data set encompassing the measured global irradiance was accessible or made available. Hence the locations of Madurai/Sivagangai and Chennai were selected as case study for testing the prediction accuracy of the modified multi-parametric model. The validation data set for Madurai for which the model was formulated or trained was not available appropriately. Hence the nearest monitoring station of Sivagangai was considered for testing the model, as its validation data set was available for the duration of (2011–2013). The validation data set for Chennai was obtained from [59], who reported a basic sunshine-based model for Chennai. The validation data set for Chennai ranges from a duration of 1980–2009. The training data set for the region of Madurai is tabulated in Table 1.

Table 1 Training data set of proposed model parameters (comprising the measured and evaluated input parameters) for Madurai/Sivagangai during (1961–1990)

The empirical constants were formulated from the training data set of model parameters covering monthly average daily data ranging for duration of 1961–1990. The least square regression-based methodology [60] was adopted for evaluation of empirical constants. The empirical constants of the proposed model for the locations of Madurai/Sivagangai are tabulated in Table 2.

Table 2 Empirical constants for the proposed model for Madurai/Sivagangai

Similarly, the training data set for Chennai were employed in determining the empirical constants of the modified multi-parametric model. The training data set for Chennai is tabulated in Table 3 and the associated empirical constants are made available in Table 4.

Table 3 Training data set of proposed model parameters (comprising the measured and evaluated input parameters) for Chennai during (1961–1990)
Table 4 Empirical constants for the proposed model for Chennai

The basic Angstrom-based sunshine models such as linear, quadratic and cubic were also formulated for the locations of Madurai/Sivagangai and Chennai to compare the performance accuracy of the same and the proposed multi-parametric model. The proposed Angstrom-based constants of linear, quadratic and cubic models for Madurai/Sivagangai and Chennai are tabulated in Tables 5 and 6 respectively.

Table 5 Sunshine-based empirical constants for Madurai/Sivagangai
Table 6 Sunshine based empirical constants for Chennai

The proposed models are compared for suggesting the highly acceptable model suited for prediction of monthly average daily global irradiance tested for locations of Madurai/Sivagangai and Chennai.

3.2 Performance Study of Irradiance Prediction Models

There exist certain performance indicators for prediction models indicating its prediction accuracy. These include mean bias error (MBE), root mean square error (RMSE), mean percentage error (MPE), mean absolute bias error (MABE) and mean absolute percentage error (MAPE). The mean bias error gives accurate information on the long-term performance of the model. This allows term by term comparison of actual deviation between the predicted and actual response [25]. A low value of MBE is always desired for better accuracy of the proposed model. A positive value of MBE shows an overestimate, while a negative value an underestimate by the model. The RMSE test gives the information on the short-term performance of the proposed model [61]. The value of RMSE is always positive. The following equations deliver the statistical performance indicators for a prediction model.

$${\text{MBE}} = \left( {\frac{ 1}{N}} \right)\sum {(H_{\text{pred}} - H_{\text{meas}} )}$$
(19)
$${\text{RMSE}} = \left( {\left( {\frac{ 1}{N}} \right)\sum {(H_{\text{pred}} - H_{\text{meas}} )^{ 2} } } \right)^{0.5}$$
(20)
$${\text{MPE}} = \, \left( {\frac{ 1}{N}} \right)\sum {\left( { (H_{\text{pred}} - H_{\text{meas}} )/H_{\text{meas}} } \right)} \times 100$$
(21)

On reality, prediction models usually possess low values of MBE, RMSE and MPE indicating acceptable prediction limits. The maximum deviation between the actual and the predicted response values (mean percentage error) should lie between ±10 % for a model to satisfy predictive nature. If the mean absolute percentage error (MAPE) is ≤10 %, then the model has higher prediction accuracy and if 10 ≤ MAPE ≤ 20 means good prediction. MPE ≥ 20 indicates inaccurate prediction [62].

The values of statistical indicators are evaluated during validation and are compared for the suggested multi-parametric and sunshine-based models. The evaluated statistical indicators are compared for Madurai/Sivagangai during validation (2011–2013). The performance comparison is tabulated in Table 7.

Table 7 Performance comparison for the proposed models during validation for Sivagangai

The modified multi-parametric model encompassing significant factors proves to be better accurate and acceptable than basic sunshine based models for prediction of monthly average daily global horizontal irradiance for Madurai/Sivagangai. This is justified from Table 7, where a least MAPE of 2.29 % occurs for the modified multi-parametric model. A similar comparison of percentage error or deviation is made among the modified multi-parametric model and the existing multi-parametric models for the case of Sivagangai during training or model formulation. Selected multi-parametric models whose input parameters were found available was considered. The models which fall in this line were reported by literatures as seen in [6365]. A performance comparison of deviation among the actual and predicted values of global irradiance obtained through the existing and the proposed multi-parametric model is made for the location of Sivagangai and is tabulated in Table 8.

Table 8 Performance comparison among the existing and the reported multi-parametric models for Sivagangai

The proposed model of the form as in Eq. (18) lies close to the actual or the measured values of global irradiance during the training. This is reflected in the least value of percentage error as seen in Table 8. Similarly, the modified multi-parametric model is also applied for Chennai with the evaluated empirical constants and testing data set. A performance comparison of MAPE is made among the existing prediction models for Chennai [18, 19, 66, 67] during validation, considered for the duration from 1980 to 2009. This comparison is made available in Table 9.

Table 9 Performance comparison among the existing and the reported models for Chennai during validation

The modified multi-parametric model works out well for the prediction of global irradiance for the location of Chennai. This is made evident from Table 9, showing the multi-parametric model as in Eq. (18) to experience least MAPE of 0.07 % than the reported models. The proposed multi-parametric model possess better prediction accuracy due to the fact of encompassing significant input factors affecting global irradiance. Hence the selection of suitable model for prediction of global irradiance lie in the availability of model inputs and in the addition of significant factor justified through established physical laws.

4 Energy Prediction Model Emphasized Through Performance and Exergy Analysis

Prediction of energy delivered by a typical photovoltaic system forms a major aspect towards achieving reliability, which is one of the greatest challenge in context to power system operation. This section contributes to the formulation of energy prediction model for prediction of long term (monthly average daily) AC energy generation.

The existing Sandia inverter empirical model employs four equations which are the function of DC power input and the electric self-consumption [68]. The theoretical estimation of DC power output, further depends on models such as Sandia photovoltaic array dependent model and California Energy Commission model [5] which further lies dependent on more number of input parameters such as direct and diffuse radiation, module characteristics, array layout, diode current, reverse saturation current, series and shunt resistance increasing complexity. Thus to reduce the complexity and to make the model most applicable for pre-sizing and installation study, an empirical model independent of module system parameter is highly recommended. This forms the objective for the present section.

The evaluation of an empirical model for energy prediction follows the similar steps of methodology as adopted in formulation of global irradiance prediction. The performance analysis (electrical study) and exergy analysis (thermal study) form the preliminary study emphasizing factor addition towards empirical model formulation.

4.1 Performance Analysis of Solar PV Distribution System (Grid Connected PV System)

The performance analysis for a grid connected PV system deals with the evaluation of performance indicators such as energy generation, yield, performance ratio and efficiency for the monitored duration. Most commonly, monthly average daily based comparison for an annual period is commercially practiced. Hence monthly average prediction is rightly dealt. Furthermore, the most unique performance indicator occurs to be the AC energy generation through which the key performance indices like final yield, performance ratio and capacity factor is made available. Thus the prediction of AC energy generation for a solar photovoltaic system lies important.

The reason for the variation of key performance indices with respect to monitored input identifies the input factors affecting the same. The significant factor affecting the AC energy generation is emphasized through baseline regression analysis employed in RETscreen plus.

A typical case study of 5 MWp PV system is considered whose energy generation is to be predicted. The plant lies operational at Sivagangai. The measured AC energy generation for the monitored duration is shown in Fig. 5. The AC energy generation varied from a minimum value of 19413.1 kWh/day (December) to 27482.8 kWh/day (September). Similarly, the monthly average daily global irradiance varies from a minimum of 4.388 kWh/m2/day in December to 5.986 kWh/m2/day during September. Hence the variation of AC energy generation and global irradiance occurs hand in hand. An increase in global irradiance subsequently increases the AC energy generation.

Fig. 5
figure 5

Monthly average daily variation of AC energy generation and global irradiance for 5 MWp PV

The calculated monthly average daily variation of final yield for the monitored duration for the 5 MWp PV plant is shown in Fig. 6. The nature of variation in final yield is similar to the annual variation of AC energy generation, which ultimately depends on the global irradiance.

Fig. 6
figure 6

Monthly average daily variation of final yield for 5 MWp PV plant

Thus, the significant effect of global irradiance towards AC energy generation is emphasized by Figs. 5 and 6. Furthermore, the same is justified by adopting baseline regression analysis through RETscreen plus. The input factors such as global irradiance and ambient temperature are varied with respect to the response to be predicted or the AC energy generation. The regression coefficient occurring between their variations mark the closeness between the same. The higher the value of regression coefficient, the significant is the considered input parameter towards energy generation. Table 10 represents the effect of variation in global irradiance (H) and ambient temperature (T a) towards the key performance indices such as energy generation and efficiency as obtained from RETscreen plus.

Table 10 Effect of variation of H and T a towards energy generation and efficiency

As inferred from Table 10, the input parameter T a is found to be less significant towards energy generation and efficiency. This lies behind the value of R 2 which varies between the ranges of 0.40 and 0.49 indicating the parameter of T a to be less significant on comparison to global irradiance. Hence the emphasized input factor through performance analysis is the global horizontal irradiance. This is considered as one of the input factor for formulation of empirical model for energy generation.

Multi-parametric system independent energy prediction models are preferred over single parametric energy models. This is because the multi-parametric empirical prediction models experience better prediction accuracy. Hence, the formulation of multi-parametric model is followed for energy prediction too.

4.2 Exergy Analysis of Solar PV System

The effect of photovoltaic deals with the creation of power on exposure of the PV material to sunlight. During the process of power generation, there also occurs simultaneous dissipation of heat or thermal energy. The amount of thermal heat loss dissipated varies with the sizing of PV system. This loss of heat plays a significant role in affecting the performance or the energy generation of the PV system. Thus the knowledge on exergy, which accounts for the variation of thermal exergy loss towards efficiency thereby energy generation is essential for knowing its significance. The term exergy and its concept were first put forward by Gibbs in 1873 [69] and were further developed by Rant in 1956 [70]. Exergy analysis is basically derived from the second law of thermodynamics. Thus, exergy is more concentrated than energy as it considers all the irreversibility’s present in the on-site operation of the plant yielding more meaningful efficiencies approaching to the ideal.

Exergy analysis plays a decisive role in analysis, improvement, design, assessment and optimization of the energy system [71]. The main key features of this analysis are to provide a true measure of actual plant performance and to identify the types, causes and location of thermodynamic losses in the system. The objective of exergy analysis in the present study is to emphasize the significance of thermal exergy loss and module temperature (resulted due to the dissipation of thermal loss) toward energy generation. Though the concept of exergy is dealt with the PV side, the same remains unchanged on integrating the PV array system to the grid. Hence the accountability of thermal loss towards energy modelling remains important.

4.2.1 Assessment of Thermal Exergy Loss

Exergy balance of solar photovoltaic as seen in [72] can be written as

$$\begin{aligned} {\text{Exergy}}\;{\text{input}} - {\text{Exergy}}\;{\text{output}} - {\text{Exergy}}\;{\text{consumption}} & = {\text{Exergy}}\;{\text{accumulation}}. \\ ( {\text{or)}}\;{\text{Exergy}}\;{\text{input}} - ({\text{Electrical}}\;{\text{exergy}} - {\text{Thermal}}\;{\text{exergy}}) & = {\text{Energy}}\;{\text{destruction}} \\ \end{aligned}$$

The thermal exergy loss can be theoretically evaluated [73] as given in equation

$${\text{Thermal}}\;{\text{exergy}} = {\text{UA}}(T_{\text{m}} - T_{\text{a}} )\left( {1 - \frac{{T_{\text{a}} }}{{T_{\text{m}} }}} \right)$$
(22)

U represents the overall heat loss coefficient in (W/m2 °C). T m represents the module temperature. The convective heat transfer coefficient ‘h’ is given by Boyle (2004) [74]

$$h_{\text{c}} = 5.7 + 3.8\upsilon_{\text{s}}$$
(23)

The radiative heat transfer coefficient is small and hence considered to be negligible.

The assessment of thermal exergy loss is carried out for the 5 MWp PV plant to justify the addition of it towards energy prediction. The evaluated monthly average daily thermal exergy loss over the monitored duration of the 5 MWp PVplant is shown in Fig. 7. The thermal exergy loss is found to increase with increase in ambient temperature. The increase in ambient temperature further increases the module temperature. Hence, as the module temperature increases the thermal exergy loss subsequently increases. Hence the module temperature, also acts as a significant factor affecting thermal exergy loss influencing energy generation.

Fig. 7
figure 7

Monthly average thermal exergy loss generated by 5 MWp PV system and the monitored temperature difference

The variation of thermal exergy loss with respect to AC energy generation for the 5 MWp PV plant is shown in Fig. 8.

Fig. 8
figure 8

Variation of thermal exergy loss over AC energy generated for a 5 MWp PV system

The value of R 2 justifying or indicating the effect between thermal loss and AC energy generation (E ac) is found to be 0.771. This greatly implies the justification for inclusion of thermal exergy loss for modelling energy generation. Similarly, the dependence of thermal exergy loss towards energy generation for a 160 kWp PV plant [75] is shown in Fig. 9. The variation of E dc with respect to thermal exergy loss rightly represents the variation of E ac with respect to Exth. The factors influencing the DC energy generation is considered influencing AC energy generation. The value of R 2 is also high amounting to 0.846 emphasizing the effect between energy generation and thermal exergy loss.

Fig. 9
figure 9

Variation of thermal exergy loss over AC energy generated for a 160 kWp PV system

Thus, the effect of thermal loss significantly affects the AC energy generation. Furthermore, the effect of module temperature also influences AC energy generation. This is justified by certain case studies which are described as follows. The effect of module temperature with respect to E ac for a 1.72 kWp roof top PV plant [76] generates the regression coefficient value between the same to be 0.734. The performance of 67.84 kWp PV system [77] possesses an R 2 value of 0.767 as shown in Fig. 10.

Fig. 10
figure 10

Tm versus Eac for 67.84 kWp PV system [77]

The higher the value of R 2 approaching ideality, the more is the significance of response with respect to the input. Thus, the inclusion of T m towards formulation of long-term energy prediction model is well supported by long term realistic PV plant studies.

Ultimately, the factors influencing the DC energy generation of a PV system influences the AC energy generated by the system too. The DC energy of the system varies with AC energy with an assumed constant of proportionality in most cases or the inverter efficiency.

Thus, the AC energy generated can have its dependence as

$$E_{\text{ac}} = {\text{function}}\,(E_{\text{dc}} ,\eta_{\text{inv}} ) \to E_{\text{ac}} = {\text{function}}\,(T_{\text{m}} ,{\text{Ex}}_{\text{th}} );\; [ {\text{from}}\;{\text{exergy}}\;{\text{analysis]}}$$
(24)

Thus, as inferred from Eq. (24), the AC energy generated by the system is influenced by significant factors such as module temperature and thermal exergy loss as concluded from exergy analysis.

4.2.2 Formulation of Empirical Model for Energy Prediction

The input factors affecting the AC energy generation, termed significant are the global horizontal irradiance (H), module temperature (T m ) and thermal exergy loss (Exth). These are the individual input factors contributing towards the formulation of empirical model for energy prediction.

The possible combinations of constituted input factors which are significantly affecting the energy generation include H, T m, Exth, (H * T m) (H * Exth), and H 2. The interactions of the main effects include (H * T m) and (H * Exth). Thus the proposed non-linear model is of the form

$$E_{\text{ac}} = a + bH + cT_{\text{m}} + d{\text{Ex}}_{\text{th}} + eHT_{\text{m}} + fH{\text{Ex}}_{\text{th}} + gH^{2}$$
(25)

The empirical coefficients in the proposed model such as a, b, c, d, e, f and g as in Eq. (25) can be calculated for a solar PV system installed at a particular location employing least square criterion. Thus the proposed model can be made applicable for a location with the assistance of certain input data set called the training data set corresponding to a location.

Thus applying Eq. (25) employing the measured and evaluated training data set the proposed equation for prediction of AC energy generated by a 5 MWp PV plant at Sivagangai employing predicted irradiance (obtained from Eq. (18)) is given by

$$\begin{aligned} E_{\text{ac}} &= -22550 - 11585H + 3590.4T_{\text{m}} + 5.0718{\text{Ex}}_{\text{th}} - 723.72HT_{\text{m}} \\ & \quad - 0.302H{\text{Ex}}_{\text{th}} + 3936.9 H^{2} \\ \end{aligned}$$
(26)

The proposed model is compared with the other existing models as cited in [11, 13, 45]. The absolute mean percentage error varied from a minimum to a maximum of 1.13–7.37 % for the proposed model and the same for the models proposed by Krebs and Gianolli-Rossi [11], Mayer et al. [13], International Energy Agency [45] varied from 0.3 to 24.79 %, 0.5 to 8.4 % and 0.5 to 9.47 % respectively. This is depicted in Fig. 11, which shows the proposed model to be highly acceptable for prediction of monthly average daily energy generated by a PV distribution system.

Fig. 11
figure 11

Comparison of MPE for the existing with the proposed model for 5 MWp PV plant at Sivagangai during training (2011–2012)

The modified form of energy prediction model and the existing models is also applied to a reported case study of 1.72 kWp [76]. A performance comparison of MPE is made among the models and is represented in Fig. 12.

Fig. 12
figure 12

MPE of the energy prediction models for a 1.72 kWp PV plant at Durban

As seen from Fig. 12, the mean percentage error for the individual observations is least for the proposed energy prediction model than the reported energy prediction models. The adaptability of suggested energy prediction model for varying peak power capacity is also inferred on its application to PV plant at Durban.

The advantage of the modified empirical energy prediction model lies in the incorporation of system independent or metrological factors for energy prediction. Furthermore, the model is limited to real-time monitored input parameters such as ambient temperature and wind speed. In addition, the improved accuracy of the proposed model resulted due to the account of factors emphasized through performance (electrical) and exergy (thermal) analysis.

5 Summary

In order to experience improved prediction accuracy multi-parametric model is preferred over single parametric model. In addition, incorporation of significant input factors affecting energy generation also plays a vital role in yielding improved prediction accuracy. An improved empirical model for prediction of monthly average daily global horizontal irradiance tested for locations of Madurai/Sivagangai and Chennai are proposed. Furthermore, an improved energy prediction model is also formulated with predicted global irradiance for a 5 MWp PV system whose AC energy generation is predicted over a long term horizon (monthly average daily). The advantage of the proposed models includes its limitation towards real time measured input parameters which is absent in the existing empirical model. Moreover, the cost experienced for measuring the independent model parameters should be less than the direct measurement of the depended parameter or the desired response (global irradiance and energy generation). This becomes the adequate necessity of an empirical model. Hence, the proposed models lie in line with this adequate necessity.