INTRODUCTION

The Algerian Big South is a carrier of a development promise for Algeria, where it covers the bulk of Algerian territory. In addition, it is expected to become the Algeria’s supply of renewable energy especially solar energy. Algeria is one of the Sun Belt countries [1]. Therefore, the potential of solar radiations in the Sahara (Algerian) is highly significant; a median annual radiation of 2650 kWh/m2/year and the insulation time reaches 3500 h/yr [2, 3].

With the exception of rural areas, the global solar radiation (GSR) measuring instruments are widely available in Algeria. The empirical modeling, estimation, and evaluation of the GSR estimation in these rural areas is an economically feasible option. Previous publications show that appropriately constructed empirical GSR models are accurate enough for most solar energy applications. In this context, several empirical models have been developed to estimate global solar radiation based on meteorological parameters such air temperature, relative humidity, cloud cover, and sunshine duration. The most widely model used in different locations of the world which presents a linear regression model used in correlating the global solar radiation data with relative sunshine duration was proposed by Angstrom–Prescott [4], and several modifications have been proposed since it was developed, many researchers have established correlations based on these model and exploring other regression forms.

Recently, several studies realized in different places in the world have appeared in the literature, some of these studies have been reviewed here. Abdo and EL-Shimy [5], Almorox and Hontoria [6], Bakirci [7] and Namrata et al. [8] attempted to estimate the monthly means for daily global solar radiation taking into consideration only the sunshine duration measurements. Duzen and Aydin [9] correlate the global solar radiation to relative sunshine duration in simple linear, quadratic, cubic, logarithmic and exponential regression form for Lake Van region in Turkey. Some other reported literatures was based on the empirical relationship using other commonly measured meteorological data; Khalil and Shaffie [10] employed shine hours, ambient temperature, cloud cover, relative humidity, maximum and Minimum ambient temperatures to determine direct and diffuse and global solar radiation for Cairo in Egypt. Almorox et al. [11], concluded that daily global solar radiation can be estimated properly with temperature based models for locations where measured values are not available. The authors presented a new linear model which considered the saturation vapor pressure and temperature range. Further, several evaluation studies were developed for different locations in Algeria. For example, Mecibah et al. [12] developed a study for estimating the monthly GSR on a horizontal surface in six Algerian sites using just sunshine duration data. Chegaar and Chibani [13] employed sunshine hours, air temperature, relative humidity, number of rainy days, altitude, latitude, total precipitation, evaporation and cloudiness in a model to determine monthly average daily GSR radiation. Nia et al. [14] evaluated eight regression models to estimate the monthly mean daily solar radiation on a horizontal surface at various places in Algerian, by using sunshine. Salmi et al. [15] evaluated three sunshine duration model to estimate the monthly mean global solar radiation on a horizontal surface in various cities in Algerian (Algiers, Constantine, Oran and Tamanrasset). Aoun and Bouchouicha [16] presented and compared ten simple empirical models for estimating monthly mean daily global radiation on a horizontal surface, these model are compared with three existing models proposed by [14, 15].

As this literature review attests, although the developped models have been evaluated in few cities of Saharan Algeria region, monthly average daily global solar radiation data are needed for other cities of Algeria as a Saharan Medium region. The objectives include the construction of a systematic framework for the construction of accurate empirical models for the estimation of the monthly average daily GSR over a horizontal surface. The available long-term GSR measurements over in the Algerian site of Adrar are considered in the model construction and evaluation phases. The site is located Southern West region exactly at 27°53′ N latitude and 0°17′ W longitudes. The GSR measurements from year 2009 to year 2013 are used in the modeling phase while the measurements in the duration 2014–2016 are used for the model evaluation phase. Several statistical indicators are used for assessing the accuracy of various models and the best model is then selected.

DATA SOURCE

The data of the solar radiation used in this study are obtained from The Research Unit for Renewable Energies in the Saharan Medium (URERMS), affiliated to the Renewable Energy Development Center. The GSR measurements are performed by the Kipp and Zonen CM11 pyranometer. The collected dataset is divided into two datasets. The first dataset contains the mean values of the GSR in the period 2009–2013 and it is used for model construction. The mean GSR values of the second dataset cover the period 2014–2016 and they are used for model evaluation.

METHOD FOR ESTIMATION

In the absence of solar radiation measurements, it is necessary to develop methods sufficiently accurate estimation. There are numerous methods for estimating the solar radiation from other available meteorological measurements. These methods are usually based on the use of input datasets such as astronomical factors, geographical factors, geometrical factors, physical factors, and meteorological factors. Empirical models of the GSR are classified into three main categories;

(a) Models based on the sunshine duration data,

(b) Models based on the Temperature data, and

(c) Detailed meteorological data-based models.

The most popular meteorological based model is the temperature-based model. This is due to the availability of the temperature datasets in the vast majority of meteorological stations. In this model, the datasets of the minimum and maximum temperatures are used for estimating the GSR. If the empirical parameters of the temperature model are well calibrated for the considered site, the accuracy of the temperature model can reach high levels.

The Angström formula (Eq. (1)) is a simple model used for estimating the monthly average daily GSR on a horizontal surface. This formula is the modified form of the Angström–Prescott equation [17] and it provides good results if sunshine fraction (\({{S}_{f}}\)) is available, which is the ratio between the actual sunshine hours and the maximum possible sunshine hours respectively (\(S\) and \({{S}_{o}}\), accordingly).

$${{H}_{g}}/{{H}_{o}} = a + b{{S}_{f}}{\text{,}}$$
((1))

where a and b are regression constants determined empirically. The monthly average daily extraterrestrial solar radiation on a horizontal surface in MJ/m2 day can be calculated by using [18]:

$$\begin{gathered} {{H}_{o}} = \left( {1/{\pi\text{}}} \right){{I}_{{sc}}}f \\ \times \,\,\left[ {\cos \delta \cos \lambda {\text{sin}}{{{\omega\text{}}}_{{\text{s}}}} + \left( {\frac{{\pi {{\omega }_{{\text{s}}}}}}{{180}}} \right)\sin \delta \sin {{\varphi }_{{\text{s}}}}} \right], \\ \end{gathered} $$
((2))

where Isc is the total extraterrestrial solar irradiation (118.108 MJ/m2 day); f is the eccentricity correction factor; λ is the latitude in degrees; δ is the solar declination in degrees; and ωs is the hour angle of the sun in degrees. These parameters can be computed by the following equations [19]:

$$\begin{gathered} f = 1.00011 + 34.221 \times {{10}^{{ - 3}}}\cos \gamma \\ + \,\,1.28 \times {{10}^{{ - 3}}}\sin \gamma + 7.19 \times {{10}^{{ - 4}}}\cos 2\gamma \\ + \,\,7.70 \times {{10}^{{ - 5}}}\sin 2\gamma ; \\ \end{gathered} $$
((3))
$$\begin{gathered} \delta = \left( {180/\pi } \right)\left( {6.918 \times {{{10}}^{{ - 3}}}~-\,39.991 \times {{{10}}^{{ - 2}}}\cos \gamma } \right. \\ + \,70.257 \times {{10}^{{ - 3}}}\sin \gamma - 67.58 \times {{10}^{{ - 4}}}\cos 2\gamma ~ \\ + \,\,9.07 \times {{10}^{{ - 4}}}\sin 2\gamma - 26.97 \times {{10}^{{ - 4}}}\cos 3\gamma \\ + \,\,\left. {~1.48 \times {{{10}}^{{ - 3}}}\sin 3\gamma } \right), \\ \end{gathered} $$
((4))
$$\gamma = 2\pi \frac{{dn - 1}}{{365}};$$
((5))
$${{{\omega\text{}}}_{s}} = {{\cos }^{{ - 1}}}\left( { - \tan \lambda \tan \delta } \right).$$
((6))

The maximum possible sunshine hours is approximated by [20]:

$${{S}_{o}} = \left( {2/15} \right){{\omega }_{s}},$$
((7))

where γ is the day angle in radians, and \(dn\) is the day number; 1 at January 1 and 32 at February 1. \({{H}_{o}}\) of a given month is the daily extraterrestrial solar radiation of 15th day of that month.

Sunshine Based Empirical Models

In the Sunshine based models (also shown in Table 1), the monthly mean global fraction of the solar radiations is formulated as a function of the sunshine fraction (\({{S}_{f}}\)). In Table 1, four empirical models are considered; quadratic and cubic, logarithmic and exponential models. The table shows the popular literature models.

Table 1. Summary of selected empirical models considering the sunshine duration, and temperature

Temperature Based Empirical Models

Researchers have also developed numerous empirical equations for connecting the GSR with the daily air temperature data. Models based on the temperature assume that the difference between the maximum and minimum temperature is directly linked to the monthly mean global fraction of solar radiations at the ground level [20]. Models of solar radiation that utilize the maximum and minimum air temperatures are shown in Table 1.

Empirical Models Based on Detailed Meteorological Variables

The temperature-based models use maximum and minimum air temperature to estimate atmospheric transmissivity. These models assume that maximum temperature will decrease with reduced transmissivity. On the other hand, the minimum temperature will increase due to the cloud emissivity. Clear skies will increase the maximum temperature due to higher short-wave radiation, and minimum temperature will decrease due to higher transmissivity. Several studies have demonstrated that adding others meteorological data to the temperature-based model results in enhanced accuracy levels. These models include one or more of the following meteorological variables: precipitation, dew point temperature (\(T\)), relative humidity (\(RH\)), cloud cover (C) and averaged saturation deficit. These correlations are shown in Table 2.

Table 2.   Summary of emprical models considering detailed meterological data

VALIDATE MODELS

As stated in Section 2, the GSR as well as other meteorological dataset of Adrar is divided into two subsets. The first subset is used for determining the coefficients of various empirical models, while the second subset is used for testing the accuracy of each model in estimating the monthly average daily GSR. The 15 models shown in Tables 1 and 2, as well as the model of Eq. (1) are considered. The validity of each model is statistically assessed through six statistical indicators [33]; mean bias (MBE) in MJ/m2 day, relative error (MPE), root mean square error (RMSE) in MJ/m2 day, root mean square difference (RMSD), correlation coefficient (R), the Nash-Sutcliffe (NSE). The best model for the considered site is selected based on the superiority of the values of the statistical indices in comparison with other models. The equations of these statistical indicators are [3436]:

$${\text{MBE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^n \left( {{{H}_{{i,m}}} - {{H}_{{i,c}}}} \right);$$
((8))
$${\text{MPE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^n \frac{{\left( {{{H}_{{i,m}}} - {{H}_{{i,c}}}} \right)}}{{{{H}_{{i,m}}}}} \times 100;$$
((9))
$${\text{RMSE}} = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{{\left( {{{H}_{{i,m}}} - {{H}_{{i,c}}}} \right)}}^{2}}} ;$$
((10))
$${\text{RMSD}} = \frac{{100}}{{{{{\bar {H}}}_{{i,m}}}}}\sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{{\left( {{{H}_{{i,m}}} - {{H}_{{i,c}}}} \right)}}^{2}};} $$
((11))
$$R = \frac{{\sum\limits_{i = 1}^N {\left( {{{H}_{{i,c}}} - {{{\bar {H}}}_{c}}} \right)} \sum\limits_{i = 1}^N {\left( {{{H}_{{i,m}}} - {{{\bar {H}}}_{m}}} \right)} }}{{\sqrt {\sum\limits_{i = 1}^N {{{{\left( {{{H}_{{i,c}}} - {{{\bar {H}}}_{c}}} \right)}}^{2}}} \sum\limits_{i = 1}^N {{{{\left( {{{H}_{{i,m}}} - {{{\bar {H}}}_{m}}} \right)}}^{2}}} } }};$$
((12))
$${\text{NSE}} = 1 - \frac{{\sum\limits_{i = 1}^n {{{{\left( {{{H}_{{i,m}}} - {{H}_{{i,c}}}} \right)}}^{2}}} }}{{\sum\limits_{i = 1}^n {{{{\left( {{{H}_{{i,m}}} - {{{\bar {H}}}_{{i,m}}}} \right)}}^{2}}} }},$$
((13))

where N is the total number of available data points and \({{H}_{{i,c}}}\) and \({{H}_{{i,m}}}\) are the i-th calculated value and measured value, respectively.

RESULTS AND DISCUSSION

The long-term average annual variations of various meteorological parameters are illustrated in Fig. 1 for investigated location. The shown data include air temperature, relative humidity, sunshine fraction, and cloud cover. The sunshine fraction is used for the rough description of the fair weather based on the relative sunshine hour classification scheme reported by [37]. Through the application of statistical regression techniques, the first dataset is used for estimating the coefficients of each of the considered models (see Eq. (1), Tables 1 and 2). The results are shown in Table 3.

Fig. 1.
figure 1

Long-term average annual variation of various meteorological parameters for the candidate location.

Table 3.   Regression analysis and statistical test of various models for estimating the GSR on a horizontal surface at the Adrar sit. Shaded row presents the best found model. Italic numbers present worst indicators while bold numbers present best indicators

The correlation coefficient and the RMSD have been given high attention in [5], [38], for selecting the best empirical models for GSR estimation. This is because these indicators respectively, provide clues about the strength of linear dependence between the measured and estimated values, and the deviation between them. As previously stated, in this paper, eight statistical indicators are used for assessing the accuracies of various empirical models in estimating the GSR. The selection of the best model is then based on its overall statistical performance in comparison with the other models. Through the use of numerous statistical indicators, the confidence of the results is enhanced. In addition, the statistical behavior of each model can be easily depicted as shown in Table 3. It is generally depicted from the results that all the models provide good performances with a minimum value of correlation of 0.971 (models M6) and the maximum value of RMSD of 6.083% (model M4). The worst values of the MBE, MPE, RMSE, and NSE respectively, are 0.463 (model M4), 1.390 (model M13), 1.345 (model M4), 0.914 (model M4).

The best values of various statistical indicators are illustrated in Table 3 by bold numbers while their worst values are illustrated by italic numbers. It can be easily seen from Table 3 that best model is the model number M11 while the worst model is the model number M4. A comparison between the statistical performances of these two models is shown in Fig. 2 which proves the superiority of model M11 in comparison with model M4. Therefore, the model number M11 is selected for estimating the GSR over the Adrar site. Further evaluation of the accuracy of the selected model will be presented by the use of the evaluation subset of measured data as explained in Section 2.

Fig. 2.
figure 2

Comparison between the statistical performances of models M4 and M11.

Figure 3 illustrates a comparison between various groups of models i.e. sunshine duration, temperature, and detailed meteorological data groups. For enhancing the readability of the figure, the correlation coefficient and RMSD are only displayed. It can be seen from Fig. 3 that detailed meteorological group provides the highest value of the correlation coefficient and the lowest value of the RMSD. Therefore, the detailed meteorological data-based models show a superior performance in comparison with the sunshine duration and the temperature-based models. On the other hand, the maximum correlation provided by the sunshine duration group equals to the highest correlation of the temperature group; however, the minimum RMSD of the sunshine duration group is lower than the minimum RMSD of the temperature group. Therefore, it is clear that the sunshine duration group shows a better performance in comparison with the temperature group.

Fig. 3.
figure 3

Comparison of the statistical performance of various model groups. Correlation coefficient (a), RMSD (b).

As described in Section 2, the GSR data in the period from year 2014 to year 2016 are used for evaluating the accuracy selected models in estimating the GSR on a horizontal surface over Adrar. Based on the results of Table 3 and the previous analysis, it is found that the model number M11 provides the best statistical performance in fitting the dataset of years 2009–2013. Careful inspection of the performance of the other considered models shows that the performances of model’s number 2, 9, 10, and 12 are very close to the performance of model M11. This is illustrated in Fig. 4. It can be easily seen from Fig. 4 that the selected cloud cover (models M9) and the detailed meteorological based models (model M12) are highly comparable from the statistical performance point of view. In addition, these models have a closer statistical performance to the best model (i.e. model M11) in comparison with the selected temperature-based model (model 11). Therefore, it is clear that adding the cloud cover to the traditional Angström formula (Eq. (1)) improves its statistical performance. On the other hand, models that exclude the maximum and minimum temperatures suffer from degradation in their statistical performance.

Fig. 4.
figure 4

Statistical indicators of the best selected models for GSR estimation over Adrar.

The prediction of the GSR for the years 2009–2016 is performed using the best selected models and compared with the measured GSR (\({{H}_{m}}\)) in the same period. The results show that the estimation using the model number 11 provides the best match with the measured data in comparison with the other selected models. It is shown in Fig. 5 that model M11 has the tendency to slightly overestimate the values of the GSR. As shown in Fig. 5, the estimated and the measured GSR values can be approximated by a linear relation that takes the form,

$${{H}_{{11}}} \cong 0.938{{H}_{m}} + 1.119.$$
((29))
Fig. 5.
figure 5

GSR predicted by the best model in comparison with the observed GSR values over Adrar region for the period 2009–2016.

CONCLUSIONS

This paper considers the problem of GSR estimation through empirical modeling. Various categories of empirical models are considered and they are statistically analyzed through many statistical indicators. Based on their input parameters, popular as well as proposed empirical models are classified into three groups; the sunshine fraction, temperature and detailed meteorological data models.

Two long-term averaged radiometric and meteorological measurements datasets are used in the presented analysis, the data are collected from the Adrar’s observing station, situated in the Algerian Big South region. The first dataset which covers the period from 2009 to 2013 are used for estimating the coefficient of the considered 16 empirical models (4 sunshine fraction models, 4 temperature models, 7 detailed meteorological data-based models and the traditional Angström formula).

The first dataset is also used in performing statistical tests on each empirical model. This for determining the statistical performance of various models and for selecting the best candidates for GSR estimation over Adrar. Six statistical indicators are considered for enhancing the confidence of the results and for the deep understanding of the statistical performance of the empirical models. The detailed inspection of the statistical performance of various model categories can help in selecting appropriate GSR models for the selected site.

The results obtained in this study indicate that all the models provide good statistical performances, the selected sunshine fraction and the detailed meteorological based models show a comparable statistical performance, however, the model number M11 shows the best statistical performance. It is important to note that the nest found model belongs to the category of detailed meteorological data-based models. In addition, the performance of the selected temperature-based models is weaker than both the sunshine fraction and detailed meteorological data-based models. The selected detailed meteorological based models show the most superior statistical performance in comparison to all selected models. This is attributed to the effect of considering the impact of the cloud cover on the Angström formula in estimating the GSR.

The best-selected model is evaluated through the use of the second dataset that cover measurements from years 2014 to 2016. The results show the capability of the selected model to accurately estimate the GSR over Adrar; however, it is shown that this model slightly overestimates the GSR values. This model can be applied to estimate the monthly average global solar radiation on a horizontal surface for any sites in similar conditions of climate, in the absence of radiometric measurements where measurements of meteorological parameters were available.