1 Introduction

Recently, renewable energy (RE) is recognized to become one of the most successful renewable energy resources with the capacity to change the country’s power profile. The RE has been predominantly produced from several sources such as thermal, solar sunlight, wind, hydraulic and so on. Actually, the various country produce the electrical energy from RE for significantly reduced environmental damage, in fact the greenhouse effect has already encouraged the discovery of new alternative energy systems [1]. Since the strong growth of the above resource is expected to enhance government, it is also important to identify the opportunities, difficulties and prospective of RE development. In Morocco, the government has implemented various RE projects to generate 42% of electricity by 2020, focusing primarily on solar and wind energy technologies [2]. The main difficulty in the development of these technologies is to employ solar energy as an electricity source in the photovoltaic generator (PVG), thermal solar energy (TSE) and solar concentration technologies (CPV). This difficulty has encouraged scientists and academics to identify the effective mechanisms for forecasting the value of solar radiation, since the generation production of solar photovoltaic system depends directly on global solar radiation [3]. We noticed that the precision of solar radiation models is closely linked to the performance of modelling the power production of installed solar systems and influences the management and scheduling of future sustainable energy installations [4]. Through the use of an appropriate model for forecasting solar radiation, it is feasible to control the power provided by the photovoltaic system. In fact, the assessment, evaluation and forecasting of global solar radiation are necessary due to the great importance for the performance of PVG, TSE and CPV in electrical energy production and its integration in the local electrical grid. In terms of improving and confirming that the power generated from RE source is well injected into the electrical grid without perturbation. Throughout this regard, the forecasting of global solar radiation will have an important impact on the development and management of future energy systems. Forecasting serves a significant task in controlling the performance of the electricity grid [5]. Different methods to forecast global solar radiation have been developed [6]. They depend on both available data and their specific forecasting horizon. Different forecasting categories are summarized in Table 1 [7]. Several scientists recommend three categories of the forecast horizon: short-term, medium-term and long-term [8]. Some others proposed a fourth category [9] depending on the criteria of the decision-making phase for intelligent or microgrids [10, 11], appropriately referred to as the very short-term or ultra-short-term prediction horizon. Nevertheless, no commonly accepted standard has been set.

Table 1 Time scale category of forecasting

Countless techniques for forecasting DGSR have been documented in the different scientific references. Predicting methodologies are commonly organized into three major categories, namely traditional models [15], machine learning and statistical regression methods [16]. The use of a combined methodology is suggested to enhance the performance of each single model. The traditional models, also called statistical or mathematical models, can be classified into dynamic [17] and empirical [18] models. Dynamic models were used to forecast global solar radiation in long-term durations. Among empirical models, those based on the use of accessible weather data as inputs were usually preferred, thanks to their low computational cost and easy data requirement [15]. The basic principle on which they ground is the association between global solar irradiation and weather and/or climatic parameters, such as sunshine duration and air temperature. Hargreaves et al., [19] presented the first empirical model using temperature by assuming the temperature difference ΔT assigned to global solar radiation. Meanwhile, Bristow et al. [20] proposed the exponential relationship between global solar radiation and ΔT (Bristow-Campbell model), which might describe 70-90% of global solar variability in America. Later, Hassan et al. [21] compared 3 newly designed models and 17 existing DGSR models in Egypt, and recorded that the temperature-based model is the most reliable in term of greater forecasting precision with respect to traditional sunshine-based models. The models are tested and evaluated on a 20-years span observed dataset. Results show that the new model is particularly relevant in weather forecasting techniques. On the other hand, Fan et al. [15] examined and reviewed 14 emerging temperature models and introduced 6 new temperature models in China. The newly developed polynomial model assures reliable DGSR forecasting and it can be implemented in environments in which only air temperatures are available. Also, this model is used to characterize the mathematical equation relationship between global solar radiation and the associated environmental parameters. Commonly, the above methodology does not require historical data, instead depends heavily on detailed station placement descriptions and climate conditions. The input measurements are used to identify of DGSR dependent on meteorological conditions. These models can be both simple, if based on solar sunshine duration or more complex, if additional parameters such as temperature, wind speed, dust and relative humidity are included. Therefore, the traditional approaches will not be suitable to forecast global solar radiation throughout any specified temporal and geographical scale, particularly in short term.

Although empiric models are suitable to forecast global solar radiation in different conditions, their findings have been less accurate compared to machine learning models [22]. Recently, quite a variety of machine learning models have been constructed to forecast global solar radiation. Similarly to all artificial intelligence (AI) strategies, machine learning (ML) does not require any priory knowledge of the system, and it can deal with problems that cannot be depicted by concrete algorithms [22], and nowadays is among the most effective methods for time series data forecasting. However, the most adopted machine learning (ML) algorithm in the DGSR forecasting is the artificial neural networks (ANN). [22]. Notton et al. [23] implemented the artificial neural networks (ANN) to forecast both the global horizontal radiation (GHR) and direct normal radiation (DNR) over 1 h (h + 1) to 6 h (h + 6). From a deep review about the ANN method application in solar irradiation field [6], the suitability of the methodology clearly erases. The conclusions of that kind of assessment underline the ability of the hybrid ANN method.

The third category is based on statistic regression or probabilistic techniques focused on follow-up measurements or determining data, generally used for short-term and very short-term forecasting [24].

These methods rely more heavily on historical data and they are used to evaluate the intrinsic rule of forecasting by analysing past information. In addition, these methods are based on historical records associated with meteorological information, regardless of the fact that the past data will appear in the future [25]. The principal drawback of statistical methods is the fact that they ground on the hypothesis of linear stationary structures that are not suitable for nonlinear solar radiation. Consequently, the forecast performance of the statistical model depends on the time and reliability of the historical data. These methods are also known as black box. Some of the most widely adopted time series analysis models are autoregressive (AR), moving average (MA), autoregressive moving average (ARMA) [26], autoregressive integrated moving average (ARIMA), seasonal autoregressive integrated moving average (SARIMA), autoregressive moving average with exogenous inputs (ARMAX), autoregressive integrated moving average with exogenous data (ARIMAX) [27]. Bacher et al. [28] presented the AR with exogenous inputs model to forecast the hourly value of solar photovoltaic power. They noticed that the AR model is operating well when the prediction period is up to two hours. Benmouiza et al., [26] used a combination between ARMA and nonlinear autoregressive (NAR) to forecast multihours ahead global horizontal solar radiation. The importance of the combination is to enhance the efficiency of the single methods. In fact, ARMA requires a stationary time series, whereas most real time series are not. On the contrary, ARIMA approach does not into account the mechanism behaviour and incorporates non-stationary elements from time series information [29]. The ARMAX concept does not depend on the forecast of solar radiance, but it is considered as the traditional model of ARIMA [29]. ARIMA models will certainly explain the complexities of the data in a provided application. The effectiveness of the ARIMA model is attributed to its computational characteristics as well as the well-known Box-Jenkins methods in the model construction process [30]. In fact, a collection of analytical smoothing methods can be related to ARIMA models [31]. While ARIMA models are very robust because they can reflect many different patterns of time series, their key weakness is the pre-assumed linear structure of the systems. Forecasting performance may be increased if several independent models are adapted to the same data rather than using a single model. Several hybrid methods have also been suggested in the research, incorporating the benefits of two or more different models.

Currently, numerous researchers have applied a combination of the different single method to enhance the performance of the forecasted results and to overcome various problems such as the nonlinearity and complexity of the weather data. The hybrid technique becomes actually the most used strategy in forecasting due to its ability to forecast complex and nonlinear input parameters. The use of ARIMA–ANN was introduced by [32] to present more accurate forecasting with respect to individual models. The same method was used by Babazadeh et al. [33] to forecast the gasoline price consumption. This technique was applied to several fields such as water quality prediction [34], electricity price [35], stock index returns [36] and global solar radiation [37].

In this background, this paper aims to forecast the global solar radiation for three different regions in Morocco using the hybrid ARIMA–ANN model. The DGSR experimental data are collected from three different cities, namely Er-rachidia, Ifrane and Tanger. The three cities are characterized by different climate zoning from Mediterranean to hot desert climate passing from cold region of the second city. The motivation for using the ARIMA–ANN model is due to its strong capacity to model nonlinear and complex structures through time series analysis. The results obtained subsequently showed a strong matching with the low error experimental data. Also, the results are compared with single ARIMA model, ANN model and experimental data, respectively. In order to evaluate the results, specific error parameters are also determined and examined such as mean absolute percentage error (MAPE), mean bias error (MBE) and percentage MBE, root mean square error (RMSE) and percentage RMSE, standard deviation (SD) and percentage SD, normal root mean square error (NRMSE) and T-statistic (TS). The overall performance is evaluated by determination coefficient (R2). Another analysis is based on the linear regression coefficients.

This paper is structured as it follows: Sect. 2 describes the material and methodology through the experimental data and the ANN, ARIMA and ARIMA–ANN models. Sections 3 and 4 present the evaluation of forecast performance and the empirical results and discussions, respectively. Concluding remarks are reported in Sect. 5.

2 Materials and methods

As part of its venture regarding electricity consumption, Morocco gives priority to increasing clean energies and sustainable development. Morocco has a very high solar potential, more than 2600 kWh/m2/year and connected with Spanish network via 400kv and 700 MW power lines. The Moroccan government has installed various renewable energy projects to get the target of 42% of electricity from sustainable energy by 2020 and 52% by 2030 [2]. The most important project aims to generate 2 gigawatts in five major projects installed in Ouarzazate, Ain Bni Methar, Boujdour, Foum Al Oued and SebkhatTah, using photovoltaic and concentrated solar power. A three-case study is chosen from the installed project named Er-rachidia, Ifrane and Tanger. The selected locations are evaluated and investigated, according to a variety of research analyses of the Moroccan environment, a government agency specializing in sustainable energy and efficiency has created an environment zone [38].

2.1 Experimental data

The three selected cities Er-rachidia, Ifrane and Tanger are characterized by different climate conditions. The first Er-chidia (latitude = 31.930°, longitude = − 4.424°, altitude = 1080 m) is characterised by a hot a desert climate, dry and mostly clear year. Generally, temperatures vary from 3 °C to 38 °C. The summers in Ifrane (latitude = 33.500°, longitude = − 5167°, altitude = 1665 m) are short, warm and arid, the sky is mostly clear around the year. The temperature frequently varies from -3 °C to 28 °C. Finally, in the third city (Tanger) (latitude = 35.580°, longitude = − 5900, altitude = 21 m), the summers is warm, humid and arid, and the sky is mostly clear, and the temperature typically varies from 9 °C to 29 °C.

The collected data have been measured every 10 min by Pyrometers instrument from Kipp and Zonen type CM11 (Fig. 1) [39] at three locations. The Pyranometer is characterized by excellent linearity, fast response time and low tilt error.

Fig. 1
figure 1

Pyrometers Kipp and Zonen CM11 type [39]

The historical measured data covered the period from January 2013 to December 2015. An example of the DGSR on a horizontal surface is illustrated in Fig. 2.

Fig. 2
figure 2

Example of DGSR of the three selected locations (Er-rachidia, Ifrane and Tanger)

Figure 2 shows regular annual fluctuation in DGSR. From this figure, it emerges that the solar radiation time series is a non-stationary, being strongly affected by an annual phenomenon. Annual phenomenon can be differentiated. In fact, while the average of DGSR is generally the same, the regular peak of radiation is variable except on consecutive days.

2.2 ARIMA model

The ARIMA model is widely used in several fields (econometrics and engineering) [40].The ARIMA model calculates the significance of the generated time series as a linear composition from its historical measurements [36]. The common form of the ARIMA (p, d, q) includes a mixture of three types of models: p is the order of the autoregressive (AR) model; d is the degree of differencing to keep data stationary (I); and q is the order of the moving average (MA) model. The general of the ARIMA model is presented in the Eq. 1 [30].

$$ \left\{ \begin{aligned} &Y_{t} = \left( {1 - B} \right)^{d} \left( {1 - B^{s} } \right)^{D} X_{t} - \mu \hfill \\ &\phi \left( B \right) = \varphi \left( {B^{s} } \right)Y_{t} = \theta \left( B \right)\varTheta \left( {B^{s} } \right)Z_{t\,\,,\,\,\,} Z_{t} \,\infty \,N\left( {0,\sigma^{s} } \right) \hfill \\ \end{aligned} \right. $$
(1)

with \( \varPhi \left( z \right) = 1 - \sum\nolimits_{i = 1}^{p} {\varPhi_{i} z^{i} } \), \( \varphi \left( z \right) = 1 - \sum\nolimits_{i = 1}^{p} {\varphi_{i} z^{i} \,} \), \( \varPhi \left( z \right) = 1 - \sum\nolimits_{i = 1}^{p} {\varPhi_{i} z^{i} } \) and \( \varTheta \left( z \right) = 1 + \sum\nolimits_{i = 1}^{p} {\varTheta_{i} z^{i} \,} \)where \( \emptyset ,\theta ,\varPhi ,\varTheta \) present the polynomial coefficients, D and s represent the order of differentiation of the seasonal part period, the part of seasonal autoregressive and seasonal moving average part of the model.

The time series forecasting by ARIMA models could be carried out by four steps: classification, approximate, treatment and forecasting [41, 42]. In this paper, the four fundamental steps of ARIMA are selected carefully and have the following order: firstly, we begin with the identification by choosing the best fit model referred to the autocorrelation function (ACF) and partial autocorrelation function (PCAF) to evaluate the practicable persistence possible arrangement in the DGSR data. The behaviour of the ACF and PACF analyses allows identifying the ARIMA model that explains the corresponding stationary time series. The second step is the model’s approximate input (x) parameters using one of the determination methodologies. The third step is diagnostic; it involves the residual value of the chosen model with findings and measurements of the approved model. The last one is predictive; it produces forecasts and calculates a random error. In addition, the ARIMA is investigated through the Ljung-Box Q test [43], where the insignificant assumption specifies that there is no residual autocorrelation for k lag at the time of the test referred to Q is defined as:

$$ Q = n\left( {n + 2} \right)\sum\limits_{k = 1}^{h} {\frac{{\rho_{k}^{2} }}{n - k}} $$
(2)

where n is the sample size, \( \rho_{k} \) is the sample autocorrelation at lag k, and h is the number of lags being tested.

2.3 Artificial neural network

The ANN is well recognized as an effective AI computing device that has already been largely used in all disciplines such as telecommunication, materials, medicine and neurology fields [44, 45]. The ANN procedure technique is essentially based on the input layer and data acquisition ability named hidden layers for the output layer. ANN has been widely used in single [46] and combined forecasting with statistical regression [47] to forecast photovoltaic system power. Results are better than other techniques [22]. In this context, to overcome several problems, a feed-forward network (FFN) based on a back-propagation learning (BPL) technique was selected to forecast the DGSR at case study cities as illustrated in Fig. 3.

Fig. 3
figure 3

A comprehensive form of ANN model

The response variable Yk is represented as

$$ Y_{k} = \phi_{0} \sum\limits_{j = 1}^{n} {\left( {w_{kj} + \theta_{k} } \right)} $$
(3)

where \( \varPhi_{0} \) is the activation function of the hidden layer, \( Y_{k} \) is the output of the hidden layer kth, \( \theta_{k} \) represents the bias value of the hidden layer, and \( w_{kj} \) is the synaptic weight value from input to \( x_{j} \) to the hidden layer k.

The forecast performance evaluation of the implemented models is categorized into two sample procedures; the first is a learning dataset that is used specifically for model creation, containing all inputs and forecasted outputs, the second is to validate model through testing dataset.

2.4 Proposed hybrid ARIMA–ANN model

ANN can be applied to nonlinear systems, ARIMA to linear ones. The mixing of the two models can overcome several problems of nonlinearity and stationary or non-stationary data [48]. The ARIMA–ANN model is the combination of linear presented by ARIMA and nonlinear presented by ANN. In addition, the ARIMA model is adapted to the time series data forecasting, and the error sequence is assumed to appear as nonlinear function and modelled by applying ANN. Further, the forecasting results are provided by the ARIMA and ANN models, which are combined to get the last step of the estimation. In term of efficiency and performance, this hybrid model is better than the individual ARIMA and ANN models, as illustrated by [32]. According to this paradigm, time series data are presumed to be a collection of linear and nonlinear subsystems, presented in the context by:

$$ Y_{t} = L_{t} + N_{t} $$
(4)

while Lt represents a linear element and Nt represents a nonlinear element.

The residuals are necessary for identifying the adequacy of linear models and are modelled by ANNs and given as:

$$ e_{t} = f\left( {e_{t - 1} ,e_{t - 2} , \cdots \cdots e_{t - n} } \right) + \varepsilon_{t} $$
(5)

where \( f \) is the nonlinear function determined by ANN and \( \varepsilon_{t} \) is the random error.

The forecast equation can be written in the following equation.

$$ \hat{y}_{t} = \hat{L}_{t} + \hat{N}_{t} $$
(6)

In conclusion, the mentioned procedure for the hybrid method constitutes of two steps. The first step is by using the ARIMA model to evaluate the linear aspect of the problem. In the second step, an ANN model is built to model the residuals of the ARIMA model. Since the ARIMA model could not control the nonlinear nature of the dataset, the residuals of the linear model must provide details on nonlinearity. The outputs from the ANN could be used as forecasts of the error conditions for the ARIMA model. The combined model incorporates the special characteristics and reliability of the ARIMA model including the ANN model to assess various developments. This may then be useful to forecast linear and nonlinear metrics independently through different models and then integrate the forecasts to increase overall modelling and forecasting accuracy. The steps of the proposed hybrid ARIMA–ANN model are presented in Fig. 4.

Fig. 4
figure 4

Flowchart of hybrid ARIMA–ANN

The statistical metrics are used to evaluated the performance of the used ARIMA–ANN model, which are applied in a variety of disciplines to determine the quality of the forecast models [46, 47, 49, 50] (“Methodology Appendix”). Generally, the assessment of the forecasted models has been based on the analysis of the statistical metrics used to check the accuracy, performance and efficiency of the models. It should be emphasized that the lower value of MAPE, MBE and PMBE, RMSE, PRMSE, Sd PSd, NRMSE prove the accuracy of the forecasted values. The lower values of Ts mean a suitable model’s performance. The Sd represents the ratio between measured and computed values: Sd = 0 means the absence of a linear relationship, while Sd = 1 shows the ideal linear relationship between measured and computed values. Finally, the best correlation coefficient R2 must be close to 1 as possible.

3 Result and discussion

In this section, the results of the application of the ARIMA–ANN model forecasting method over a 3-years interval are applied for the three selected cities, analysed and discussed. To apply the ARIMA–ANN model requires that the time series is stationary. As it is well known, the global solar radiation presents annual and daily variations and/or oscillations. These periodicities make the time series non-stationary. Many authors [51,52,53] use these models to make the global radiation time series stationary. In this paper, we use variant of this index, considering only the radiation outside the atmosphere (TOA), this way we get the clearness index (\( K_{t} \)). It is defined as the ratio of the global solar radiation at the earth surface to the equivalent extraterrestrial solar radiation on the earth ground surface (TOA) as described in the foregoing equation [54]:

$$ K_{t} = \frac{{\text{DGSR}}}{{\text{TOA}}} $$
(7)
$$ {\text{TOA}} = \int_{{}}^{\text{day}} {I_{0} E_{0} } \sin \left( h \right){\text{d}}t $$
(8)

where I0 is the solar constant, h is the solar elevation and E0 is the Earth–Sun distance correction.

This technique does not completely make the global solar radiation stationary. To tackle this problem, we have completed our method by using variation coefficients \( Cv_{x} \):

$$ Cv_{x} = \frac{\sigma }{\mu } $$
(9)

where \( \sigma \) is standard deviation and \( \mu \) the average.

Figure 5 presents the results of the application of our global stationary methodology and its impact on the time series. Before any treatment (step 1), the variation coefficient of the time series is high (Cvx ~ 0.57), while in steps 2 and 3, this coefficient is divided by two (Cvx ~ 0.34). This coefficient and the shape of the curves tend to show that there is a better stationarity at the end of steps 2 and 3.

Fig. 5
figure 5

Example of the daily behaviour of clearness index of Tanger site

The corresponding ACF and PACF are shown in Fig. 6 for Tanger (panels a-b), Er-rachidia (panels c-d) and Ifrane (panels e–f), respectively.

Fig. 6
figure 6

ACF and PACF of clearness index time series, ab Tanger, cd Er-rachidia, and ef Ifrane

It’s highlighted that after a few lags, the ACF accumulates within 95% of the limit, suggesting a relatively stationary time series. For all the analysed cities, PACS has a major spike at LAG = 1, suggesting that an AR (2) or any of the higher order autocorrelation may be sufficient.

The akaike intelligence criterion (AIC) is described as the most commonly used. The criteria of goodness-of-fit based on the information criterion are presented in Eq. (10):

$$ {\text{AIC}} = - 2\ln (L) + 2(p + q + K + 1) $$
(10)

where p is autoregression parameters, q is moving average parameters, L is likelihood, k is number of model parameters. The computed results of the AIC are showed in Fig. 7.

Fig. 7
figure 7

Presentation of the AIC criterion of Tanger, Er-rachidia and ifrane

AIC results showed that the values for Tanger reach a lowest error when the order is equal to two. Thus, the correct configuration model for Tanger site is ARIMA (2,1,1); analogously ARIMA (2,1,1) is the best model for Ifrane site. AIC values for Er-rachidia site reach minimum when the order is equal to one. The appropriate ARIMA model FOR Er-rachidia is thus ARIMA (1,1,1). AIC results are shown in Fig. 7. The obtained data are reported in Table 2.

Table 2 ARIMA (2,1,1), ARIMA (1,1,1), ARIMA (2,1,1) models parameters

When the model fit is adequate and its parameters are forecasted, the diagnostic assessment for the residuals is applied to check if they fit well the data series. Throughout this evaluation examination, we investigate if the residual model collected from the ACF and PACF graphs is IID (independent and identically distributed). Figure 8 shows the ACF and PACF behaviour of the established residuals ARIMA (2, 1,1), ARIMA (1, 1, 1) and ARIMA (2, 1, 1) models. As we can conclude that most of the significant increases are within the 95% CI.

Fig. 8
figure 8

ACF and PACF of the ARIMA (2,1,1), ARIMA (1,1,1) and ARIMA (2,1,1) residuals for ab Tanger, cd Er-rachidia, and ef Ifrane

However, the residuals model was evaluated by the Ljung-box analysis, and the obtained results are listed in Table 3. All chi-squares \( \chi^{2} \) are larger than Q statistics and all p values for lag numbers are more significant than 0.05. From the previous analysis, it follows that residuals are uncorrelated and represent white noise.

Table 3 Ljung-Box test of ARIMA (2, 1, 1), ARIMA (1, 1, 1) and ARIMA (2, 1, 1) models

The correlation between experimental and forecasted data using ARIMA models is presented in Fig. 9. The circles show the experimental data points, and the line indicates the relatively better match of the training data derived from the forecasted GSR. As shown from Fig. 9, the coefficient of determination is close to 1 for Tanger, Er-rachidia and Ifrane are 0.954, 0.949 and 0.950, respectively.

Fig. 9
figure 9

Regression plot between experimental and forecasted DGSR obtained by ARIMA models for Tanger ( a), Er-rachidia ( b) and Ifrane ( c)

Previous results show that the chosen hidden layers scheme in the ANN can handle all data if the right number of neurones is selected [55,56,57]. In this study, a three-layer neural network is chosen for forecasting the clearness index. Based on Eq. (23), the current parameters of the ANN model are given as follows:

  • The input neurons correspond to the number of lagged observations;

  • The number of the output layer is one. After several trials, we found that the optimum neural network for Er-rachidia is one input, one hidden layer with one neuron and one output (1 × 1×1), for Tanger and Ifrane, the optimum network is two inputs, one hidden layer with two neurons and one output layer (2 × 2×1).

Figure 10 shows regression correlation analysis of the forecasted values using ANNs model. In term of accuracy, the ANNs have improved the performance in comparison with ARIMA model. The presented correlation coefficients reach to 0.969, 0.958 and 0.959 in Tanger, Er-rachidia and Ifrane, respectively.

Fig. 10
figure 10

Regression plot between experimental and forecasted DGSR obtained by ANN models for Tanger (a), Er-rachidia (b) and Ifrane (c)

The residuals of the ARIMA model which has nonlinear part are used as input to the multilayer perceptron of ARIMA–ANN model, and the Levenberg–Marquardt (LM) is the trained algorithm. While the outputs have been normalized inside [1]. The proposed hybrid model has the particularity of using both the strength of ARIMA and ANN models to determine different patterns. As we can see from Fig. 11, ARIMA–ANN model performs better with respect to the single ARIMA and ANN models. The present coefficient of determination for Tanger, Er-rachidia and Ifrane is 0.986, 0.988, and 0.984, respectively. According to linear regression analysis results, the forecasted values obtained by merging ARIMA and ANN models match better the measured data with respect to single ARIMA and ANN models. The correctness of the suitable forecasting model of DGSR is selected from the compared results obtained by ARIMA, ANN and hybrid ARIMA–ANN models, respectively. The conclusion is based on several compared terms such as statistical error measures. The forecasted values obtained by ARIMA and ANN with the measured data were better than the single model ARIMA and ANN models.

Fig. 11
figure 11

Regression plot between experimental and forecasted DGSR obtained by hybrid ARIMA–ANN models for Tanger (a), Er-rachidia (b) and Ifrane (c)

Table 4 shows several statistical indices which were computed to check previous results: MBE, RMSE, NRMSE, MAPE, TS, Sd, PSd and linear regression coefficients (R2, a, b) (see “Methodological Appendix” for more details). This evaluation also provided the benefit of determining which model values are statistically important or not at a given degree of level.

Table 4 Statistical analysis results for the optimum and suitable ARIMA, ANN and hybrid ARIMA–ANN models

In ARIMA modelling, the MBE, PMBE, RMSE, PRMSE, NRMSE, MAPE, TS, Sd, PSd and linear regression coefficients for Tanger site are − 3.430 Wh/m2 (− 0.064%), 713.365 Wh/m2 (13.275%), 0.133, 44.102, 0.499, 713.610 Wh/m2 (13.280%) and linear regression are 1.169, − 328.159, these indicators were calculated as − 1.110 W h/m2 (0.019%), 662.724 Wh/m2 (11.363%), 0.114, 44.483, 0.554, 663.026 Wh/m2 (11.368%) and 1.194, -497.753, respectively for Er-rachidia. For Ifrane site, the MBE (PMBE) is − 2.560 (− 0.047%), the RMSE (PRMSE) and NRMSE are 1475.166 Wh/m2 (27.215%) and 0.272, the MAPE, TS, Sd (PSd) and linear regression are 105.960, 1.073, 1475.883 Wh/m2 (27.228%) and 1.517, − 408.896, respectively. Applying the ANN model, the obtained values of R2 are very close to 1 representing optimum and best correlation between the forecasted and measured values. The PMBE, PRMSE and PSd range, respectively, from − 0.301 to − 0.065, from 8.368 to 25.030 and from 8.372 to 25.040 for optimum and suitable configuration of Tanger, Er-rachidia and Ifrane. The MBE, RMSE, NRMSE, MAPE, TS and Sd range, respectively, from − 16.338 Wh/m2 to − 3.531 Wh/m2, from 449.670 Wh/m2 to 1356 Wh/m2, from 0.084 to 0.250, from 29.793 to 97.227, from 0.246 Wh/m2 to 574 Wh/m2 and from 449.542 Wh/m2 to 1357.228 Wh/m2 for three sites. The value of MBE and NRMSE is very close to 0 indicating the accuracy between estimated and measured DGSR value. The constant ‘b’ is very close to 0 representing the perfect linear fit and linear relationship between the forecasted and the measured values. Other results are described in the same table. In the cases of hybrid model, the statistical indicator values for optimum configuration of Tanger are: − 10.765Wh/m2 for MBE, 446.352Wh/m2 for RMSE, 0.083 for NRMSE, 25.544 for MAPE, 0.252Wh/m2 for TS and 446.862Wh/m2 for Sd these indicators were calculated as − 0.084 for PMBE, 7.391 for PRMSE and 7.394 for PSd of Er-rachidia. For Ifrane site, MBE, RMSE, NRMSE, MAPE and linear regression are − 18.899Wh/m2, 582.882 Wh/m2, 0.108, 42.936 and 0.921, − 3211.117, respectively. The correlation between the used models in terms of precision and accuracy demonstrates that the hybrid ARIMA–ANN model has lower values than the single ARIMA and ANN approaches, respectively.

The current study results were compared to several literature recent works which use different models (Table 5). The highest R2 value, corresponding to the current study, suggests that hybrid ARIMA–ANN performs well than other existing models.

Table 5 Comparative study between optimum hybrid model and many existing models in the literature

Figures 12, 13 and 14 show the forecasted daily global solar radiation compared with the single ANN and ARIMA. For the three selected location, the hybrid ARIMA–ANN model has given higher accuracy and perform better than single ARIMA and ANN models, and is more effective with the experimental data values.

Fig. 12
figure 12

Compared experimental and forecasted DGSR values for ARIMA (2,1,1), ANN and hybrid ARIMA–ANN for Tanger site (a). b, c Are enlargement of a

Fig. 13
figure 13

Compared experimental and forecasted DGSR values for ARIMA (1,1,1), ANN and hybrid ARIMA–ANN for Er-rachidia site (a). b, c Are enlargement of a

Fig. 14
figure 14

Compared experimental and forecasted DGSR values for ARIMA (2,1,1), ANN and hybrid ARIMA–ANN for Ifrane site (b). b, c Are enlargement of a

4 Conclusion

In this paper, we have proposed an innovative hybrid model to forecast the daily global solar radiation in three different regions located in Morocco. The experimental used data are taken from three different stations for full years 2013, 2014 and 2015. The established approach of the proposed study has provided the weight to capture different patterns of ARIMA and AI models. Before applying the modelling approaches, the DGSR has been transferred to \( K_{t} \) to make data non-stationary. According to the non-stationary data, the optimum ARIMA and ANN models were processed. In time series data, the significant ACF, PACF and AIC criteria allowed to select the ARIMA (2. 1. 1), ARIMA (1.1.1) as adequate models of three sites.

The accuracy and performance of the proposed ARIMA–ANN model have been evaluated and checked, using various statistical measurement errors. Results obtained by hybrid ARIMA–ANN show a suitable matching between the observed and forecasted values, suggesting the ARIMA–ANN suitability to reproduce experimental data with satisfactory precision.