Introduction

The consumption of electrical energy has recently increased worldwide at a very high rate due to the fast growth of urbanization, improving life quality, and growing population. Meanwhile, facing the increasing depletion of fossil fuels and reducing the massive amount of greenhouse gases, most countries are moving toward the use of renewable energies as an alternative to generating power, and to ensure pollution-free sustainable global electricity (Das et al. 2018; Jumin et al. 2021; Singla et al. 2022a). The low price and the universal availability of solar energy make it the most important resource among the available renewables. Solar energy comes from the sun to the earth surface in the form of solar irradiation. As known, there are two components of solar irradiation: beam normal irradiance (BNI) and diffuse horizontal irradiation (DHI). These components are the basic information for many solar system applications, including site analysis, site selection, technologies installation, optimal system design, and plant operation. Moreover, it is indispensable for the integration of an important amount of solar energy production technologies in buildings or electrical grids (Castangia et al. 2021a).

Forecasting solar power production is critical in power plant management. It helps greatly grid operators by creating plans to anticipate risks, maintenance schedules, and balancing both generation and demand at every moment (Diagne et al. 2012; Notton et al. 2019), from very short-term to long term going through short term and medium term. However, forecasting accurate solar power system outputs is highly conditioned by the correct modeling of solar irradiation parameter given as input (Blaga et al. 2019a, b; Huang et al. 2021; Voyant et al. 2017) since the uncertainty and variability of solar irradiation characteristics, caused by weather conditions, climate types, time of the day or night, and seasonal variability (Kumari and Toshniwal 2021b; Lan et al. 2019) lead to the fluctuations in power generation.

However, the availability of accurate data of global solar irradiation and its components present several challenges because the regular updates and maintenance of measurement instruments such as pyranometers and pyrheliometers which are relatively complex and expensive (Ibrahim and Khatib 2017; Kleissl 2013; Zhang et al. 2017). Moreover, solar irradiation data are rarely measured in most meteorological stations over the world. Hence, an alternative way to obtain these data is to propose prediction models for locations with only accessible parameters (Sun et al. 2015).

The estimation and forecasting of solar irradiation is a challenging task, as it depends completely on the geographical position and weather conditions of studied sites. In the literature, the prediction of solar irradiation can be performed using various techniques. These techniques can be mainly categorized into six classes for forecasting: persistence, Cloud motion tracking, Numerical weather predictions (NWP), classical statistical, machine learning (ML), and hybrid methods (Blaga et al. 2019a, b). For estimation, four classes can be considered: empirical, physical, statistical, and machine learning models (Zhou et al. 2021).

Physical models aim to explore the physical state of solar irradiation and other meteorological conditions using mathematical equations (Ramadhan et al. 2021). Their complex structure needs a huge amount of data to calibrate the complex dynamics of the atmosphere.

The statistical methods aim to measure the relationship between the historical value of solar irradiation and weather parameters, by applying the statistical analysis of the different input parameters and make predictions about solar irradiance. Generally, these methods are adopted for short-term forecasting of solar irradiation (Voyant et al. 2017). These models are limited to solve more complex prediction problems in the case of longer forecasting horizons (Diagne et al. 2013).

The empirical models purpose to develop a nonlinear or linear regression equation (Jiang 2009). They are commonly used to correlate solar irradiation with various measured meteorological and geographical parameters. Empirical models are simple and easy; nevertheless, their accuracy is usually limited due to uncertain variables (Gürel et al. 2020).

The persistence model assumes that global irradiance at an hour ahead or a day ahead is the best forecast by its value at previous hour or previous day respectively (Diagne et al. 2013). This model acts as a reference model in the solar irradiation forecasting community, especially for short-term horizon (Kumar et al. 2020). In the field of solar irradiation forecasting, the persistence model is used to evaluate the quality of the proposed approaches (Blaga et al. 2019a, b).

Cloud motion tracking based-models consist of two processes: cloud detection and solar irradiance forecasting. These approaches based on sky images obtained from either ground or satellite cameras. Because solar irradiation is strongly influenced by cloud pattern, detection of cloud’s motion leads to prediction of cloud positions that subsequently allows the forecast of solar irradiance (Kamadinata et al. 2019).

Numerical weather predictions (NWP) simulate the physical state of atmospheric conditions using mathematical models aiming to predict weather in a future times, based on the current climate conditions (Blaga et al. 2019a, b), (Huang et al. 2021).

Recently, machine-learning methods such as Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Artificial Neural Networks (ANN) have been shown to handle linear, non-linear, and non-stationary data forms. They can learn and solve the complex nonlinear relationships between inputs and outputs (Zhou et al. 2021).

Finally, the hybrid methods purpose to combine two or more techniques to design a forecasting model. These models become popular in the last years due to their ability to provide better performances than a standalone model for different prediction problems by combining the strength and the benefits of each model.

The most popular machine learning methods used in solar irradiation estimation and forecasting are ANN models due to their ability to solve complex, uncertain and non-linear problems and require fewer experimental parameters to generate the input/output relationships (Yadav and Chandel 2014), (Blaga et al. 2019a, b). Furthermore, ANN models are capable to deal with many input parameters that make them more accurate and reliable (Qazi et al. 2015).

Lately, many interesting research studies have summarized and studied the literature related to the prediction of solar irradiation based on artificial neural network models. In fact, authors in (Yadav and Chandel 2014) presented a survey on ANN models used to estimate global and daily solar irradiation. Furthermore, this study also reviewed papers dedicated to forecasting solar irradiation in the short-term. This work presented a summary of suitable methods available in the literature in order to identify research gaps.

The research work in (Qazi et al. 2015) presents a systematic review based on an artificial neural network for solar systems design and solar irradiation prediction. The results of this study show that the performance of solar irradiation prediction models depends on input parameters and ANN architectures. In (Rajagukguk et al. 2020), a review of deep learning models to predict solar irradiation and photovoltaic power using time-series data is presented. This study reviewed three single models long short-term memory (LSTM), recurrent neural network (RNN), gated recurrent unit (GRU), and one hybrid model convolutional neural network-LSTM (CNN–LSTM). The results show that LSTM provides the best performance. However, CNN–LSTM outperforms the three single models. In a review of Kumari and Toshniwal (Kumari and Toshniwal 2021a), the authors presented a detailed study of deep learning models for solar irradiation forecasting, this review proved that deep learning improves prediction performance over other machine learning models. Furthermore, deep hybrid models such as CNN-LSTM outperform standalone models such as LSTM, CNN, and GRU.

Finally, in the review of Guermoui et al. in (Guermoui et al. 2020), a holistic summary is provided of six hybrid model techniques used recently in solar irradiation prediction. The purpose of this study was to compare the different hybrid models proposed in the literature. The aim is to identify the promising and potential approaches for solar radiation prediction and covers an opportunity for future researchers in this area.

From the studied literature and to the author’s knowledge, no paper review gathering ANN model types (including deep learning and hybrid ANN models) has been published up to date. Most of the published reviews in the field of solar irradiation prediction focused on a specific approach like ANN, deep learning model, or a general topic such as hybrid models approaches. The purpose of the present paper is not only to make a comprehensive study of the literature but also to conduct, synthetically and clearly, an exhaustive analysis linked to both estimation and forecasting problems based on ANN models (including deep learning and hybrid ANNs).

This systematic review will help researchers to obtain an overview of the various ANN approaches developed over the last 6 years. Our work is devoted to surrounding all-important aspects required in ANN model development for prediction problems in solar irradiation. The aim is to help interested researchers have a clear vision in this particular field. In this context, the main contributions and novelty of our work can be summarized as follows:

  • Unlike some important review papers dealing with multiple machine learning techniques (Zhou et al. 2021), (Voyant et al. 2017), our review is focusing on the most used machine learning model, namely ANNs. Our objective is to update the state of art research up to 2022,

  • Presenting a clear distinction between estimation and forecasting terminologies. In fact, regarding the existing papers in the literature, several authors often confuse the meaning of “estimation”, “prediction” and “forecasting”. Indeed, most available works in the literature do not indicate any difference between them,

  • Providing an overview of different kinds of data types, data horizons, data pre-processing and feature selection,

  • Gathering definitions from the literature of the four forecasting horizons: very short-term, short-term, medium-term, long-term,

  • Classifying retained papers into single and hybrid models,

  • Establishing the highly detailed state of the art with different columns, including ANN architecture, training/testing data percentage, size of the input matrix, forecasting horizon and results of comparison methods,

  • Besides single ANN models, hybrid ANNs models are also reviewed and summarized,

  • Evaluating the existing challenges, which should be solved for solar irradiation estimation and forecasting, proposing some important recommendations from the analysis of the established state of the art that should be considered in incoming studies and identifying the recent trends.

Accordingly, the remainder of this review is organized as follows: in section 2, we illustrate the review methodology. Section 3 includes two subsections: The first one discusses briefly data, including data types, data horizon, and data preprocessing. The second provides an overview of feature selection methods and their application in solar irradiation prediction. Section 4 illustrates a detailed analysis of forecasting horizon and gives a resume of ANN models investigated in our paper. Section 5 presents summaries of different models proposed in the literature for solar irradiation estimation and forecasting based on ANN models and hybrid ANN models.

A detailed discussion of factors influencing the performance of ANN models with recommendations, outlooks, and trends is given in Sect. 6. Finally, the main conclusions are provided in Sect. 7. Figure 1 shows a diagrammatic representation of our review structure.

Fig. 1
figure 1

Diagrammatic representation of our paper structure

Review methodology

The purpose of our review is to study and analyze high quality papers in the field of solar irradiation prediction in order to make a relevant and useful synthesis of previous studies aiming to extract the existing gaps in this particular field. This synthesis will be a valuable opening for future researchers working on solar irradiation estimation and forecasting, particularly when ANN approaches are used as a prediction technique. As shown in Fig. 2, the flowchart of our review consists of seven primary steps.

  • Specification of period time, language, libraries, and keywords for our review: initially, the reviewed papers were collected by carrying out online research in well-reputed digital libraries such as IEEE Xplore, Science direct, ResearchGate, Elsevier, and the Google Scholar search engine in order to search for relevant papers published in the years from 2015 to 2022. This primary search was performed using keywords that are illustrated in the last columns of Table 1.

  • Limit the database for our search: the obtained papers database was limited by searching for keywords in the title of the article, abstract, and investigated keywords. We also included studies from lists of references from found papers.

  • Analyze and select relevant papers: the second step consists of checking the quality of the obtained papers by the previous step. Those papers are selected based on journal quality, inclusion criteria, and exclusion criteria (see Table 1).

  • Classify papers into estimation and forecasting problems: in this step, the already chosen papers are then classified into two categories: papers dealing with estimation solar irradiation problems and papers concerning forecasting solar irradiation.

  • Results and discussion: present the actual state of the art for the two aforementioned categories with a detailed discussion on available approaches in estimation and forecast of solar irradiation.

  • Recommendations, trends and outlooks: the last step offers useful recommendations and outlooks for futures researches in the field of solar irradiation estimation and forecasting using ANN techniques.

Fig. 2
figure 2

Flowchart of our review

Table 1 Inclusion, exclusion criteria, and search terms

Data pre-processing and feature selection for solar irradiation estimation and forecasting

In our review, we present an overview of the whole process required in estimation and forecasting models, including the type of input parameters, data pre-processing, feature selection, type of used models, forecasting horizon, and performance indicators.

Data types and pre-processing phase for forecasting and estimation model

Data types

The forecast or estimation of solar irradiation requires data from the studied site. The most used are ground measured and sky images data. Available ground measured data in weather stations are generally geographical and meteorological parameters, solar characteristics and physical parameters. On the other hand, sky images data present the human observation of cloud motion to provide a piece of information about cloud cover, type, speed and level of cloud over a specific spot (Marquez and Coimbra 2013), (Barbieri et al. 2017). There are generally two types of sky images: ground-based sky images (Ferreira et al. 2012) and satellite-based sky images (Blaga et al. 2019a, b). Figure 3 summarizes the two types of input parameters used to estimate and forecast solar irradiation.

Fig. 3
figure 3

Types of input parameters used for estimation and forecasting solar irradiation

In prediction problems, these variables can be divided into three types: endogenous, exogenous, and hybrid parameters. The endogenous parameters consider only the historical values of solar irradiation data to predict solar irradiation. Exogenous inputs include all pre-mentioned parameters (see Fig. 3) besides solar irradiation, and hybrid inputs combine both endogenous and exogenous parameters.

Data horizon

The measurements of solar irradiation data at ground meteorological stations depend on several factors such as the measuring instruments specifications, calibration method and maintenance of measuring instruments, data acquisition method and accuracy, location, and environmental conditions (Sengupta et al. 2021). Measurement instruments can record solar irradiation values at a certain step or time resolution. Then, these data recorded at a specific time interval can be summed to obtain the desired forecasting horizon. These data can be sub-hourly, hourly, daily, and monthly solar irradiation according to their utility in solar systems (see Fig. 4). Sub-hourly solar irradiation is the value of solar irradiation recorded for one minute or several minutes. Hourly value is the sum of recording values of solar irradiation over a period of one hour. Daily solar irradiation is the total value of solar irradiation during a day, while, Monthly averages or sums are the average value of daily radiation over a month. Generally, the measures of solar irradiation at short time interval such as hourly or sub-hourly are more accurate than long-term interval such as daily and monthly because these data record the change of solar irradiation in details during a smallest time interval (Zhang et al. 2017).

Fig. 4
figure 4

Different horizons of solar irradiation data

Furthermore, the measuring instruments can be configured to produce output values of solar irradiation or other data at any desired time period (Sengupta et al. 2021). For instance, a value of solar irradiation can be a mean value of more frequent measurement records for a given time period.

Data pre-processing

The quality of measured data depends on the quality of calibrations, and regular maintenance of measuring instruments. Furthermore, these instruments are greatly affected by time, location, and environmental conditions. In addition, the obtained data are generally presented in raw format and do not have significant characteristics to provide suitable accuracy. Thus, a pre-processing phase is required to prepare and ensure the dataset quality introduced to estimation or forecasting models. The quality of data plays a vital role in the accuracy of predictive models. Therefore, data should be quality controlled and well organized before further use. Data preprocessing aims to solve these issues and improve the data quality by removing missing, meaningless values, and eliminating outliers caused by the abnormal measurement of data (Lai et al. 2020). Moreover, data pre-processing defines the input data according to the specifications and transforms the united raw format into a simpler data representation that is easier to use for future processing steps.

A number of techniques have been applied in the literature to pre-process the input parameters for forecasting models such as Markov models, linear regression, Kalman filter, Wavelet transform, self-organization map, quality control, empirical mode decomposition, Principal Component Analysis (PCA), and normalization. For instance, the work of Wang et al. in (Wang et al. 2018), used Discrete Wavelet transform for certain weather types (cloudy, rainy and heavy rainy) to decompose the raw solar irradiance sequence into several subsequences. Results showed that Discrete Wavelet transform based solar irradiance sequence decomposition enhances the corresponding forecasting performance of the proposed model. However, Castangia et al. in (2021b) normalized the GHI data by using the clear sky index transformation to introduce the stationarity in solar irradiation time series and to scale each input parameter in the range between 0 and 1. Moreover, in (Husein and Chung 2019a, b), Husein and Chung used linear regression fit to replace the missing values in his database. In (Lan et al. 2019), Lan et al. used PCA to reduce the size of the original database and to identify the essential frequency features given to the Elman-based neural networks.

Feature selection

The procedure of feature selection is a crucial requirement for improving model accuracy. In this section, we will provide an overview of the most used feature selection techniques and study their usefulness and ability to deal with solar irradiation prediction problems.

An overview of feature selection techniques

Feature selection is an important phase to explore process and analyze data to a given problem. Feature selection techniques aim to explore and analyze the data in order to select the most important and relevant subset of features from a large dataset (Garcia et al. 2015; Yahya et al. 2011; Zebari et al. 2020).

There are various objectives of feature selection, the most important ones are: improve models performance, reduce the size of data, reduce model complexity and computational costs, avoid overfitting problem, remove redundant, irrelevant and noisy data, improve data quality, and reduce data storage (Saeys et al. 2007). For instance, some machine learning models do not perform well in the case of high dimensions data. Reducing the size of data can be more efficient to visualize data clearly and reduce models complexity.

Feature selection techniques are generally performed through two major phases. Firstly, subset generation aiming to find the optimal variables from the available data. Followed by a subset evaluation phase to assess and determine whether the generated subset is optimal for the given problem according to a suitable stopping criterion that speeds up the selection and decides which features may be added or removed from generated subset. Moreover, the stopping criteria must be defined according to the purposes of feature selection (Khaire and Dhanalakshmi 2019), (Karagiannopoulos et al. 2007) (see Fig. 5).

Fig. 5
figure 5

Major phases of feature selection technique

There are several feature selection methods available in the literature. The most well-known and used are Filter, Wrapper, and Embedded methods.

  1. i.

    Filter methods

The filter method is a category of statistical methods; it assesses the relevance of parameters by studying the correlation of each feature with the model’s output to identify the most relevant parameters (Castangia et al. 2021a).

These methods are used either to produce a features ranking in terms of the relevant measurement by using statistical standards such as the Chi-squared test, mutual information, and Pearson correlation coefficient or combined with algorithms such as forward or backward search to obtain subsets of features (Zhang and Wen 2019).

The filter technique consists of two stages, the first one uses measures such as information, distance, dependence or consistency. The second stage is the learning and testing process with the subsets of relevant features.

The filter approach has several advantages such as dealing with high dimensional data, having better generalizable properties, being faster than other feature selection techniques, and finally the scalability for large-sized data. Although, they can fail to evaluate complex and profound relationships between the features and the output. In addition, they ignore the dependency between features (Zhang and Wen 2019).

  1. ii.

    Wrapper methods

Wrapper methods use a learning model with different subsets of features to select relevant feature subsets. The purpose to produce a ranking based on the accuracy obtained with each subset. In fact, a various subsets of features are generated and evaluated based on defining stooping criteria. The evaluation of a specific subset is obtained by training and testing the related model.

The Wrapper methods can be classified into two main groups, sequential Selection Algorithms and Meta-Heuristic Algorithms. The sequential selection algorithms start with an empty or full set and add or remove features respectively according to chosen criteria until obtaining the best subsets of features that achieve better performance. However, the heuristic search algorithms generate and evaluate different subsets to find the optimal one presenting the best accuracy (Chandrashekar and Sahin 2014). These search methods can be divided in two classes: deterministic and randomized search algorithms.

The advantages of wrapper techniques are their ability to include the interaction between feature subset search and model selection, and their capacity to take into account dependency between features. However, they are highly dependent on machine learning models, which lead to overfitting and computational problems with high dimension datasets (Garcia-Hinde et al. 2016). Also, the wrapped methods produce high time complexity due to the fact that they react after prediction stage.

  1. iii.

    Embedded methods

Those methods are embedded into machine learning models that consider feature selection as a part of the learning process. The main approach of embedded methods is to integrate the feature selection as part of the training process. There are multiple machine learning techniques with embedded feature selection used in estimation such as: decision tree, convolutional neural network models (CNN) (Zhang and Wen 2019), and Multi-Objective Evolutionary Algorithms (Ferreira and Ruano 2011).

The embedded method combines the qualities of both filter and wrapper methods. Indeed, they have better computational complexity than wrapper and higher performance accuracy than filter. Furthermore, Embedded methods could provide a faster solution by avoiding reclassifying each subset of features and re-training a predictive model (Garcia et al. 2015).

Feature selection for solar irradiation prediction using ANN models

The available solar irradiation data, together with meteorological and geographical parameters, result in a large number of variables that can contain unimportant and irrelevant parameters, which leads to a complex database with high dimensions. In fact, solar irradiation prediction models often suffer from a large amount of irrelevant information due to the existence of a high degrees of uncertainty associated with inadequate maintenance of sensors and absence of data quality control (Cebecauer and Suri 2015).

As mentioned above, there is a large number of parameters that are related to solar irradiation. Thus, feature selection is carried out to select the parameters that have a strong correlation with solar irradiation. Note that feature selection algorithms aim to keep the best set of variables in a prediction problem.

According to literature, several studies applied the feature selection techniques to identify the features that are most relevant as inputs and necessary for estimation or forecasting models for solar irradiation. In the case of endogenous inputs, there is no feature selection issue. However, for exogenous and hybrid inputs, choosing relevant ones giving the best accuracy is an important and open problem. For instance, Dahmani et al. in (Dahmani et al. 2016), Castangia et al. in (2021a), Pang et al. in (2020), Ahmad et al. in (Ahmad et al. 2015), Almaraashi in (Almaraashi 2018), and Meenal and Selvakumar in (Meenal and Selvakumar 2018) investigated the use of Filter methods. The wrapped methods were used in (Bouzgou and Gueymard 2017; Jadidi et al. 2018; Marzouq et al. 2019; El Mghouchi et al. 2019a, b; Rao et al. 2018), and the embedded methods were used in (Ghimire et al. 2019), (Zang et al. 2020).

Forecasting horizon and models type

Forecasting horizon

In solar system application, each stage of a solar energy system development requires data of solar irradiation at different time scales. The inaccuracy of measured data at specific locations leads to the high financial cost of any solar system project (Sengupta et al. 2021). Furthermore, the operation and management of an intermittent source of energy produced from a solar system are very difficult when connected to an electrical grid due to their inability to balance between generation and demand at different time scales. In fact, a solar system output could be highly affected by the weather condition (Voyant and Notton 2018), (Kumari and Toshniwal 2021a). Thus, the accurate prediction of solar power requires accurate data of solar irradiation (Rajagukguk et al. 2020). Therefore, there is a crucial need to accurately predict solar irradiation at different forecasting horizons.

The forecast horizon is the time interval for which a model can predict future values. This period can range from few seconds to hours, days, weeks, months, or even years (Gupta et al. 2021). In the case of solar irradiation forecasting, different forecasting horizons can be found in the literature such as very short-term, short-term, medium-term and long-term. However, there is no universal definition of these forecasting horizons.

In fact, in (Yang et al. 2015), very short-term is defined as sub-5 min interval. Caldas and Alonso-Suárez in (2019) define very short as a time interval from 5 min to some hours ahead. In (Kumari and Toshniwal 2021b), it is from few minutes up to few hours, while in (Pedregal and Trapero 2021) it is from few seconds to 1 h.

In short-term horizon, the time interval for forecasting horizon is ranging from few hours up to several hours or days. For instance, in (Diagne et al. 2012), short-term horizon is defined from 1 h up to 5 h forecasts. Other researchers such as Voyant and Notton in (2018) and Marzouq et al. in (2020) considered short term forecast from 1 to 6 h. In addition, the study of Voyant et al. in (2017) considered that short-term can be divided in two sub-classes: now-casting (from 0 to 3 h ahead) that presents the very short time domain, and short-term forecasting (from 3 to 6 h ahead).

For medium-term, there are few studies carrying out forecasting of solar irradiation. In (Sharma and Kakkar 2018) and (Kumari and Toshniwal 2021b) medium-term is for one week ahead and in (Pedregal and Trapero 2021) it is longer than 48 h ahead.

For Ozoegwu, long term is a period of few months to years (Ozoegwu 2019). In (Aslam et al. 2019) it is up to one year ahead. However, in (Sharma and Kakkar 2018) and in (Kumari and Toshniwal 2021b) long-term is forecasting for months or years ahead.

The requirement for the forecasting horizon changes with applications. A synthesis of the studied papers of the literature can be summarized in Fig. 6, which shows the utility of each horizon from the very short to the very long term.

Fig. 6
figure 6

Applications of different forecasting horizons

ANN estimation and forecasting models

The estimation of solar irradiation is defined as the process in which the solar irradiation data can be predicted from the knowledge of other parameters of different nature. The collected data up to, and excluding time t are given as an input to an estimation model and it predicts the value of solar irradiation at the same time t as an output. The forecasting process consists in predicting future values of solar irradiation based on present and historical data. The collected data given to a forecasting model as inputs can be times series data that consider only current and past values of solar irradiation, structural data that present other variables of a different nature, or hybrid data that combine between time series data and structural data.

There are numerous types of ANN models used in the estimation and forecasting of solar irradiation, they can be broadly classified into single and hybrid ANN models. Single ANN models can be feedforward neural networks such as Multi-Layer Perceptron (MLP), Radial Basis Function Network (RBFNN), Generalized Regression Neural Network (GRNN), Extreme Learning Machine (ELM) and Recursive Neural Networks as Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM). The single ANN models can be used in different configuration for instance, Non-linear Autoregressive Neural Network with eXogenous inputs (NARX), Non-linear Autoregressive Neural Network (NAR), Time-Delay Neural Network (TDNN), and Wavelet Neural Network (WNN).

On the other hand, the hybrid approach consists of combining two or more models. In our review, we have only focused on hybrid models that combine ANN and other models such as statistical, empirical, or machine learning models. Even though these hybrid models are complex, they have been widely used because of their ability to combine the advantages of different models, thus achieving higher accuracy for estimation and forecasting.

In Fig. 7, we synthesize existing ANN single models and some of the most used hybrid ANN models found in the literature for the prediction of solar irradiation.

Fig. 7
figure 7

Single and hybrid ANN models found in the literature

State of the art of solar irradiation estimation and forecasting based on ANN models

This section presents notable findings in estimation and forecasting models of solar irradiation after a deep analysis of the published articles based on all types of ANN models (including the hybrid ANNs and deep learning). In addition, a brief introduction to performance indicators investigated in the literature studies is briefly presented.

Performance indicators of accuracy evaluation

Since the estimation and forecasting accuracy is a critical factor in selecting any prediction model, many well-known indicators were used in the literature. The most used are Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), normalized Mean Absolute Error (nMAE), and normalized Root Mean Square Error (nRMSE) (see Table 2).

Table 2 Performance indicators used in our review study

Where \(N\) is the number of samples in the database, \({Y}_{pred}\) is the \({i}^{th}\) predicted value and \({Y}_{meas}\) is the \({i}^{th}\) measured value.

State of the art of solar irradiation estimation

Our analysis of the retained papers from the literature dealing with the estimation of solar irradiation using an ANN-based model is classified and summarized into single and hybrid models presented respectively in Tables 3 and 4. These tables have been listed in a descendant chronological order according to publication year by specifying the estimated component, the inputs parameters, the types of estimation model, the ANN architecture, data period and training/testing percentage, the number of studied sites and location, the corresponding performance indicators, and finally the results of compared methods.

Table 3 State of the art of solar irradiation estimation based on single ANN models
Table 4 State of the art of solar irradiation estimation based on hybrid ANN models

Estimation of solar irradiation based on single ANN models

In the literature, several ANN modeling approaches were used to estimate solar irradiation. Table 3 presents our state of the art of single ANN models for solar irradiation estimation. From this table, we can see that even if there is a large choice of approaches in neural networks models, only four models were widely used in the literature: MLP with different configurations and ELM, which demonstrates their effectiveness in this field.

Estimation of solar irradiation based on hybrid ANN models

The hybrid ANN approach combines an extension of the ANN model with another approach aiming to overcome the disadvantages of a single ANN model. These models have been developed and broadly used in the estimation of solar irradiation, especially in the last years. We can see from Fig. 7 that hybrid models can be a combination of two or more models or optimization of a single ANN model. Table 4 lists our state of the art of solar irradiation estimation using hybrid ANN models.

Solar irradiation forecasting based on ANN models

The published papers during the last seven years based on ANN techniques for solar irradiation forecasting were analyzed and summarized in Tables 5 and 6. These tables have been classified in a descendant chronological order according to the year of publication by specifying the forecasting component and horizon, the inputs parameters, the forecasting model type, the input matrix dimension, data period and training/testing percentage, the number of studied sites, location, the performance evaluation indicators, and results of compared methods if available.

Table 5 State of the art of works using single ANN models to forecast solar irradiation
Table 6 States of the art of works using hybrid ANN models to forecast solar irradiation

Forecasting using ANN-based single model

Solar irradiation forecasting is a challenging task. In fact, there are several proposed algorithms aiming to maximize prediction accuracy. Regarding ANN models, MLP with different architucture, and LSTM are the most used to solve this issue (Additional details will be given in Sect. 6). In Table 5, we provided a deep analysis of the published works.

Forecasting using ANN-based hybrid models

From the literature, we can note that the most published studies in the last years for solar irradiation forecasting are based on hybrid models. Those methods adopt either combination or optimization techniques to increase the ANN-based single model accuracy. For instance in (Kumari and Toshniwal 2021c), authors combine CNN and LSTM. While in (Marzouq et al. 2020), authors used an optimization method in the form of evolutionary algorithm to increase the ANN model accuracy. More discussion will be given in the next section. A detailed analysis of the state of the art is presented in Table 6.

Discussion, trends and recommendations

From the previously presented states of the art, a detailed discussion will be given about research directions and open problems extracted from the related papers. Finally, our review is concluded with compilation of the main findings and interesting outlooks in solar irradiation prediction based on ANN models.

Discussion of factors influencing the performance of ANN models

During our review, we noticed that forecasting or estimating solar irradiation with a good accuracy is a difficult task. In fact, as we can see from Tables 3, 4, 5, and 6, the performances of ANN models are highly dependent on several influential factors including forecasting horizon, climate type, used ANN model, input data, optimization algorithms, input matrix dimension, and forecast component (see Fig. 8).

Fig. 8
figure 8

Factors influencing solar irradiation prediction accuracy

In the next subsections, we will present a detailed discussion about all pre-mentioned factors.

Impact of ANN model type

Despite the fact that single ANN models provide high accuracy in predicting solar irradiation, many researchers are still working to improve it by combining ANN models with other algorithms in order to form a more accurate hybrid model.

As we can see from Tables 3, 4, 5, and 6, there are various approaches of ANN models to estimate and forecast solar irradiation. In our review, we classified these methods into two main classes. The first class is dedicated to single ANN models such as MLP, RBFNN, LSTM, and CNN. The second class is devoted to hybrid ANN models, which combine ANN with other statistical or machine learning approaches, for instance, evolutionary artificial neural networks (EANN), LSTM-CNN, and ANFIS.

The aim of hybrid models is to maximize the model accuracy by overcoming the weakness of a single ANN approach (Benmouiza and Cheknane 2016), (Pazikadin et al. 2020). Indeed, each extension of ANN models has strengths and limitations to forecast and estimate solar irradiation. Several studies proved that ANN as a single model has difficulties to deal with the intermittency and fluctuations of solar irradiance (Huang et al. 2021). In order to overcome the shortcomings of a single model to aim higher accuracy, authors focused on testing hybrid models to improve forecasting accuracy and improve model reliability, especially in bad weather conditions (Azimi et al. 2016), (Ali-Ou-Salah et al. 2021), (Benmouiza and Cheknane 2016). As we can see in (Gairaa et al. 2016a), authors conducted a study to forecast solar irradiation by combining Auto Regressive Moving Average (ARMA) with ANN: ARMA was used to capture the linearity information and the ANN was capturing the nonlinear characteristics of data.

Regarding the studied literature, we deduced that models as MLPNN, RBFNN are limited in high dimensional datasets and sky images, which directly influences the prediction model accuracy by overfitting and extrapolation. Therefore, further studies with more focus on other extensions of ANNs such as RNN, LSTM, CNN, and their combination (Huynh et al. 2020), (Liebermann et al. 2021), (Husein and Chung 2019a, b), (Pang et al. 2020) were proposed to overcome this limitation. These models become popular in the last years, as they present some advantages, such as the ability of capturing the short and the long-term dependencies within the solar irradiation data series patterns to accurately predict it in the future at different time horizons and also the ability of performing a more general feature extraction. In addition, these models perform well in different weather conditions. As demonstrated in the study of (Gbémou et al. 2021), LSTM presents good performances for cloudy days. Moreover, the hybrid models that combine between LSTM and CNN are more robust under diverse climatic conditions seasonal and sky conditions (Liebermann et al. 2021), (Zang et al. 2020), (Kumari and Toshniwal 2021c). The performance of these models exceed other extension of ANN models; however, they have some limitations. In fact, these models require high dimensional datasets and complex inputs parameters such as images. In addition, research in this area is very recent and other studies are expected to deeply explore more performances from future models. Figure 9 and Fig. 10 show a synthesis of studied papers. We can note that the number of papers dealing with forecasting problem is slightly higher than papers devoted to estimation. We can also see that FFNN is the most used algorithm in the case of irradiation estimation and LSTM in the case of forecasting for single ANN models. For hybrid, models optimized ANN and machine learning combined with ANNs are the most used in the case of estimation and forecasting respectively.

Fig. 9
figure 9

Most used approaches in solar irradiation forecasting

Fig. 10
figure 10

Most used approaches in solar irradiation estimation

Optimization algorithms of ANN architecture

In all presented models, the selection of appropriate parameters is crucial for improving the model accuracy. In ANN models, the selected parameters are usually, the number of hidden layers, the number of neurons in each hidden layer, and the number of epochs and the initial weights. These parameters are associated with the data patterns, input parameters, training, and testing datasets and training algorithms.

During our inspection of the literature, we observed that various studies determined the pre-mentioned parameters experimentally based on the trial-and-error method to select parameters that enhance the accuracy of estimation and forecasting models. For instance, Citakoglu in (2015) and Quej et al. in (2017) tested the ANN model under several architectures to choose the best ANN structure. However, this method produces high computational complexity.

On the other hand, there are few studies carrying out automatic methods to optimize these parameters. In this context, optimization algorithms have been adopted as the best solution to overcome this problem by identifying and selecting the appropriate model parameters. Various types of optimization techniques have been used in the literature by several researchers such as Genetic Algorithms (GA), Simulated Annealing (SAN) and Particle Swarm Optimization (PSO. In fact, in (Xue 2017), two optimization techniques, genetic algorithms (GA) and particle swarm optimization (PSO), have been used to improve the efficiency of backpropagation neural network model to estimate daily diffuse solar radiation. Results show that the proposed model optimized by PSO performs better than the one optimized by GA. Marzouq et al. proposed evolutionary artificial neural networks for estimation and forecasting solar irradiation, where the authors selected automatically the input parameters in (Marzouq et al. 2019) and ANN architecture (size of input matrix, number of hidden layers and hidden neurons in each layer) in (Marzouq et al. 2020).

Impact of forecasting horizon

According to published literature, we noticed that forecasting horizon could be divided into five time intervals: intra-hour, intra-day, day ahead, two days ahead, and three days ahead. Every time scale is relevant according to its application in solar systems. Figure 11 shows the number of papers devoted to forecasting in each time interval from the studied literature.

Fig. 11
figure 11

Number of papers in each time interval for solar irradiation forecasting

As previously noticed in Tables 5 and 6, forecasting performances depend on the prediction horizon. In fact, forecasting errors rise with the increase in time horizon as shown in the work of Marzouq et al. in (2020) where the values of nRMSE significantly increased from one to six hours ahead. This can be also noticed in the study of Ghofrani et al. in [132] where the forecast was performed from one hour to two days ahead.

Climate conditions

The availability of solar irradiation data is heavily influenced by clouds motion and weather conditions. In fact, the cloudiness affects the amount of solar irradiation received at the earth’s surface (Besharat et al. 2013). As demonstrated in the literature, the forecasting accuracy can change drastically depending on the site climate (Liebermann et al. 2021), (Hong et al. 2020). Certainly, most proposed models perform well on sunny days (Mohammadi et al. 2016a), while on cloudy days the forecasting accuracy decreases significantly. In addition, ANN models present a limitation in forecasting in extremely bad weather conditions (Benmouiza and Cheknane 2016). As we can see in the study of Pedro et al. in (Pedro and Coimbra 2015), the forecast error doubles between Merced (a location with low GHI variability) and Ewa Beach (a location with high GHI variability). These results indicate that low forecast errors are more difficult to obtain for locations where clouds are formed locally. Another study in (Kumari and Toshniwal 2021c) proposed a hybrid LSTM–CNN model where the authors proved that this model presents highly prediction accuracy in different sky conditions (overcast, cloudy, mixed, and sunny) for different climatic locations.

Accordingly, some researchers suggest classifying weather before prediction. In fact, this procedure is essential to increase the forecasting performances and the robustness of used models. As proved by McCandless et al. in (2016), identifying the different cloud regimes by applying the K-means clustering based on surface weather and irradiance data can improve significantly forecasting accuracy. Also, the study of Wang et al. in (2018) forecast solar irradiation in four different weather types (sunny days, cloudy days, rainy days, and heavy days). For sunny days, the hybrid CNN- LSTM model performs well, while for cloudy and rainy days, they used Discrete Wavelet transform as a pre-processing step to decompose data given to the CNN-LSTM model.

On the other hand, the season has a major impact on forecasting performances. In the literature, several authors tested ANN models under each season. In (Yu et al. 2019), the authors evaluated their models under four seasons: spring, summer, fall, and winter. They found that the forecast error decreases in winter and summer and increases in spring and fall. Another study in (Kumari and Toshniwal 2021c) proved that the hybrid LSTM–CNN model outperforms other models such as SP, SVM, MLP, CNN, and LSTM in all seasons for three different locations. Furthermore, this study revealed that the seasonal errors of the proposed model are highly influenced by the site climate.

Impact of feature selection, data nature and size

The estimation and forecasting ANN models require accurate dataset from studied sites. From our investigation, we noticed that some studies do not provide specifications about used dataset such as total samples and data recording intervals. This information should be considered in future works in order to evaluate prediction models properly. Furthermore, the combination of input parameters is important to enhance model’s performance. Therefore, a pre-processing phase followed by a feature selection procedure are necessary to solve this issue and identify the best subset of data fed to the ANN model. In fact, research shows that the performance of ANN models increases when the input parameters have high correlation factor with the solar irradiation.

Regarding input nature, the estimation approaches proposed in the literature are mostly based on exogenous inputs. However, in the forecasting process, we found that authors adopt endogenous, exogenous and hybrid inputs for their models.

The best choice of input parameters in estimation models plays an important role in increasing the prediction performances. Thus, many works used different combinations of parameters to choose input variables that present better prediction accuracy. Based on the finding results of (Jahani and Mohammadi 2019), (Kaba et al. 2018) and (Marzo et al. 2017), sunshine duration was the most relevant parameter and most significant in the estimation process compared to other parameters. However, in (Citakoglu 2015) authors proved that the ANN model with four input parameters: month number (M), extraterrestrial radiation, average air temperature (Tm), and average relative humidity (RHm) presents the best result. In addition, this study proved that the month number is the most significant one among the other variables. On the other hand, (Guijo-Rubio et al. 2020) proposed an approach to estimate solar irradiation based only on satellite data in order to avoid the use of ground data obtained with expensive measuring instruments. This approach can be used to predict solar irradiation in locations with similar data.

Regarding the forecasting procedure, Husein and Chung in (2019a, b) forecast solar irradiation using only exogenous inputs. They proved that the prediction accuracy of the proposed algorithm is dependent on the combination of weather predictor variables and training algorithms. On the other hand, the studies in (Notton et al. 2019), (Huynh et al. 2020), (Bou-Rabee et al. 2017), (Sharma et al. 2016), (Benmouiza and Cheknane 2016), (Azimi et al. 2016), (Pedro and Coimbra 2015) and (Gairaa et al. 2016b) only used endogenous parameters to forecast solar irradiation. Additionally, various authors used mixed parameters (endogenous and exogenous). Indeed, Castangia et al. in (2021a) and Jadidi et al. (Jadidi et al. 2018) proved that adding exogenous inputs to solar irradiation parameters can significantly improve the forecasting performance for prediction horizons greater than 15 min.

Furthermore, in the study of Ahmad et al. in (2015), several combinations of endogenous and exogenous inputs to forecast solar irradiation for 24 h in advance have been used. In addition, Wojtkiewicz et al. in (2019) proved that adding cloud cover as input improves the prediction performance. However, the main challenge of using both endogenous and exogenous inputs is to develop a model with the simplest architecture and the smallest possible number of inputs that can achieve a better accuracy of forecasting (Diez et al. 2020).

From the above studies, the historical input values can also have a significant impact on forecasting accuracy. Therefore, authors need to choose the optimal input matrix dimension. As indicated by Wojtkiewicz et al. in (2019), they tested from one to three historical values as an inputs matrix. The authors found that two previous days of input data give the best performance (using LSTM and GRU) to forecast solar irradiation for one hour ahead. However, there are only a few studies carrying out an automatic selection of previous values used as inputs to their models. For instance, in the study of Marzouq et al. in (2020), the authors used an evolutionary approach to determine the optimal historical data needed. In (Notton et al. 2019), (Fouilloy et al. 2018), (Benali et al. 2019), the authors used the auto mutual information method for this purpose.

Estimation and forecasting of solar irradiation using ANN versus other Models

In order to show the ability of the proposed approach to forecast and estimate solar irradiation, authors have compared their results with other machine learning, statistical, and empirical models. In fact, in the studies of Martinez-Castillo et al. in (2021) and of Gürel et al. in (2020), ANNs reveal better accuracy than RF, SVM, MLR, Holt-Winters, RSM, and empirical models. Most studies show that ANN models performances exceed all other methods (see Tables 3, 4, 5, and 6).

On the other hand, the deep learning models such as LSTM, RNN, and GRU exceed other types of ANN models such as MLP and BPNN in terms of performance. As showed in (Castangia et al. 2021a), (Aslam et al. 2019), (Qing and Niu 2018), the LSTM model exceeds BPNN and FFNN. In addition, deep learning models are found to accurately forecast solar irradiation in different climatic conditions (Jallal et al. 2020). The study of Yu et al. in (2019) compared LSTM with ARIMA, SVR, and ANN models and found that LSTM has a strong competitive advantage in forecasting in cloudy days and mixed days in different seasons.

Recommendations, trends and outlooks for future research

From our review, we noticed that the performances of estimation and forecasting models depend on several factors (see Fig. 8). In this subsection, we will provide concluding remarks, recommendations, and trends deduced from our investigation of the studied papers.

Figure 12 shows the cumulative number of publications per year using single or hybrid ANN models from 2015 to 2022.

  • As we can see from Fig. 12, the use of hybrid models increased in comparison with single ANNs, proving that the predictions generated by single ANN models tend to be limited in terms of performances. The key findings of the studied works highlighted the capacity of the hybrid ANN models to significantly improve the performances of the solar irradiation prediction for different climates and seasons.

  • To prove the generalization ability of proposed models, researchers should test their models under different weather conditions, seasons, and locations. Nevertheless, in the literature only few works consider data collected from different climates and few compare the model performance under different weather conditions and seasons. These gaps should be taken into consideration in future studies.

Fig. 12
figure 12

Cumulative number of publications that used ANN models for solar irradiation prediction with single and hybrid ANN models

Figure 13 shows the cumulative number of works using deep learning models in the studied literature for both estimation and forecasting models.

Fig. 13
figure 13

Number of studies using Deep Learning models for solar irradiation prediction from the studied literature

Arising trend in the use of deep learning models in the solar irradiation forecasting field can be clearly noticed in Fig. 13, where the number of publications using deep learning models increased significantly from 2015 to 2022, which demonstrates that deep learning models proved their ability to perform well under bad weather conditions. Thus, the research community proposed to extend their attention toward deep learning models for forecasting and estimating solar irradiation and its components.

Another important finding through the literature study is that recent works use the cloud cover, temperature and humidity as inputs to their neural network predictor (Wojtkiewicz et al. 2019), which may indicate their effectiveness and pertinence as input features. These variables should be taken into account in the forecasting process, especially in sites with high solar irradiation variability. In fact, many authors found that including these exogenous inputs improves the forecasting performances (Castangia et al. 2021a),(Jadidi et al. 2018), which explains the trends toward using simultaneously endogenous and exogenous inputs. This trend is clearly shown in Fig. 14 detailing the number of publications versus the types of used inputs. However, the correlation between such parameters and the forecasting output changes from one site to another. Thus, more comparative studies should be carried out to show the effect of including meteorological parameters on the performance of the proposed models to draw a conclusion on this research topic.

  • An effective approach that we have identified from our literature study includes investigations on the clouds effect during model development, starting by classifying the sky conditions before following up by the forecasting task (McCandless et al. 2016). This decomposition can be performed using data-preprocessing techniques such as K-means clustering and Discrete Wavelet transform.

  • From the literature review, we noticed that all proposed works focused only on a specific issue, which is either estimation or forecasting. In (Amiri et al. 2021), Amiri et al. presented for the first time a new approach that could both estimate and forecast solar irradiation. This study is based on a hybrid model with two outputs: one estimates the irradiation at the current instant and another predicts it for the next hour. Despite the complexity of the addressed problem, obtained results are good compared to the literature as claimed by the authors. This approach can be useful for monitoring a solar system in real-time and forecasting future energy production. Further studies simultaneously estimating and forecasting solar irradiation should be addressed to evaluate this proposed approach. This can be a challenging direction in future research.

  • In solar irradiation prediction problems, ANN models need to be trained with an accurate and preferably a large database. Moreover, the method to set the model architecture, the number of input features, and the input matrix size to enhance the prediction accuracy remains an open problem, especially with different types of data and different time horizons. As a synthesis, using proper data pre-processing methods, optimization algorithms, and feature selection techniques has a remarkable impact on the model performance and should be considered as key success factors of incoming models.

Fig. 14
figure 14

Used data types in studied literature for solar irradiation forecasting

Conclusion

The understanding of solar irradiation predictive methods is of great interest to control and operate solar power generation. In this paper, we provided a comprehensive and in-depth review of the recent studies on estimation and forecasting solar irradiation using ANN models in order to reveal the existing gaps and future suggestions in this field.

Findings show that the performance of predicting models depends on several aspects:

  • Pre-processing phase and feature selection procedures used at the first steps of the prediction process,

  • selection of ANN architecture, and type of employed model,

  • desired forecasting horizon which can lead to higher accuracies for very short horizons,

  • climate type of the location of interest,

In most cases, the model prediction accuracy is influenced by the forecast horizon and the climatic conditions. In fact, ANN models performed well under the clear sky while the accuracy decreases significantly under rapid and varying weather conditions. Moreover, it is worth noting that solar irradiation phenomenon is extremely complex and is not necessarily well modeled with simple approaches, especially when high performances are required. Thus, combining several approaches might be necessary to seek greater accuracy. Further, in the last few years, several studies moved toward the use of deep neural networks such as LSTM and RNN to overcome some limitations of conventional ANNs.

The state of the art of solar irradiation prediction presented in this paper can enrich and chart a path toward promising and suitable directions for future planners and forecasting professionals to develop an appropriate ANN predictor. Additionally, the offered directions will inspire future researches that can improve the prediction models toward more accuracy and efficiency.