Keywords

1 Introduction

Solar power is one of the most promising renewable energy sources, the generation of which does not result in the emission of pollutants and greenhouse gases (Kim et al. 2017). Global warming and the energy crisis over the past few decades have motivated the use and development of alternative, sustainable, and clean energy sources (Sabzehgar et al. 2020). Solar energy or solar power is a nonconventional source or renewable source of energy and its intensity depends on the weather and sun’s position concerning the panels. Electricity demand needs to be predicted for better planning of utilization. Solar energy generation varies frequently with the weather and relative solar position (Verma et al. 2016).

Solar power forecasting is a complex process as it mainly depends on climate conditions, which change or fluctuate over time. Therefore, machine learning (ML)–based methods have been used for effective solar power generation forecasting. For improved accuracy, new and more intelligent methods are being developed (AlKandari and Ahmad 2020).

Renewable energy or nonconventional generation resources such as solar and wind behave in a stochastic fashion due to frequent weather parameter changes. Transmission and distribution losses also affect the performance of these resources. Therefore, scheduling, optimization, and management of smart grids and microgrids with a high concentration of renewable energy resources are the main issues or challenges of such grids. One of the most promising practices in scheduling the performance of smart grids is to forecast the energy production of the resources that lead to energy generation and cost-efficient replacement of the current process (Sabzehgar et al. 2020). Models based on artificial intelligence (AI) are used for optimization and control purposes due to their learning capabilities. Examples of these models include support vector machines (SVM), regression, and neural networks.

AI techniques play an important role in prediction modeling and analysis of the performance as well as in controlling renewable energy generation processes. AI techniques are used to solve complicated problems and practices in various fields of the engineering and technology. AI systems can be used as a way to solve complex problems. AI systems have been used in diverse applications of pattern recognition, manufacturing, optimization, control, robotics forecasting, power systems, signal processing, medical, and social sciences (Kumar and Kalavathi 2018). Their use in renewable energy generation forecasting is however a newer application, becoming increasingly popular nowadays.

2 Use of AI in Solar Generation Prediction

One of the simplest AI-based techniques used in generation prediction is regression analysis, which is effectively used for solar generation prediction in the present work. It is a method to determine a functional relationship of the model between predictor (independent) parameters and response (dependent). The regression analysis shows a repetitive process so that the outputs can be utilized to verify, criticize, analyze, and modify the inputs (Nalina et al. 2014). The regression approach identifies the correlation between parameters by fitting the linear equation to the data. Figure 1 shows the plot between solar generation and ambient temperature. This type of distribution of data can be analyzed using the regression technique very conveniently.

Fig. 1
A scatterplot graph of solar generation versus ambient temperature. The plots are clustered between 0.150 and 0.250 million units and between 29.5 and 30.5 degrees Celsius. Data are estimated.

The plot of solar generation versus ambient temperature

The methodology of modeling the system for the prediction of solar generation is shown in Fig. 2. It shows the stepwise procedure of the AI model developed for obtaining the forecast data considering weather conditions, such as, sunny, scattered cloud, partly sunny drizzle over, overcast, thunderstorm, etc. Another input is daily solar generation in net million units (MU) from the North Eastern Regional Load Dispatch Centre (NERLDC) for the state of Assam for the year 2021.

Fig. 2
A block flow diagram of the system model. The input training data set is followed by processing techniques, machine learning algorithms, and the development of a unified model for forecasting.

Block diagram of the system model

Figure 3 shows the step-by-step methodology of training, testing, and validation of a regression-based AI model designed for predicting solar generation. The weather parameters are gathered from different authentic weather forecast websites. Initially, the solar generation is predicted using a model generated, based on the aforementioned weather parameters, using two different approaches: linear and polynomial regression techniques. Then, these estimated values are used to forecast the generated power in the grid. The estimation models are obtained from the averaged data gathered in the year 2021. An essential step in obtaining forecast models is to have two sets of data: the training and the test data sets. The training data set is used to obtain the prediction model, and the test data set is used for model evaluation.

Fig. 3
A decision flowchart of the A I model. The solar generation and weather conditions data are followed by training the software, testing using the weather forecast data, comparing with the actual solar generation, and ending.

Process flowchart of the AI model

Therefore, further weather and generation data were gathered to test the accuracy of the model. The input data are all sorted before being used. The data with null entry are eliminated, and the data are normalized to increase the accuracy of the forecast models. After utilizing these models for forecasting solar power generation, the error of each model is measured by comparing the original target data and the forecast data. The analysis and predictions are performed using Python software for programming and training regression algorithms.

As mentioned earlier, the solar energy generation forecast, utilized in this work, is done in phases. Phase one is the data preprocessing for the forecast models, which in the case of the current work includes eliminating all of the null entry data. In phase two, a model for estimating and forecasting solar generation based on weather parameters is derived. It can be comprehended from these figures that while some of the weather parameters are linearly related to solar generation, some are not. Hence, polynomial regression technique is found to be more accurate in estimating solar generation as compared to linear regression.

3 Results and Discussion

Figure 4 shows the best fit polynomial graph after the regression technique is implemented for solar generation estimation in August 2021. In the figure, the horizontal axis shows the ambient temperature, and the vertical axis shows solar generation data obtained from NERLDC for the state of Assam.

Fig. 4
A scatterplot graph of the solar generation in the state of Assam, India, in 1 day versus the ambient temperature. The plots are clustered between 0.13 and 0.25 million units and between 29.5 and 30.5 degrees Celsius. A line passes through the cluster and falls to (32, 0.05). Data are estimated.

Polynomial fit of solar generation vs. ambient temperature

Table 1 shows the estimated solar generation considering 1-month-long data for training the regression model and predicting the solar generation of the subsequent days for the different seasons of the year 2021.

Table 1 Estimated solar generation using polynomial regression using month-long data for different seasons in the year 2021

Table 1 shows the accuracy of estimated solar generation for different days of the seasons of 2021 for training and validation, using polynomial regression, thereby considering the actual solar generation (net MU) from the NERLDC website and the predicted solar generation (net MU) from the technique developed and subsequently computing the relative error. Figure 5 shows the error percentage curve by taking the actual and the predicted value obtained from the system forecasting model using polynomial regression technique. The error percentage lies within the range of -5% to +20% for the period considered. Table 2 shows estimated solar generation considering data for 15 days of the year 2021 for training. The methodology used is the same as the training done using 1-month data. The accuracy calculations are also shown for the different seasonal estimations for the year. The table shows the calculation of the accuracy of different months of 2021 for training and validation using the regression algorithm, thereby considering the actual solar generation (net MU) from the NERLDC website and the predicted solar generation (net MU) from the trained regression model and obtaining the error.

Fig. 5
A dual bar and a line graph of solar generation and error percentage versus training and testing period. The bars are actual and predicted. Period 4 has the highest bars of 0.29 and 0.23 million units, respectively. The line follows an irregular trend and peaks at (4, 22). Data are estimated.

Error percentage curve between the actual and the predicted solar generation in (net MU)

Table 2 Estimated solar generation using polynomial regression using 15 days of data for different seasons in the year 2021

Figure 6 shows the error curve by taking the actual and the predicted value obtained from the system forecasting model using polynomial regression technique for a particular period interval as shown in Table 2 and its error percentage lies within the range of −23% to +17%.

Fig. 6
A dual bar and a line graph of solar generation and error percentage versus training and testing period. The bars are actual and predicted. Period 8 has the highest bars of 0.29 and 0.295 million units, respectively. The line follows an irregular trend and peaks at (3, 17). Data are estimated.

Error percentage curve between the actual and the predicted solar generation in (net MU) using 15-day data period interval

Table 3 shows estimated solar generation considering weekly data (7 days each) for the training of the prediction system for different seasons of the year 2021. In the table, the accuracy of the estimation technique considering weekly data as input (weather conditions, temperature, solar generation, etc.) for different seasons of the year 2021 for training. By comparing the actual and predicted solar generation, it can be observed that training the system with weekly data gives better accuracy. With month-long data for training, the error is slightly more than that with week-long data. With 15 days of data used for training, the error in estimating solar generation is the highest. Training the regression model with less than 7 days data leads to greater error. Hence, it is observed that the optimum period of training of the regression model for a more accurate prediction of solar generation is 7 days (1 week).Fig. 7 shows the error curve by taking the actual and the predicted values obtained from the system forecasting model using polynomial regression techniques for a particular period of interval as shown in Table 3, and its error percentage lies within the range of −25.5% to +15%.

Table 3 Estimated solar generation using polynomial regression using 7 days of data for different seasons in the year 2021
Fig. 7
A dual bar and a line graph of the solar generation and error percentage versus training and testing period. The bars are actual and predicted. Periods 2 and 4 have the highest bars of 0.29 and 0.295 million units, respectively. The line follows an irregular trend, and peaks at (3, 16). Data are estimated.

Error percentage curve between the actual and the predicted solar generation in (net Mu) using 7-day data period interval

4 Conclusion

In this work, an AI-based solar generation prediction model is developed, based on the actual values of different weather parameters for the city of Guwahati, Assam, in the year 2021. The designed regression-based AI model gives accuracy in the range of 0.75–25.5% error. The model can be further improved in accuracy by incorporating more inputs, such as near-real-time remote sensed data. Moreover, as the inputs are highly nonlinear in relation, the use of artificial neural networks in the training is expected to provide more accuracy in estimating solar generation in the region. This AI-based technique also requires testing and validation for other areas of the world for the global validity of the model.