Introduction

Agriculture plays a strategic role in Brazilian economic development. In addition to the economy, it has also contributed to the reduction of poverty and inequality in Brazil (Berchin et al. 2019). The Brazilian agricultural sector is characterized by modernity and dynamism (Garcia and Vieira Filho 2014; Meyer and Silva 2019; Meyer and Braga 2019).

The state of Minas Gerais is described as having a vast climatic diversity (Antunes 1986; Dubreuil et al. 2019), with four main characteristics: humid tropical savanna (Aw), dry climate with summer rains (BSw), rainy temperate (Cwa) and subtropical altitude (Cwb), according to the Köppen climate classification (Souza et al. 2006).

Minas Gerais is a state that has significant importance in Brazil’s agricultural economy. In 2017, agribusiness accounted for 33.54% of the state’s GDP and had a 13.59% share of the Brazil’s GDP. The predominant culture in Minas Gerais is coffee, in a way that Minas Gerais is the Brazilian state with the highest coffee production, responsible for 54.27% of the produced coffee in the country (CEPEA 2019; FAEMG 2019).

Several aspects, such as climate, relief and hydrographic basins, are predominant in the composition of the varied biodiversity of the state of Minas Gerais (Oliveira et al. 2017). The state’s vegetation can be described in three main biomes: Atlantic Forest, Cerrado and Caatinga (IEF 2018).

The predominant biome is the Cerrado, appearing in about 50% of the State, mainly in the basins of the São Francisco and Jequitinhonha rivers (Callisto et al. 2016). In the Cerrado, the dry and rainy seasons are well defined (Scherrer et al. 2016). The vegetation consists of grasses, shrubs and trees. The second largest biome in Minas is the Atlantic Forest, with a dense vegetation and permanently green forest, due to great periods of rainfall (Szabó et al. 2018). The trees have large, smooth leaves. The Campo de Altitude, or rock biome, is characterized by a lower proportion of vegetation cover with a wide variety of species, with herbaceous vegetation predominating, where shrubs are scarce and trees are rare and isolated (Silveira et al. 2016). They are found at the highest points of the mountains of Mantiqueira, Espinhaço and Canastra (Silva et al. 2018). Mata Seca (Dry Forest) is present in the north of the state, in the São Francisco river valley (Rodriguez et al. 2017). The plant formations of this biome are characterized by the appearance of spiny plants, dry branches and few leaves in the dry season. In the rainy season, the forest flourishes intensely, providing great foliage (IEF 2018).

Solar radiation is all electromagnetic radiation derived from the Sun that reflects the planet (Querino et al. 2011). Solar radiation is the driving force for many physical–chemical and biological actions that take place in the Earth-Atmosphere system (Brusseau et al. 2019). It is considered an important meteorological variable used in the analysis of water requirement of irrigated crops, modeling of plant growth and production, climate change, among others (Borges et al. 2010; Jahani et al. 2017).

The difficulty in measuring solar radiation, mainly due to the cost of sensors, maintenance and the technical difficulty of installation in remote locations (Das et al. 2015; Yang et al. 2006), causes the need for modeling to estimate solar radiation. Thus, several researchers have developed models to determine radiation, being based on artificial intelligence (Mohammadi et al. 2015; Shamshirband et al. 2016). The importance of determining solar radiation is extremely important, since it is a source of energy for plants.

However, there are locations where the collection of solar radiation data is not performed. In these cases, the estimated values can be obtained by means of mathematical models, which differ from each other by the degree of complexity and by the input variables (Borges et al. 2010). The first published model to estimate solar radiation was made by Angströn (1924). This model is based on heat stroke (hours of sunlight), to estimate the incident solar radiation (Borges et al. 2010; Buriol et al. 2012).

For Tanaka et al. (2016), the most popular and employed temperature-based models are the models by Hargreaves (1981) and Bristow and Campbell (1984), since they require few meteorological variables to estimate solar radiation, thus being characterized by simplicity.

An Automatic Meteorological Station (EMA) collects, every minute, meteorological data (temperature, humidity, atmospheric pressure, precipitation, wind direction and speed, and solar radiation) that represent the place where it is located. Every hour, these data are received and made available to be transmitted, via satellite or cell phone, to the National Meteorological Institute (INMET’s) headquarters in Brasília (capital of Brazil). All data received are validated, through a quality control and stored in a database (INMET 2018).

In view of this relevant scenario of agribusiness in the state of Minas Gerais, it is noted that it is extremely important to have knowledge of the conditions that can influence agricultural production. Based on the problem raised, in which several regions of the state of Minas Gerais do not present data, and studies that indicate the simplified model that presents the best performance, the present work aimed to perform the calibration of fifteen simplified models for estimating solar radiation and, subsequently, evaluate their statistical performance for 51 cities in the state of Minas Gerais.

Materials and methods

Study area

Figure 1 shows the location of the 51 cities where the automatic stations are located in the state of Minas Gerais, which were used in the study. It is observed that all regions of the state were able to be contemplated.

Fig. 1
figure 1

Location of the cities studied in the state of Minas Gerais. Source: INPE (2019)

Data acquisition and information about EMA’s

The data used in this work were obtained from the network of Automatic Meteorological Stations (EMA’s) of the National Institute of Meteorology (INMET), located in 51 cities in the state of Minas Gerais, Brazil (Table 1). This network is composed by 68 EMA’s in the whole of the state. However, some EMAs had failures and lack of data, characterized by equipment failures, maintenance periods, or were built recently and so, had little data. Therefore, 17 EMA’s were disregarded in the analyses.

Table 1 Automatic meteorological stations in the state of Minas Gerais

The data period was different from city to city, since the EMA’s started operations at different times, and also, due to technical problems, there were periods when data collection did not occur, thus causing different amounts of data between the cities studied. Table 1 shows the amount of data used in the study, the period of data collection, the amount of data collected and the percentage of null data. The data actually used in the analyzes are lower than the totals collected, since there was a loss of data caused through collection system failures, instrument failures, data capture problems, among others.

The climatic stations in which the data were collected are standardized, being free from natural and building obstruction, with a minimum area of 14X18 meters, fenced and grassed. The vegetation within that radius is grassy, always kept around 5 cm. This area is closed with a fence to prevent the entry of animals. Since the EMA’s are composed of a data collection subsystem, through sensors that measure environmental variables; control subsystem and local storage in data-logger; power subsystem; communications subsystem; database subsystem; and a subsystem for disseminating data to users, openly and free of charge over the internet. In EMA’s the data collection is done through sensors to measure the meteorological parameters to be observed. The measures taken, at minute-by-minute intervals, and paid for within an hour, to be transmitted, are: Instant Air Temperature; Maximum Air Temperature; Minimum Air Temperature; Instant Relative Air Humidity; Maximum Relative Humidity of Air; Minimum Relative Humidity of Air; Instant dew point temperature; Maximum Dew Point Temperature; Minimum Dew Point Temperature; Instant Atmospheric Air Pressure; Maximum Atmospheric Air Pressure; Minimum Atmospheric Air Pressure; Instant Wind Speed; Wind Direction; Intensity of the Wind Gust; Solar Radiation and Precipitation accumulated in the period.

Models

The equations addressed in the present research are based on air temperature and precipitation, since these data were collected in all stations used in this study (Table 1). In addition to that, such variables can be measured with low-cost equipment. The mathematical models for estimating the studied global solar radiation, coefficients with demand for calibration and their respective references are presented in Table 2.

Table 2 Equations for estimating solar radiation, parameters and references

The models under evaluation were ordered according to the name of the author(s). Most of them are models from the proposals of Hargreaves (1981) and Bristow and Campbell (1984), in which there are different requirements regarding the parameterized coefficients with the need for calibration (Tanaka et al. 2016).

First, the coefficients of each model were calculated, to ascertain which model would have the least error for the cities studied.

The obtaining of the parameters of all models was done using the Matlab Software, with the lsqcurvefit function, which is indicated for solving nonlinear curve fitting problems (data adjustment) in the sense of least squares (Matlab 2019).

Therefore, the following function seeks to determine coefficients x that solve the problem mentioned above:

$$ \mathop { \hbox{min} }\limits_{x} \left\| {F(x, x{\text{data}}) - ydata} \right\|_{2}^{2} = \mathop { \hbox{min} }\limits_{x} \sum\limits_{i} {(F(x,xdata_{i} ) - ydata_{i} )^{2} } , $$

In which the input data provided is xdata and the observed output values are ydata. Thus, xdata and ydata are matrices or vectors, and F (x, xdata) is a function with a matrix value or vector value of the same size as ydata.

The function lsqcurvefit requires the user-defined function to calculate the function with a vector value:

$$ F \left( {x, xdata} \right) = \left[ {\begin{array}{*{20}c} {F \left( {x, xdata(1)} \right)} \\ {F \left( {x, xdata(2)} \right)} \\ \vdots \\ {F \left( {x, xdata(k)} \right)} \\ \end{array} } \right]. $$

The function syntax is:

$$ x = lsqcurvefit \left( {fun, x0, xdata, ydata} \right) $$

As an example of applying the function to a simple exponential fit model, assuming that the observation time data is xdata and the observed response data is ydata, the objective is to find the parameters x (1) and x (2) to fit the model:

$$ ydata = x (1){ \exp } (x(2) xdata) $$

Thus, for the vectors:

$$ xdata = \left[ { 1 { 2 3 4 5 6 7 8 9 1}0} \right]; $$
$$ ydata = \left[ { 4 5 5 { 428 124 67 43 28 13 } - 4 { } - 1 { } - 5} \right]; $$

The associated simple exponential decay model will be:

$$ function = @ (x, xdata) x(1) * { \exp } (x (2) * xdata) $$

Adjusting the model using the starting point × 0 = [100, − 1], we have:

$$ x0\, = \,\left[ {100, - \,1} \right]; $$
$$ x = lsqcurvefit \left( {fun, x0, xdata, ydata} \right) $$

The statistical performance indexes used in this work, to ascertain the accuracy of the models, were: the Coefficient of Determination for Linear Regression (R2); Root-Mean-Square Error (RMSE) and Mean Relative Error (MRE). To assess whether the model performs well or not, the R2 and RMSE values are observed. For the value of R2, the best is that it is closer to 1, so that the estimated values are close to the measured values. For RMSE, the lower the value, the better the performance of the statistical model (Jacovides and Kontoyiannis 1995; Tanaka et al. 2016).

For the analysis of the best model, the position values (Vp) of the statistical indicatives were used to classify and determine the best method for estimating the global radiation. To obtain the Vp value, scores from 1 to “n” were assigned to each statistical indicator, with “n” being the number of models tested, that is, n = 15, in which case, the score of 1 was assigned to best model and the score of “n”, to the worst. Then, to find the best model, the score is summed up and the best will be the one with the lowest sum of the assigned scores, that is, the lowest accumulated Vp value.

To verify the accuracy of the models studied, the coefficient of determination of linear regression (R2) was observed, since it is one of the first indicators of the good performance of the model (Yorukoglu and Celik 2006). However, in addition to the R2, it was necessary to analyze other evaluation parameters, such as the analysis of the degree of dispersion between the estimated values, overestimation and underestimation of the model and its degree of precision (Jacovides and Kontoyiannis 1995).

Results and discussion

Tables 3, 4 and 5 present the parameters of the equations for the global solar radiation estimation calibrated for all the cities studied. With that data, it is possible to observe a great variation between the values of the same model between the cities. The values for Linear Regression (R2), also varied between cities, revealing that a model is great for certain cities, but for others it is not recommended.

Table 3 Calibrated parameters of the solar radiation estimation equations for cities in the state of Minas Gerais
Table 4 Calibrated parameters of the solar radiation estimation equations for cities in the state of Minas Gerais (Continuation)
Table 5 Calibrated parameters of the solar radiation estimation equations for cities in the state of Minas Gerais (Continuation)

Analyzing the data collected, it can be seen that the R2 values ranged from 6.04 to 59.58%. The city with the lowest R2 value was Passa Quatro (6.04%), and the highest value (59.58) was for Capelinha. The average of the R2 values was 36.97%, values slightly below those found by Tanaka et al. (2016) for cities in the state of Mato Grosso, in which they ranged from 40 to 70%, and Borges (2010) for the city of Cruz das Almas in the state of Bahia, in which they ranged from 68 to 72%. This variation is expected, due to the climatic characteristics of each region, but also because of the large amount of data (51 cities were studied) analyzed in this work when compared to other similar works.

In Fig. 2, it is possible to analyze the variation of the values for the MRE models. It is observed that there was a tendency of overestimation for most of the models used in this study. Tanaka et al. (2016) also found a tendency towards overestimation for the state of Mato Grosso. However, Almorox et al. (2011) found a tendency towards underestimation, in the case of Spain. There is a major disadvantage in analyzing the MRE in isolation, where the underestimation of an isolated observation can cancel out the overestimation of another (STONE 1993).

Fig. 2
figure 2

Mean Relative Error (MRE) of the global radiation estimates for models with calibrated coefficients for different weather stations in the state of Minas Gerais

Figure 3 shows the values for the RMSE of the coefficients calibrated for cities in the state of Minas Gerais. The dispersion between the measured values and the estimated values is, on average, 2.80 MJ m−2 day−1 for all models studied. They corroborate the work of Almorox et al. (2011), in Spain. Values lower than those found by Goodin et al. (1999), in which the RMSE was between 3.62 and 5.81 MJ m−2 day−1 in the United States, and by Tanaka et al. (2016), in the state of Mato Grosso.

Fig. 3
figure 3

Root-Mean-Square Error (RMSE) of the estimates of global radiation for models with calibrated coefficients for different weather stations in the state of Minas Gerais

The Willmott d index et al. (1985) demonstrates the degree of accuracy between the measured and estimated values and is represented in Fig. 4. It is observed that all models obtained results with almost perfect precision, with index values between 0.92 and 1.0, values above those found by Tanaka et al. (2016) and Silva et al. (2012).

Fig. 4
figure 4

Willmott agreement index (d) for global radiation estimation models calibrated for different meteorological stations in the State of Minas Gerais

Figure 5 represents the correlation between the observed solar radiation values and those estimated for the city of Varginha-MG. The models that showed the best performance for Varginha were Bristow and Campbell, Hunt 1 and Donatelli and Campbell. It is observed that most models showed a tendency towards overestimation. Silva et al. (2012) and Tanaka et al. (2016) also observed overestimations of the studied models, for the northwest region of Minas Gerais and for the state of Mato Grosso, respectively.

Fig. 5
figure 5

Correlations between the measured global solar radiation and the global radiation estimated by different models calibrated by the Varginha meteorological station

There is a greater dispersion of data from the estimates made by ABS and ASW, which means that the estimated values were not very accurate.

To obtain the best model for each city, an ordering was carried out according to the statistical indexes evaluated in this study. Table 6 is the result of the sum of the ordering of these indexes by city and for the state. Thus, by analyzing the line, one can understand that the lowest value indicates the best model and the highest value indicates the worst model for the city in question.

Table 6 Best models according to the Vp (value position) for the cities of Minas Gerais ordered by river basin regions

Based on a statistical analysis, it can be seen that the DOC model performed better in 33% of the cities studied in the state of Minas Gerais, followed by BRC (23%) and HU1 (18%).

In Table 6, in addition to the best model for each city under study, the best model is also observed according to each hydrographic region of Minas Gerais. As it can be seen, the following models have the best performance for each hydrographic region: the DOC is best suited for the the Rio Doce basin; the BRC, for the Rio Grande; the DOC and HU1 for the Jequitinhonha River basin; the DOC for the Paranaíba river, and the DOC and HU1 for the São Francisco river. Therefore, in the same way as in most cities, the model that had the best performance in most of the state’s hydrographic basins was the DOC.

Due to the lack of automatic meteorological stations installed in the regions of Rio Mucuri, Rio Paraíba do Sul and Rio Pardo, only two cities were studied and, for the São Mateus river basin, only one city was studied.

Conclusion

In view of the studies and calculations performed, the results made it possible to conclude that the performance of the methods of estimating global solar radiation differed between the cities analyzed.

Based on the statistical analysis and the ordering of the indexes, that is, the use of the position values (Vp) of the statistical indicators to classify and define the best method for the estimation of the global radiation, it can be observed that the DOC model obtained better performance in 33% of the cities studied in the state of Minas Gerais, Brazil, followed by the BRC and HU1 models. These three best models add up to 74% of the studied cities. The DOC model also achieved the best performance for most of the state’s river basins.