1 Introduction

The food demand is increasing around the world, which is associated with both population and per capita consumption increase and dietary changes for food insecurity. Under this scenario, soybean crop has an important role for being a grain rich in protein and oil, serving for human consumption and animal feed (EMBRAPA 2015). The increase of soybean demand can be supplied by increasing the growth area or crop yield (Sentelhas et al. 2015; Davis et al. 2016). The increase of growth area is limited in many countries, where yield improvement is the only way to produce more food. In Brazil, there is land available for agricultural expansion of about 65 Mha, but mainly in regions with degraded pastures and restrictive climates (Monteiro and Sentelhas 2017).

For increasing crop yield, it is essential to understand the best management strategies to improve crop resilience (Battisti et al. 2017a; Do Rio et al. 2015), mainly in the zones of higher climatic risk (Heinemann et al. 2016; Zanon et al. 2016). These analyses can be done by using crop simulation models, estimating yields and yield gaps based on simulated potential (Yp) and attainable (Ya) yields, together with actual yield (Yr) (Van Ittersum et al. 2013; Sentelhas et al. 2015) and integrating soil, weather, and cultivar effects on crop growth and development (Nendel et al. 2014; Battisti et al. 2017b). For these analyses, it is crucial to have a consistent historical weather data series with at least 30 years (WMO 1989) and with suitable spatial distribution to represent consistently the temporal and spatial climatic variability (Grassini et al. 2015; Ruane et al. 2015).

In Brazil, suitable temporal weather data and spatial weather station density are limited, mainly in the Midwest and North regions of the country, to where soybean crop is advancing. In these regions, there are few weather stations with series no longer than 15 years (Xavier et al. 2015) and with gaps (Battisti et al. 2017a), which make their use very restrict to agrometeorological studies. One of the ways to solve these problems is the use of gridded weather data (GWD), which are based on the interpolation of observed ground-based data and merged satellite products (Xavier et al. 2015; Ruane et al. 2015).

GWD have a high potential to be used for agricultural applications, mainly for planning purposes, through the crop simulation models. In this way, Bai et al. (2010) showed that a combination of NASA satellite solar radiation with ground-station temperature is an excellent option for filling weather data gaps, allowing the simulation of maize potential yield in China. Bandaru et al. (2017) used four gridded weather databases to simulate the biomass of a short-rotation woody crop in Columbia Plateau, highlighting the importance of accounting for uncertainties in biomass estimates by GWD. Mourtzinis et al. (2017), on the other hand, observed that the use of weather data interpolated from existing meteorological stations for a region with high density of weather stations was better than using two different gridded weather databases (Daymet and PRISM) to simulate maize yield in U.S. Corn Belt.

Considering the lack of historical weather data series in the Brazilian soybean regions and the availability of a large number of daily gridded weather databases around the world (Xavier et al. 2015; Ruane et al. 2015), the main objective of this study was to assess the suitability of two gridded weather databases: AgMERRA, obtained from a reanalysis of satellite and observed weather data (Ruane et al. 2015), and XAVIER, obtained from the interpolation of several ground-based station data available for Brazil (Xavier et al. 2015); in comparison with observed weather data to simulate soybean development and yield in Brazil.

2 Material and methods

2.1 Locations and weather databases

The weather databases employed in this study were obtained from 24 locations in the main soybean regions across Brazil, which represent the areas with the highest soybean production intensity (Fig. 1). These regions encompass different climates, from subtropical humid in the south to tropical with intense dry season during the winter in the north. Details about the climate characteristics of these locations can be found in Alvares et al. (2013). The daily weather data comprise solar radiation, maximum and minimum air temperature, rainfall, mean relative humidity, and wind speed. The period used for soybean growth simulations was between 1980 and 2010, being the same for the three databases, totalizing 30 growing seasons.

Fig. 1
figure 1

Distribution of weather stations used for soybean yield simulations and the distribution of this crop during the 2014/2015 growing season in Brazil. The values in the legend represent the decimal part of a pixel cultivated with soybean. Adapted from Battisti and Sentelhas (2017) and based on IBGE (2016) data

Measured weather data (MWD) were obtained from Brazilian Meteorological Service (INMET) (21 locations) and Agronomic Institute of Paraná (IAPAR) (three locations). The homogeneity test of MWD followed the approach used by Xavier et al. (2015). In these weather databases, 20% of missing weather data was found. For rainfall, missing data were replaced by measured data from the closest rainfall station from Brazilian Water Agency (ANA), which has a dense rainfall station network covering all Brazilian territory. Missing air temperature and relative humidity data were replaced by estimates from linear relationships with nearby weather stations, choosing those with the same climate classification (Alvares et al., 2013). Solar radiation was estimated using Angstrom equation (Angstrom, 1924), which has input sunshine hours, or by the method proposed by Hargreaves and Samani (1982), which uses maximum and minimum air temperature as inputs (Allen et al. 1998; Pereira et al. 2002). Missing wind speed data were filled out with daily historical average values for each day.

The daily GWD were obtained from two publicly available weather databases. The first weather database was developed based on interpolated ground-based stations available for Brazil, aiming to grid data of precipitation and reference evapotranspiration, obtained from Xavier et al. (2015) (XAVIER). XAVIER used 2890 rain gauges and 735 weather stations (260 conventional and 475 automatics) for their interpolations, testing different methods to that and using cross-validation to evaluate the efficiency of GWD in the resolution of 0.25° for latitude and longitude.

The second gridded weather database was the AgMERRA (Ruane et al. 2015), which is one of the global weather database used by AgMIP (The Agricultural Model Intercomparison and Improvement Project) (Rosenzweig et al. 2013). AgMERRA uses reanalysis of climate forcing datasets (provided by MERRA, MERRA-Land, CRU, WM, GPCC, TRMM, CMORPH, PERSIANN, NASA/GEWEX SRB), with a higher spatial resolution for agricultural application. The satellite datasets were adjusted based on measured monthly data, from 737 weather stations in agricultural regions around the world, using smart interpolation, which considers elevation to create daily gridded weather database with a spatial resolution of 0.25°.

2.2 Crop simulation model

The soybean development and yield simulations were performed using the CSM-CROPGRO-Soybean model (Boote et al. 2003), present in the software Decision Support System for Agrotechnology Transfer (DSSAT) (Jones et al. 2003). This model was calibrated by Battisti et al. (2017b) for Brazilian cultivars, with the model presenting a root mean square error below of 550 kg ha−1 for calibration and validation phases. CSM-CROPGRO-Soybean model considers the following approaches: reference evapotranspiration estimated by Penman-Monteith FAO 56 method (Allen et al. 1998); infiltration of water into the soil defined by soil curve number (Soil Conservation Service 1972); soil water balance determined by Ritchie tipping-bucket method (Ritchie 1998); soil evaporation estimated by Suleiman-Ritchie (Suleiman and Ritchie 2003); and leaf-level photosynthesis response from Boote and Pickering (1994).

The common soybean management practices were considered in the simulations, such as sowing date on 15 Nov with a plant density of 30 plants m−2 for all locations, considering a rainfed crop with nitrogen in the soil coming from biological fixation. Across locations, three cultivar maturity groups were used based on the latitude. Maturity group 5.8 was used for latitudes higher than 23° S; maturity group 6.8 between 15° and 22.9° S; and maturity group 7.8 for latitudes lower than 14.9° S. The cultivar coefficients for each of the three maturity groups were obtained from Battisti (2016). The use of different maturity groups resulted in a crop cycle between 110 and 130 days from sowing to maturity, representing the main cultivars used in Brazil. In these regions, the soil type was classified as clay, having a permanent wilting point of 0.296 cm3 cm−3, field capacity of 0.458 cm3 cm−3, and saturation point of 0.578 cm3 cm−3, which resulted in a soil water holding capacity of 1.6 mm cm−1 of soil depth. For simulating the root system, an intermediate value for root growth factor was used, which resulted in a maximum root depth of 120 cm (Battisti and Sentelhas 2017).

2.3 Statistical analysis

The following variables were used to compare the performance of XAVIER and AgMERRA GWD against MWD: potential and attainable grain yield as defined by Sentelhas et al. (2015); maximum and actual crop evapotranspiration accumulated between sowing and maturity; water deficit represented by the difference between potential and actual crop evapotranspiration; days for vegetative (from sowing to beginning of flowering) and reproductive (from beginning of flowering to maturity) periods; accumulated rainfall from sowing to maturity; and mean solar radiation and minimum and maximum air temperatures from sowing to maturity. The analysis was done considering 24 locations and 30 growing seasons as replications. For attainable grain yield, an individual analysis by location was also considered. These variables were all outputs from the crop model used to simulate the soybean growth, considering MWD and GWD as the input in the simulation.

The statistical indexes used to assess the performance of each database were as follows: Pearson’s correlation coefficient (r), mean error (ME), and root mean square error (RMSE) (Wallach et al. 2006):

$$ r=\frac{\sum_{i=1}^N\left[\left( Mi-M\right)\left( Si-S\right)\right]}{\sqrt{\sum_{i=1}^N\left[{\left( Mi-M\right)}^2\right]{\sum}_{i=1}^N\left[{\left( Si-S\right)}^2\right]}} $$
$$ ME=\frac{1}{N}\sum \limits_{i=1}^N\left( Si- Mi\right) $$
$$ RMSE=\sqrt{\frac{1}{N}\sum \limits_{i=1}^N\left( Si- Mi\right)2} $$

where Mi and M are respectively the variable of each i replication and the mean value from MWD or estimated based on it, Si and S are respectively the variable of each i replication and the mean value from GWD or estimated based on it, and N is the number of replications.

3 Results and discussions

3.1 Air temperature, solar radiation, and rainfall

GWD showed a good agreement with MWD for minimum and maximum air temperature with r above 0.79 for AgMERRA (Fig. 2a) and above 0.87 for XAVIER (Fig. 2b), considering the period when soybean is cultivated. The AgMERRA showed a mean error (ME) of − 0.32 and − 0.30 °C, respectively, for minimum and maximum air temperatures (Fig. 2a), while XAVIER had a ME of 0.18 and − 0.06 °C, respectively, for minimum and maximum air temperatures (Fig. 2b). XAVIER presented lower RMSE than AgMERRA for minimum and maximum air temperatures. These errors were around 1.2 °C for AgMERRA and between 0.7 and 1.1 °C for XAVIER. These errors in air temperature can affect several crop development and growth processes in the crop model, such as crop cycle duration, photosynthesis, and evapotranspiration (Martre et al. 2015). Bai et al. (2010) observed that GWD systematic errors of − 2.8 and − 1.4 °C, respectively, for minimum and maximum air temperatures, limited the potential yield estimation for maize in China.

Fig. 2
figure 2

Relationship between measured (MWD) and gridded (GWD) weather data for the following variables: minimum and maximum air temperature (a and b), solar radiation (c and d), and accumulated rainfall (e and f), during soybean cycle, considering two sources of GWD, AgMERRA (a, c, and e) and XAVIER (b, d, and f). Person correlation coefficient (r), mean error (ME), and root mean square error (RMSE) are presented in the graphs. The red line is the 95% confidence interval for predicted individual values, the cyan line is the trend line, and the black dashed line is 1:1 line

The agreement between GWD and MWD solar radiation data, based on their average during the soybean cycle, was poor for both databases (Fig. 2c and d), with r of 0.15 and 0.19, respectively, for AgMERRA and XAVIER. AgMERRA underestimated solar radiation (ME = − 0.28 MJ m−2 day−1), while XAVIER overestimated it (ME = 0.30 MJ m−2 day−1). The poor correlations resulted in high RMSE, around of 2.50 MJ m−2 day−1 for both databases. The worst performance of the GWD for solar radiation was in part expected once as MWD is derived from conventional weather stations, which do not report solar radiation, but effective sunshine hours (Xavier et al. 2015). Based on sunshine hours from MWD, the solar radiation was estimated by Angstrom-Prescott method (Angstrom 1924; Prescott 1940), to the locations where calibrated coefficients (a and b) were available, or by Glover and McCulloch (1958) coefficients, where a and b were not calibrated. Xavier et al. (2015) used Glover and McCulloch (1958) coefficients to estimate solar radiation through Angstrom-Prescott model for conventional weather stations and daily solar radiation measured in automatic stations that probably caused the differences.

The accumulated rainfall during soybean cycle showed good agreement for both databases (Fig. 2e and f). AgMERRA underestimated rainfall by 9.14 mm cycle−1, while XAVIER underestimated it by 21.17 mm cycle−1. For rainfall above 1200 mm cycle−1, both sources underestimated the observed values. Such performance mainly occurs as a function of the spatial rainfall distribution, once intense rainfall that occurs at a specific point is normally underestimated by the GWA that reports a mean value for the grid (Ruane et al., 2015; Xavier et al. 2015).

3.2 Evapotranspiration and water deficit

The maximum and actual crop evapotranspiration estimated with both databases were compared (Fig. 3). For crop evapotranspiration, XAVIER data showed better performance than AgMERRA, with similar performance for maximum (r = 0.82, ME = −11 mm cycle−1, and RMSE = 63 mm cycle−1) and actual (r = 0.80, ME = 4 mm cycle−1, and RMSE = 49 mm cycle−1) crop evapotranspiration (Fig. 3). Maximum crop evapotranspiration showed higher dispersion than actual crop evapotranspiration for both databases, which is basically associated with the errors in solar radiation, once Penman-Monteith ETo method (Allen et al. 1998) considers net radiation as an input data.

Fig. 3
figure 3

Relationship between evapotranspiration estimated with measured and gridded data, using AgMERRA (a and c) and XAVIER (b and d) database, for maximum (a and b) and actual (ET) (c and d) crop evapotranspiration accumulated during the soybean cycle. Person correlation coefficient (r), mean error (ME), and root mean square error (RMSE) are presented in the graphs. The red line is the 95% confidence interval for predicted individual values, the cyan line is the trend line, and the black dashed line is 1:1 line

The total water deficit during soybean cycle, obtained by the difference between the maximum and actual crop evapotranspiration, is presented in Fig. 4. The water deficit penalizes the final yield and is the main yield gap cause for soybean in Brazil, which has different levels across the country (Sentelhas et al. 2015). When determined with MWD, the water deficit showed more values lower than 100 mm cycle−1, but in some cases, it reached near 400 mm cycle−1. Both databases showed an underestimation of water deficit in comparison to the values estimated by MWD, with ME of − 22 and − 15 mm cycle−1, respectively, for AgMERRA and XAVIER (Fig. 4a and b). The water deficit estimated with XAVIER database presented higher correlation (r = 0.81) and lower RMSE (46 mm cycle−1) with the estimates with MWD than AgMERRA (r = 0.69; RMSE = 58 mm cycle−1). The higher water deficit tended to have higher errors, which was a consequence of the errors in rainfall amount and distribution (Xavier et al. 2015; Ruane et al. 2015).

Fig. 4
figure 4

Relationship between water deficits estimated with measured and gridded data, using AgMERRA (a) and XAVIER (b) database, accumulated during the soybean cycle. Person correlation coefficient (r), mean error (ME), and root mean square error (RMSE) are presented in the graphs. The red line is the 95% confidence interval for predicted individual values, the cyan line is the trend line, and the black dashed line is 1:1 line

3.3 Crop development

The vegetative and reproductive periods were well estimated by both GWD, with similar performance for AgMERRA and XAVIER when compared with MWD (Fig. 5). The correlation coefficient was higher than 0.90, with ME ranging from − 0.36 to 0.68 days and RMSE below 2.46 days. These results are similar to those observed by Mourtzinis et al. (2017) for maize in the U.S. Corn Belt for different GWD (RMSE from 3 to 7 days). As the duration of vegetative and reproductive periods are defined by air temperature, photoperiod, and water deficit (Boote et al. 2003), the errors observed of GWD for these variables were not enough to cause errors of higher magnitude for crop phenology estimation. The correct estimation of crop phases and cycle is the first step to have good crop growth and yield estimates by crop simulation models since they also affect photosynthesis rate and crop sensitivity to water deficit (Battisti and Sentelhas 2015).

Fig. 5
figure 5

Relationship between vegetative and reproductive periods estimated with measured and gridded data, using AgMERRA (a) and XAVIER (b) database. Person correlation coefficient (r), mean error (ME), and root mean square error (RMSE) are presented in the graphs. The red line is the 95% confidence interval for predicted individual values, the cyan line is the trend line, and the black dashed line is 1:1 line

3.4 Potential and attainable soybean grain yield

Potential yield is determined by solar radiation, air temperature, and crop cycle duration (Sentelhas et al. 2015). The AgMERRA and XAVIER showed a good agreement with MWD for potential yield, respectively, with r of 0.88 and 0.81, ME of 35 and 69 kg ha−1, and RMSE of 405 and 399 kg ha−1 (Fig. 6a and b). As presented in Fig. 6a and b, most of the data are close to 1:1 line, showing the high agreement between potential yield estimated with MWD and GWD. As the temperature was well estimated by both gridded databases, the majority of the dispersion observed for potential yield is related to the errors in solar radiation (Fig. 2c and d). The results indicate that both GWD were able to estimate efficiently soybean potential yield across Brazil, which disagrees from the study of Bai et al. (2010), which observed that potential yield can be estimated efficiently using solar radiation from satellite GWD, but air temperature needs to be from ground-stations for improving maize yield simulations in China.

Fig. 6
figure 6

Relationship between soybean yield estimated with measured and gridded data, using AgMERRA (a and c) and XAVIER (b and d) database, for potential (a and b) and attainable (c and d) yield. Person correlation coefficient (r), mean error (ME), and root mean square error (RMSE) are presented in the graphs. The red line is the 95% confidence interval for predicted individual values, the cyan line is the trend line, and the black dashed line is 1:1 line

For attainable yield, estimated by considering the effect of water deficit on potential yield (Sentelhas et al. 2015), the estimates with XAVIER GWD presented better performance than with AgMERRA, when compared with MWD. The correlation coefficients were of 0.78 and 0.67, ME of 179 and 238 kg ha−1, and RMSE of 700 and 864 kg ha−1, respectively, for the estimates with XAVIER and AgMERRA (Fig. 6c and d). In both cases, the comparison between the attainable yield estimated by GWD and MWD presented higher dispersion than the same analysis for potential yield, which is mainly caused by the errors observed for water deficit (Fig. 6c and d). According to Banduru et al. (2017), it is important to account for the uncertainties in biomass estimation by crop simulation models when using GWD.

The attainable yield is normally the main result of a crop simulation model, once it represents the maximum yield that can be achieved by the crop in a rainfed system, allowing to elaborate agricultural plan, regarding the main options for crop management (Battisti and Sentelhas 2014; Do Rio et al. 2015). Taking into account the attainable yield estimated for the 24 locations assessed in this study (Table 1), an expressive variability was observed in the efficiency of the estimates done with GWD. The variability was associated with the local climate conditions (Alvares et al. 2013) as well as to the density of the weather stations in the area (Xavier et al. 2015; Ruane et al. 2015; Mourtzinis et al. 2017). One of the best performances was obtained in Cruz Alta, with r > 0.86, ME > − 110 kg ha−1, and RMSE < 774 kg ha−1 (Table 1). On the other hand, one of the worst performances was observed in Ponta Grossa, with r of − 0.14 and 0.22, ME of 713 and 553 kg ha−1, and RMSE of 1003 and 870 kg ha−1, respectively, for AgMERRA and XAVIER GWD (Table 1).

Table 1 Mean attainable soybean yield estimated by the CSM-CROPGRO-Soybean, using measured (MWD) and gridded weather data (GWD) from AgMERRA and XAVIER databases, totalizing 30 crop seasons, for 24 locations in Brazil

Besides the performance of the estimates done by GWD, these databases were able to represent the attainable yield variability along the crop seasons, as showed for two sites in Fig. 7. It is possible to observe that attainable yields estimated by both GWD resulted in similar yield tendency along the crop seasons when the locations with one of the best and one of the worst performances were assessed (Fig. 7). In Cruz Alta, RS (Fig. 7a), the AgMERRA and XAVIER followed the same yield tendency of the estimates obtained with MWD. In Ponta Grossa, PR (Fig. 7b), where one of the worst performance was obtained (Table 1), a systematic error was observed, with the GWD simulating higher yields than MWD (Fig. 7b), mainly with AgMERRA. When the errors are systematic, as in Fig. 7b, it is even possible to capture the effects of climate on yield variability (Pirttioja et al. 2015).

Fig. 7
figure 7

Attainable soybean yield simulated using measured (MWD) and gridded weather data (GWD) from AgMERRA and XAVIER databases along 30 crop seasons for the locations with one of the best (Cruz Alta, a) and the worst (Ponta Grossa, b) performances, as presented in Table 1

The systematic error observed in Ponta Grossa (Fig. 7b) was associated with uncertainties in the solar radiation between MWD and GWD. The solar radiation was estimated from sunshine hours for MWD, while in the GWD was estimated based on sunshine hours, and measured daily solar radiation, especially after 2000s with the increase of automatic weather stations in Brazil (Xavier et al. 2015). This type of trend was not clear for other sites analyzed, which is associated with the number of regional weather station that were variable for both GWD along the period of simulation.

4 Conclusions

The AgMERRA and XAVIER gridded weather databases showed a good agreement for minimum and maximum air temperature, accumulated rainfall, potential and actual evapotranspiration, and water deficit when considered the soybean-growing seasons. Solar radiation was identified being a limiting variable for both GWD in comparison to MWD, but these errors did not affect potential yield simulations by the crop model. AgMERRA and XAVIER GWD were efficient to simulate soybean vegetative and reproductive periods and potential and attainable yields, representing well their variability along the crop seasons. XAVIER GWD presented a better performance in all variables than AgMERRA, which makes its use preferential for simulating soybean growth and development in Brazil.