1 Introduction

Crop production is directly affected by the weather and climatic conditions (Capa-Morocho et al. 2016a). In the Mediterranean basin, natural vegetation in general and crop production in particular has always been affected by large natural climate variability (Grasso and Feola 2012; Páscoa et al. 2017; Gouveia et al. 2017) and is expected to continue to be affected in the future (Nguyen et al. 2016). Particularly, seasonal changes in precipitation and temperature and their seasonal variability affect crop production, especially in regions where crops are highly dependent on precipitation (Ruiz-Ramos and Mínguez 2010). As a consequence of long-term influence of precipitation and temperature on crop production, drought is a major cause of unexpected crop failure (Wilhelmi and Wilhite 2002; Wu and Wilhite 2004; Li et al. 2009; Di Falco et al. 2014; Lesk et al. 2016). In a climate change context, one of the key aims in the agricultural sector for the next few decades will be the mitigation of the risk associated with drought-related crop losses (Li et al. 2009; Ferrise et al. 2011; Capa-Morocho et al. 2016b).

A significant part of the Iberian Peninsula (IP) countries’ economies and landscape is linked to agriculture. In 2014, the IP had more than 26 million hectares of harvested area, and about 2% of each IP countries’ gross domestic product (GDP) came from the agriculture sector (FAO 2015). Among agricultural crops, winter cereals such as wheat and barley are two major world crop productions (FAO 2014) particularly significant in the Mediterranean regions, and the growing of these cereals under rainfed conditions is dominant in the IP countries (Austin et al. 1998; Vicente-Serrano et al. 2006).

Presently, the increase of the frequency of occurrence of drought events in the IP (Vicente-Serrano et al. 2014; Páscoa et al. 2017) and the close relationship between cereal yield and drought conditions in the Iberian territory is pointed out by several authors (Vicente-Serrano et al. 2006; Iglesias and Quiroga 2007; Páscoa et al. 2017). Austin et al. (1998) have shown a strong dependence of wheat and barley on seasonal rainfall in Spain, and the response of winter cereals in IP to the widely used precipitation-based Standard Precipitation Index (SPI) have also been demonstrated in several works (Vicente-Serrano et al. 2006; Iglesias and Quiroga 2007; Hernández-Barrera and Rodríguez-Puebla 2017). Moreover, and aside from rainfall variability, drought severity in southwestern Europe is being reinforced by enhanced evaporative demand due to an increased temperature scenario (Trigo et al. 2013; Vicente-Serrano et al. 2014). Hernández-Barrera and Rodríguez-Puebla (2017) have found wheat yields to be declining in Spain due to warming climate conditions, and according to Ferrero et al. (2014), maize yield in Spain using rainfed systems may be at risk as heat waves will increase in intensity, frequency and duration. Consequently, under the scope of climate change, a sustainable agricultural management of rainfed crops requires reliable estimations of the drought impacts using diverse drought indicators at various spatial and temporal scales.

To include the effect of evapotranspiration on drought monitoring, the Standardized Precipitation Evaporation Index (SPEI) was proposed (Vicente-Serrano et al. 2010) and is now widely used (Vicente-Serrano et al. 2014; Gouveia et al. 2017; Zampieri et al. 2017). In the IP, rainfed cereal yield have shown significant correlations with SPEI varying with several factors, such as month and time scale of the dry episode (Páscoa et al. 2017). Atmospheric patterns, such as the North Atlantic Oscillation (NAO) have also shown significant relationships with wheat yield in the IP (Gouveia and Trigo 2008; Capa-Morocho et al. 2016a).

In addition to the hydro-meteorological influence, the recent advances of remote sensing have strongly contributed to the agricultural sector (Rojas et al. 2011; Kogan et al. 2015a; Van Hoolst et al. 2016). The widely used Normalized Difference Vegetation Index (NDVI) was reported to be strongly correlated to the winter wheat yield over the southern part of Portugal (Alentejo) (Gouveia and Trigo 2008) and north of Spain (Ebro valley) (Vicente-Serrano et al. 2006). Moreover, remote sensing indices based on NDVI and Brightness Temperature (BT) have also been successfully considered by several authors for modelling agricultural productivity (Dalezios et al. 2014; Kogan et al. 2015a, b; Bokusheva et al. 2016), including the Vegetation Health Index (VHI) (Kogan 1995), the Vegetation Condition Index (VCI) (Kogan 1990) and the Temperature Condition Index (TCI) (Kogan 1995).

An important step towards developing strategies to mitigate agricultural drought risk is the establishment of models for estimating crop yield under drought influence (Vicente-Serrano et al. 2006; Mishra et al. 2015; Kogan et al. 2015a). In mechanistic modelling, crop yield is estimated by equations describing the relationships between complex biophysical variables and crop growth, requiring a high degree of input data (Paredes et al. 2014; Giménez et al. 2016; Paredes et al. 2016). On the other hand, empirical modelling makes use of statistical relationships between yield data and predictor variables, representing rather well larger scale impacts of drought conditions (Vicente-Serrano et al. 2006; Matsumura et al. 2015; Kogan et al. 2015a). Despite the lack of detailed representation of crop’s biophysical interactions, empirical modelling is computationally easier and have lower computation costs than mechanistic modelling, and the results are considered good (Ferrise et al. 2011; Estes et al. 2013). Results found by Ferrise et al. (2011) suggested a high level of correspondence between a mechanistic model of durum wheat in the Mediterranean with empirical model’s results. The authors successfully used artificial neural network (ANN) models to reproduce the results of a wheat yield mechanistic model output by using mean spring temperature and precipitation (Ferrise et al. 2011).

The applications of ANN have been increasing in the recent past for modelling and prediction on environmental studies (Morid et al. 2007; Russo et al. 2013; Le et al. 2017) and have proved to add significant improvements to traditional statistical modelling, such as Multiple Linear Regression (MLR) models, namely in the case of crop yield modelling (Jiang et al. 2004; Matsumura et al. 2015).

The purpose of the current work is to model, through the application of MLR and ANN techniques, the influence of drought conditions in rainfed winter cereal yields (wheat and barley) over the major agricultural areas in the IP, examining the potential of combining remote sensing indices (VCI, TCI and VHI) with a multiscalar drought indicator (SPEI). The results presented in this paper constitute a first step towards the development of an agricultural drought risk model for IP and may contribute to assist final users and insurance companies with some guidance on decision making process.

2 Data and methods

2.1 Rainfed cereal yields and land cover in Iberia

Agricultural drought especially affects the growing of crops under rainfed conditions (Páscoa et al. 2017) making data on agricultural land use and harvested yields key factors in agricultural drought risk reduction. Hence, maps of land cover information and data on two major rainfed crops in the IP (wheat and barley) were analysed over the Iberian territory. In IP, the precipitation regime is marked by a strong variability (Martin-Vide and Lopez-Bustins 2006; Muñoz-Díaz and Rodrigo 2006; Martins et al. 2012); hence, there is a high probability of occurrence of droughts and the agricultural activities are particularly prone to its effects. The highly variable precipitation regime in space and time over the IP is strongly associated with the geographic diversity of the peninsula, like the orography, and the influence of diverse circulation weather patterns (Cortesi et al. 2014). The spatial patterns of rainfall in the IP exhibit strong gradients, with higher values in the northwestern sector and lower values in the southeastern sector, and most of the precipitation is concentrated between October and May (Belo-Pereira et al. 2011). In addition to the lack of rain, drier conditions in the summer are enhanced by high temperatures during the summer in the IP (Vicente-Serrano et al. 2014). The spatial heterogeneity of vegetation dynamics in the IP is pronounced, with predominance of the vegetation classes with the maximum of vegetation greenness in spring (Gouveia et al. 2017). According to the classification by Gouveia et al. (2017), the spatial distribution of vegetation clusters exhibits a northwestern-southeastern gradient: Temperate Oceanic–Mediterranean Oceanic–Mediterranean dry. The vegetation behaviour in the IP ecosystems is mainly driven by the precipitation regimes (Gouveia et al. 2008), being expressed in the vegetative cycle of the winter crops: sowing usually occurs between October and November and the harvest occurs during June and July of the following year (Gouveia and Trigo 2008; Capa-Morocho et al. 2016a).

Annual production (tons, t) and total area (ha) of barley and wheat crops were obtained from the Portuguese National Statistics Institute (INE) and the Spanish Agriculture, Food and Environment Ministry, for the regions of Portugal and the provinces of Spain, respectively. Annual crop yield time-series were calculated as the ratio between the collected crop’s production and harvested area during the period of 1986–2012 (Páscoa et al. 2017). The year 1986 corresponds to the year when the crop yield time-series in Portugal started to be aggregated at the regional (and not only at district level as until 1985) level, as they are available in Spain, and therefore considered as the beginning of the analysis (Páscoa et al. 2017). The crop yield anomalies were computed by removing the crop yield time series linear trend, in order to exclude non-climatic factors (Gouveia and Trigo 2008; Páscoa et al. 2017).

The pixels corresponding to rainfed cereal crop areas were identified considering the non-irrigated arable land classification from the more recent CORINE Land Cover map (CLC 2012) which is a standard procedure (e.g. Vicente-Serrano et al. 2006; Gouveia et al. 2011; Atzberger et al. 2014; Blauhut et al. 2016; Gouveia et al. 2016). As not all provinces are strongly dominated by agricultural practices, a selection of the major rainfed agricultural areas in the IP is required. The provincial clusters selected for the present analysis have been determined according to three criteria: (1) the provincial land use is dominated by agricultural practices, i.e. more than half of the pixels at each province correspond to agricultural areas; (2) the agricultural areas are dominated by rainfed crops, i.e. more than half of the agricultural areas correspond to non-irrigated arable land; (3) the provinces are contiguous and non-isolated. Selecting provincial clusters provides the advantage of estimating a short number of models for a larger number of provinces. In this way, we intend to estimate the best model for each cereal over each cluster, applicable to more than one province.

2.2 Remote sensing and multiscalar indices

With the aim of evaluating the response of the rainfed winter cereal yields (wheat and barley) to the regional drought conditions, drought hazard was evaluated based on the multiscalar drought index SPEI and the remote sensing indices VCI, TCI, and VHI. The potential of modelling cereal crop in the IP based on these drought indicators, considering different combinations of the possible predictors (as will be described later), is one of the goals of the present study.

The above mentioned remote sensing indices are based on NDVI and BT, given that green vegetation reflects visible and emits thermal solar radiation. The VCI and TCI are mathematically expressed by weekly NDVI and BT values, respectively, relative to their minimum and maximum limits and further normalised relative to their amplitude interval (Eqs. 1 and 2). Mathematical expressions of VCI and TCI were first introduced by Kogan (1990 and 1995), respectively, where a detailed description of the indices calculation was provided. The VCI and TCI characterise the moisture and thermal conditions of vegetation, respectively, and the VHI (Eq. 3) is assumed as an average of the two in order to consider their combined effect on vegetation health (Kogan 1997).

$$ VCI=100\times \frac{NDVI-{NDVI}_{min}}{NDVI_{max}-{NDVI}_{min}} $$
(1)
$$ TCI=100\times \frac{BT_{max}- BT}{BT_{max}-{BT}_{min}} $$
(2)
$$ VHI=\left(\frac{VCI+ TCI}{2}\right) $$
(3)

The values of VCI, TCI, and VHI vary from 0 to 100, and index values below 40 are indicative of drought conditions (Kogan 2001). The reason for employing these remote sensing indices in the present study, instead of the popular NDVI, is the inclusion of the thermal component (BT) and their ability to consider ecosystem changes in terms of fluctuations between the maximum and minimum values of NDVI and BT. Accordingly with their definition (Kogan 1997), low values of VCI indicate vegetation stress due to lack of water content and low TCI values correspond to vegetation stress due to high temperatures.

The weekly global maps of VCI, TCI, and VHI were retrieved at 4 km spatial resolution from NOAA’s ftp server (ftp://ftp.star.nesdis.noaa.gov/pub/corp/scsb/wguo/data/VHP_4km/geo_TIFF/), during 1985–2012. The reason of the inclusion of weekly data for 1985 is because the plant life cycle of the cereals harvested in 1986 starts in the autumn/winter of the year before, in this case 1985. Missing week values were substituted by the climatological value of each week, and the analysis was performed between the week 35 (approximately the beginning of September of the year n – 1) and 25 (approximately the end of June of the year n), comprising the major crop life cycle moments: pre-sowing and sowing (autumn/winter), vegetative phase (winter/early spring), reproductive phase (middle of spring), stage of formation and maturation of the grain (end of spring), and beginning of crop harvest (early summer). The spatial averages of VCI, TCI, and VHI were computed for each provincial cluster and used for further cereal yield modelling.

One of the aims of the present study is to discuss the utility of the remote sensing indices for cereal yield modelling, assessing the relative contribution of the moisture and thermal term and the further combination with the additional information of the drought index SPEI. Thus, the monthly drought index SPEI gridded values, with spatial resolution of 0.5°, were computed based on precipitation and temperature values from the Climate Research Unit (CRU TS3.21). The SPEI computation uses the monthly difference between precipitation (P) and potential evapotranspiration (PET) as shown in Eq. 4, where D provides a simple measure of the water deficit for the analysed month at different time-scales.

$$ D=\mathrm{P}-\mathrm{PET} $$
(4)

A log-logistic distribution was used, as suggested by Vicente-Serrano et al. (2010), and the Hargreaves method was considered for the estimation of the reference evapotranspiration (Beguería et al. 2014). A discussion of several computing options for the use of SPEI is provided by Beguería et al. (2014). The spatial averages of SPEI at the time-scales 1–12 months were computed for each provincial cluster from January to June. The use of a variety of time-scales (1–12 months) incorporates the memory of the respective past months, which does not happen with remote sensing indices, and for this reason the SPEI data considered for the analysis covers approximately the period between the crop growth vegetative phase to the harvest (January to June). In other words, the SPEI period in analysis does not include the typical months of pre-sowing and sowing because their drought conditions are intrinsically considered in the medium and longer time-scales of the SPEI intervals (4 to 12 months).

2.3 Linear correlation analysis

Having identified the cluster of provinces more exposed to agricultural drought, a correlation analysis is conducted to assess the linear relationships between the winter cereal yields and the drought indicators (remote sensing and multiscalar indices) in terms of the Pearson correlation coefficient (R) (Wilks 2006). Statistical significant evidence is assessed with a 95% significance level.

The moments of the vegetative cycle of the highest crop’s requirements to moisture and thermal conditions are assessed in terms of VCI and TCI, respectively. The relationships between the cereal yield and the VHI indicate the impacts of the combined effect of water and heat stress during the crop growth cycle. In addition, the winter cereal yield response to each time scale of drought occurrence is assessed based on the multiscalar drought index SPEI during the development stages of the cereals.

2.4 Selection of significant predictors and their possible combinations

The range of predictors encompasses three remote sensing indices (43-week intervals for each) and one multiscalar drought index (6 months (January to June) by 12 time-scales = 72 SPEI intervals) for each of the provincial clusters. The time scales and months of SPEI, together with the weeks of VHI, VCI, and TCI better related with wheat and barley yield were chosen based on a stepwise regression (95% confidence level). The stepwise regression algorithm carries out an exhaustive search and generates a subset of predictors which together have the largest contribution to the variability of each cereal yield in each provincial cluster (predictands). For each provincial cluster and each winter cereal (wheat and barley), stepwise regression models are performed based on the moisture and thermal components (VCI and TCI) separately from models based on the VHI, to avoid collinearity since VHI is a combination of both VCI and TCI. Subsequently, stepwise regression models combining SPEI with the remote sensing indices (VCI + TCI + SPEI and VHI + SPEI) are performed to evaluate the relative contribution of the remote sensing indices and the further combination with the multiscalar index for the simulation of the variability of winter cereal yield.

2.5 Cereal yield estimation models

After the selection of the significant predictors, the standardisation of both dependent and independent variables is performed by computing the z scores for further statistical modelling (Wilks 2006). Multiple linear regression (MLR) and artificial neural network (ANN) techniques are applied for modelling the wheat and barley yields at the provincial clusters. The reason for the application of a non-linear methodology in addition to the classical MLR models is to discuss the use of alternative promising tools, such as the ANN (Morid et al. 2007; Russo et al. 2015), to simulate the complexity of the non-linear character of the agricultural systems under drought conditions (Jiang et al. 2004; Matsumura et al. 2015).

In MLR, the functional relationship between the predictand (cereal yield) and the predictors (previously statistically selected) can be described by means of the intercept and the slope of the regression line, usually called regression coefficients. The regression coefficients are estimated by minimising the sum of the squared differences between the observations of cereal yield and the regression line (Wilks 2006).

ANN are mathematical models inspired by the behaviour of the human nervous system, composed by several layers and respective neurons. In this study, a simple three-layer structure was adopted with one input layer, one hidden layer and one output layer. The input variables corresponding to the statistically significant predictors are forced by the weight and bias, which alter the initial information at the neurons, and then pass the combined information to the next layer and consequently reach the output value of the simulated cereal yield (target). The ANN training that updates the weights and bias on each cycle was here performed according to the Levenberg-Marquardt backpropagation method and considering the same statistical significant predictors (input variables) as the MLR models. Different architectures were examined in which different number of neurons in the hidden layers between 1 and 4 were considered. In order to compare the different architectures a fixed seed was considered for the initial random weights. The use of a second hidden layer was tested but it was found to be redundant. The number of neurons in the input layer corresponds to the number of selected predictors for each target (wheat and barley yield at cluster 1 and 2). For the single node in the output layer a linear transfer function was considered, and for each hidden neuron a log-sigmoid function was considered to account for the non-linear behaviour.

The MLR and ANN model’s performance is assessed in terms of leave-one-out cross-validation, obtaining unbiased estimations by avoiding overfitting associated to the models, which occurs when the same data is used for the fit and for the performance assessment. The leave-one-out cross-validation assesses how well the model performs by successively using a small set of observations from the original sample for validation, and the remaining observations as the training data. In other words, in the present work, one observation is successively removed from the total sample for the model’s fit (training data), and the left-out observation is used for validation (validation data). This procedure ensures that every data is used for training and validation independently, since the model’s performance is assessed on independent data not considered on the fit. This approach is commonly used and appropriate for cases which have a low number of samples (Wilks 2006), as is the present study. To support the robustness of the leave-one-out cross-validation scheme, the results of explained variance in terms of adjusted coefficient of determination (R2adj) are analysed with and without cross-validation mode (R2adj_no_cv). The adjusted R2adj is an unbiased R2 considering the finite sample and the number of predictors used as input for the MLR and ANN models. Other widely used accuracy measures are also considered to evaluate the performance of the linear and non-linear methods, such as the root mean squared error (RMSE) and the skill score based on the RMSE (SSRMSE, Eq. 5). The total deviance of simulated values from observed values is assessed in terms of the RMSE, and the SSRMSE (Eq. 5) is used in this paper considering persistence (the previous year yield value) as a reference model.

$$ {SS}_{RMSE}=1-\frac{\frac{1}{N}{\sum}_{i=1}^N{\left(\overset{\wedge }{y}-{y}_i\right)}^2}{\frac{1}{N}{\sum}_{i=1}^N{\left({y}_{i+1}-{y}_i\right)}^2}\times 100 $$
(5)

Having statistically modelled the standardised anomalies of wheat and barley yields at the regional level by MLR and ANN techniques, the potential of the modelled cereal data for prediction of crop losses is assessed. Here, crop yield loss is defined as values of standardised yield anomaly below zero, indicating the years when harvested cereal crops are below the mean value. The MLR and ANN model’s performance regarding the loss of crop yield (yield anomaly < 0) is assessed in terms of contingency tables and the associated categorical scores (Wilks 2006): frequency bias (FB), success ratio (SR), hit rate (HR), and false alarm rate (FAR). The score FB describes the ratio of the estimated and observed events and measures the ability of the models to underestimate (FB < 1) or overestimate (FB > 1) the occurrences of crop-loss. For example, models with FB > 1 indicate that occurrences of crop-loss were modelled more often than they occur and FB = 1 indicates that the model is unbiased. The score SR describes the ratio between the hits and the estimated events and gives information about the likelihood of a crop loss, given that it was predicted by the model. The HR and FAR scores correspond respectively to the rate of correct forecast of crop loss (proportion of occurrences which are hits) and the rate of wrong forecast of crop loss (proportion of non-occurrences which are false alarms).

3 Results

3.1 Cereals and drought indicators during low yield years

Two clusters of provinces dominated by rainfed agricultural practices are identified (Fig. 1), according to the criteria described in Section 2.1. Both clusters are in Spain, approximately in the regions of Castilla-Léon and Castilla-La Mancha. The northern provincial cluster (Cluster 1) includes 5 provinces (Zamora, Valladolid, Palencia, Segovia, and Burgos) and the southern provincial cluster (Cluster 2) includes 4 provinces (Toledo, Cuenca, Ciudad Real, and Albacete). Figure 2 shows the spatial averages of wheat and barley yield from 1986 to 2012, computed for each provincial cluster. The corresponding trends and detrended time-series (yield anomalies) are also illustrated. The temporal evolution of the yield anomalies shows low values (i.e. below the 25th percentile) during drought episodes over the IP, particularly during the events which took place during 1992, 1995, and 2005 (years associated with low yield at both clusters and cereals). Figure 2 shows that the more recent events of 1995 and 2005 experienced yield anomalies more negative in the southern sector of the IP (cluster 2—Fig. 2 bottom panel), while the year of 1992 exhibited yield anomalies more negative in the northern sector (cluster 1—Fig. 2 top panel). Overall, the temporal evolution of wheat and barley yield anomalies is similar at both provincial clusters, although the respective productions and total crop areas are quite distinct at the province level (not shown).

Fig. 1
figure 1

Selected clusters of provinces correspondent to the agricultural drought prone areas. Cluster 1 provinces: Burgos (1), Palencia (2), Segovia (3), Valladolid (4), and Zamora (5). Cluster 2 provinces: Albacete (6), Ciudad Real (7), Cuenca (8), and Toledo (9)

Fig. 2
figure 2

Wheat and barley yields (grey line), trends (dashed line and respective equation), anomalies (black line) and 25th percentile of yield anomalies (dotted line) during the period 1986–2012 over the two selected provincial clusters. The common years associated with low yield anomalies (i.e. below the 25th percentile) are denoted by a circle: 1992, 1995 and 2005

Figures 3 and 4 show drought severity during the individual low yield years of 1992, 1995, and 2005, based on the spatially distributed averaged values over each cluster of the remote sensing indices (VCI, TCI and VHI, Fig. 3) and the SPEI (Fig. 4). Figure 3 shows the weekly values of VCI, TCI, and VHI from the week 35 (year n—1) to week 25 (year n), corresponding approximately to the period between the sowing and harvesting of the winter cereals, i.e. between September of previous year to June of harvest year. According to the remote sensing indices (Fig. 3), there is little or no evidence of drought conditions (values below 40) during the first growth stages (until January) for both clusters during the 3 considered years, particularly featuring cold autumn/winter weeks based on the high values of TCI (low values of BT) during 1992. Similar values of TCI are found in 2005 during intermediate growth stages of the cereal life cycle (week 11), indicating a spring with low temperatures. On the other hand, in 1995 during the intermediate and final growth stages (more evident in cluster 1), the VCI values are indicative of favourable moisture conditions (NDVI increase), in contrast with hotter conditions found by low values of TCI (BT increase). Nevertheless, there is almost no evidence of drought based on VHI, during 1995 and 2005 in cluster 1 (except June). The highest number of drought weeks recorded by the VHI (also coincident with low values of VCI and TCI) are found in cluster 1 during 1992, and in cluster 2 during 1995 and 2005. While the onset of drought conditions in 1992 (cluster 1) is experienced during vegetative growth stages (winter), 1995 and 2005 (cluster 2) show less favourable conditions slightly later. This feature is in accordance with the regions of Iberia that were more affected by drought in 2005 which was more intense in southern Iberia (Gouveia et al. 2012).

Fig. 3
figure 3

Weekly values of spatial averages of VCI (Vegetation Condition Index), TCI (Temperature Condition Index) and VHI (Vegetation Health Index) between the week 35 (beginning of September of the year n-1) and 25 (end of June of the year n), during the low yield years of 1992 (top panel), 1995 (middle panel) and 2005 (middle panel) at the cluster 1 (left) and cluster 2 (right). Values below 40 indicate drought conditions

Fig. 4
figure 4

Monthly values of spatial averages of SPEI at time scales from 1 to 12 months (y axis) between January and June during 1992 (top panel), 1995 (middle panel) and 2005 (bottom panel) at the cluster 1 (left) and cluster 2 (right). Values between 1 and − 1 correspond to near normal conditions, and values below − 1 and above 1 indicate dryness and wetness, respectively

Figure 4 shows the monthly values of the SPEI at the different time-scales (1–12 months) between January and June, corresponding approximately to the period between the vegetative growth stage and harvesting. In 1992, the overall pattern shows values of SPEI indicating drought or near normal conditions, namely for the first months of the year and for longer times scales. On the other hand, spring months do not present a marked pattern, showing a tendency to wet conditions, in particular in June for cluster 2. This feature may be associated with the non-drought conditions based on TCI and VHI (low temperatures and favourable vegetation conditions), in contrast with drought conditions displayed by the VCI (moisture stress) (see Fig. 3 top panels). This finding suggests that despite the presence of favourable conditions according to SPEI, TCI, and VHI, the greenness of vegetation was shallow (low values of VCI) during the final growth stages. Moreover, cluster 1 does not show a clear pattern of drought conditions in 1995 and 2005. In fact, during these years the drought, as obtained by SPEI, is evident only in April (June) during 1995 (2005). On the other hand, extreme drought conditions, accordingly to SPEI, were observed in 1995 and 2005 over cluster 2, being stronger in May (June) during 1995 (2005). These results are also in accordance with the ones obtained using the vegetation indices (Fig. 3).

In general, a good agreement is found between the higher values of negative yield anomalies (Fig. 2) and the drought-affected weeks according to the remote sensing indices and the months of SPEI (Figs. 3 and 4). In 1992, detrended yield time-series display higher negative anomalies in cluster 1 rather than in cluster 2 (Fig. 2), in accordance with drier conditions suggested by the remote sensing indices and SPEI in cluster 1 as well (Figs. 3 and 4). Similarly, 1995 and 2005 display more pronounced negative anomalies of yield (Fig. 2) and drier conditions (Figs. 3 and 4) in cluster 2. In conclusion, negative yield anomalies followed by dry conditions in 1992 were more pronounced in the northern sector of the IP (cluster 1), while the same conditions in 1995 and 2005 were more pronounced in the southern sector (cluster 2).

3.2 Relationships between cereal yield and drought indicators

To investigate the strength of the relationship between the winter cereals crop yield and the remote sensing indices, and to identify the moments of the vegetative cycle of the highest crop’s requirements to moisture (VCI) and thermal conditions (TCI), a correlation analysis was performed (Fig. 5). Figure 5 shows the correlation coefficients between the winter cereals yield (barley and wheat) and the remote sensing indices over the two agricultural provincial clusters from week 35 to week 25. Generally, VCI, TCI, and VHI display low correlations during the first growth stages of both rainfed cereal (during autumn and begin of winter) and a sharp increase from the intermediate growth stages to the harvest time over both provincial clusters. This feature is consistent with Fig. 3, which shows no evidences of drought conditions until further growth stages during the low yield years (1992, 1995, and 2005) according to the VCI, TCI, and VHI. In the same way, correlation values suggest that the greatest influence of the remote sensing indices is observed during the spring and summer months, corresponding to the moments in which the vegetation is photosynthetically more active.

Fig. 5
figure 5

Correlations between the weekly values of VCI (Vegetation Condition Index), TCI (Temperature Condition Index), VHI (Vegetation Health Index), and the wheat yield (full line) and the barley yield (dashed line) in cluster 1 (left) and cluster 2 (right), between 1986 and 2012. The significant correlations at 95% level of confidence are marked with a dot over the line

Moreover, between the late winter and the early summer, VHI and VCI correlation values are statistically significant, whereas TCI significant correlations are found between early spring and early summer. This aspect points out that while water stress (VCI) on vegetation exhibits stronger correlations during early-spring (late February and early March approximately), heat stress (TCI) shows stronger correlations slightly afterwards during the latter growth stages (from the 14th week (April) onwards). In other words, Fig. 5 suggests that crop yield decline is associated with moisture depletion on vegetation (low VCI) during early-spring and with high temperatures (low TCI) close to the harvest time. The correlations obtained with the VHI are generally stronger than with the VCI and TCI and exhibit a peak during late-spring with potential impacts on the maturation of both wheat and barley grains. The temporal evolution of the correlation values is very similar in both cereals throughout the crop life cycle, with some barley correlation values slightly stronger in cluster 2.

The crop response to each SPEI time-scale (1 to 12 months) was evaluated for each cereal at each cluster, approximately from the vegetative phase to the harvesting moment (January to June). The results are illustrated in Fig. 6 for the 1-, 3-, 6-, 9-, and 12-month time-scales, representative of the shorter (1 and 3), medium (6) and longer time-scales (9 and 12). Similarly to Fig. 5, SPEI displays lower correlations during the vegetative growth stages of both rainfed cereals, over the two provincial clusters (Fig. 6). As a matter of fact, the SPEI exerts major influence during the springer months (April to June), particularly at the shorter time-scales (1 and 3-months), corresponding to intermediate and final growth stages. May exhibits the strongest correlations in all cases. At the time-scales of 6, 9, and 12 months, the correlation values during the winter are more pronounced than for the shorter time-scales (1 and 3 months), and the difference between the different seasons is not so evident. In cluster 2, the correlation values of SPEI with 6- and 9-month time-scales are statistically significant during the whole growth cycle, while in the other cases most of the statistical significance is found during intermediate growth stages (spring). At all time-scales, the number of statistically significant correlations is higher in cluster 2 (southern sector) rather than in cluster 1 (northern sector). Moreover, the impact of SPEI on cereal yield in cluster 2 is registered earlier than on cluster 1, considering all the temporal scales.

Fig. 6
figure 6

Correlations between average SPEI and wheat yield (full line) and barley yield (dashed line) in cluster 1 (left) and cluster 2 (right), between January and June of 1986–2012. The results are illustrated for 1, 3, 6, 9 and 12-month time-scales, representative of the shorter, medium and longer time-scales. The significant correlations at 95% level of confidence are marked with a circle

In general, the correlations in cluster 1 during spring in Fig. 5 reach stronger values than in Fig. 6, suggesting stronger relationships between remote sensing indices and cereal yield in the northern sector, rather than SPEI. On the other hand, cluster 2 exhibits more months with statistically significant correlations in Fig. 6 (particularly at 6- and 9-month time-scales) than in Fig. 5, suggesting stronger relationships between SPEI and cereal yield in the southern sector (cluster 2).

3.3 Statistical significant predictors/inputs

The correlation analysis between yield and the drought indicators pointed out significant temporal differences of the drought impact, and pointed to different moments of the vegetative cycle when the crops are more vulnerable to drought conditions (Figs. 5 and 6). Therefore, the redundant information should be removed to find the time scales and months of SPEI, together with the weeks of VHI, VCI, and TCI most suitable to accurately estimate the cereal yield. The statistical significant predictors were chosen based on stepwise regression (95% confidence level). Table 1 shows the selected predictors for each of the 4 combinations of predictors, resulting on 11 different subsets of input variables for the MLR and ANN models.

Table 1 Results of the stepwise regression at the 95% confidence level for the selection of the statistical significant predictors of wheat and barley yields (p indicates the number of predictors selected for each model). The numbers of the remote sensing indices correspond to the respective weeks selected to predict the cereal yield, and the numbers of SPEI correspond to the respective month and time-scale selected by each model

Each resulting model nomenclature (Table 1) refers to the target cereal species (letter “W” for wheat and letter “B” for barley), the respective provincial cluster (clusters 1 and 2), and the possible combination of predictors (VCI and TCI—“a”; VHI—“b”; VCI, TCI, and SPEI—“c”; VHI and SPEI—“d”). For example, the model W1a refers to the wheat yield at cluster 1, based on the statistically significant weeks of VCI and TCI, and model W2c refers to the wheat yield at cluster 2, based on the statistically significant time scales and months of SPEI in addition to the better related weeks of VCI and TCI.

In accordance with the correlation analysis in Fig. 6, the results from the stepwise regression (Table 1) indicate that the inclusion of the drought index SPEI in the pool of possible predictors is only significant in the cluster 2 for both cereals. In the case of the cluster 1, the predictor selection chooses the same variables for the pair of models “a” and “c,” and the same for “b” and “d.” In other words, the inclusion of SPEI information is redundant in cluster 1. In consequence, only models based on VCI and TCI together (W1a and B1a based on late spring weeks 18, 20, 21, 23) and VHI (W1b and B1b based on mid-winter and late-spring weeks 50, 1 and 22) are performed in cluster 1. The SPEI of February, April, May, and June display significant influence at cluster 2, when SPEI is included in the predictors’ pools. In fact, in models W2c and B2d the remote sensing indices weeks are removed by the stepwise regression, remaining only SPEI information to estimate the cereal yields.

The selected remote sensing indices weeks suggest a predictive power based on the autumn/early-winter period and mid-spring/early-summer weeks (Table 1). Between the week 18 (~mid-April) and 25 (~mid-June) 10 predictors (remote sensing indices) are selected, and between the week 35 (~early-September) and 1 (early-January) 9 predictors are selected. Between January and mid-spring only the SPEI of February with 5-months’ time-scale is selected as a predictor. In comparison with cluster 1, the predictor selection in cluster 2 selects a larger number of winter and late autumn variables, particularly in the case of barley (Table 1). The models B2a, B2b, and B2c select the earlier week values of the three predictors (vegetation indices in late autumn, winter and spring), and model B2d selects SPEI of February, April, and June, similarly to models W2c and B2c. Only the barley model B2c selects VCI and SPEI together as statistical significant predictors.

Finally, it is important to stress that most of the models select 2 or 3 predictors, whereas the model B2c is the one with the highest number of predictors (p = 5). On the other hand, only one model (W2b) chooses only 1 predictor (VHI).

3.4 MLR and ANN models

The overall performance of the MLR leave-one-out cross-validation is shown in Table 2 in terms of the statistical measures described in the Data and Methods section, for the 11 possible models. The results indicate that model B2c presents the highest values of performance explaining 85% of the variance of barley yield in cluster 2, based on the VCI, TCI and SPEI. The values of the explained variance without cross-validation (R2adj_no_cv) are slightly less conservative than the values obtained by cross-validation (R2adj) in all models, supporting the robustness and reliability of the models. While models W2a and B2a explain less 11% of the variance with cross-validation, the remaining models explain less than 10% with cross-validation reaching a low of 5% less by the models W1b and B1b.

Table 2 Summary of the regression equations and the overall performance of the MLR leave-one-out cross-validation models of wheat and barley at cluster 1 and 2 (p indicates the number of predictors selected for each model). For each cereal and cluster, the models with highest performance are denoted by a *

Concerning the wheat cereal in cluster 1 (Table 2), the R2adj and RMSE of the models W1a and W1b display the same values (73%), while the W1a values of SSRMSE display marginally higher percentage of performance against persistence. The barley cereal in cluster 1 denotes higher explained variance (76%) and lower RMSE considering the VHI as predictor (B1b). The rainfed cereals in cluster 1 exhibit the strongest linear relationship considering the late-spring weeks of VCI and TCI as predictors in the case of wheat (W1a), and the mid-winter and late-spring weeks of VHI in the case of barley (B1b).

Regarding the cluster 2, the use of SPEI in the predictors’ pool shows an added value in combination with VCI and TCI (W2c and B2c). Models W2c and B2c display the highest values of explained variance (71 and 85%, respectively) and 69 and 78% of the skill against persistence. While barley at cluster 2 displays the strongest linear relationship based on a remote sensing index (VCI) and SPEI together, the model W2c consists only of SPEI values (VCI and TCI are not significant predictors). In comparison with models without the multiscalar drought index in cluster 2, the inclusion of SPEI reduces the importance of the VCI and VHI values (TCI is not a significant predictor in any model in cluster 2). In the case of the model B2c, the VCI during week-25 (proximate to the harvest) used in model B2a is “replaced” by the SPEI predictors.

Table 3 shows the performance of the ANN models in terms of the same statistical measures as those used for MLR models. For the sake of simplicity, the presented ANN results are shown based on the most suitable ANN architectures according to the skill against persistence prediction (SSRMSE). In general, a good performance is observed considering between 1 and 5 hidden neurons. The model B2c also presents the highest values of performance but explains slightly less variance (84%) than the B2c MLR model (85%). Similar to Table 2, the ANN statistics support the robustness of the models using cross-validation (Table 3). However, models W2a and W2b significantly decrease the explained variance by using cross-validation, while in the remaining models the difference with and without cross-validation is similar to the observed by MLR models.

Table 3 Summary of the ANN architectures according to the maximum SSRMSE value and the overall performance of the respective ANN leave-one-out cross-validation (p indicates the number of predictors selected for each model). The architectures indicate the number of neurons in input, hidden and the output layers respectively. For each cereal and cluster, the models with highest performance are denoted by a *. The models whose performance of the ANN techniques improves the MLR results (Table 2) are denoted by a ‘

The statistics present in Table 3 indicate that 5 ANN models (W1a, B1a, B1b, W2c and B2b denoted by a ‘) improve the MLR results (Table 2). Similar to the linear regression statistics, the models W1a, B1b, W2c, and B2c display the strongest relationships explaining 85, 83, 73, and 84% of the variance in the case of ANN models, against 73, 76, 71, and 85% in the case of MLR models respectively (denoted by a * in Tables 2 and 3). Hereafter, results are presented only for the models W1a, B1b, W2c, and B2c since they present the best performance for each cereal in each cluster considering both MLR and ANN techniques (Tables 2 and 3). Except for the case of B2c, the highest performance models are slightly improved using ANN techniques. The overall good performance of the models is illustrated in Fig. 7, which shows the time-series of the cereal observations in each cluster, together with the respective estimations using ANN and MLR methods. The considerable similarities between the two techniques in model B2c are well shown in the bottom panel of Fig. 7, while some differences are observed in the other models.

Fig. 7
figure 7

Wheat and barley time-series of observations (full line) from 1986 to 2012 in clusters 1 (top two panels) and 2 (bottom two panels) and respective statistical estimations using MLR (dotted line) and ANN (dashed line) methods with the strongest statistical relationships (W1a, B1b, W2c, and B2c)

A summary of contingency table results of the occurrence of crop yield losses (standardised yield anomaly < 0) is presented in Fig. 8, comparing the performance of the MLR and ANN techniques of the models W1a, B1b, W2c, and B2c. The results show that the models W1a, B1b, and B2c based on ANN, and B1b based on MLR slightly overestimate the yield losses, while the remaining models are almost unbiased (FB~1). Generally, all models predict a great number of true positives (hits) of occurrence of crop-loss exhibiting HR values higher than 69%. Except for B2c, the ANN models display values of HR higher than MLR models, estimating more occurrences of crop loss. The SR values indicate that in the case of B1b and W2c, the likelihood of crop loss occurrence, given that it was estimated by the model, is higher based on ANN rather than on MLR techniques. In comparison with wheat, the barley models display slightly higher values of SR and HR. The cereal models in cluster 2 display the lower values of SR, HR and higher values of FAR, in comparison with the cluster 1.

Fig. 8
figure 8

Summary of the contingency table results of the occurrence of crop-loss (standardised yield anomaly < 0) in terms of frequency bias (FB), success ratio (SR), hit rate (HR), and false alarm rate (FAR), based on MLR (white bars) and ANN (black bars) methods of the models W1a, B1b, W2c, and B2c

4 Discussion and conclusions

This work aimed to assess the influence of drought conditions in agricultural yields over the IP, considering remote sensing (VCI, TCI, and VHI) and multiscalar (SPEI) drought indices as predictors of rainfed cereal yields. The exposure analysis performed in this work allowed for the identification of distinct geographical areas in the IP exposed to agricultural drought, according to the use of dryland for agriculture. In a different way from the criteria applied in the present work, Hernández-Barrera and Rodríguez-Puebla (2017) have also specified two different regions in the IP by applying a cluster analysis based on wheat yield data variability. Other approach followed by Iglesias and Quiroga (2007) selected 5 sites representing the major rainfed and irrigated agricultural regions of Spain. In this work, the analysis of exposure to agricultural drought in terms of dryland allowed for the study of more than one cereal growing in rainfed conditions, and proved to be rather suitable for wheat and barley. Moreover, the aggregation of provinces with similar percentage of arable land allowed the estimation of a few number of models suitable for a larger number of provinces.

We found that the spatial averages of wheat and barley computed for each cluster exhibited low values of yield anomalies during the years of 1992, 1995, and 2005 (Fig. 2), coinciding with main drought events that affected the IP (García-Herrera et al. 2007; Andrade and Belo-Pereira 2015). As a matter of fact, the drought conditions identified with the remote sensing and multiscalar indices (Figs. 3 and 4) are coincident with the low yield anomalies: drier conditions were found in the northern cluster in 1992 (anomalies more negative in cluster 1 according to Fig. 1), while drier conditions (anomalies more negative in cluster 2 according to Fig. 1) were found in the southern cluster in 1995 and 2005. The temporal evolution of the drought hazard during the individual low yield years of 1992, 1995 and 2005 (Figs. 3 and 4) and the correlation analysis (Figs. 5 and 6) also suggested minor influence of drought conditions during the initial growth stages (autumn/winter), and greatest influence during the intermediate and final growth stages (spring/summer), corresponding to the moments in which the vegetation is photosynthetically more active. Stronger relationships with NDVI were also found by Vicente-Serrano et al. 2006 during flowering stages of wheat and barley yield in north-east Spain (Middle Ebro valley).

Given the importance of assessing crop’s vulnerability to dry conditions at different stages of plant development, we also looked for the highest crop’s requirements to moisture (VCI) and thermal (TCI) conditions at different moments of the vegetative cycle (Fig. 5). The correlation values between crop yield and remote sensing indices suggested that crop yield reduces with moisture depletion (low values of VCI) during early-spring (and enhance with water content increase) and with too high temperatures (low values of TCI) close to the harvest time (and improve with temperature decrease). This highlights the importance of both water content and air temperature for cereals productivity and the advantage of combining the contributions of moisture and thermal conditions using the remote sensing indices. The effects of water stress and high temperatures during middle growth stages of the crop life cycle are in accordance with previous studies (García del Moral et al. 2003; Iglesias and Quiroga 2007; Ferrise et al. 2011).

The use of remote sensing and multiscalar indices (Figs. 2 and 3) allowed analysing the vegetation responses to drought conditions over large regions and at different time-scales of drought occurrence. The dominant time-scales at which drought influences the crop yield correspond to longer time-scales (6 to 12 months) throughout January to June, and a pronounced impact is verified during the springer months (April to June) at the shorter time-scales (1 to 6 months) (Figs. 5 and 6). These results are in accordance with previous work performed by the authors, which stress the stronger impact of longer timescales and identify spring as the dominant season of winter cereal yield dependence on drought conditions based on SPEI, particularly in Spain (Páscoa et al. 2017).

Spatial differences were also pointed out by the correlation analysis and the statistical modelling, suggesting that in comparison with cluster 1 (northern sector), cluster 2 (southern sector) is impacted by dry conditions beforehand (Figs. 5 and 6), in accordance with the geographical location and respective climate variability of the provinces. According to Rodriguez-Puebla et al. (1998), the spatial patterns of the precipitation regime in Spain exhibit strong gradients, with higher values in the northwestern sector and lower values in the southeastern sector. In addition, the southern sector of Iberia have been exceptionally affected by severe drought events, particularly during the recent episode of 2004/2005 (García-Herrera et al. 2007; Gouveia et al. 2009, 2012), which was also a year with higher negative yield anomalies in cluster 2 (Figs. 2, 3, and 4).

Significant regional differences were also found considering the potential of combining the multiscalar and remote sensing approaches. Correlation analysis results from Figs. 5 and 6 suggested stronger relationships between remote sensing indices and cereal yield in the northern sector (cluster 1), and stronger relationships between SPEI and cereal yield in the southern sector (cluster 2). In agreement, the results of the stepwise regression for significant predictors selection (Table 1) suggested that the inclusion of the drought index SPEI in the possible predictors pools is only significant in the cluster 2 for both cereals. These findings propose that the southern sector crop yield is better inferred from the drought index SPEI information, rather than remote sensing indices, and sensitive to balance between precipitation and evapotranspiration. On the contrary, cluster 1 models suggest strong dependence of the health of the vegetation, and only predictors based on VCI, TCI, and VHI are selected (Table 1). Kogan et al. 2015a and Kogan et al. 2004 have also performed accurate predictions of crop yield based on VCI, TCI, and VHI for Russia and China, respectively, and have suggested the potential of using the remote sensing of vegetation health to assess weather-related crop losses.

The combined use of remote sensing data (NDVI) and multiscalar drought indices (SPI) have already been considered by Vicente-Serrano et al. 2006 to model wheat and barley yields in Spain. Vicente-Serrano et al. 2006 have found that the inclusion of NDVI in a linear regression model based on SPI (February at 1-month time-scale) increases the model’s performance. Moreover, Vicente-Serrano et al. 2006 have shown the potential of the combined use of NDVI and SPI to predict cereal production four months prior to harvest. Similarly, we also addressed the ability to estimate crop yield during growth stages early enough before harvesting. Table 1 indicates that the models based on remote sensing indices depend largely on the weekly values of mid-winter (December and January) and mid-spring to early-summer (late-April to June), suggesting a predictive power of crop yield based of the satellite-based data. The selection of SPEI of February, April, May, and June in cluster 2 also suggests the predictive power of a range of drought time-scales for crop-loss estimation.

The MLR and ANN results suggest that the models displaying the strongest relationships are the same in both statistical techniques, and the strength of the statistical relationships found by the linear and non-linear methods is quite similar (Tables 2 and 3). However, regarding the 4 models with the strongest relationships for the two cereals in the two clusters (W1a, B1b, W2c and B2c), the ANN techniques improve the MLR models except in the case of barley in cluster 2 (Table 3 and Fig. 7). The explained variance of the model W1a using MLR increases 12% using ANN techniques, models B1b and W2c increase 7 and 2% respectively.

Despite the slight overperforming of the ANN over the MLR techniques in 3 of the best 4 models (W1a, B1b, W2c, and exception of B2c), the ability to estimate yield losses is overrated (HR and SR display higher values using ANN rather than MLR but FB values by ANN are generally indicative of overestimation). The cereal models in cluster 2 display the lower values of SR, HR and higher values of FAR, in comparison with the cluster 1, suggesting that despite the ability of SPEI in representing the average variability in the southern sector, it underperforms the estimation of crop loss in comparison with remote sensing indices in cluster 1. However, most of the crop loss events are estimated (high values of HR) by all models, suggesting the potential of the proposed methodology for the modelling of wheat and barley losses in IP.

A substantial number of studies have already suggested the better performing skills of ANN in comparison to MLR in cereal yield modelling (Jiang et al. 2004; Matsumura et al. 2015). In the Mediterranean region in particular, Incerti et al. (2007) proposed a drought risk analysis based on ANN for South Italy based on precipitation, temperature, evapotranspiration, NDVI and land cover. Climate change impact on durum wheat over the Mediterranean basin has been addressed by Ferrise et al. (2011) based on ANN as well. Using other alternative statistical techniques, such as partial least square regression, Hernandez-Barrera et al. (2017) had analysed the climate change impacts on wheat yield over Spain. Results by Ferrise et al. (2011) suggested that the projected warmer and drier climate will increase the risk of yield loss in the Mediterranean, and Hernandez-Barrera et al. (2017) suggested that climatic warming will lead to about 32% decrease in Spanish wheat production in the twenty-first century. Henceforth, an improved assessment of the agricultural crop yield impacts under current drought conditions is becoming crucial in a climate change context. The establishment of novel statistical techniques for crop modelling, such as ANN, constitutes an important step towards developing strategies to mitigate agricultural drought risk.

Besides some slight overestimation of yield losses, limitations of the presented results arise from the lack of forecasting of future yield-losses of wheat and barley. Nevertheless, the present study indicates that based on mid-winter and mid-spring drought indicators, the estimation of the harvestable yield is predictable for the current year. In addition, the results from the calculation of the drought index SPEI using climate projections of precipitation and temperature, and further application using the statistical relationships found in the present study, would be rather interesting to compare with recent works. Other potential usefulness of this study for future research is to evaluate the suitability of the regional-scale crop yield models to each province of the IP individually. More future work should also cover other agro-areas of the IP and look towards the development of crop-specific agricultural drought risk models (e.g. using a probabilistic approach) based on the established models.

In summary, the statistical methodology used in this analysis relied on yield information at the province scale, and the results have shown the potential of crop yield modelling based on multiscalar (SPEI) and remote sensing (VCI, TCI, and VHI) indices, using two empirical techniques (MLR and ANN), providing estimations of drought-impacts over large areas. In contrast, numerous modelling tools integrating the complex biophysical interactions of crop growth (mechanistic crop simulation models) have been used by several authors (Paredes et al. 2014; Giménez et al. 2016; Paredes et al. 2016), generally requiring careful calibration and several in-situ measurements, usually limited to the local/field scales. The model outcomes using the presented methodology are suitable for broader scales, and highlight the usefulness of such analysis in the framework of developing an agricultural drought risk model for cereal yields in the IP. In terms of an operational point of view, the results aim to contribute to an improved understanding of crop yield management under dry conditions, particularly regarding rainfed winter crops. Moreover, the present study will provide some guidance on user’s decision-making process in agricultural practices in the IP, assisting farmers in deciding whether to purchase crop insurance.