1 Introduction

Assessment of agricultural impacts of climate change at regional or local level requires accurate and high resolution climate projections (Mearns et al. 2003), as even small biases in the climate variables can have significant consequences when physical and/or biological thresholds are critical for crop growth and development. For instance, some climate models overestimate the occurrence of freezing temperatures in southern Europe (Kjellström et al. 2010; Domínguez et al. 2013), which can lead to unrealistic estimates of freezing damage to crops. Also, overestimation in maximum temperatures (in particular the number of days above 35 °C, Ruiz-Ramos et al. 2011) can lead to overestimation of yield loss due to heat stress during flowering and grain filling for summer crops in the Iberian Peninsula (IP). Besides, crop models simulate crop development by accumulating daily mean temperature above a base temperature. For these reasons, when using the results of climate models as an input for impact assessment, the biases should be carefully evaluated and where necessary reduced (Wood et al. 2004; Baigorria et al. 2007; Teutschbein and Seibert 2010).

Global Climate Models (GCMs) generate the variables needed for impact assessment at a spatial resolution generally considered too coarse for most impact studies. One of the downscaling approaches consists of a Regional Climate Model (RCM) forced by boundary and initial conditions generated by a GCM (Giorgi 1990; Wang et al. 2004). RCMs improve the representation of spatial variability in comparison to GCMs and the simulation of extreme events (Sánchez et al. 2004, 2011; Domínguez et al. 2013). RCMs are generally considered to improve the applicability of simulated climate for impact assessment, especially in regions of complex orography such as the IP (Mínguez et al. 2007 for IP).

However, climate output from RCMs still presents biases, i.e. systematic deviations of simulated values from the observed values (Christensen et al. 2008), or just a slight improvement on fine-scale geographic features, unable to compensate the GCM biases (Glotter et al. 2014). Some of these biases are inherited from the driving GCM, while others are intrinsic to the RCM (Kjellström et al. 2010; Nikulin et al. 2011). Comparisons with observations may also be hampered by uncertainties in the observations themselves. The variables presenting large biases vary regionally; for instance, for the RCM ensemble produced in the framework of the ENSEMBLES EU Framework Program Project (van der Linden and Mitchell 2009): A warm bias was reported for the IP, where summer bias can be related to a combination of incomplete representation of cloud cover and soil moisture (Maraun 2012), while an underestimation of precipitation was reported for some RCMs (Christensen et al. 2008; Domínguez et al. 2013). Bias reduction has consequences also for climate projections (e.g. Dosio et al. 2012; Bosshard et al. 2013). For instance, when a warm bias in summer in the Mediterranean was reduced, projections of future were found to decrease by up to one degree in the ensemble mean (i.e., by up to 10–20 % of the unadjusted projected change) (Boberg and Christensen 2012).

Biases can be reduced by several techniques, e.g. the delta change method, which imposes the climate change signal from GCMs or RCMs on observations without changing the higher moments of the distribution. By using a transfer function (TF), Piani et al. (2010a, b) corrected precipitation and temperature biases from a RCM and a GCM showing good performance not only for means but also for time dependent statistical properties. Their method was adapted by Dosio and Paruolo (2011) and Dosio et al. (2012) to reduce the bias in the ENSEMBLES RCM ensemble, using the observational dataset E-OBS (Haylock et al. 2008) as the reference. This bias correction approach has also been applied for improving crop yield prediction (Ines and Hansen 2006; Oettli et al. 2011), although a debate on the consequences and convenience of bias correcting or not currently exists (e.g. Liu et al. 2014). Bias can also be reduced by using a weather generator (Jones et al. 2011). The above-mentioned studies apply a single post-processing technique; few studies have inter-compared the effect of different bias reduction options on impact projections (see the review by Teutschbein and Seibert 2012, and Ruffault et al. 2014, for hydrological impacts).

In this study we have compared several post-processing methods to correct the biases of ENSEMBLES RCM simulation for IP. The bias corrected results were first evaluated over the present climate 1971–2000. The corrected datasets were also evaluated in terms of their applicability in crop impact studies for the near (2021–2050) and far future (2071–2100).

2 Data and methods

2.1 Crop modelling

CERES-maize (Jones and Kiniry 1986) is a crop model that includes ecophysiological relationships driving crop growth and development, and simulates the effects of temperature and CO2 changes on crop photosynthesis and transpiration rates. CERES uses a daily time step and daily data of maximum and minimum temperature (Tmax, Tmin), precipitation and radiation. Crop development is computed using the sum of mean daily temperature (growing degree days, GDD) above a base temperature. This model has been extensively applied to climate impact assessment (Mínguez et al. 2007 for the IP; Bassu et al. 2014).

Maize was chosen because it provides a reference for summer crops in the IP, as it comprises 11 % of the Spanish irrigated cropping area (which in turn is the 22 % of the cropping land in Spain) and more than a third of the irrigated cereal area (MAGRAMA 2014). Two locations were selected, Aranjuez and Albacete (location map in supplemental material Fig. S-1), because of their availability of referenced field data and their different temperature regimes and orographic conditions. Calibration and validation was done based on previous field experiments using cultivars, management, and specific soil information for each location (see supplemental material, text and Table S-1).

2.2 Observed and simulated climate datasets

Observed data from the Spanish Meteorological Agency (AEMET) stations span 1961–2010 in Aranjuez and 1971–2010 in Albacete. From both stations, daily Tmax, Tmin, precipitation and radiation were used to evaluate the performance of the simulated datasets described below under present climate.

In this study, the original set of 17 high resolution climate change projections generated in the framework of the EU FP6 project ENSEMBLES, was used (hereafter referred to as ENS, see the specific RCMs runs in supplemental material Table S-2) (van der Linden and Mitchell 2009). Simulations forced with the SRES A1B climate change scenario (Nakicenovic and Swart 2000) spanned the period 1961–2050 (or 1960–2100 in some cases), at a resolution of around 25 km.

The second dataset, hereafter referred to as ENS-EOBS, was produced by bias correcting a subset of the ENS dataset (12 RCMs) using the E-OBS version 3.0 observational dataset as a reference (Haylock et al. 2008; see description in supplemental material) for the 1961–1990 climate (Dosio and Paruolo 2011; Dosio et al. 2012), adapting the Piani et al. (2010a, b) technique.

Spain02 (Herrera et al. 2012) is an observational dataset for Spain with higher density of underlying stations than E-OBS. This means that using Spain02 rather than E-OBS as reference could improve bias reduction in Spain. Thus here, we also adapt the bias correction technique used by Dosio and Paruolo (2011) to correct the ENS with respect to the Spain02 reference, generating the hereafter so called ENS-SPAIN02 dataset.

The third dataset was generated by perturbing the CRU weather generator (WG) (Kilsby et al. 2007) with monthly change factors calculated from present and future projections of every RCM of the ENS dataset, generating the hereafter-named ENS-WG dataset. This method was also applied to the ENS-EOBS and ENS-SPAIN02 datasets, obtaining two additional datasets that combine bias correction with use of the WG, ENS-EOBS-WG and ENS-SPAIN02-WG respectively.

The last dataset consists of scenarios generated by the simple delta change method, one of the most commonly used techniques (e.g. Rötter et al. 2013). This is referred to hereafter as the DELTA dataset and was obtained by applying monthly change factors projected by individual ENS RCMs to AEMET data.

Correspondence between gridded datasets and AEMET stations was done by the nearest neighbour method. Crop simulations were replicated with all datasets (see summary of RCM-based datasets in supplemental material Table S-3) for the period 1971–2000 and for the near (2021–2050) and far future (2071–2100).

2.3 Techniques of bias correction and reduction

The bias correction technique used in this study has been extensively described in Piani et al. (2010a, b); Dosio and Paruolo 2011; Dosio et al. 2012). Briefly, it is based on the calculation of a parametric transfer function (TF) which, when applied to model output, delivers corrected output with a marginal cumulative distribution function (CDF) which matches that of the observed measurement. The TF depends on the variable to be corrected. For temperature, the TF proposed by Piani et al. (2010b) was a linear equation, with two parameters. For precipitation, the TF was a set of three equations (linear, logarithmical and exponential) with four parameters (adaptation of this method to our case is described in the supplemental material).

Also, bias can be reduced by the use of a weather generator (WG); in our case the CRU WG (Kilsby et al. 2007). The WG is calibrated on observed station data and projection output is produced by perturbing the WG parameters with monthly change factors calculated from RCM present and future runs. The system produces series at a daily time resolution, using two stochastic models in series, RainSim and CRU WG (Kilsby et al. 2007), generating the other variables dependent on rainfall (and for humidity and so on, dependent on rainfall and temperature; details are in the supplemental material). For projection of future climate, the procedure includes applying the change factors.

A delta change-based ensemble of future projections was generated by applying mean monthly change factors from individual RCM present and future projections to observed station data (AEMET). This method only reflects changes in mean conditions and does not change the future variability. The WG used in this study can be considered as a more sophisticated delta change approach as higher-order statistics are adjusted using RCM-derived change factors.

Comparing the three methods, bias correction has the advantage of correcting not only means but also distribution tails. WG method assures consistency among the variables, while delta method is very simple and easy to implement.

The Nash–Sutcliffe coefficient of model efficiency (E, described in supplemental material, Nash and Sutcliffe 1970) was calculated for comparing: 1) AEMET vs. every dataset; and 2) AEMET-derived crop simulation vs. every dataset-derived crop simulation.

3 Results

3.1 Bias analysis of climate variables for the period 1971–2000

3.1.1 Uncorrected biases: ENS

The monthly Tmax from the ENS ensemble mean presented biases with respect to observations (AEMET) that ranged from 0.5 to 2 °C in Aranjuez, being higher in winter. The bias was close to 0 °C in summer (Fig. 1a). For Albacete, the biases in mean monthly Tmax were small all the year (Fig. 1d). For both locations, the amplitude of the annual cycle of variance was smaller than observed (Fig. 1 g, j), leading to an underestimation of the variance in autumn and to an overestimation in summer.

Fig. 1
figure 1

Comparison of the performance of uncorrected and corrected ensembles in simulating Tmax at Aranjuez and Albacete for the period 1971–2000: Monthly means (top plots) and variance (bottom plots) of Tmax. All plots include data from the observational data sets AEMET, E-EOBS and SPAIN02, and from the uncorrected ensemble ENS and ENS’s spread, as references. Each column also includes data from one corrected ensemble: Left column: ENS-EOBS, where shaded area shows ENS-EOBS’s spread; Central column: ENS-SPAIN02, where shaded area shows ENS-SPAIN02’s spread; and Right column: ENS-WG

Biases of monthly Tmin were close to 0 °C in Aranjuez (Fig. 2a), and the variance was well simulated except in summer, when it was overestimated (Fig. 2 g). In Albacete (Fig. 2d), monthly Tmin was overestimated in winter and summer, while its variance (Fig. 2j), was underestimated throughout the year.

Fig. 2
figure 2

As Fig. 1, but for Tmin

Monthly precipitation in Aranjuez (Fig. 3a) showed an overestimation in winter (up to ca. 20 mm per month), and an underestimation in May, with the same pattern for variance (Fig. 3 g). A similar pattern was found in Albacete (Fig. 3d, j).

Fig. 3
figure 3

As Fig. 1, but for precipitation

3.1.2 Datasets of RCM projections with reduced bias

Temperatures from the corrected ensemble means presented biases with respect to observations (AEMET) close to 0 °C in both locations (Figs. 1 and 2). The only exception was winter and summer Tmin from ENS-OBS for which biases of ca. 1 °C remained, especially in Albacete (Fig. 2a, d).

However, the bias reduction did not improve the simulation of Tmax variance although ENS-SPAIN02 was the dataset that better matched the observed variance (Fig. 1h, k). This maybe explained because SPAIN02 matches better AEMET variance than E-OBS, especially in Albacete. In the case of ENS-WG, the WG shows an annual cycle of variance parallel to that of the observations (AEMET), but with lower values. All datasets underestimated spring and autumn Tmax variance for both locations (Fig. 1, lower two rows). The simulation of Tmin variance improved for some seasons and worsened for others (Fig. 2, lower two rows).

Biases in monthly precipitation were reduced by both ENS-EOBS and ENS-SPAIN02 in Aranjuez throughout the year, and for late autumn and winter also by ENS-WG (Fig. 3, first row). The variance simulation was similar to that of ENS (Fig. 3, third row). In Albacete, the three adjusted datasets simulated better the annual cycle for both mean and variances, but precipitation was slightly underestimated throughout the year, except for ENS-WG which overestimated both mean and variance in summer (Fig. 3, third row for mean and last row for variance).

The inter-model variability for each dataset showed similar results for both locations (Figs. 1, 2, and 3): EOBS and ENS-SPAIN02 showed a smaller spread (ca. half) than ENS for Tmax, Tmin and precipitation. The spread of the mean was smaller than that of the variance for both Tmax and Tmin and all datasets. The spreads were higher for late winter and spring corrected temperatures than for other seasons. Corrected precipitation showed higher spread than temperatures, especially in autumn and spring. Some peaks of spread appeared for some months, corrected variables and locations; some of which were due to a single model, as for instance the high precipitation variance at Albacete in October (Fig. 3k).

In summary, biases of mean Tmax and Tmin were close to 0 °C for the bias reduced datasets and precipitation bias was decreased. Temperature variances were not improved and precipitation variance was only improved for one location. The inter-model spreads of the bias reduced datasets were smaller than that of the uncorrected one. ENS-SPAIN02 showed a slightly better performance with respect to AEMET than the other datasets.

Two additional analyses, the calculation of the efficiency coefficient E and the comparison of probability distribution functions (PDFs) for the variables and seasons that are more limiting to crop production in the IP, confirmed these results (see supplementary material text and Table S-4, Figs. S-2, S-3).

3.2 Comparison of datasets’ performance for crop impact assessment

The differences between the crop simulation outputs obtained with the datasets described in section 2.2 compared with maize simulations run with AEMET for the period 1971–2000 are referred to hereafter as biases in crop phenology and in yield (Table 1).

Table 1 Evaluation of the modelling chain climate-crop for Aranjuez and Albacete in present climate (1971–2000): comparison of crop phenology. Sowing date (SD), anthesis (AD) and maturity (MD) dates (Julian days, DOY), and grain filling duration (GF, days), yield (YI, kg ha-1) with its interannual variability (coefficient of variation YCT, %) and ensemble spread (coefficient of variation YCS, %), simulated with observed climate (AEMET), with the uncorrected ensemble (ENS) and with the five bias-reduced datasets (ENS-EOBS, ENS-SPAIN02, ENS-WG, ENS-EOBS-WG, ENS-SPAIN02-WG). Yield bias (differences in projected yields regarding simulation conducted with AEMET data, YB). Phenological bias (differences in projected phenological dates regarding simulation conducted with AEMET data): anthesis date bias (ADB, days), maturity date bias (MDB, days), and grain filling duration bias (GFB, days)

The projected dates for the relevant crop phenological stages (Table 1) may help to highlight the differences between the climate datasets’ results, as well as to understand the consequences of their biases. This is because these dates are computed by the crop model using the sum of projected temperatures over a base temperature (8 °C for maize). All post-processing methods improved the simulation of anthesis, maturity dates and grain filling duration (which is relevant for yield formation) in present-day climate, in both locations (Table 1) (these improvements can be partially quantified by comparing the E coefficients, see supplemental material Table S-4).

Bias correction resulted in a different yield response for both locations: yield simulation improved in Albacete but biases increased in Aranjuez. This result may be related to the remaining biases in temperatures at Aranjuez, small for the mean but large for the variance. In turn, these remaining biases maybe related to deviations from AEMET of the observational datasets that were used as reference to reduce the ENS biases, as for instance for SPAIN02 Tmin at Aranjuez in winter (Fig. 2). Nevertheless, absolute yield biases from ENS-EOBS were larger than from ENS-SPAIN02 at both locations (Table 1). Yield simulated with any dataset in combination with the WG showed a very small bias as expected, as these data are pretty much constrained to reproduce the observed mean climate values.

ENS presented higher inter-annual variability than ENS-EOBS and ENS-SPAIN02 at both locations (measured by the coefficient of variation YCT, Table 1). When considering WG derived datasets, both locations presented contrasting results. ENS presented higher spread (the coefficient of variation YCS, Table 1) than ENS-EOBS and ENS-SPAIN02 for both locations, in agreement with the results found for the climate variables (Figs. 1, 2, and 3).

3.3 Future projections of climate change impacts

Phenological and yield projections for near (2021–2050) and far (2070–2100) future (NF and FF, respectively, Table 2) periods obtained with the seven datasets described in section 2.2 were compared to evaluate the consequences for climate change impact assessment when using different post-processing techniques. All projected changes are considered with respect to 1971–2000.

Table 2 Crop projections for Aranjuez and Albacete: sowing date (SD, Julian days, DOY), phenology anthesis date (AD) and maturity date (MD) in Julian days (DOY) and grain filling duration (GF), days and maize yield (kg ha−1) with its interannual variability (coefficient of variation YCT, %) and ensemble spread (coefficient of variation YCS, %), simulated with the uncorrected ensemble (ENS) and with the five bias-reduced datasets (ENS-EOBS, ENS-SPAIN02, ENS-WG, ENS-EOBS-WG, ENS-SPAIN02-WG), for the near future (NF, 2021–2050) and for the far future (FF, 2071–2100), under the A1B scenarios. Mean changes of the anthesis date (ADC), maturity date (MDC) grain filling duration (GFC) and yield (YC) are calculated regarding the corresponding 1971–2000 (present climate) projections, as difference (A1B-present, days) for phenology and as percentages for yield ((A1B-present)*100/present)

Phenological projections indicated earlier anthesis and maturity dates and shorter grain filling than those of the 1971–2000 period for both locations, as expected (Ruiz-Ramos et al. 2011). The crop simulations driven by ENS presented later phenological dates than those driven by the bias reduced datasets, especially in Aranjuez. The spread of projections across all datasets was reduced when ENS was excluded, so bias reduction implied a convergence of results. The ensemble spread across datasets also diminished in FF compared to NF, especially in Aranjuez.

The projections indicated similar yield decrease in both locations, ranging from 9 to 17 % for NF and from 24 to 33 % for FF (Table 2). The delta change method projected maximum changes in yield at Aranjuez in both periods and at Albacete for FF. Projections were similar whatever the applied post-processing technique for both locations, with differences among methods equal to or lower than 8 and 10 % in NF and FF (Table 2), respectively. Differences among yield projections obtained with ENS-EOBS, ENS-SPAIN02 and ENS-WG were even smaller. In general, post-processing methods increased the projected yield changes.

4 Discussion

Our results show that post-processing techniques can help to reduce uncertainty due to a poor representation of present-day local climate in some locations, providing more realistic results in terms of means of simulated crop phenology and yield, in agreement with Oettli et al. (2011) and Michelangeli et al. (2009). However, the spread found among observational datasets reveals an uncertainty not attributable to RCMs, and highlights the sensitivity to the observational dataset chosen as reference for bias reduction.

The comparison of the different post-processing techniques revealed that an overestimation of grain filling duration resulted in an overestimation of crop yield. The differences among simulated yields were temperature-driven because maize was irrigated. The bias reduction of the grain filling length simulation, through correction of temperatures, was enough to improve ENS yield bias in Albacete where the initial bias was small. However, this was not enough to reduce yield bias in Aranjuez, where the ENS bias in grain filling duration was still ca. 2 weeks, and the remaining biases in temperatures of the bias-reduced datasets were larger than in Albacete. Besides, other factors also affect yield such as the diurnal temperature range (Tmax-Tmin) in specific periods. For this reason, similar phenological dates may result in different crop biomass and yield. This may help to explain the different response in Aranjuez in spite of the improvement in the simulation of phenology.

These contrasting effects on phenological and yield biases and ensemble spread for both locations suggest that not only an initial evaluation of the local climate data is needed, but also an evaluation of the effects of these techniques on the impact results. This way, the impact model becomes a tool for evaluating climate models (Stéfanon et al. 2015). Our results indicated that about 10 % of variation in yield projections was due to the latter effects. In case of rainfed crops, this value is expected to increase since much higher uncertainty has been reported for precipitation related variables (e.g. Ruffault et al. (2014) reports 45 % uncertainty in drought intensity anomalies linked to bias correction). This variation should be added to the estimated uncertainty of the modelling chain, i.e. the climate modelling-post-processing-impact modelling chain, along which uncertainty is accumulated. These findings are in agreement with Liu et al. (2014) who reported different effect of bias correction depending on the location and impact variable.

We conclude that there would not be a “best” post-processing technique, in agreement with Räisänen and Räty (2013) and Räty et al. (2014). When a decision has to be made about choosing a technique, if an initial climate-impact evaluation can be done as we recommend here, bias correction offers opportunity for improvement for those locations with small initial biases. For locations where remaining biases after correction are still large, the use of a weather generator, alone or in combination with bias correction, may be particularly useful, probably because WGs, in contrast to the bias-correction techniques used here, do not correct temperature independently of precipitation, and other variables such as radiation are also adjusted in a consistent way. Such a combined approach may be particularly useful for rainfed simulations in sites where the monthly precipitation bias is still large after bias reduction. Another possible approach for these cases would be to consider several post-processing methods in parallel (in agreement with Räty et al. 2014). The main limitation is the large number of possible simulations to be run. However, there are sampling methods that can be used to reduce the number of simulations needed (Asseng et al. 2013). Also, bias correction could be applied to the GCM, driving a RCM with a bias-corrected GCM output (Glotter et al. 2014). And an alternative approach to bias reduction would be the selection of the RCM the most consistent with an impact model (Stéfanon et al. 2015), or a reduced ensemble of climate and crop model combinations meeting this criterion.

The simulation of interannual variability remains challenging. Oettli et al. (2011) and Michelangeli et al. (2009) report that this difficulty is transmitted to the simulation of yield variability. On the other hand, the simulation of some relevant extremes was improved here, as in the case of Tmax in Albacete, which is a hazardous event for maize flowering at that location. In our study, a small improvement in the simulation of the annual cycle of precipitation was accomplished by reducing RCM bias using SPAIN02. Also, biases in the mean can be different to biases for the high quantiles (Maraun 2012; Christensen et al. 2008); our results show that post-processing methods and especially bias correction with regard to a high resolution observational dataset (SPAIN02) improved the distributions of the simulated climate variables, in terms of both peak and tails.

The mentioned limitations stress that bias reduction remains a temporary solution while model improvement is undertaken. In the meanwhile, further steps may include the correction of other variables such as radiation and multivariate correction (Hoffmann and Rath 2012; Piani and Haerter 2012). For this purpose, reliable reference datasets including radiation data would be needed (E-OBS and Spain02 do not include it).

5 Conclusions

The objective of this study was to evaluate the potential of post-processing techniques (in particular, bias correction, a WG combined with RCM-derived change factors, and a very simple delta change method) for improving the quality of crop impact projections.

The use of the different post-processing techniques resulted in a difference among crop projections of 10 % or less. The added value of these techniques becomes evident in 1) the improvement of crop phenology which is valuable for improving crop simulations and also for cultivar and species suitability studies, 2) the improvement of yield projections, and 3) the reduction of uncertainty because the inter-model (ensemble) spread of the climate models used is reduced. For these improvements, the density of observation stations used to create each reference observational dataset affected the correction performance.

The improvement was not the same for both locations and all techniques studied, and no single technique proved to be the best one. We recommend undertaking an initial evaluation of the observed and simulated climate data, their post-processing and implications for impact modelling, as an assessment of climate and crop projection biases may help to select the most robust techniques to build a tailored ensemble, locally designed for a specific crop and its management. Rainfed crop simulations in particular could benefit from this approach. Although this kind of procedure complicates the modelling chain, it is desirable when the objective is to go a step further in the reliability of impact projections.