1 Introduction

Rice is the most important crop in Asia, which is home to three-fifths of humanity. In India, more than half of the annual rice crop continues to be grown during the summer monsoon season (kharif), despite increased dry-season harvests made possible by expanded irrigation. Recent research indicates that the monsoon has changed in two significant ways during the past half-century: it has weakened (less total rainfall during June–September; Ramanathan et al. 2005; Dash et al. 2007; Ramesh and Goswami 2007), and the distribution of rainfall within the monsoon season has become more extreme (Goswami et al. 2006; Dash et al. 2009). Here, we use a combination of statistical and simulation methods to analyze the impacts of these changes on rice yield (= harvest per unit area) since the 1960s. To our knowledge, such an analysis has not been previously conducted, perhaps because the daily gridded rainfall data needed for measuring extreme rainfall in India became available only recently (Ramesh and Goswami 2007).

The all-India mean of total June–September rainfall during 1961–98 was about 5% below the mean for the previous 30-year period (Ramanathan et al. 2005). This reduction is more than double the overall reduction since the late 1800s (Dash et al. 2007), thus suggesting that the weakening of the monsoon has accelerated. During 1951–2003, the area of India with monsoon rainfall one standard deviation below the mean for that period expanded by nearly 50% (Ramesh and Goswami 2007). During roughly the same time period (1951–2000), the frequency of heavy and very heavy rain events in central India increased by nearly 50% and more than 100%, respectively, while the frequency of moderate events decreased by about 10% (Goswami et al. 2006). For the country as a whole, the frequency of days with low or moderate rainfall decreased significantly during 1951–2004, while the frequency of long rainy spells decreased and the frequency of short rainy spells, dry spells, and prolonged dry spells all increased (Dash et al. 2009).

These changes raise concerns about food security. Numerous studies have demonstrated that the kharif harvest is lower when total June–September rainfall is lower (Webster et al. 1998; Selvaraju 2003; Krishna Kumar et al. 2004). A drought during the summer of 2009 was one of the most severe in decades, with rice harvest declining by 14% (Commission for Agricultural Costs and Prices 2010). Flooding associated with heavy rain events can also damage crops (Goswami et al. 2006).

Given that the monsoon has evidently already weakened and become more extreme, one should be able to detect any resulting impacts on rice yield by analyzing historical data. To do this, however, one must control for changes in agricultural technology. Planting of high-yielding varieties (HYV) and application of agro-chemicals expanded greatly in India after the advent of the “Green Revolution” in the mid-1960s. The positive impact of such innovations on yield could mask the negative impacts of changes in the monsoon. One must also control for other meteorological changes. Surface warming accelerated in India at the end of the 20th Century, with minimum (nighttime) temperature (T min) rising 0.025°C/yr during 1981–1990 and 0.056°C/yr during 1991–2000 (Padma Kumari et al. 2007). India’s land surface also became dimmer, with surface solar radiation falling by about 5% during 1981–2004 (Ramanathan et al. 2005; Padma Kumari et al. 2007). These changes, and not just changes in the monsoon, could contribute to any residual reductions in yield that are detected after controlling for changes in technology. Rice yield tends to be reduced by higher minimum temperature (Yoshida and Parao 1976; Seshu and Cady 1984; Peng et al. 2004; Wassmann et al. 2009) and lower solar radiation (De Datta and Zarata 1970; Yoshida and Parao 1976; Evans and De Datta 1979; Seshu and Cady 1984; Stanhill and Cohen 2001; Peng et al. 2004; Praba et al. 2004), especially during the latter part of the growing season.

In the statistical part of the study, multiple regression was used to determine the sensitivity of yield to monsoon characteristics—extreme rainfall and drought, in addition to total June-September rainfall—while controlling for potentially confounding technological and meteorological factors. The sample consisted of 1966–2002 data for major agricultural states in India where farms are predominantly rainfed. In the simulation analysis, we combined these regression-based estimates of the climate sensitivity of yield with historical climate data to predict the impacts of changes in monsoon characteristics on yield and to compare them to the impacts of changes in other climate characteristics. We describe the statistical analysis in the next section and the simulation analysis in the section after that. We discuss implications and limitations of our findings in the final section.

2 Statistical analysis

2.1 Regression methods and data

The statistical analysis extended previous research on the impacts of weather on kharif rice production by quantifying the impacts of monsoon characteristics other than just total June–September rainfall. It used the following modified version of an existing statistical model of state-level rice production in India (Auffhammer et al. 2006):

$$ \ln \left( {{y_{{it}}}} \right) = {c_i} + {\theta_t} + {\varphi_i}t + {{\mathbf{X}}_{{it}}}{\mathbf{\beta }} + {{\mathbf{Z}}_{{it}}}{\mathbf{\gamma }} + {\varepsilon_{{it}}}. $$

y it is yield in state i in year t; c i and θ t are fixed effects for states and years (i.e., state- and year-specific regression constants); ϕ i is a state-specific parameter on the annual time trend t; X it denotes farm inputs; Z it denotes weather variables; β and γ are parameters on farm inputs and weather variables, respectively; and ε it is the error term. Boldface indicates vectors. With one exception (discussed below), all variables in X it and Z it were expressed as natural logarithms. Nine states with predominantly rainfed rice production were included in the analysis: Assam, Bihar, Karnataka, Kerala, Madhya Pradesh, Maharashtra, Orissa, Uttar Pradesh, and West Bengal. Nineteen sixty-six was chosen as the first year in the regression sample because it was the year when HYV were introduced in India, and 2002 was chosen as the final year due to data availability. The resulting estimation period was more than a third longer than in Auffhammer et al.

The original model in Auffhammer et al. had the logarithm of harvest instead of the logarithm of yield (= harvest per hectare) as the dependent variable, and it included the logarithm of area harvested as an additional explanatory variable. When we estimated that version of the model, we found that the parameter on the logarithm of area harvested was not significantly different from one, which implies that the logarithm of yield can be used as the dependent variable. In addition to being simpler, the latter specification avoids a statistical problem (endogeneity; see Mundlak 2001) that could be caused by the simultaneous determination of area harvested with quantity of rice harvested.

We included eight weather variables in Z it . Six of them were total rainfall, mean T min, and mean surface radiation, each measured separately during two periods, June–September and October–November. The earlier period is the standard definition of the monsoon period, and it covers approximately the vegetative and reproductive growth phases of the rice plant. The later period covers approximately the ripening phase, at the end of which harvesting occurs. The correspondence between time periods and rice growth phases is only approximate, because crop-establishment and harvest dates vary across states and years due to variation in weather conditions.

The remaining two weather variables measured two additional characteristics of monsoon rainfall, extreme rainfall and drought. Extreme rainfall was defined as the sum of June–September rainfall that occurred on days with rainfall that equaled or exceeded a state’s 95th percentile daily threshold. Drought was a binary 0–1 variable that indicated years when total June–September rainfall was at least 15% below a state’s mean for that variable. The 15% threshold was determined statistically, as the value that had the most statistically significant impact on yield. The drought variable was not expressed in logarithmic form, given that it could have values of zero. The extreme rainfall variable had values of zero in a few cases, so the number one was added before the variable was converted to logarithms. Table 1 provides detail on data sources and construction of the weather variables.

Table 1 Weather variables in regression model
Table 2 Non-weather variables in regression model

The original model in Auffhammer et al. did not include the extreme rainfall and drought variables. Other differences were that it included total rainfall before the monsoon (March–May), which we found to have an insignificant impact on yield, and mean surface radiation during December, which is after the kharif harvest.

The modified model included the same four non-weather variables in X it as in Auffhammer et al.: area irrigated (although our analysis focuses on the kharif crop in predominantly rainfed states, irrigation works can improve water management even during the monsoon season), area in HYV, fertilizer use, and number of farm workers. Table 2 provides detail on these variables. Although most inputs of fertilizer and labor occur months before harvest, one might argue that these variables are nevertheless endogenously determined with yield. For example, an unobserved early-season shock that ultimately affects yield might also influence use of fertilizer or labor. Failure to control for such endogeneity could potentially bias the parameter estimates (Mundlak 2001). However, a Hausman test did not reject the null hypothesis that fertilizer and labor were uncorrelated with the error term at even a 10% level, and so we concluded that endogeneity was not a significant concern. Along similar lines, one might argue that inclusion of the non-weather variables causes the model to underestimate the impacts of weather on rice yield, as the levels of these variables might be adjusted in response to weather conditions. To check this, we also estimated the model with all the variables in X it excluded—i.e., a specification that included only the weather variables.

Fig. 1
figure 1

Historical weather observations for a representative kharif rice-growing region in predominantly rainfed areas of India. Observations were constructed by weighting state-level weather variables by 1966–2002 mean areas harvested. Dotted lines show trends. a Total June–Sep. rainfall (blue, left axis); total Oct.–Nov. rainfall (red, right axis,); June–Sep. extreme rainfall (green, right axis). b Oct.–Nov. T min

The fixed effects for states and (c i ) and the state-specific annual trends (ϕ i t) implicitly demeaned and detrended all the variables at the state level, thus controlling for unobserved sources of variation in mean yield between states and, within each state, over time that might potentially be correlated with the weather variables and thus could bias the parameter estimates on them. The fixed effects for years (θ t ) controlled for additional unobserved sources of variation that were common across states but varied nonlinearly over time. In calculating the standard errors of the parameter estimates, we allowed the error term ε it to have different variances across states and nonzero covariances between states (unobserved shocks could be spatially correlated). We also allowed for first-order serial correlation, but we found that the serial correlation coefficient had a negligible value of 0.03 and so ignored it in calculating the standard errors.

2.2 Regression results

Table 3 shows complete regression results. Results in the first two columns (parameter estimates and P-values) were the source of parameters used in the simulation analysis. All three monsoon characteristics were found to have significant impacts on yield at P < 0.1, but only total June–September rainfall and the drought indicator were significant at P < 0.05. Extreme rainfall was thus less significant than the other two characteristics. The magnitude of its impact was also less. Because the dependent variable (yield) and all explanatory variables except drought were in logarithmic form, parameter estimates \( \left( {\widehat{\gamma }} \right) \) for variables other than drought equal elasticities: a 1% change in a weather variable has a \( \widehat{\gamma } \)% impact on yield. In the case of the drought variable, the parameter estimate indicates the additional \( \hat{\gamma } \)% reduction in yield that occurred when total June–September rainfall dropped 15% below its mean value for a given state. Using these relationships, the regression results indicate that yield decreased 0.20% per 1% decrease in total June–September rainfall but only 0.022% per 1% increase in extreme rainfall, and 12% when a drought occurred.

Table 3 Parameter estimates on variables in regression model

These results indicate that the monsoon’s impact was nonlinear—the negative impact of reduced rainfall was amplified when rainfall was very low (drought), and the positive of impact of higher rainfall reversed sign and became negative when the increase occurred as extreme rainfall—but the nonlinearity related to drought was much more important than the one related to extreme rainfall. The middle two columns of Table 3 show results for a model that excluded the extreme rainfall and drought variables. The parameter on total June–September rainfall was much higher in this model (0.31 instead of 0.20). Ignoring nonlinear characteristics of the monsoon thus caused the estimate of this parameter to be biased upward, which is consistent with the omitted impact of drought being much more influential than the omitted impact of extreme rainfall.

Returning to the results in the first two columns, several of the controls for other weather variables were also significant. As expected, October–November T min had a highly significant (P < 0.001), negative impact on yield, and the impact was large: a 1% increase was associated with a 0.95% decrease in yield. Total October–November rainfall had a significant (P = 0.015), positive impact, but it was small compared to the impact of total June–September rainfall: a 1% increase was associated with just a 0.031% increase in yield, not much more than for extreme rainfall. Neither of the surface radiation variables was significant, perhaps because these variables were more aggregate (and thus measured less accurately) than the rainfall and temperature variables (as explained in Table 1, they referred to northern and southern groups of states, not individual states).

The parameters on several non-weather variables were also significant and had plausible signs. Parameters on area irrigated, area planted with HYV, and fertilizer were positive and significant (P < 0.05). The parameter on labor was significant but negative, which perhaps indicates that yield tended to be lower in states with smaller average farm size (i.e., more labor per unit area, and thus a loss of scale economies). The last two columns of Table 3 show results for a model that excluded the non-weather variables. The exclusion of these variables had very little impact on the parameters on the weather variables, which implies that the levels of the non-weather variables were not very sensitive to weather conditions.

3 Simulation analysis

3.1 Simulation methods and data

The impact of monsoon changes on rice yield depends on not only the sensitivity of yield to monsoon characteristics but also the magnitude of changes in these characteristics. The simulation analysis integrated these two sets of factors. Specifically, it addressed the question, “How would rice yield have differed during 1966–2002 if climate had remained the same as before 1960?” Nineteen-sixty was chosen as the historical dividing line due to the evidence of a reduction in monsoon rainfall since then (Ramanathan et al. 2005).

Using estimated parameters from the statistical model, rice yield was predicted for a representative kharif rice-growing region in the predominantly rainfed areas of India under two climate scenarios, with climate change and without climate change. Denote the value of weather variable j in year t under the former scenario by z jt and its value under the latter scenario by \( {\tilde{z}_{{jt}}} \), where t refers to a year during 1966–2002. Holding the non-weather variables at their historical values, and expressing the weather variables in original units (not logarithms), the ratio of predicted yield without climate change (\( {\tilde{y}_t} \)) to predicted yield with climate change (\( {\hat{y}_t} \)) is given by

$$ \frac{{{{\tilde{y}}_t}}}{{{{\hat{y}}_t}}} = {\prod\limits_j {\left( {\frac{{{{\tilde{z}}_{{jt}}}}}{{{z_{{jt}}}}}} \right)}^{{{{\hat{\gamma }}_j}}}}, $$

where the exponent \( {\hat{\gamma }_j} \) is the parameter estimate on weather variable j from the regression model. For expositional simplicity, this expression ignores the drought dummy, which enters as an exponential term; this variable was however included in the simulation.

Applying this expression required data on z jt and \( {\tilde{z}_{{jt}}} \). We focused on the five weather variables that were significant at P < 0.1 in the regression model that included the full set of variables (the first two columns in Table 3). Data on z jt were simply the actual 1966–2002 state-level weather observations, aggregated to form estimates for a representative kharif rice-growing region in the country by using 1966–2002 mean areas harvested of the nine states as weights. For the total June–September and October–November rainfall variables, we used monthly series from the Indian Institute of Tropical Meteorology instead of the daily gridded data from the India Meteorological Department that we used in the regression model. We did this to obtain longer time series: as discussed in the next paragraph, we based the counterfactual weather variables \( {\tilde{z}_{{jt}}} \) on pre-1960 weather observations, and the daily rainfall series were available only since 1951.

Data on \( {\tilde{z}_{{jt}}} \) are obviously not observable. We used Monte Carlo techniques to simulate them. Exact prediction of the post-1960 weather realizations that would have occurred in the absence of climate change is impossible, but the Monte Carlo analysis enabled us to characterize their expected values. We determined the means, variances, and covariances for the five area-weighted weather variables for the pre-1960 period, using 1930–60 observations where available. We assumed that, in the absence of climate change, the 1966–2002 realizations of the \( {\tilde{z}_{{jt}}} \) variables would have been drawn from a multivariate normal climate distribution having these pre-1960 characteristics.

The means and variances of three variables in \( {\tilde{z}_{{jt}}} \)—total June–September rainfall, and total rainfall and mean T min during October–November—were set equal to the means and variances of the observed 1930–60 values, which are shown in Fig. 1. In view of the much shorter time series for June–September extreme rainfall, which was available only from 1951, we used a different approach based on regression analysis, which related the observed values of this variable to a time trend. We also used a regression-based approach for June–September drought, which related the observed values to total June–September rainfall. Table 4 provides details on the construction of all the \( {\tilde{z}_{{jt}}} \) variables, with Online Resources 1–2 providing details on the regression results for June–September extreme rainfall and drought.

Table 4 Construction of counterfactual weather variables in simulation model

The historical weather series that we used in the simulation analysis exhibited several notable changes over time (Fig. 1). The direction of these changes was consistent with ones reported in previous studies, but the magnitudes were not necessarily the same due to differences in areas analyzed: we used weather observations from just the rice-growing portions of states when we constructed the state-level weather variables (Table 1); and we weighted the state-level weather variables by mean harvested areas when we formed the aggregate, “India”-wide variables shown in Fig. 1 (Table 4). Weather trends reported in previous studies have not referred so specifically to rice-growing areas. The mean of total June–September rainfall fell from 1259 mm to 1211 mm between 1930–60 and 1961–2002, while the mean of total October–November rainfall fell from 128 mm to 109 mm. Trends in these variables within either the 1930–60 or 1961–2002 periods were not significant at P < 0.05, but the trend in June-September extreme rainfall was significant during 1961–2002, rising 0.67 mm/year. The annual probability of one of the states in our sample experiencing a drought according to our definition (i.e., total June–September rainfall being at least 15% below a state’s mean) increased from 8.6% during 1930–60 to 16.4% during 1961–2002. October–November T min did not have a significant trend during 1930–60, but it rose 0.025°C/yr during 1961–2002 (P < 0.01). In the absence of these changes, the average predominantly rainfed rice-growing area in India would have had a stronger monsoon, with a lower risk of drought and less extreme rainfall, and more rain and cooler nights during October–November.

Consideration of covariances among the \( {\tilde{z}_{{jt}}} \) variables is potentially important, as individual aspects of weather do not change in isolation. For example, cooler periods tend to be wetter. To investigate the importance of covariances, we constructed two versions of the variance-covariance matrix for the \( {\tilde{z}_{{jt}}} \) variables (Online Resource 3): a diagonal matrix that assumed covariances equaled zero (Version A), and version that used the pre-1960 observations to calculate nonzero covariances (Version B). Both versions omitted June–September drought, which as noted above was directly related to the Monte Carlo draws of total June–September rainfall. We ran the Monte Carlo analysis using each matrix. The mean predicted impacts on yield were virtually identical for the two matrices, differing by at most 0.04 percentage points. The results presented in this paper refer to draws from matrix B.

Ten thousand draws for 1966–2002 were taken from the pre-1960 climate distribution for a given simulation. For each year in a draw, the ratio of yield without climate change (\( \tilde{y} \), based on the simulated weather realizations \( \tilde{z} \)) to yield with climate change (\( \hat{y} \), based the observed weather realizations z) was calculated. The ratios were then averaged across draws. Simulations were run with different combinations of weather variables included, to determine the individual and combined impacts of the variables.

3.2 Simulation results

The results shown in Table 5 are for a series of simulations that started with just total June–September rainfall and progressively added the remaining four weather variables. Results were virtually identical if the variables were added in the reverse order, thus implying that the order of addition did not matter.

Table 5 Predicted impact of historical absence of climate change: percentage difference between rice yield if 1966–2002 weather had been drawn from 1930–60 climate distribution, and rice yield under actual 1966–2002 weather realizations

The results indicate that average yield during 1966–2002 would have been higher in the absence of climate change, with all five weather variables contributing to the increase. Yield would have been roughly 2% higher if monsoon characteristics had not changed. The increase in total June–September rainfall and the reduced frequency of drought each would have accounted for about half of the increase, with the impact of reduced extreme rainfall being negligible. The increase in total October–November rainfall would have increased yield by about a third as much as the improved monsoon conditions.

The decrease in October–November T min would have had an even greater impact, 50% larger than the combined impact of the four rainfall variables. Average yield during 1966–2002 would have been nearly 6% higher if none of the five weather changes had occurred.

4 Discussion and conclusions

Actual cumulative kharif rice harvest during 1966–2002 in the 9 states in our study was 1,322 million tons. Our simulation results imply that the cumulative harvest would have been 5.67% higher in the absence of climate change, or 75 million tons. This amount can be compared to a back-of-the-envelope estimate of the increase caused by the actual increase in yield. Average yield across the nine states during 1961–65, before the introduction of HYV, was 0.89 tons per hectare. If yield had stayed at that level, then cumulative harvest during 1966–2002 would have been just 967 million tons. (This calculation takes into account the roughly 10% increase in harvested area between the mid-1960s and early 2000s.) The difference between the actual cumulative harvest and this amount, 355 million tons, represents the impact of the actual increase in yield, which resulted from such changes in farming technology as the introduction of HYV and greater use of irrigation and fertilizer. The boost in yield from a better climate thus would have increased cumulative harvest by about a fifth as much as improved farming technology did.

Another way to gauge the simulated 5.67% increase in yield is to compare it to predictions of the future agricultural impacts of climate change. Guiteras (2009) predicted that climate change will reduce rice yield in India by 4.5–9% by 2039, while the Fourth Assessment Report of the IPCC reported that net production of cereals (not just rice) in South Asia could decrease by 4–10% by 2100 even under “the most conservative climate change scenario” (Cruz et al. 2007, pp. 480–1). Our results indicate that impacts of these magnitudes already occurred during 1966–2002.

The simulated yield impacts in Table 5 probably understate the actual impacts of historical climate change on rice harvests in India. The simulation analysis ignored potential responses by farmers to the improved weather conditions they would have faced in the absence of climate change. Farmers might have been able to achieve even higher yields by adjusting their use of fertilizer and other inputs, which were held constant at historical levels in the simulations. Moreover, the impacts in Table 5 refer only to changes in yield, while the percentage change in harvest is given by the sum of the percentage change in yield and the percentage change in area. Previous research indicates that an increase in total June–September rainfall has an even larger impact on area than on yield (Kanwar 2004; Auffhammer et al. 2006). An increase in rice area would, however, come at the expense of other crops and non-farm land uses. Hence, the area-related portion of an increase in rice harvest would not represent as unambiguous an increase in overall economic output as the yield-related increase reported here.

There is another, more purely statistical reason for expecting the estimated impacts to be understated: measurement error. The regression model in this study, like those in several others on the impacts of climate change on agriculture in India (e.g., Kanwar 2004; Auffhammer et al. 2006, including the sources cited on p. 19670; Guiteras 2009), was based on a combination of two types of data, agricultural and meteorological. The agricultural data were reported at the state level, while the meteorological data were reported at more disaggregated levels. As indicated by Table 1, a series of steps were needed to make the meteorological data conform to the agricultural data. The resulting, aggregated meteorological variables suffer from an unknown, but surely nonzero, amount of measurement error as indicators of weather in the predominantly rainfed areas of the states. The usual consequence of measurement error in an explanatory variable is to cause the regression parameter on that variable to be biased toward zero (Greene 2008, pp. 325–327). For this reason, the actual yield impacts of some of the weather variables in our regression models could therefore be greater than the parameters in Table 3 indicate.

Our statistical results indicate that monsoon rainfall is not the only weather variable affecting the kharif rice yield in India. In fact, our simulation results indicate that nighttime warming at the end of the growing season had an even greater impact on yield during 1966–2002 than changes in monsoon characteristics. Regarding monsoon characteristics, both our statistical and simulation results confirm the usefulness of the standard summary measure of the strength of the monsoon, i.e. total June–September rainfall, for predicting rice yield. Our statistical results indicate that this variable is significantly correlated with rice yield and that it can be used to generate a simple drought indicator that is also significantly correlated with yield. Our simulation results indicate that changes in total June–September rainfall and drought frequency had about equal impacts on rice yield during 1966–2002. In contrast, our statistical results indicate that a second nonlinear characteristic of the monsoon, the amount of extreme rainfall, is less significantly correlated with yield, and our simulation results indicate that its impact on yield during 1966–2002 was much smaller than the impacts of the other two monsoon characteristics.

The historical changes in India’s climate analyzed here were purely observation-based—a statistical comparison of weather realizations before 1960 to realizations after that date—not the result of climate model runs. The analysis thus provides no basis for determining the extent to which the observed changes were due to human activity. The changes are, however, consistent with climate model predictions of the combined effects of increased global concentrations of greenhouse gases and increased regional concentrations of aerosols (“brown clouds”) (Ramanathan et al. 2005; Ramanathan et al. 2008). Mitigating emissions of greenhouse gases and aerosols might therefore confer benefits on India’s hundreds of millions of rice producers and consumers. Moreover, climate models predict that the monsoon will continue to weaken (Kripalani et al. 2007) and that the global area affected by drought will likely increase in the future, with the frequency of heavy precipitation events very likely to increase over most areas (Pachauri and Reisinger 2007). Future impacts of these changes on rice yield in India would thus likely be larger than the historical ones estimated here.