Introduction

Sugarcane, a traditional crop of India plays an important role in agricultural and industrial economy of the country. It is cultivated in most of the states and though it covers an insignificant share of about 2 and 20% of gross cropped area of our country and world respectively, its share in our country’s economic growth has become significant. Among the sugarcane growing states in our country, Tamilnadu ranks first in per hectare productivity of sugarcane with 113.9/ha of cane yield. Sugarcane is a versatile crop. Because of its diversified uses in different industries, this crop is considered as “Karpagavirucham” and in modern terminology as “wonder cane” (Mohan et al. 2007).

Sugarcane occupies a significant position among the commercial crops in India. A proper forecast of production of such important commercial crops is very important in an economic system. There is close association between crop productions with prices. An unexpected decrease in production reduces marketable surplus and income of the farmers and leads to price rise. A glut in production can lead to a slump in prices and has adverse effect on farmers’ incomes. Impact on price of an essential commodity has a significant role in determining the inflation rate, wages, salaries and various policies in an economy. In case of commercial crops like sugarcane, production level affects raw material cost of user industries and their competitive advantages in the market. In our present study sugarcane area, production and productivity of Tamilnadu have been forecasted using Auto Regressive Integrated Moving Average (ARIMA) models.

Previously attempts were made to forecast sugarcane production and productivity using ARIMA models (Bajpai and Venugopalan 1996; Yaseen et al. 2005). ARIMA models have been used for modeling and forecasting of fish catches (Venugopalan and Srinath 1998; Tsitsika et al. 2007). The forecasting efficiency of ARIMA models were compared with neural network models (Hanson et al. 1999). ARIMA models have been developed to forecast the cultivable area, production and productivity of various crops of Tamilnadu (Balanagammal et al. 2000). Wheat production in Pakistan and Canada were forecasted using ARIMA models (Saeed et al. 2000; Boken 2000). ARIMA models were used to obtain seasonal forecast of paddy in Tamilnadu and food grains in India (Balasubramanian and Dhanavanthan 2002). ARIMA models were compared with structural time series models (Ravichandran and Prajneshu 2001; Prajneshu et al. 2002). ARMA models were used in forecasting of milk, fat and protein yields of Italian Simmental cows (Maccioitta et al. 2000, 2002). Univariate forecasting of state level agricultural production was done using ARIMA models (Indira and Datta 2003). ARIMA models were compared with nonparametric regression approach for forecasting oilseed production in India (Chandran and Prajneshu 2005). Forecasting of irrigated crops like Potato, Mustard and Wheat were forecasted using ARIMA models (Sahu 2006). Milk production in India was forecasted using time-series modeling techniques (Pal et al. 2007). The objective of our present study using ARIMA models to forecast sugarcane area, production and productivity of Tamilnadu.

Materials and Methods

The Data on sugarcane area (000’ ha), production (000’ tonnes) and productivity (tonnes/ha) for a period of 57 years from (1950–1951) to (2007–2008) has been collected from various volumes of ‘Cooperative Sugar’ (CSJ 1980, 2007) and ‘Indian Sugar’ (ISJ 1985, 2009) journals.

The data for a period of 55 years (1950–2006) was used in model building. The remaining 2 years data (2007–2008) was used for validation of the model.

Description of the Model

In general, an ARIMA model is characterized by the notation ARIMA (p, d, q) where p, d, q denote orders of auto-regression, integration (differencing) and moving average respectively. In ARIMA, time series is a liner function of past actual values and random shocks. A stationary ARIMA (p, q) process is defined by the equation

$$ Y_{t} = \Upphi_{0} + \Upphi_{1} Y_{t - 1} + \Upphi_{2} Y_{t - 2} + \ldots \ldots \ldots + \Upphi_{p} Y_{t - p} + \varepsilon_{t} \omega_{1} \varepsilon_{t - 1} - \omega {}_{2}\varepsilon_{t - 2} \ldots \ldots \omega_{q} \varepsilon_{t - q} $$
(1)

where, Yt is the response (dependant) variable at time t. Yt−1, Yt−2…Ytp is the response (dependant) variable at time lags t1, t2,,tp respectively; these Y’s are independent variables. Φ1, Φ2…Φp is the coefficients to be estimated. ɛt is the error term at time t that represents the effects of variables not explained by the model; the assumptions about the error term are the same as those for the standard regression model. ɛt−1, ɛt−2…ɛtq is the error term that represents the effect of variables not explained by the model. The assumptions about the error term are the same as those for the standard regression model. ω1, ω2ωq is the coefficients to be estimated.

ARIMA Model Building

Identification

The foremost step in the process of modeling is to check for the stationarity of the series, as the estimation procedures are available only for the stationary series. There are two kinds of, viz., stationarity in ‘mean’ and stationarity in ‘variance’. Visual examination of graph of the data and structure of autocorrelation, and partial correlation coefficients helps to check the presence of stationarity. Another way of checking for stationarity is to fit a first order autoregressive model for the raw data and test whether the coefficient ‘Φ1’ is less than one. If the model is found to be non-stationary, stationarity is achieved by differencing the series.

If ‘Xt’ denotes the original series, the non-seasonal difference of first order is

$$ Y_{t} = \, X_{t} { - }X_{t - 1} $$
(2)

The next step in the identification process is to find the initial values for the orders of non-seasonal parameters, p and q. They are obtained by looking for significant autocorrelation and partial autocorrelation coefficients. There are no strict rules in choosing the initial values. Though sample autocorrelation coefficients are poor estimates of population autocorrelation coefficients, still they are used as initial values while the final models are achieved after going through the stages repeatedly.

Estimation

At the identification stage, one or more models are tentatively chosen that seem to provide statistically adequate representations of the available data. Then precise estimates of parameters of the model are obtained by least squares. Standard computer packages like SAS, SPSS etc. are available for finding the estimates of relevant parameters using iterative procedures.

Diagnostics

Different models are obtained for various combinations of Auto Regressive and Moving Average individually and collectively. The best model is selected based on the following diagnostics:

  1. a)

    Low Akaike Information Criteria (AIC)

  2. b)

    Insignificance of auto correlations for residuals (Q-tests)

  3. c)

    Significance of the parameters

(a) Low AIC: AIC is given by AIC = (−2log L + 2m) where m = p + q and L is the likelihood function. Since −2 log L is approximately equal to {n(1 + log 2π) + n log σ2} where σ2 is the model MSE, AIC is written as AIC = {n(1 + log 2π) + n log σ2 + 2 m} and because first term in this equation is a constant, it is omitted while comparing between models. As an alternative to AIC, sometimes SBC is also used which is given by SBC = log σ2 + (m log n)/n.

(b) Insignificance of auto correlations for residuals (Q-tests): After tentative model is fitted to the data, it is important to perform diagnostic checks to test the adequacy of the model and, to suggest potential improvements. One way to accomplish this is through the analysis of residuals. It has been found that it is effective to measure the overall adequacy of the chosen model by examining a quantity Q known as Box-Pierce statistic (a function of autocorrelations of residuals) whose approximate distribution is chi-square and is computed as follows:

$$ Q = n\sum {r^{2} \left( j \right)} $$
(3)

where summation extends form 1 to k with k as the maximum lag considered, n is the number of observations in the series, r (j) is the estimated autocorrelation at lag j: k is a positive integer and is usually around 20. Q follows Chi-square with (k − m1) degrees of freedom where m1 is the number of parameters estimated in the model. A modified Q statistic is the Ljung-box statistic which is given by

$$ Q = n(n + 2)\sum r^{2} (j)/(n - j) $$
(4)

The Q statistic is compared to critical values from chi-square distribution. If model is correctly specified, residuals should be uncorrelated and Q should be small (the probability value should be large). A significant value indicates that the chosen model does not fit well.

Results and Discussion

The stationary check of time series revealed that the time series data on sugarcane area, production and productivity was not stationary. It was made stationary by using the first order differencing technique. For different values of p and q (0, 1 or 2), various ARIMA models were fitted and appropriate model was chosen corresponding to minimum value of the selection criterion i.e. Akaike Information Criteria (AIC) and Schwarz-Bayesian Information Criteria (SBC). In this way, ARIMA (1, 1, 1) model was found to be appropriate for sugarcane area and productivity. ARIMA (2, 1, 2) was suitable for sugarcane production. The estimates of parameters along with their standard errors have been presented in Tables 1, 2 and 3 for sugarcane area, production and productivity. After model fitting, next step is diagnostic checking of the fitted model. ACF and PACF were plotted for residuals of the fitted model. For the present study ACF and PACF were lying within the limits for sugarcane area, production and productivity which shows that ARIMA model fitted well.

Table 1 Final estimates of parameters for sugarcane area
Table 2 Final estimates of parameters for sugarcane production
Table 3 Final estimates of parameters for sugarcane productivity

The fitted models were validated by comparing the actual values with predicted values. The observed and predicted values for sugarcane area, production and productivity along with percentage of deviation has been presented in Tables 3, 4 and 5.

Table 4 Performance of ARIMA (1, 1, 1) model for sugarcane area
Table 5 Performance of ARIMA (2, 1, 2) model for sugarcane production

The results of Table 4 indicate that the predicted values of sugarcane area are slightly higher than the actual values. From Table 6 it could be seen that the predicted values are much closer to the observed values for sugarcane productivity.

Table 6 Performance of ARIMA (1, 1, 1) model for sugarcane productivity

Conclusions

The drought of 2009 has brought home the critical need for a short-term forecasting model for agriculture sector at sub-national level, since good and bad agricultural years are not synchronous across states.

The ARIMA models developed could successfully used for forecasting sugarcane area, production and productivity of Tamilnadu for subsequent years. The forecast values for the year (2010–2012) for sugarcane area, production and productivity are presented in Table 7.

Table 7 Forecast values for the future