Introduction

Sugarcane (Saccharum officinarum L.) is a commercial crop belongs to the Graminae family of Glumiflorae order. This cash crop is mainly utilized as a material for the sugar business, which is the second-biggest agro-based industry in India. It is cultivated commercially in most tropical and sub-tropical regions of the globe. India is the second-largest country producing sugarcane after Brazil, producing nearly 25% of total global production (Yadav 2007). Sugarcane is considered as a potential crop to produce sugar, jaggery, ethanol, energy generation, decomposable goods and silage for livestock. Sugarcane juice is used in the manufacture of sugar and jaggery, by-products like bagasse and molasses are used for diverse utility. The bagasse is mainly used as the source of energy and fuel, and it is also used for the production of paper, fiberboard, etc., because of its high cellulose content. Ethyl alcohol, butyl alcohol and citric acid are extracted commercially by molasses of sugarcane.

Sugarcane plays an essential role in the economy of the country. The region under sugarcane cultivation and production in 2019–2020 year was 4867 ha and 3,77,766 ton, respectively, and it was estimated that the productivity is 77.6 ton/ha. During 2018, 79.9% of total sugarcane production of India was used in the manufacture of white sugar, 11.29% was used for jaggery production, and 8.80% was used as seed and feed materials. 840.16 Mt sugarcane was exported in the year 2019 (IISR 2019). Sugarcane production and industries contribute 1% of the National GDP. The production is always characterized by interactions between abiotic elements (such as rainfall, temperature, humidity, etc.), edaphic conditions. Basic understanding of these is necessary for making decisions on predicting the production. Because of these constraints, it is not easy to analyze performance and productivity. The development of virtual reality models can help in efficiency in predicting the growth and yield. This modeling will help in predicting the behaviour in response to its environment and management practices. It also explores both external and internal changes in soil and climate.

The predicated and timely forecast of seasonal production of sugarcane will greatly help in decision-making for sugarcane industries regarding cash flow, value chain, etc. The better model increases the significance and application of the research. Some of the important reviews of past work on sugarcane modeling are given as follows: Yaseen et al. (2005) used time series data for a period 1947–2002 and used to forecast the yield of Sugarcane for Pakistan. The authors identified the appropriate model for forecasting as ARIMA(2, 1, 2). Azam and Khan (2010) examined the consequence of sugarcane in Khyber Pakhtunkhwa, the province of Pakistan and presumed the outcomes which show that the aggregate of the flexibilities was higher than the solidarity, it implies that the horticultural area was creating in the stage of expanding come back to scale, which demonstrates that the distribution of contributions to this area isn't the best.

Krishna and Priya (2011) conduct an investigation of pre-harvest of sugarcane yield predicting by utilizing climatic factors in India, they were building up an estimated model utilizing climate variable as a regressor and determine that the predicted model able to clarify 87% variation in the sugarcane yield before two months reap. Ali et al. (2015) forecasted the production and yield of sugarcane for Pakistan by utilizing ARIMA models. They utilized data for a time of 1948 to 2012, productions and yield were anticipated for a long time beginning from 2013 to 2030.

Vishawajith et al. (2016) attempted to estimate area, production, productivity and sugar production of India, in addition to the main sugarcane developing states of India through the fitting of univariate ARIMA models. The authors used time series data collected from 1950 to 2012 on sugarcane area, production, productivity and sugar production for the study. For the sugarcane area, ARIMA(3, 1, 3) and for sugarcane production, productivity and sugar production in India, ARIMA(2, 1, 1) models found as suitable. These models were used to forecast values for subsequent years. Vishwajith et al. (2018) used ARIMA models for arhar production in India.

Shah et al. (2017) conduct a study to forecast the substantial food crop production in Khyber Pakhtunkhwa, Pakistan; the secondary data were utilized by applying ARIMA forecasting strategy. They found that the outcome of the ARIMA model was sufficient. Mehmood et al. (2019) utilized historical data of sugarcane production from Pakistan Bureau of Statistics (PBS) and used to predict the sugarcane production for the years 2018 to 2030, via Box and Jenkins 1970) approach. ARIMA(2, 1, 1) model was identified by authors to forecast the Sugarcane crop production from the years 2019 to 2030, which also showed a significant increase, from 75,394 to 86,792 ton. In this current study, the time series data used from 1950 to 2018.

Material and Methods

For the current study, five major state-wise productions of sugarcane viz Andhra Pradesh, Karnataka, Maharashtra, Tamil Nadu and Uttar Pradesh, along with whole India are considered.

Data associated with sugarcane production in five major states could be acquired for the era 1950–2018 (Agriculture at Glance 2020). To develop the best model for forecast the coming year’s series, data for the era 1950–2015 have been utilized for model structure, and years 2016–2018 are utilized for model validation.

To define patterns and general trends in data, descriptive statistics are helpful. It contains numerical and graphical procedures to summarize data set in a reasonable and understandable manner. To look at every series nature, these have been exposed to diverse descriptive measures. Some important statistical measures are used to explain the series such as mean, standard error, skewness, kurtosis, minimum and maximum.

The ARIMA was chosen as the best model for forecasting the sugarcane production for some models-selection criteria. These criteria given as follows: highest adjusted R2, lowest values of Akaike information criterion, Schwarz criterion, root.mean squared error, mean.absolute error, mean.absolute percentage error and Theil.inequality coefficient (Mishra et al. 2021) (Table 1).

Table 1 Model accuracy metrics

When the model satisfies the conditions, it is used for forecasting purpose. With the help of R software, ARIMA models were estimated for sugarcane production. Based on the best model, forecasting has been made up to 2025.

Autoregressive Model

The ARIMA model is a generality of the ARMA model in time series analyses. Integrated means the trend has been removed, and no significant trend means the model is the ARMA model. In this study, the ARIMA(p, d, q) model is considered. It has three sections. AR (p) some portion of the model shows the autoregressive procedure of order p written as:

$$X_{t} = \alpha_{0} + \mathop \sum \limits_{i = 1}^{p} \alpha_{i} X_{t - i} + \mu_{t}$$

where \(X_{t}\) = production of sugarcane at t time, \(X_{t - 1} , X_{t - 2} , \ldots , X_{t - p}\) is the production of sugarcane at lags t − 1, t − 2, …, tp time, \(\alpha_{0}\) is a constant, \(\alpha_{1} ,\alpha_{2} , \ldots ,\alpha_{p}\) are parameters of the model, and \(\mu_{t}\) is an error term, i.e., \(\mu_{t} \sim N(0,\sigma^{2} )\) respectively.

In the second section of the model, d implies contrast, for example, on non-stationary data, at that point, it’s changed into stationary as it is the primary rule for creating the ARIMA model. The notation MA (q) refers to the moving average series of order q:

$$X_{t} = \mu + \mathop \sum \limits_{i = 1}^{q} \theta_{i} \varepsilon_{t - i} + \varepsilon_{t}$$

where \(\theta_{1} , \theta_{2} , \ldots , \theta_{q}\) are the parameters of the model, μ is the expectation of \(X_{t}\) (often assumed to equal zero), and \(\varepsilon_{t}\) is white noise error term at t time, \(\varepsilon_{t - 1} , \varepsilon_{t - 2} , \ldots , \varepsilon_{t - q}\) is errors in preceding time frames that are fused in the response \(X_{t}\).

ARMA Model

A time series, Xt is an ARMA(p, q) model, if Xt is stationary, for all t:

$$X_{t} - \vartheta_{1} X_{t - 1} - \cdots - \vartheta_{p} X_{t - p} = Z_{t} + \theta_{1} Z_{t - 1} + \cdots + \theta_{q} Z_{t - q} \quad {\text{where}},\;Z_{t} \sim N(0,\sigma^{2} )$$

and polynomials have no common factor \(\left( {1 - \vartheta_{1} Z - \cdots - \vartheta_{p} Z^{p} ) \;{\text{and}}\;(1 + \theta_{1} Z + \cdots + \theta_{q} Z^{q} } \right),\) where p and q are respectively the AR and MA terms (Mishra et al. 2020).

ARIMA Model

In time series \(X_{t}\) is an ARIMA(p, d, q) model, if \(Y_{t}\) = \(\left( {1 - K} \right)^{d} X_{t}\) is a ARMA (p, q) procedure. It means if \(X_{t}\) satisfies \(\vartheta^{*}\)(K)\(X_{t}\) = ϑ\(\left( K \right)\left( {1 - K} \right)^{d} X_{t}\) = θ\(\left( K \right)Z_{t}\), where \(Z_{t} \sim N(0,\sigma^{2} )\),\(\vartheta \left( Z \right){\text{ and }}\theta \left( Z \right)\) are polynomials of p and q degree.\(\vartheta \left( Z \right) \ne 0\) for \(\left| Z \right| \le 1.\) At z = 1 polynomial, \(\vartheta^{*} \left( Z \right)\) has zero of order d. The process \(X_{t}\) is stationary. If d = 0 and in this case, it decreases to ARMA(p, q) process. Given a lot of time series data, one can compute the mean, variance, ACF and PACF of the time series. This computation permits us to look at the assessed ACF and PACF, which gives although regarding the correlation between the perceptions, signifying the sub-group of models to be interested. The technique is finished by taking a look at the cutoffs in the ACF and PACF. At the recognizable proof stage, one would attempt to coordinate the evaluated ACF and PACF with the hypothetical ACF and PACF as a guide for speculative model determination; however, an ultimate choice is made once the model is assessed and analyzed. ARIMA has four major steps as model building and identification, estimation, model diagnostics and forecast. Firstly, tentative model parameters are identified through ACF (Auto Correlation Function) and PACF (Partial Auto Correlation Function), then the best coefficients for the model are determined through MSE,MAPE, etc., next steps involve is to forecast and finally validate and check the model performance by observing the residuals through Ljung Box test and ACF plot of residuals.

Results and Discussion

From Table 2, we find that in India, since 1950, the production of sugarcane has increased from 1953 to 2018, it has reached 444.11 million tones 410.42 million ton. The average yearly production of sugarcane is 19.9909 million ton per year. Kurtosis value (− 1.16) of production indicates the platykurtic nature followed by a positive value of skewness (0.25), which indicates continuous effort was there to increase the yield of sugarcane. State-wise figures show that in Andhra Pradesh, the production of sugarcane has increased 17,912 thousand tones during the period and varies from 3780 thousand tones to 21,692 thousand ton. Though Andhra Pradesh is the lowest sugarcane production ranks fifth with an average 11,345.48 thousand tones production per year. The production of sugarcane has increased 44,022 thousand tones during the period from 1890 thousand tones to 45,912 thousand tones in Karnataka. Maharashtra state production of sugarcane has increased 97,516 thousand tones during the period from 1084 thousand tones to 98,600 thousand ton. Although Maharashtra also highest sugar production, it rank second with an average production of 27,769 thousand tones per year. Tamil Nadu state sugarcane production is 39173 thousand tones during the period of 1951 thousand tones to 41,124 thousand ton. While the production of sugarcane in Uttar Pradesh has been increased by 15.3935 million tones during the period from 21,065 thousand ton to 17.5000 million ton. Uttar Pradesh ranks first with average sugar production of 73,181.69 thousand ton per year. The negative value of kurtosis for the major states as well as the whole India excepting Maharashtra clearly reveals that there has been a sustained effort in augmenting the per hectare yield of sugarcane in India.

Table 2 Descriptive statistics of sugarcane production (‘000 ton) in major states of India

Table 3 shows all model selection criteria results obtained using parametric trend models. After assessment of each and every trend series, we forecast the series for the coming years. For purpose of forecasting ARIMA(p, d, q) methodology, as discussed in material and methods section. Data for the period 1950–2015 were used for model building and as model validation data used for period 2016–2018. Best models are utilized to predict the series for the coming years. Different series are seen as fitted with various ARIMA(2, 1, 3), ARIMA(2, 1, 4), ARIMA(3, 2, 4), ARIMA(2, 2, 3), ARIMA(3, 1,3) and ARIMA(2, 1, 3) models individually. These models are seen as best fitted models for sugarcane production in India and chose states. Utilizing the models developed, forecast values are worked for ensuing years.

Table 3 Selecting the best model for forecasting for sugarcane production

In Table 4, ARIMA(2, 1, 3), ARIMA(2, 1, 4), ARIMA(3, 2, 4), ARIMA(2, 2, 3), ARIMA(3, 1, 3) and ARIMA(2, 1, 3) models were taken for a long time ahead and forecast for sugarcane production alongside 95% confidence interval values. For 2019 forecasts, the production of sugarcane in whole India was about 37.8760 million ton with lower and upper limits of (37.8557 to 37.8963) million tones correspondingly. A sugarcane production prediction for the year 2025 was 40.6468million ton with a range of 39.5662 to 41.7274million tones limits. The validity of the predicted values can be check when the information for the prime time frames become accessible.

Table 4 Forecasting for Sugarcane Production (‘000 ton) in India from 2019 to 2025

Similarly, forecasting figures as shown in Fig. 1 indicate that there will be an increase in the production of sugarcane in the whole of India and major states Uttar Pradesh, Maharashtra and Karnataka have been increased production while Tamil Nadu and Andhra Pradesh production of sugarcane decreases in years 2019 to 2025. Finally, forecasting of next five years is also computed for all the three different levels which are showing an increasing trend for sugarcane production. It is worth to mention that this forecasting approach his ideally suitable for short period of forecasting as forecast accuracy is used to decrease with increasing number of forecast horizon. This projection of the present study may provide a direct support in formulating national agricultural policy.

Fig. 1
figure 1figure 1

Forecasting for sugarcane production in India from 2019 to 2025. a Andhra Pradesh sugarcane production forecast. b Karnataka sugarcane production forecast. c Maharashtra sugarcane production forecast. d Tamil Nadu sugarcane production forecast. e Uttar Pradesh sugarcane production forecast. f India sugarcane production forecast

Conclusion

It may be concluded from the present study that different ARIMA models fitted better for sugarcane production in whole India and selected states, in terms of all assessment criterion like highest adjusted R2, lowest values of (AIC, Schwarz Criterion, Root.Mean.Square.Error, Mean.Absolute.Error, Mean.Absolute.Percentage.Error and Theil.Inequality.coefficient), along with highest significant coefficients. It is also concluded that sugarcane production will increase in future, and it would for the year 2025 to be 40.6468 million ton in whole of India. Similarly major states Uttar Pradesh, Maharashtra and Karnataka also have been increased production while Tamil Nadu and Andhra Pradesh production of sugarcane decreases in the years 2019 to 2025.