Introduction

Dairy products are part of daily life, and perception about them have evolved through time from a luxurious product accessible from the “elite” into a common product consumed by millions of people. One glass of milk can tremendously improve the nutritional levels of the children in the region of Asia (Siddiky 2015). One of the core dairy products is milk, which has grabbed the attention of governments trying to implement policies which could forecast its subproducts whilst enterprises are becoming dairy driven as the best way to make profitable margins as consumer’s preference are rising for high-quality milk. Hence, the manufactured dairy product output is estimated to grow 10% to INR 283,000 crore ($37.58 billion) during the current financial year (April 2020-March 2021,www.fao.org.) Currently, South Asian countries including China are leading milk producer (China sustain milk output growth in Asia and FAO 2020). India ranked first position in the world for milk production, which is accounted for 196.18 million tones (2019) and China ranked 5th position (FAO 2019).

In the European Union countries have the highest second level of milk production whilst Africa and Oceania have the lowest level of milk production in the world (Fig. 1).

Fig. 1
figure 1

Cow's milk production (Share per region)

European Union countries have the highest second level of milk production whilst Africa and Oceania have the lowest level of milk production in the world.

We show cow’s milk production as share per region indicating that in comparison to other countries, Asia counts the highest share. European Union countries have the highest second level of milk production whilst Africa and Oceania have the lowest level of milk production in the world.

In Table 1 we show total milk production in recent years revealing that world total milk production has increased. As the level of milk production has increased, trade has raised to 77.9 million tones in November 2020.

Table 1 World milk production1 (million tonnes)

There is vast existing literature on milk production, which focuses on a particular country or firm while the studies in SAARC region including China seem scant so far. Mainly, the SAARC region, most households rear livestock either as a mainstay and/or complementary to crop production (Ahuja and Staal 2012). Therefore, due to the importance of dairying, we try to estimate and forecast milk production to promote the commercialization of dairying in SAARC member countries (SAARC 2015). Forecasting of milk production is required so that necessary policy formations can be done (Mishra et al. 2020a, b; Deshmukh and Paramasivam2016). Lohano and Soomro (2006) have used a random walk model with a drift autoregressive model to forecast milk production in Pakistan. Schmit and Kaiser (2006) indicate that decline in retail per capita demand would persist but at a reduced rate from years past. In a similar approach to ours, (Akter and Rahman 2010) forecasts milk supply up to 3 years for a dairy cooperative in Murphy et al. (2014), Zhang et al. (2020) have conducted a study to identify the different modeling techniques for the prediction of total daily herd milk yield in Ireland and non-linear model especially for short-term milk-yield predictions. Li (2020) also studied the genome-wide association study of milk production using statistical models. Taye et al. (2020) have considered the trends of actual yield of cow milk production. They have forecasted the volume of milk in Andassa dairy farm in Ethiopia using ARIMA (1, 2, 1). Mishra et al. (2020a, b) used time series models in milk production and forecasted for 2020. Uddin et al. (2020) determine that Bangladesh will be self -sufficient in milk production 2029.

(Wood 1967; Ali and Schaeffer 1987; Wilmink 1987; Guo 1995) tried to fit a lactation curve to the data while (Ptak and Schaeffer 1993 and (Shallo et al. 2004) proved the nutrition of milk through genetic analysis and bio-economic modeling. Milk production is highly influenced from certain factors such as nutritional interventions (Kolver and Muller 1998), disease (Collard et al. 2000), seasonality of pasture production (Adediran et al. 2012), grazing conditions (Baudracco et al. 2012) or other factors such as (Olori et al. 1999a, b; Tekerli et al. 2000; Brun-Lafleur et al. 2010). Macciottaet al. (2002) and Vasconcelos et al. (2004) have used auto-regressive models to forecast lactation while (Sharma et al. 2006; and Sharma et al. 2007) have used large models such as multiple regression and artificial neural networks. Other studies have revealed the set of variables which could influence milk production such as season of calving (Wood 1967), climatic conditions (Smith 1968), number of DIM (Grzesiak et al. 2003) and stocking rate (McCarthy et al. 2011).

In general, there is much success in the production of dairy products in the developed countries than in developing countries such as South-Asian countries. Even though the government has implemented policies, the growth process has been low. Smallholders constitute a large portion of the dairy industry while privately owned and state-owned farms constitute the other portion. Lack of dairy animals with good generic merit, lack of good quality feed, limited knowledge of and skills of farmers, high cost of inputs and lack of good marketing are the main challenges that South-Asian countries are facing now. The ability to forecast milk production is important as it will affect energy consumption and farmer’s income. Predicting milk production is the best tool to adjust its supply. Hence, due to the importance of milk as dairy production and as South-Asian countries are leading the production we try to forecast milk production using ARIMA/GARCH models and Holt’s Linear Model (Oliveros 2019).

The results show ARIMA approach indicates that India would be the leading state in milk production with 91 MMT in the year 2024–2025 among South Asian countries. The second country ranked is Pakistan which milk production would reach 26 MMT in 2024–2025, China is the third country with 3MMT, while Bangladesh and Sri Lanka seem to be the countries with the lowest production of milk. Since the residuals of the fitted ARIMA models for China, India, Nepal, Pakistan, and Sri Lanka are having an absence of ARCH effects, we cannot estimate an ARCH/GARCH model. Hence, we proceed by fitting a GARCH model only for Bangladesh and Myanmar and the findings suggest that Bangladesh forecasts an abundance of milk production. In comparison to the ARIMA model, Holt’s linear model forecasts higher levels of milk production for the region. It indicates that India’s forecasted level will reach 105 MMT, Pakistan 58 MMT and China 4 MMT in the year 2024–2025. We compare the mean absolute percentage error (MAPE) between ARIMA and Holt’s models and the findings suggest that ARIMA model shows higher errors. The only exception is China, Nepal and Pakistan which errors are higher using Holt’s model (Fig. 2).

Fig. 2
figure 2

Milk production forecasting

Material and methods

The main approaches to the research problem with their methodologies are discussed here.

Data collection

Milk production data of SAARC countries and China were collated separately. The milk production data are in tons. The data set contains annual data from 1961 to 2018 (www.fao.org.in). The data sets were divided into two parts as 80% and 20% for the model building and model validation, respectively. The statistical packages used for model building are R and E-views software.

ARIMA model

ARMA (p,q) model where p is the order of the autoregressive part and q is the order of the moving average part (as defined below).

Autoregressive model

The notation AR (p) refers to the autoregressive model of order p. The AR(p) model is written Eq. 1

$$ X_{t} = c + \sum\limits_{i = 1}^{P} {\rho_{i} } X_{t} + \varepsilon_{t} $$
(1)

where \(\rho_{1} ,\rho_{2} .....\rho_{p}\) are the parameters of the model, c is a constant and \(\varepsilon_{t}^{{}}\) is white noise. Sometimes the constant term is avoided.

Moving average model

The notation MA (q) refers to the moving average series of order q:

$$ X_{t} = \mu + \varepsilon_{t} + \sum\limits_{i = 1}^{q} {\theta_{i} } \varepsilon_{t - i} $$
(2)

where the θ1, …, θq are the parameters of the model, μ is the expectation of Xt (often assumed to equal 0), and the \(\varepsilon_{t} ,\varepsilon_{t - 1}\)

Stationary time series can be modelled with ARIMA models. The non-seasonal ARIMA model can be written as in Eq. 3.

$$ z`_{t} = c + \emptyset_{1} z`_{t - 1} + \emptyset_{2} z`_{t - 2} + \ldots + \emptyset_{p} z`_{t - p} + e_{t} + \theta_{1} \varepsilon_{t - 1} + \theta_{2} e_{t - 2} + \ldots + \theta_{p} e_{t - p} $$
(3)

where \(z`_{t}\) is the differenced series. The “predictors” on the right-hand side include both lagged values of \(z_{t}\) and lagged errors. This is defined as the ARIMA (p, d, q) model where p, d and q, respectively, represent the order of the autoregressive part, the degree of the differencing involved and the order of the moving average part. ARIMA has four major steps as model building and identification, estimation, model diagnostics and forecast. Firstly, tentative model parameters are identified through ACF (Auto Correlation Function) and PACF (Partial Auto Correlation Function), then the best coefficients for the model are determined through MSE, MAPE etc. next steps involve is to forecast and finally validate and check the model performance by observing the residuals through Ljung Box test and ACF plot of residuals.

Holt’s linear trend method

Holt’s Linear Trend Method is an extension of the simple exponential smoothing and allows forecasting data with a trend. This method involves a forecast equation and two smoothing equations: one for the level and one for the trend given by Eq. 4, Eq. 5 and Eq. 6, respectively (Holt 1957).

$$ {\text{Forecast Equation}} \hat{z}_{t + h|t} = k_{t} + hd_{t} $$
(4)
$$ {\text{Level Equation }} k_{t} = \rho z_{t} + \left( {1 - \rho } \right)\left( {k_{t - 1} + d_{t - 1} } \right) $$
(5)
$$ {\text{Trend Equation}} d_{t} = \sigma^{*} \left( {k_{t} - k_{t - 1} } \right) + \left( {1 - \sigma^{*} } \right)d_{t - 1} $$
(6)

where \(k_{t}\) denotes an estimate of the level of the series at time \(t\), \(d_{t}\) denotes an estimate of the trend (slope) of the series at time \(t\)\(\rho\) is the smoothing parameter for the level, \(0{ } \le \rho \le 1\), and \(\sigma^{*}\) is the smoothing parameter for the trend, \(0{ } \le \sigma^{*} \le 1\).

Generalized autoregressive conditionally heteroscedastic (GARCH) process

The generalized autoregressive conditional heteroscedasticity (GARCH) model describes the error variance of a model Bollerslev (1986).

$$ \begin{gathered} h_{t} = \alpha_{0} + \alpha_{1} \varepsilon_{t - 1}^{2} + ... + \alpha_{q} \varepsilon_{t - q}^{2} + \beta_{1} h_{t - 1} + ... + \beta_{p} h_{t - p} \hfill \\ \hfill \\ \end{gathered} $$
$$ h_{t} = a_{0} + \sum\limits_{i = 1}^{q} {a_{i} } \rho_{t - 1}^{2} + \sum\limits_{j = 1}^{p} {b_{j} } h_{t - j} $$
(8)

A sufficient condition for the conditional variance to be positive is

$$ a_{0} > 0,a_{i} \ge o,i = 1,2,...,q;b_{j} \ge 0,j = 1,2,...,p $$

The GARCH model is equivalent to an infinite ARCH model. In that case, the GARCH (p, q) model, where p is the order of the GARCH terms \(\rho^{2}\) and q is the order of the ARCH terms \(e^{2}\) is shown in Equation 0.9

$$ \rho_{t}^{2} = \theta_{0} + \alpha_{1} e_{t - 1}^{2} + \cdots + \theta_{q} e_{t - q}^{2} + \omega_{1} \rho_{t - 1}^{2} + \cdots + \omega_{p} \rho_{t - p}^{2} $$
(9)

Results and discussion

Some descriptive statistics such as mean, maximum, minimum, standard deviation, skewness, and kurtosis are given in Table 2. When Table 2 is examined, India's produced approximately three times the milk of Pakistan, the closest competitor, between 1961 and 2018. Bangladesh had the lowest mean milk production among the studied seven countries. From Table 2 anyone can see this; during the period study under investigation, India has a tremendous growth of 422.33%. Myanmar reached 193,841 tonnes in 2018, with 560.41 percent. In all counties taken in the study is positive skewness, which indicates that milk production increased from 1961 to 2018. Except the Myanmar, other counties found negative kurtosis in milk production indicating steadiness in production.

Table2 Descriptive statistics of milk production data (Tonnes)

After seeing the nature through descriptive statistics next steps is validated and forecast the milk production time series. For projection purpose different time series models used ARIMA,GARCH and Holt’s winter model and compared. ARIMA model selections for seven c ountries obtained by making use of some goodness of fit criteria such as Akaike information criterion (AIC), Bayesian information criterion (BIC), and bias-corrected Akaike information criterion (AIC), and the results are given in Table 3. In Table 2, it is also shown Holt’s model results.

Table 3 ARIMAModels fitted and Holt’s linear model for milk production time series over the period (1961–2018)

The best models of India, China, and Myanmar are selected ARIMA (1,2,1) for milk production data over the period of 1961 to 2018. ARIMA (0,1,0) model is also specified for Sri Lanka and Bangladesh. ARIMA (1,2,2) and ARIMA(1,2,0) models are determined, respectively, by Nepal and Pakistan.

Milk production from different counties in time series of the ARIMA model equation is given except for Bangladesh and Sri Lanka:

$$ {\mathbf{Z}}_{{\mathbf{t}}} = 2*{\mathbf{Z}}_{{{\mathbf{t}} - 1}} - {\mathbf{Z}}_{{{\mathbf{t}} - 2}} + {{\varvec{\upvarepsilon}}}_{{\mathbf{t}}} , {\mathbb{E}}\left( {{{\varvec{\upvarepsilon}}}_{{\mathbf{t}}} } \right) = 0 $$

For Sri Lanka and Bangladesh only first differencing is required for making data stationary. For Bangladesh and Sri Lanka milk production ARIMA model is equation.

$$ {\mathbf{Z}}_{{\mathbf{t}}} = 1*{\mathbf{Z}}_{{{\mathbf{t}} - 1}} + {{\varvec{\upvarepsilon}}}_{{\mathbf{t}}} , {\mathbb{E}}\left( {{{\varvec{\upvarepsilon}}}_{{\mathbf{t}}} } \right) = 0 $$

ARIMA-GARCH models fitting for milk production data are given in Table 4. Because the residuals of the ARIMA models of China, India, Nepal, Pakistan, and Sri Lanka do not indicate the ARCH effect, these countries’ residuals cannot be modeled by the GARCH models. These results were obtained using the ARCH test given in the third column of Table 3. GARCH (1,1) model is also specified for Bangladesh and Myanmar. Milk production data is using fitted models between 1961 and 2007.

Table 4 ARIMA-GARCH models fitting for milk production time series over the period (1961–2018)

While the part of milk production data between 1961 and 2007 was used for modeling, the part between 2008 and 2018 was used to test the model validity. Model validation results for the ARIMA-GARCH models given in Table 5 between 2008 and 2018 for the milk production data. From Table 5 it is observed that the actual values of the milk productions are very close to the point forecasted milk productions in both Bangladesh and Myanmar. The comparison of ARIMA and ARIMA-GARCH models is given in Table 6. The lowest values of the RMSE, MAE, and MAPE are shown the best model. The model with the lowest values of RMSE, MAE, and MAPE shows the best model. From Table 6, because ARIMA(0,1,0)-GARCH(1,1) and ARIMA(1,2,1)-GARCH(1,1) has the lowest value for the RMSE, MAE, and MAPE, these models selected in the best models for Bangladesh and Myanmar, respectively.

Table 5 Milk production forecasting and model validation using ARIMA-GARCH models (PF: point forecast)
Table 6 Comparison of ARIMA and ARIMA-GARCH models

The best models for modeling and forecasting milk production for seven countries are also given in Table 7. For Sri Lanka and Myanmar GARCH (1,1) is betted in milk production and equation is

$$ \varepsilon_{t}^{2} = a_{0} + \sum\limits_{i = 1}^{Max(p,q)} {\left( {a_{i} + b_{j} } \right)\varepsilon_{t - i}^{2} } + \eta_{t} + \sum\limits_{j = 1}^{p} {b_{j} \eta_{t - j} } $$

Thus a GARCH model can be regarded as an extension of the ARMA approach to squared series \(\left\{ {\varepsilon_{t}^{2} } \right\}\).Parameter estimates for the exponential growth model using Holt’s methods are given in Table 8. The point forecasting (PF), the lower bound (Lo), and higher bound (Hi) for α = 0.2 and α = 0.2 are presented in Table 9 for the milk production using Holt’s linear models trend from 2019 to 2025. From Table 9, it is concluded that the upward milk production trend in India and Pakistan will continue. It is expected to exceed 100 million metric tons (MTT) milk productions in 2025 in India. It is also expected to exceed 55 million tone milk productions in 2025 in Pakistan. While milk productions in China, India, and Pakistan will be expected to increase significantly, in Nepal, Sri Lanka, and Myanmar will be expected to increase more slowly over the years. It will also be expected to decrease milk productions in Bangladesh over the years.

Table 7 Best time series models selected for modeling and forecasting milk production
Table 8 Holt's linear trend models fitted for milk production time series over the period (1961–2018)
Table 9 Milk production forecasting using Holt's linear trend (PF: point forecast)

From Tables 10, 11, and 12, we find that a model in Holt's Linear model achieves the lowest MAPE in China, India, Nepal, Pakistan, Sri Lanka, and Bangladesh, and thus a Holt's Linear Model is the best in Forecasting production in these countries as well. Anyone can find that a model in Myanma that GARCH model is better than ARIMA and when we compare MAPE Myanmain Holt's Linear model and GARCH model, we find that GARCH model more accurate than Holt's Linear model and achieve low MAPE in GARCH model.

Table 10 MAPE ARIMA Model
Table 11 MAPE Holt's Linear Trend Model
Table 12 MAPE milk production forecasting and model validation using ARIMA-GARCH models (PF: point forecast)

The dairy sector is an important activity in the agriculture sector. Milk production plays a crucial role in development. The dairy sector: data were analyzed in the following seven countries, China, India, Nepal, Pakistan, Sri Lanka, and Bangladesh during the study period. For all the milk production data, we expect China there will be an increase in milk production during the coming period, while India we expect an increase in milk production in the coming period, and by 2024, dairy production in India will exceed 100 million tons annually and will have a good impact on the rest of the sectors in India. Of the dairy production in India during the coming period, and we expect Nepal, there will be an increase in milk production. The annual increase in milk production in Nepal will be a slight increase in the annual production rate.

For Pakistan, we also expect more annual production for the amount of milk production. Also, there will be a slight increase in the rate of production. Annual for Albanians in Pakistan. We expect Sri Lanka that there will also be an increase in the amount of dairy production during the coming period, but there will be a decrease in the annual production rate of milk, thus it will have a negative impact in Sri Lanka in the sectors related to dairy production. Therefore, attention must be paid to the dairy production sector in Sri Lanka to prevent further losses in The period is the leader in the sectors related to dairy. In Bangladesh, we expect that there will be stability in the amount of dairy products in the coming period. We expect in Myanmar increases in dairy production, but there will be a difference in growth rates. It will witness a decrease and increase and an increase in the growth rates of milk production.

However, lower growth rates are expected in 2025 compared to the previous period. To increase milk production, you need to provide the animals with good fodder and proper health care. This projection helps strategize to meet our future milk demand. To increase the need for milk production to educate dairy owners and farmers about the animal breeding program and health care practices.

Conclusions

This paper uses annual data from 1961 to 2018 to forecast milk production in South-Asian countries using an Auto-Regressive Integrated Moving average model (ARIMA) model, a Generalized Autoregressive Heteroskedastic (ARCH-GARCH) model and then Holt’s Linear Trend. The findings employing the ARIMA approach show that India would be the leading state in milk production with 91 MMT in the year 2024–2025 among South Asian countries. The second country ranked is Pakistan which milk production would reach 26 MMT in 2024–2025, China is the third country with 3MMT, while Bangladesh and Sri Lanka seem to be the countries with the lowest production of milk. Since the residuals of the fitted ARIMA models for China, India, Nepal, Pakistan, and Sri Lanka are having absence of ARCH effects, we proceed by fitting a GARCH model only for Bangladesh and Myanmar. GARCH model for Bangladesh forecasts an abundance of milk production. In comparison to the ARIMA model, Holt’s linear model forecasts higher levels of milk production for the region. It indicates that India’s forecasted level will reach 105 MMT, Pakistan 58 MMT and China 4 MMT in the year 2024–2025. We compare the mean absolute percentage error (MAPE) between ARIMA and Holt’s models and the findings suggest that ARIMA model shows higher errors. The only exception is China, Nepal and Pakistan which errors are higher using Holt’s model. This study has policy implications, as it can be used by policymakers in the national agriculture sector to forecast milk production and other dairy productions.

The limit of the study

In this paper, we use annual data to forecast milk production in South Asian countries using autoregressive models. As a matter of fact, autoregressive models are used with high-frequency data, and the usage of annual data instead of quarterly or monthly data can reduce the robustness of our results. Another limitation is related to the models' properties; we use ARIMA models with different lags, while the autoregressive models are sensitive to the number of lags. Instead, GARCH models are the benchmark among the autoregressive models; the coefficients are restricted to be positive, and by imposing artificial restrictions, it makes the model less reliable and far from reality. Hence, the researcher should be careful while using autoregressive models as fitting an ARIMA or GARCH models is more an “art than of science”.