Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Accurately forecasting intermittent demand is important for manufacturers, transport businesses, and retailers [3] because of the diversification of consumer preferences and the consequent small production lots of the highly diversified products. There are many models for forecasting intermittent demand. Croston’s model [2] is one of the most popular and has many variant models, including log-Croston and modified Croston. However, Croston’s model has an inconsistency in its assumptions as pointed out by Shenstone and Hyndman [9]. Further, Croston’s model generally needs round-up approximation on the inter-arrival time to estimate the parameters from discrete time-series data.

We employ non-Gaussian nonlinear state-space models to forecast intermittent demand. Specifically, we employ a mixture of zero and Poisson distributions because the occurrence of an intermittent phenomenon generally implies low average demand. As in DECOMP [5, 6], time series are broken down into trend, seasonal, auto-regression, and external terms in our model. Therefore, we cannot obtain parameters via ordinal maximum likelihood estimators because the number of parameters exceeds the number of data items owing to non-stationary assumptions on the parameters. Therefore, we adopt the Bayesian framework, which is similar to DECOMP. We employ a particle filter [7] for our filtering method instead of the Kalman filter in DECOMP because of the non-Gaussianness of the system, and the observation noises and nonlinearity in these models. To show the superiority of our method to other typical intermittent demand forecasting methods, we conduct a comparison analysis using actual data for a grocery store.

2 Model

2.1 Mixture Distribution and Components

Let the observation of a time series for discrete product demand be \(y_n~(n=1,2,\ldots , N)\). We assume that demand for a product at arbitrary time step n follows a mixture distribution, considering the non-negativity of product demand. We do not need to conduct any approximating operations as in Croston’s model. This mixture distribution is composed of a discrete probability distribution with a value of 0 with weight \(w_n\) and a Poisson distribution that has parameter \(\lambda _n\) with weight \(1-w_n\):

$$\begin{aligned} y_n&\sim w_n \cdot 0 +(1-w_n)y_n^{\prime },\end{aligned}$$
(1)
$$\begin{aligned} y_n^{\prime }&\sim \frac{\mathrm {e}^{\lambda _n} \lambda ^y_n}{y!}. \end{aligned}$$
(2)

From the expectation property of the Poisson distribution, the expected value of the mixture distribution becomes \((1-w_n)\lambda _n\).

Now assume that parameter \(\lambda _n\) has trend component \(t_n\), seasonal component \(s_n\), steady component \(d_n\), and external component \(e_n\):

$$\begin{aligned} \lambda _n&= \exp (t_n + s_n+d_n+e_n). \end{aligned}$$
(3)

Specifically, the fluctuations in each component are as follows:

$$\begin{aligned} \varDelta ^k w_n&= v_{0,n},\end{aligned}$$
(4)
$$\begin{aligned} \varDelta ^l t_n&= v_{1,n},\end{aligned}$$
(5)
$$\begin{aligned} \varDelta ^m_q s_n&= v_{2,n},\end{aligned}$$
(6)
$$\begin{aligned} d_n&= \sum _{i=1}^I a_i d_{n-i} +v_{3,n},\end{aligned}$$
(7)
$$\begin{aligned} e_n&= \sum _{j=1}^{J} \left( \gamma _{j} c_{k,n}+v_{e,j,n} \right) . \end{aligned}$$
(8)

Here, \(\varDelta _q^m\) indicates the difference between cycle q and degree m in the trend term. k and l are the degrees of differences in the weight and the seasonal component, respectively. I is the auto-regression order and J is the number of external variables. \(a_i\) is the jth auto-regression coefficient. \(v_{0,n}\), \(v_{1,n}\), \(v_{2,n}\), \(v_{3,n}\) and \(v_{e,j,n}\) are the noise terms for the components, and they follow Gaussian distributions:

$$\begin{aligned} v_{i,n}&\sim N(0, \tau _i^2)~~~\forall ~i=0,1,\dots ,3,\end{aligned}$$
(9)
$$\begin{aligned} v_{e,j,n}&\sim N(0,\tau _{e,j}^2), \end{aligned}$$
(10)

where \(N(0,\sigma ^2)\) is a Gaussian distribution with mean 0 and variance \(\sigma ^2\). In the seasonal component, we can employ multiple components simultaneously. However, the introduction of plural components often leads to mistakes in practice. Therefore, we employ singular components in the seasonal component. The external component corresponds to variables and parameters such as price and promotion variables. In addition, we utilize the external component to consider the holiday effect via dummy variables.

2.2 State-Space Expression

It is meaningless to estimate the time-varying parameter \(\lambda _n\) via ordinary maximum likelihood estimators. In the simplest setting, the number of unknown variables \(\lambda _n\) equals the number of data items \(y_n\). Furthermore, we cannot estimate the parameters in our settings via ordinary maximum likelihood estimators because the number of unknown variables \(t_n\), \(s_n\), \(d_n\), and \(e_n\) exceeds the number of data items \(y_n\).

We introduce the state-space expression to resolve the above formulation. Let the model be expressed as a state-space model. When \(k=2\), \(l=2\), \(m=1\), \(q=7\), \(I=2\), and \(J=1\), we can write the state vector as

$$\begin{aligned} {\varvec{x}}_n=[w_n,w_{n-1},t_n,t_{n-1},s_n,\ldots ,s_{n-5},d_n,d_{n-1},e_n]^T. \end{aligned}$$
(11)

Therefore, the system and observation models are described as

$$\begin{aligned}&{\textsf {[System Model]}} \,{\varvec{x}}_n = {\varvec{F}}_n ({\varvec{x}}_{n-1}) + {\varvec{G}}_n {\varvec{v}}_n, \end{aligned}$$
(12)
$$\begin{aligned}&\quad \qquad {\varvec{F}}_n = \left[ \begin{array}{ll|ll|llll|ll|l} 2 &{} -1 &{} 0 &{} 0 &{} 0 &{} 0 &{} \ldots &{} 0 &{} 0 &{} 0 &{} 0\\ 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} \ldots &{} 0 &{} 0 &{} 0 &{} 0\\ \hline 0 &{} 0 &{} 2 &{} -1 &{} 0 &{} 0&{} \ldots &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{}1&{} 0 &{} 0 &{} 0 &{} \ldots &{} 0 &{} 0 &{} 0 &{} 0\\ \hline 0 &{} 0 &{} 0 &{} 0 &{} -1 &{} -1 &{} \ldots &{} -1 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 1&{} 0 &{} \ldots &{} 0 &{} 0 &{} 0 &{} 0\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} &{} \ddots &{} \ddots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} &{} 1 &{} 0 &{} 0 &{} 0 &{} 0\\ \hline 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} \ldots &{} 0 &{} a_1 &{} a_2 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} \ldots &{} 0 &{} 1 &{} 0 &{} 0\\ \hline 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} \ldots &{} 0 &{} 0 &{} 0 &{}1 \\ \end{array} \right] , {\varvec{G}}_n = \left[ \begin{array}{lllll} 1 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0\\ \hline 0 &{} 1 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0\\ \hline 0 &{} 0 &{} 1 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0\\ \vdots &{} \vdots &{}\vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} 0 &{} 0\\ \hline 0 &{} 0 &{} 0 &{} 1 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0\\ \hline 0 &{} 0 &{} 0 &{} 0 &{} 1\\ \end{array} \right] , \end{aligned}$$
(13)
$$\begin{aligned}&{\textsf {[Observation Model]}}\quad y_n \sim \text {Zero--inflated Poisson}(\cdot | {\varvec{x}}_n), \end{aligned}$$
(14)

where \({\varvec{v}}_n\) is an independent and identically distributed noise term vector corresponding to \(v_{0,n}\), \(v_{1,n}\), \(v_{2,n}\), \(v_{3,n}\), and \(v_{e,j,n}\). The observation model is not linear, and therefore, we cannot employ a Kalman filter and have to go with a particle filter.

2.3 Parameter Estimation

In the above setting, the elements in \({\varvec{F}}_n\) (with the exception of \(a_i\)) are given; however, we need to estimate the other (hyper) parameters, \(\tau _i\), \(\tau _{e,j}\), and \(a_i\). Let \(R(y_n|{\varvec{x}}_n)\) be the likelihood at arbitrary time n; then, the likelihood with all data \((y_1,\ldots ,y_N)\) is given by

$$\begin{aligned} L = \prod _{n=1}^{N}R(y_n|{\varvec{x}}_n). \end{aligned}$$
(15)

We estimate the parameters by maximizing Eq. (15).

We employ a grid search algorithm to maximize Eq. (15), because of the existence of Monte Carlo errors in calculating the likelihood via particle filters, which varies in each trial. Therefore, we cannot employ gradient methods such as the Newton method. Within the particle filter, we use residual resampling [8] and sequential importance sampling [4] to update the particles.

3 Comparison Analysis

3.1 Analyzing Data, and Models for Comparison

To show the superiority of our method, we conduct a comparison analysis of our method and typical intermittent demand forecasting and other relevant methods, including Croston, log-Croston [10], and DECOMP. The estimation methods used in the Croston and log-Croston methods are those shown in Syntetos and Boylan [11]. The smoothing parameter in the Croston model is set as \(\alpha =0.5\). The data analyzed here comprise fifty days of daily retail data for four SKU-level products in a Japanese grocery store. Further details of the data are shown in Table 1. #1 and #4 have relative higher intermittent demand than #2 and #3. Owing to differences in the estimation schemes, we compare forecast accuracy among these models by root mean squares (RMS).

Table 1 Data details
Table 2 Main results of the comparison analysis

To shorten the calculation time, the number of particles in each time step is fixed as 10, 000 in this paper. The setting of the degrees and cycles in our model and DECOMP are \(k=2\), \(l=2\), \(m=1\), \(q=7\), \(I=2\), and \(J=1\) (the external variable is the daily price for an objective product, which is not used in DECOMP). In the grid search, each hyperparameter of the error term has five nodes (\(v_i=0.003125, 0.00625,\ldots , 0.05\)) and each auto-regression coefficient has 10 nodes (\(a_i=-1.0,-0.8,\ldots ,1.0\)).

3.2 Results

Table 2 shows the results for each data set. The overall RMS for our model is less than those of the other three models. For #1, the RMS for our model is the lowest. Thus, our method is superior to the other three models. For #4, our method is superior to Croston and log-Croston, but inferior to DECOMP. However, DECOMP predicts negative demand that never happens (five-ahead forecast). Therefore, our method can be concluded to be superior to the other models in highly intermittent demand situations.

In contrast to the highly intermittent demand situation, we cannot show substantial superiority of our model to the other three models for #2 and #3. It is conceivable that the degree of non-Gaussianness in the data influences these differences. If the data have high Gaussianness, Croston and DECOMP are suitable. On the other hand, log-Croston and our model are suitable if the data have low demand (namely low Gaussianness).

4 Conclusions

This paper proposed a method to forecast intermittent demand with non-Gaussian nonlinear state-space models using a particle filter. To show the superiority of our method to other typical intermittent demand forecasting methods, we conducted a comparison analysis using actual data for a grocery store. The results of this comparison analysis show the superiority of our method to the Croston, log-Croston, and DECOMP models in highly intermittent demand cases. In the furture, we intend to shorten the calculation time, and the MCMC filter [1] is a promising method by which to overcome the problem.