Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

We have focused until now on the construction of time series models for stationary and nonstationary series and the determination, assuming the appropriateness of these models, of minimum mean squared error predictors. If the observed series had in fact been generated by the fitted model, this procedure would give minimum mean squared error forecasts. In this chapter we discuss three forecasting techniques that have less emphasis on the explicit construction of a model for the data. Each of the three selects, from a limited class of algorithms, the one that is optimal according to specified criteria.

The three techniques have been found in practice to be effective on wide ranges of real data sets (for example, the economic time series used in the forecasting competition described by Makridakis et al. 1984).

The ARAR algorithm described in Section 10.1 is an adaptation of the ARARMA algorithm (Newton and Parzen 1984; Parzen 1982) in which the idea is to apply automatically selected “memory-shortening” transformations (if necessary) to the data and then to fit an ARMA model to the transformed series. The ARAR algorithm we describe is a version of this in which the ARMA fitting step is replaced by the fitting of a subset AR model to the transformed data.

The Holt–Winters (HW) algorithm described in Section 10.2 uses a set of simple recursions that generalize the exponential smoothing recursions of Section 1.5.1 to generate forecasts of series containing a locally linear trend.

The Holt–Winters seasonal (HWS) algorithm extends the HW algorithm to handle data in which there are both trend and seasonal variation of known period. It is described in Section 10.3.

Each of these three algorithms can be applied to specific data sets with the aid of the ITSM options Forecasting>ARAR, Forecasting>Holt-Winters and Forecasting>Seasonal Holt-Winters.

10.1 The ARAR Algorithm

10.1.1 Memory Shortening

Given a data set {Y t , t = 1, 2, , n}, the first step is to decide whether the underlying process is “long-memory,” and if so to apply a memory-shortening transformation before attempting to fit an autoregressive model. The differencing operations permitted under the option Transform of ITSM are examples of memory-shortening transformations; however, the ones used by the option Forecasting>ARAR selects are members of a more general class. There are two types allowed:

$$\displaystyle{ \tilde{Y }_{t} = Y _{t} -\hat{\phi }\left (\hat{\tau }\right )Y _{t-\hat{\tau }} }$$
(10.1.1)

and

$$\displaystyle{ \tilde{Y }_{t} = Y _{t} -\hat{\phi }_{1}Y _{t-1} -\hat{\phi }_{2}Y _{t-2}. }$$
(10.1.2)

With the aid of the five-step algorithm described below, we classify {Y t } and take one of the following three courses of action:

  • L. Declare {Y t } to be long-memory and form \(\big\{\tilde{Y }_{t}\big\}\) using (10.1.1).

  • M. Declare {Y t } to be moderately long-memory and form \(\big\{\tilde{Y }_{t}\big\}\) using (10.1.2).

  • S. Declare {Y t } to be short-memory.

If the alternative L or M is chosen, then the transformed series \(\big\{\tilde{Y }_{t}\big\}\) is again checked. If it is found to be long-memory or moderately long-memory, then a further transformation is performed. The process continues until the transformed series is classified as short-memory. At most three memory-shortening transformations are performed, but it is very rare to require more than two. The algorithm for deciding among L, M, and S can be described as follows:

  1. 1.

    For each τ = 1, 2, , 15, we find the value \(\hat{\phi }(\tau )\) of ϕ that minimizes

    $$\displaystyle{\mathrm{ERR}(\phi,\tau ) ={ \sum _{t=\tau +1}^{n}[Y _{t} -\phi Y _{t-\tau }]^{2} \over \sum _{t=\tau +1}^{n}Y _{t}^{2}}.}$$

    We then define

    $$\displaystyle{\mathrm{Err}(\tau ) = \mathrm{ERR}{\bigl (\hat{\phi }(\tau ),\tau \bigr )}}$$

    and choose the lag \(\hat{\tau }\) to be the value of τ that minimizes Err(τ).

  2. 2.

    If \(\mathrm{Err}{\bigl (\hat{\tau }\bigr )} \leq 8/n\), go to L.

  3. 3.

    If \(\hat{\phi }{\bigl (\hat{\tau }\bigr )}\geq 0.93\) and \(\hat{\tau }> 2\), go to L.

  4. 4.

    If \(\hat{\phi }{\bigl (\hat{\tau }\bigr )}\geq 0.93\) and \(\hat{\tau }= 1\) or 2, determine the values \(\hat{\phi }_{1}\) and \(\hat{\phi }_{2}\) of ϕ 1 and ϕ 2 that minimize \(\sum _{t=3}^{n}[Y _{t} -\phi _{1}Y _{t-1} -\phi _{2}Y _{t-2}]^{2}\); then go to M.

  5. 5.

    If \(\hat{\phi }{\bigl (\hat{\tau }\bigr )}<0.93\), go to S.

10.1.2 Fitting a Subset Autoregression

Let \(\{S_{t},t = k + 1,\ldots,n\}\) denote the memory-shortened series derived from {Y t } by the algorithm of the previous section and let \(\overline{S}\) denote the sample mean of S k+1, , S n .

The next step in the modeling procedure is to fit an autoregressive process to the mean-corrected series

$$\displaystyle{X_{t} = S_{t} -\overline{S},\quad t = k + 1,\ldots,n.}$$

The fitted model has the form

$$\displaystyle{X_{t} =\phi _{1}X_{t-1} +\phi _{l_{1}}X_{t-l_{1}} +\phi _{l_{2}}X_{t-l_{2}} +\phi _{l_{3}}X_{t-l_{3}} + Z_{t},}$$

where \(\{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right )\), and for given lags, l 1, l 2, and l 3, the coefficients ϕ j and the white noise variance σ 2 are found from the Yule–Walker equations

$$\displaystyle{\left [\begin{array}{*{10}c} 1 & \hat{\rho }(l_{1} - 1) & \hat{\rho }(l_{2} - 1) & \hat{\rho }(l_{3} - 1) \\ \hat{\rho }(l_{1} - 1)& 1 &\hat{\rho }(l_{2} - l_{1})&\hat{\rho }(l_{3} - l_{1}) \\ \hat{\rho }(l_{2} - 1)&\hat{\rho }(l_{2} - l_{1})& 1 &\hat{\rho }(l_{3} - l_{2}) \\ \hat{\rho }(l_{3} - 1)&\hat{\rho }(l_{3} - l_{1})&\hat{\rho }(l_{3} - l_{2})& 1 \end{array} \right ]\left [\begin{array}{*{10}c} \phi _{1}\\ \phi _{ l_{1}} \\ \phi _{l_{2}} \\ \phi _{l_{3}} \end{array} \right ] = \left [\begin{array}{*{10}c} \hat{\rho }(1) \\ \hat{\rho }(l_{1}) \\ \hat{\rho }(l_{2}) \\ \hat{\rho }(l_{3}) \end{array} \right ]}$$

and

$$\displaystyle{\sigma ^{2} =\hat{\gamma } (0)\left [1 -\phi _{ 1}\hat{\rho }(1) -\phi _{l_{1}}\hat{\rho }(l_{1}) -\phi _{l_{2}}\hat{\rho }(l_{2}) -\phi _{l_{3}}\hat{\rho }(l_{3})\right ],}$$

where \(\hat{\gamma }(j)\) and \(\hat{\rho }(j),j = 0,1,2,\ldots,\) are the sample autocovariances and autocorrelations of the series {X t }.

The program computes the coefficients ϕ j for each set of lags such that

$$\displaystyle{1 <l_{1} <l_{2} <l_{3} \leq m,}$$

where m can be chosen to be either 13 or 26. It then selects the model for which the Yule–Walker estimate σ 2 is minimal and prints out the lags, coefficients, and white noise variance for the fitted model.

A slower procedure chooses the lags and coefficients (computed from the Yule–Walker equations as above) that maximize the Gaussian likelihood of the observations. For this option the maximum lag m is 13.

The options are displayed in the ARAR Forecasting dialog box, which appears on the screen when the option Forecasting>ARAR is selected. It allows you also to bypass memory shortening and fit a subset AR to the original (mean-corrected) data.

10.1.3 Forecasting

If the memory-shortening filter found in the first step has coefficients ψ 0( = 1), ψ 1, , ψ k (k ≥ 0), then the memory-shortened series can be expressed as

$$\displaystyle{ S_{t} =\psi (B)Y _{t} = Y _{t} +\psi _{1}Y _{t-1} + \cdots +\psi _{k}Y _{t-k}, }$$
(10.1.3)

where ψ(B) is the polynomial in the backward shift operator,

$$\displaystyle{\psi (B) = 1 +\psi _{1}B + \cdots +\psi _{k}B^{k}.}$$

Similarly, if the coefficients of the subset autoregression found in the second step are \(\phi _{1},\phi _{l_{1}},\phi _{l_{2}}\), and \(\phi _{l_{3}}\), then the subset AR model for the mean-corrected series \(\big\{X_{t} =\) \(S_{t} -\overline{S}\big\}\) is

$$\displaystyle{ \phi (B)X_{t} = Z_{t}, }$$
(10.1.4)

where \(\{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right )\) and

$$\displaystyle{ \phi (B) = 1 -\phi _{1}B -\phi _{l_{1}}B^{l_{1} } -\phi _{l_{2}}B^{l_{2} } -\phi _{l_{3}}B^{l_{3} }. }$$

From (10.1.3) and (10.1.4) we obtain the equations

$$\displaystyle{ \xi (B)Y _{t} =\phi (1)\overline{S} + Z_{t}, }$$
(10.1.5)

where

$$\displaystyle{\xi (B) =\psi (B)\phi (B) = 1 +\xi _{1}B + \cdots +\xi _{k+l_{3}}B^{k+l_{3} }.}$$

Assuming that the fitted model (10.1.5) is appropriate and that the white noise term Z t is uncorrelated with {Y j ,  j < t} for each t, we can determine the minimum mean squared error linear predictors P n Y n+h of Y n+h in terms of {1, Y 1, , Y n }, for n > k + l 3, from the recursions

$$\displaystyle{ P_{n}Y _{n+h} = -\sum _{j=1}^{k+l_{3} }\xi _{j}P_{n}Y _{n+h-j} +\phi (1)\overline{S},\quad h \geq 1, }$$
(10.1.6)

with the initial conditions

$$\displaystyle{ P_{n}Y _{n+h} = Y _{n+h},\quad \mbox{ for }h \leq 0. }$$
(10.1.7)

The mean squared error of the predictor P n Y n+h is found to be (Problem 10.1)

$$\displaystyle{ E\left [(Y _{n+h} - P_{n}Y _{n+h})^{2}\right ] =\sum _{ j=0}^{h-1}\tau _{ j}^{2}\sigma ^{2}, }$$
(10.1.8)

where j = 0 τ j z j is the Taylor expansion of 1∕ξ(z) in a neighborhood of z = 0. Equivalently the sequence {τ j } can be found from the recursion

$$\displaystyle{ \tau _{0} = 1,\sum _{j=0}^{n}\tau _{ j}\xi _{n-j} = 0,\quad n = 1,2,\ldots. }$$
(10.1.9)

10.1.4 Application of the ARAR Algorithm

To determine an ARAR model for a given data set {Y t } using ITSM, select Forecasting>ARAR and choose the appropriate options in the resulting dialog box. These include specification of the number of forecasts required, whether or not you wish to include the memory-shortening step, whether you require prediction bounds, and which of the optimality criteria is to be used. Once you have made these selections, click OK, and the forecasts will be plotted with the original data. Right-click on the graph and then Info to see the coefficients 1, ψ 1, , ψ k of the memory-shortening filter ψ(B), the lags and coefficients of the subset autoregression

$$\displaystyle{X_{t} -\phi _{1}X_{t-1} -\phi _{l_{1}}X_{t-l_{1}} -\phi _{l_{2}}X_{t-l_{2}} -\phi _{l_{3}}X_{t-l_{3}} = Z_{t},}$$

and the coefficients ξ j of B j in the overall whitening filter

$$\displaystyle{\xi (B) = \left (1 +\psi _{1}B + \cdots +\psi _{k}B^{k}\right )\left (1 -\phi _{ 1}B -\phi _{l_{1}}B^{l_{1} } -\phi _{l_{2}}B^{l_{2} } -\phi _{l_{3}}B^{l_{3} }\right ).}$$

The numerical values of the predictors, their root mean squared errors, and the prediction bounds are also printed.

Example 10.1.1

To use the ARAR algorithm to predict 24 values of the accidental deaths data, open the file DEATHS.TSM and proceed as described above. Selecting Minimize WN variance [max lag=26] gives the graph of the data and predictors shown in Figure 10.1. Right-clicking on the graph and then Info, we find that the selected memory-shortening filter is \(\left (1 - 0.9779B^{12}\right )\). The fitted subset autoregression and the coefficients ξ j of the overall whitening filter ξ(B) are shown below:

$$\displaystyle{\begin{array}{rrrrrr} \multicolumn{2}{l}{\mathsf{Optimal\ lags}} & 1& 3& 12& 13 \\ \multicolumn{2}{l}{\mathsf{Optimal\ coeffs}} & 0.5915&-0.3822&-0.3022&0.2970 \\ \multicolumn{3}{l}{\mathsf{WN\ Variance: \quad 0.12314E + 06}}& & \\ \multicolumn{6}{l}{\mathsf{COEFFICIENTS\ OF\ OVERALL\ WHITENING\ FILTER:}}\\ 1.0000 &-0.5915 & 0.0000 &-0.2093 & 0.0000 & \\ 0.0000& 0.0000& 0.0000& 0.0000& 0.0000& \\ 0.0000 & 0.0000 & -0.6757 & 0.2814 & 0.0000 & \\ 0.2047& 0.0000& 0.0000& 0.0000& 0.0000& \\ 0.0000 & 0.0000 & 0.0000 & 0.0000 &-0.2955 & \\ 0.2904& & & & & \\ \end{array} }$$

 □ 

Fig. 10.1
figure 1

The data set DEATHS.TSM with 24 values predicted by the ARAR algorithm

In Table 10.1 we compare the predictors of the next six values of the accidental deaths series with the actual observed values. The predicted values obtained from ARAR as described in the example are shown together with the predictors obtained by fitting ARIMA models as described in Chapter 6 (see Table 10.1). The observed root mean squared errors (i.e., \(\sqrt{ \sum _{h=1}^{6}(Y _{72+h}-P_{72}Y _{72+h})^{2}/6}\) ) for the three prediction methods are easily calculated to be 253 for ARAR, 583 for the ARIMA model (6.5.8), and 501 for the ARIMA model (6.5.9). The ARAR algorithm thus performs very well here. Notice that in this particular example the ARAR algorithm effectively fits a causal AR model to the data, but this is not always the case.

Table 10.1 Predicted and observed values of the accidental deaths series for t = 73, , 78

10.2 The Holt–Winters Algorithm

10.2.1 The Algorithm

Given observations Y 1, Y 2, , Y n from the “trend plus noise” model (1.5.2), the exponential smoothing recursions (1.5.7) allowed us to compute estimates \(\hat{m}_{t}\) of the trend at times t = 1, 2, , n. If the series is stationary, then m t is constant and the exponential smoothing forecast of Y n+h based on the observations Y 1, , Y n is

$$\displaystyle{ P_{n}Y _{n+h} =\hat{ m}_{n},\quad h = 1,2,\ldots. }$$
(10.2.1)

If the data have a (nonconstant) trend, then a natural generalization of the forecast function (10.2.1) that takes this into account is

$$\displaystyle{ P_{n}Y _{n+h} =\hat{ a}_{n} +\hat{ b}_{n}h,\quad h = 1,2,\ldots, }$$
(10.2.2)

where \(\hat{a}_{n}\) and \(\hat{b}_{n}\) can be thought of as estimates of the “level” a n and “slope” b n of the trend function at time n. Holt (1957) suggested a recursive scheme for computing the quantities \(\hat{a}_{n}\) and \(\hat{b}_{n}\) in (10.2.2). Denoting by \(\hat{Y }_{n+1}\) the one-step forecast P n Y n+1, we have from (10.2.2)

$$\displaystyle{\hat{Y }_{n+1} =\hat{ a}_{n} +\hat{ b}_{n}.}$$

Now, as in exponential smoothing, we suppose that the estimated level at time n + 1 is a linear combination of the observed value at time n + 1 and the forecast value at time n + 1. Thus,

$$\displaystyle{ \hat{a}_{n+1} =\alpha Y _{n+1} + (1-\alpha ){\bigl (\hat{a}_{n} +\hat{ b}_{n}\bigr )}. }$$
(10.2.3)

We can then estimate the slope at time n + 1 as a linear combination of \(\hat{a}_{n+1} -\hat{ a}_{n}\) and the estimated slope \(\hat{b}_{n}\) at time n. Thus,

$$\displaystyle{ \hat{b}_{n+1} =\beta \left (\hat{a}_{n+1} -\hat{ a}_{n}\right ) + (1-\beta )\hat{b}_{n}. }$$
(10.2.4)

In order to solve the recursions (10.2.3) and (10.2.4) we need initial conditions. A natural choice is to set

$$\displaystyle{ \hat{a}_{2} = Y _{2} }$$
(10.2.5)

and

$$\displaystyle{ \hat{b}_{2} = Y _{2} - Y _{1}. }$$
(10.2.6)

Then (10.2.3) and (10.2.4) can be solved successively for \(\hat{a}_{i}\) and \(\hat{b}_{i}\), i = 3, , n, and the predictors P n Y n+h found from (10.2.2).

The forecasts depend on the “smoothing parameters” α and β. These can either be prescribed arbitrarily (with values between 0 and 1) or chosen in a more systematic way to minimize the sum of squares of the one-step errors \(\sum _{i=3}^{n}(Y _{i} - P_{i-1}Y _{i})^{2}\), obtained when the algorithm is applied to the already observed data. Both choices are available in the ITSM option Forecasting>Holt-Winters.

Before illustrating the use of the Holt–Winters forecasting procedure, we discuss the connection between the recursions (10.2.3) and (10.2.4) and the steady-state solution of the Kalman filtering equations for a local linear trend model. Suppose {Y t } follows the local linear structural model with observation equation

$$\displaystyle{Y _{t} = M_{t} + W_{t}}$$

and state equation

$$\displaystyle{\left [\begin{array}{*{10}c} M_{t+1} \\ B_{t+1} \end{array} \right ] = \left [\begin{array}{*{10}c} 1&1\\ 0 &1 \end{array} \right ]\left [\begin{array}{*{10}c} M_{t} \\ B_{t} \end{array} \right ]+\left [\begin{array}{*{10}c} V _{t} \\ U_{t} \end{array} \right ]}$$

[see (9.2.4)–(9.2.7)]. Now define \(\hat{a}_{n}\) and \(\hat{b}_{n}\) to be the filtered estimates of M n and B n , respectively, i.e.,

$$\displaystyle\begin{array}{rcl} \hat{a}_{n}& =& M_{n\vert n}:= P_{n}M_{n},\phantom{well} {}\\ \hat{b}_{n}& =& B_{n\vert n}:= P_{n}B_{n}. {}\\ \end{array}$$

Using Problem 9.17 and the Kalman recursion (9.4.16), we find that

$$\displaystyle{ \left [\begin{array}{*{10}c} \hat{a}_{n+1} \\ \hat{b}_{n+1} \end{array} \right ] = \left [\begin{array}{*{10}c} \hat{a}_{n} +\hat{ b}_{n} \\ \hat{b}_{n} \end{array} \right ]+\Delta _{n}^{-1}\Omega _{ n}G'\left (Y _{n} -\hat{ a}_{n} -\hat{ b}_{n}\right ), }$$
(10.2.7)

where \(G = \left [\begin{array}{*{10}c} 1\;\ 0\end{array} \right ]\). Assuming that \(\Omega _{n}=\Omega _{1}=[\Omega _{ij}]_{i,\,j=1}^{2}\) is the steady-state solution of (9.4.2) for this model, then \(\Delta _{n}=\Omega _{11} +\sigma _{ w}^{2}\) for all n, so that (10.2.7) simplifies to the equations

$$\displaystyle{ \hat{a}_{n+1} =\hat{ a}_{n} +\hat{ b}_{n} + \frac{\Omega _{11}} {\Omega _{11} +\sigma _{ w}^{2}}\left (Y _{n} -\hat{ a}_{n} -\hat{ b}_{n}\right )\phantom{well} }$$
(10.2.8)

and

$$\displaystyle{ \hat{b}_{n+1} =\hat{ b}_{n} + \frac{\Omega _{12}} {\Omega _{11} +\sigma _{ w}^{2}}\left (Y _{n} -\hat{ a}_{n} -\hat{ b}_{n}\right ). }$$
(10.2.9)

Solving (10.2.8) for \({\bigl (Y _{n} -\hat{ a}_{n} -\hat{ b}_{n}\bigr )}\) and substituting into (10.2.9), we find that

$$\displaystyle{ \hat{a}_{n+1} =\alpha Y _{n+1} + (1-\alpha )\left (\hat{a}_{n} +\hat{ b}_{n}\right ), }$$
(10.2.10)
$$\displaystyle{ \hat{b}_{n+1} =\beta \left (\hat{a}_{n+1} -\hat{ a}_{n}\right ) + (1-\beta )\hat{b}_{n} }$$
(10.2.11)

with \(\alpha = \Omega _{11}/\left (\Omega _{11} +\sigma _{ w}^{2}\right )\) and \(\beta = \Omega _{21}/\Omega _{11}\). These equations coincide with the Holt–Winters recursions (10.2.3) and (10.2.4). Equations relating α and β to the variances σ u 2, σ v 2, and σ w 2 can be found in Harvey (1990).

Example 10.2.1

To predict 24 values of the accidental deaths series using the Holt–Winters algorithm, open the file DEATHS.TSM and select Forecasting>Holt-Winters. In the resulting dialog box specify 24 for the number of predictors and check the box marked Optimize coefficients for automatic selection of the smoothing coefficients α and β. Click OK, and the forecasts will be plotted with the original data as shown in Figure 10.2. Right-click on the graph and then Info to see the numerical values of the predictors, their root mean squared errors, and the optimal values of α and β. The predicted and observed values are shown in Table 10.2. □ 

Fig. 10.2
figure 2

The data set DEATHS.TSM with 24 values predicted by the nonseasonal Holt–Winters algorithm

Table 10.2 Predicted and observed values of the accidental deaths series for t = 73, , 78 from the (nonseasonal) Holt–Winters algorithm

The root mean squared error \({\bigl (\sqrt{\sum _{h=1 }^{6 }(Y _{72+h } - P_{72 } Y _{72+h } )^{2 } /6}\bigr )}\) for the nonseasonal Holt–Winters forecasts is found to be 1143. Not surprisingly, since we have not taken seasonality into account, this is a much larger value than for the three sets of forecasts shown in Table 10.1. In the next section we show how to modify the Holt–Winters algorithm to allow for seasonality.

10.2.2 Holt–Winters and ARIMA Forecasting

The one-step forecasts obtained by exponential smoothing with parameter α (defined by (1.5.7) and (10.2.1)) satisfy the relations

$$\displaystyle{ P_{n}Y _{n+1} = Y _{n} - (1-\alpha )(Y _{n} - P_{n-1}Y _{n}),\quad n \geq 2. }$$
(10.2.12)

But these are the same relations satisfied by the large-sample minimum mean squared error forecasts of the invertible ARIMA(0,1,1) process

$$\displaystyle{ Y _{t} = Y _{t-1} + Z_{t} - (1-\alpha )Z_{t-1},\ \ \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ). }$$
(10.2.13)

Forecasting by exponential smoothing with optimal α can therefore be viewed as fitting a member of the two-parameter family of ARIMA processes (10.2.13) to the data and using the corresponding large-sample forecast recursions initialized by P 0 Y 1 = Y 1. In ITSM, the optimal α is found by minimizing the average squared error of the one-step forecasts of the observed data Y 2, , Y n , and the parameter σ 2 is estimated by this average squared error. This algorithm could easily be modified to minimize other error measures such as average absolute one-step error and average 12-step squared error.

In the same way it can be shown that Holt–Winters forecasting can be viewed as fitting a member of the three-parameter family of ARIMA processes,

$$\displaystyle{ (1 - B)^{2}Y _{ t} = Z_{t} - (2 -\alpha -\alpha \beta )Z_{t-1} + (1-\alpha )Z_{t-2}, }$$
(10.2.14)

where \(\big\{Z_{t}\big\} \sim \mathrm{WN}(0,\sigma ^{2})\). The coefficients α and β are selected as described after (10.2.6), and the estimate of σ 2 is the average squared error of the one-step forecasts of Y 3, , Y n obtained from the large-sample forecast recursions corresponding to (10.2.14).

10.3 The Holt–Winters Seasonal Algorithm

10.3.1 The Algorithm

If the series Y 1, Y 2, , Y n contains not only trend, but also seasonality with period d [as in the model (1.5.11)], then a further generalization of the forecast function (10.2.2) that takes this into account is

$$\displaystyle{ P_{n}Y _{n+h} =\hat{ a}_{n} +\hat{ b}_{n}h +\hat{ c}_{n+h},\quad h = 1,2,\ldots, }$$
(10.3.1)

where \(\hat{a}_{n}\), \(\hat{b}_{n}\), and \(\hat{c}_{n}\) can be thought of as estimates of the “trend level” a n , “trend slope” b n , and “seasonal component” c n at time n. If k is the smallest integer such that \(n + h - kd \leq n\), then we set

$$\displaystyle{ \hat{c}_{n+h} = \hat{c}_{n+h-kd},\quad \ h = 1,2,\ldots, }$$
(10.3.2)

while the values of \(\hat{a}_{i}\), \(\hat{b}_{i}\), and \(\hat{c}_{i}\), \(i = d + 2,\ldots,n\), are found from recursions analogous to (10.2.3) and (10.2.4), namely,

$$\displaystyle\begin{array}{rcl} \hat{a}_{n+1}& =& \alpha \left (Y _{n+1} -\hat{ c}_{n+1-d}\right ) + (1-\alpha ){\bigl (\hat{a}_{n} +\hat{ b}_{n}\bigr )},{}\end{array}$$
(10.3.3)
$$\displaystyle\begin{array}{rcl} \hat{b}_{n+1}& =& \beta \left (\hat{a}_{n+1} -\hat{ a}_{n}\right ) + (1-\beta )\hat{b}_{n},{}\end{array}$$
(10.3.4)

and

$$\displaystyle{ \hat{c}_{n+1} =\gamma (Y _{n+1} -\hat{ a}_{n+1}) + (1-\gamma )\hat{c}_{n+1-d}, }$$
(10.3.5)

with initial conditions

$$\displaystyle\begin{array}{rcl} \hat{a}_{d+1}& =& Y _{d+1},{}\end{array}$$
(10.3.6)
$$\displaystyle\begin{array}{rcl} \hat{b}_{d+1}& =& (Y _{d+1} - Y _{1})/d,{}\end{array}$$
(10.3.7)

and

$$\displaystyle{ \hat{c}_{i} = Y _{i} -{\bigl ( Y _{1} +\hat{ b}_{d+1}(i - 1)\bigr )},\quad i = 1,\ldots,d + 1. }$$
(10.3.8)

Then (10.3.3)–(10.3.5) can be solved successively for \(\hat{a}_{i},\hat{b}_{i}\), and \(\hat{c}_{i}\), \(i = d + 1,\ldots,n\), and the predictors P n Y n+h found from (10.3.1).

As in the nonseasonal case of Section 10.2, the forecasts depend on the parameters α, β, and γ. These can either be prescribed arbitrarily (with values between 0 and 1) or chosen in a more systematic way to minimize the sum of squares of the one-step errors \(\sum _{i=d+2}^{n}(Y _{i} - P_{i-1}Y _{i})^{2}\), obtained when the algorithm is applied to the already observed data. Seasonal Holt–Winters forecasts can be computed by selecting the ITSM option Forecasting>Seasonal Holt-Winters.

Example 10.3.1

As in Example 10.2.1, open the file DEATHS.TSM, but this time select Forecasting>Seasonal Holt-Winters. Specify 24 for the number of predicted values required, 12 for the period of the seasonality, and check the box marked Optimize Coefficients. Click OK, and the graph of the data and predicted values shown in Figure 10.3 will appear. Right-click on the graph and then on Info and you will see the numerical values of the predictors and the optimal values of the coefficients α, β, and γ (minimizing the observed one-step average squared error \(\sum _{i=14}^{72}(Y _{i} - P_{i-1}Y _{i})^{2}/59\)). Table 10.3 compares the predictors of Y 73, , Y 78 with the corresponding observed values. □ 

Fig. 10.3
figure 3

The data set DEATHS.TSM with 24 values predicted by the seasonal Holt–Winters algorithm

Table 10.3 Predicted and observed values of the accidental deaths series for t = 73, , 78 from the seasonal Holt–Winters algorithm

The root mean squared error (\(\sqrt{\sum _{h=1 }^{6 }(Y _{72+h } - P_{72 } Y _{72+h } )^{2 } /6}\) ) for the seasonal Holt–Winters forecasts is found to be 401. This is not as good as the value 253 achieved by the ARAR model for this example but is substantially better than the values achieved by the nonseasonal Holt–Winters algorithm (1143) and the ARIMA models (6.5.8) and (6.5.9) (583 and 501, respectively).

10.3.2 Holt–Winters Seasonal and ARIMA Forecasting

As in Section 10.2.2, the Holt–Winters seasonal recursions with seasonal period d correspond to the large-sample forecast recursions of an ARIMA process, in this case defined by

$$\displaystyle\begin{array}{rcl} (1 - B)(1 - B^{d})Y _{ t}& =& Z_{t} + \cdots + Z_{t-d+1} +\gamma (1-\alpha )(Z_{t-d} - Z_{t-d-1}) {}\\ & & -(2 -\alpha -\alpha \beta )(Z_{t-1} + \cdots + Z_{t-d}) {}\\ & & +(1-\alpha )(Z_{t-2} + \cdots + Z_{t-d-1}), {}\\ \end{array}$$

where {Z t } ∼ WN\(\left (0,\sigma ^{2}\right )\). Holt–Winters seasonal forecasting with optimal α,  β, and γ can therefore be viewed as fitting a member of this four-parameter family of ARIMA models and using the corresponding large-sample forecast recursions.

10.4 Choosing a Forecasting Algorithm

Real data are rarely if ever generated by a simple mathematical model such as an ARIMA process. Forecasting methods that are predicated on the assumption of such a model are therefore not necessarily the best, even in the mean squared error sense. Nor is the measurement of error in terms of mean squared error necessarily always the most appropriate one in spite of its mathematical convenience. Even within the framework of minimum mean squared-error forecasting, we may ask (for example) whether we wish to minimize the one-step, two-step, or twelve-step mean squared error.

The use of more heuristic algorithms such as those discussed in this chapter is therefore well worth serious consideration in practical forecasting problems. But how do we decide which method to use? A relatively simple solution to this problem, given the availability of a substantial historical record, is to choose among competing algorithms by comparing the relevant errors when the algorithms are applied to the data already observed (e.g., by comparing the mean absolute percentage errors of the 12-step predictors of the historical data if 12-step prediction is of primary concern).

It is extremely difficult to make general theoretical statements about the relative merits of the various techniques we have discussed (ARIMA modeling, exponential smoothing, ARAR, and HW methods). For the series DEATHS.TSM we found on the basis of average mean squared error for predicting the series at times 73–78 that the ARAR method was best, followed by the seasonal Holt–Winters algorithm, and then the ARIMA models fitted in Chapter 6. This ordering is by no means universal. For example, if we consider the natural logarithms {Y t } of the first 130 observations in the series WINE.TSM (Figure 1.1) and compare the average mean squared errors of the forecasts of Y 131, , Y 142, we find (Problem 10.2 that an MA(12) model fitted to the mean corrected differenced series \(\{Y _{t} - Y _{t-12}\}\) does better than seasonal Holt–Winters (with period 12), which in turn does better than ARAR and (not surprisingly) dramatically better than nonseasonal Holt–Winters. An interesting empirical comparison of these and other methods applied to a variety of economic time series is contained in Makridakis et al. (1984).

The versions of the Holt–Winters algorithms we have discussed in Sections 10.2 and 10.3 are referred to as “additive,” since the seasonal and trend components enter the forecasting function in an additive manner. “Multiplicative” versions of the algorithms can also be constructed to deal directly with processes of the form

$$\displaystyle{ Y _{t} = m_{t}s_{t}Z_{t}, }$$
(10.4.1)

where m t , s t , and Z t are trend, seasonal, and noise factors, respectively (see, e.g., Makridakis et al. 1997). An alternative approach (provided that Y t  > 0 for all t) is to apply the linear Holt–Winters algorithms to {lnY t } (as in the case of WINE.TSM in the preceding paragraph). Because of the rather general memory shortening permitted by the ARAR algorithm, it gives reasonable results when applied directly to series of the form (10.4.1), even without preliminary transformations. In particular, if we consider the first 132 observations in the series AIRPASS.TSM and apply the ARAR algorithm to predict the last 12 values in the series, we obtain (Problem 10.4) an observed root mean squared error of 18.22. On the other hand if we use the same data take logarithms, difference at lag 12, subtract the mean and then fit an AR(13) model by maximum likelihood using ITSM and use it to predict the last 12 values, we obtain an observed root mean squared error of 21.17. The data and predicted values from the ARAR algorithm are shown in Figure 10.4.

Fig. 10.4
figure 4

The first 132 values of the data set AIRPASS.TSM and predictors of the last 12 values obtained by direct application of the ARAR algorithm

Problems

  1. 10.1

    Establish the formula (10.1.8) for the mean squared error of the h-step forecast based on the ARAR algorithm.

  2. 10.2

    Let {X 1, , X 142} denote the data in the file WINE.TSM and let {Y 1, , Y 142} denote their natural logarithms. Denote by m the sample mean of the differenced series \(\{Y _{t} - Y _{t-12},t = 13,\ldots,130\}\).

    1. (a)

      Use the program ITSM to find the maximum likelihood MA(12) model for the differenced and mean-corrected series \(\{Y _{t} - Y _{t-12} - m,t = 13,\ldots,130\}\).

    2. (b)

      Use the model in (a) to compute forecasts of {X 131, , X 142}.

    3. (c)

      Tabulate the forecast errors \(\{X_{t} - P_{130}\ X_{t},t = 131,\ldots,142\}\).

    4. (d)

      Compute the average squared error for the 12 forecasts.

    5. (e)

      Repeat steps (b), (c), and (d) for the corresponding forecasts obtained by applying the ARAR algorithm to the series {X t , t = 1, , 130}.

    6. (f)

      Repeat steps (b), (c), and (d) for the corresponding forecasts obtained by applying the seasonal Holt–Winters algorithm (with period 12) to the logged data \(\{Y _{t},t = 1,\ldots,130\}\). (Open the file WINE.TSM, select Transform>Box-Cox with parameter λ = 0, then select Forecasting> Seasonal Holt-Winters, and check Apply to original data in the dialog box.)

    7. (g)

      Repeat steps (b), (c), and (d) for the corresponding forecasts obtained by applying the nonseasonal Holt–Winters algorithm to the logged data {Y t , t = 1, , 130}. (The procedure is analogous to that described in part (f).)

    8. (h)

      Compare the average squared errors obtained by the four methods.

  3. 10.3

    In equations (10.2.10) and (10.2.11), show that \(\alpha = \Omega _{11}/{\bigl (\Omega _{11} +\sigma _{ w}^{2}\bigr )}\) and \(\beta = \Omega _{21}/\Omega _{11}\).

  4. 10.4

    Verify the assertions made in the last paragraph of Section 10.4, comparing the forecasts of the last 12 values of the series AIRPASS.TSM obtained from the ARAR algorithm (with no log transformation) and the corresponding forecasts obtained by taking logarithms of the original series, then differencing at lag 12, mean-correcting, and fitting an AR(13) model to the transformed series.