Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In this chapter we shall examine the problem of finding an appropriate model for a given set of observations {x 1, , x n } that are not necessarily generated by a stationary time series. If the data (a) exhibit no apparent deviations from stationarity and (b) have a rapidly decreasing autocovariance function, we attempt to fit an ARMA model to the mean-corrected data using the techniques developed in Chapter 5 Otherwise, we look first for a transformation of the data that generates a new series with the properties (a) and (b). This can frequently be achieved by differencing, leading us to consider the class of ARIMA (autoregressive integrated moving-average) models, defined in Section 6.1. We have in fact already encountered ARIMA processes. The model fitted in Example 5.1.1 to the Dow Jones Utilities Index was obtained by fitting an AR model to the differenced data, thereby effectively fitting an ARIMA model to the original series. In Section 6.1 we shall give a more systematic account of such models.

In Section 6.2 we discuss the problem of finding an appropriate transformation for the data and identifying a satisfactory ARMA(p, q) model for the transformed data. The latter can be handled using the techniques developed in Chapter 5 The sample ACF and PACF and the preliminary estimators \(\hat{\phi }_{m}\) and \(\hat{\theta }_{m}\) of Section 5.1 can provide useful guidance in this choice. However, our prime criterion for model selection will be the AICC statistic discussed in Section 5.5.2 To apply this criterion we compute maximum likelihood estimators of ϕ, \(\theta\), and σ 2 for a variety of competing p and q values and choose the fitted model with smallest AICC value. Other techniques, in particular those that use the R and S arrays of Gray et al. (1978), are discussed in the survey of model identification by de Gooijer et al. (1985). If the fitted model is satisfactory, the residuals (see Section 5.3) should resemble white noise. Tests for this were described in Section 5.3 and should be applied to the minimum AICC model to make sure that the residuals are consistent with their expected behavior under the model. If they are not, then competing models (models with AICC value close to the minimum) should be checked until we find one that passes the goodness of fit tests. In some cases a small difference in AICC value (say less than 2) between two satisfactory models may be ignored in the interest of model simplicity. In Section 6.3 we consider the problem of testing for a unit root of either the autoregressive or moving-average polynomial. An autoregressive unit root suggests that the data require differencing, and a moving-average unit root suggests that they have been overdifferenced. Section 6.4 considers the prediction of ARIMA processes, which can be carried out using an extension of the techniques developed for ARMA processes in Sections 3.3 and 5.4 In Section 6.5 we examine the fitting and prediction of seasonal ARIMA (SARIMA) models, whose analysis, except for certain aspects of model identification, is quite analogous to that of ARIMA processes. Finally, we consider the problem of regression, allowing for dependence between successive residuals from the regression. Such models are known as regression models with time series residuals and often occur in practice as natural representations for data containing both trend and serially dependent errors.

6.1 ARIMA Models for Nonstationary Time Series

We have already discussed the importance of the class of ARMA models for representing stationary series. A generalization of this class, which incorporates a wide range of nonstationary series, is provided by the ARIMA processes, i.e., processes that reduce to ARMA processes when differenced finitely many times.

Definition 6.1.1

If d is a nonnegative integer, then {X t } is an ARIMA( p,d,q) process if Y t : = (1 − B)d X t is a causal ARMA( p, q) process.

This definition means that {X t } satisfies a difference equation of the form

$$\displaystyle{ \phi ^{{\ast}}(B)X_{ t} \equiv \phi (B)(1 - B)^{d}X_{ t} =\theta (B)Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ), }$$
(6.1.1)

where ϕ(z) and θ(z) are polynomials of degrees p and q, respectively, and ϕ(z) ≠ 0 for | z | ≤ 1. The polynomial ϕ (z) has a zero of order d at z = 1. The process {X t } is stationary if and only if d = 0, in which case it reduces to an ARMA(p, q) process.

Notice that if d ≥ 1, we can add an arbitrary polynomial trend of degree (d − 1) to {X t } without violating the difference equation (6.1.1). ARIMA models are therefore useful for representing data with trend (see Sections 1.5 and 6.2). It should be noted, however, that ARIMA processes can also be appropriate for modeling series with no trend. Except when d = 0, the mean of {X t } is not determined by equation (6.1.1), and it can in particular be zero (as in Example 1.3.3). Since for d ≥ 1, equation (6.1.1) determines the second-order properties of {(1 − B)d X t } but not those of {X t } (Problem 6.1), estimation of ϕ, \(\theta\), and σ 2 will be based on the observed differences (1 − B)d X t . Additional assumptions are needed for prediction (see Section 6.4).

Example 6.1.1

{X t } is an ARIMA(1,1,0) process if for some ϕ ∈ (−1, 1),

$$\displaystyle{ (1 -\phi B)(1 - B)X_{t} = Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ). }$$

We can then write

$$\displaystyle{ X_{t} = X_{0} +\sum _{ j=1}^{t}Y _{ j},\quad t \geq 1, }$$

where

$$\displaystyle{ Y _{t} = (1 - B)X_{t} =\sum _{ j=0}^{\infty }\phi ^{\,j}Z_{ t-j}. }$$

A realization of {X 1, , X 200} with X 0 = 0, ϕ = 0. 8, and σ 2 = 1 is shown in Figure 6.1, with the corresponding sample autocorrelation and partial autocorrelation functions in Figures 6.2 and 6.3, respectively.

Fig. 6.1
figure 1

200 observations of the ARIMA(1,1,0) series X t of Example 6.1.1

Fig. 6.2
figure 2

The sample ACF of the data in Figure 6.1

Fig. 6.3
figure 3

The sample PACF of the data in Figure 6.1

A distinctive feature of the data that suggests the appropriateness of an ARIMA model is the slowly decaying positive sample autocorrelation function in Figure 6.2. If, therefore, we were given only the data and wished to find an appropriate model, it would be natural to apply the operator ∇ = 1 − B repeatedly in the hope that for some \(j,\{\nabla ^{j}X_{t}\}\) will have a rapidly decaying sample autocorrelation function compatible with that of an ARMA process with no zeros of the autoregressive polynomial near the unit circle. For this particular time series, one application of the operator ∇ produces the realization shown in Figure 6.4, whose sample ACF and PACF (Figures 6.5 and 6.6) suggest an AR(1) [or possibly AR(2)] model for {∇X t }. The maximum likelihood estimates of ϕ and σ 2 obtained from ITSM under the assumption that E(∇X t ) = 0 (found by not subtracting the mean after differencing the data) are 0.808 and 0.978, respectively, giving the model

$$\displaystyle{ (1 - 0.808B)(1 - B)X_{t} = Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}(0,0.978), }$$
(6.1.2)

which bears a close resemblance to the true underlying process,

$$\displaystyle{ (1 - 0.8B)(1 - B)X_{t} = Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}(0,1). }$$
(6.1.3)
Fig. 6.4
figure 4

199 observations of the series Y t  = ∇X t with {X t } as in Figure 6.1

Fig. 6.5
figure 5

The sample ACF of the series {Y t } in Figure 6.4

Fig. 6.6
figure 6

The sample PACF of the series {Y t } in Figure 6.4

Instead of differencing the series in Figure 6.1 we could proceed more directly by attempting to fit an AR(2) process as suggested by the sample PACF of the original series in Figure 6.3. Maximum likelihood estimation, carried out using ITSM after fitting a preliminary model with Burg’s algorithm and assuming that EX t  = 0, gives the model

$$\displaystyle\begin{array}{rcl} & & (1 - 1.808B + 0.811B^{2})X_{ t} = (1 - 0.825B)(1 - 0.983B)X_{t} = Z_{t}, \\ & & \{Z_{t}\} \sim \mathrm{WN}(0,0.970), {}\end{array}$$
(6.1.4)

which, although stationary, has coefficients closely resembling those of the true nonstationary process (6.1.3). (To obtain the model (6.1.4), two optimizations were carried out using the Model>Estimation>Max likelihood option of ITSM, the first with the default settings and the second after setting the accuracy parameter to 0.00001.)

From a sample of finite length it will be extremely difficult to distinguish between a nonstationary process such as (6.1.3), for which ϕ (1) = 0, and a process such as (6.1.4), which has very similar coefficients but for which ϕ has all of its zeros outside the unit circle. In either case, however, if it is possible by differencing to generate a series with rapidly decaying sample ACF, then the differenced data set can be fitted by a low-order ARMA process whose autoregressive polynomial ϕ has zeros that are comfortably outside the unit circle. This means that the fitted parameters will be well away from the boundary of the allowable parameter set. This is desirable for numerical computation of parameter estimates and can be quite critical for some methods of estimation. For example, if we apply the Yule–Walker equations to fit an AR(2) model to the data in Figure 6.1, we obtain the model

$$\displaystyle{ (1 - 1.282B + 0.290B^{2})\,X_{ t} = Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}(0,6.435), }$$
(6.1.5)

which bears little resemblance to either the maximum likelihood model (6.1.4) or the true model (6.1.3). In this case the matrix \(\hat{R}_{2}\) appearing in (5.1.7) is nearly singular.

An obvious limitation in fitting an ARIMA(p, d, q) process {X t } to data is that {X t } is permitted to be nonstationary only in a very special way, i.e., by allowing the polynomial ϕ (B) in the representation ϕ (B)X t  = Z t to have a zero of multiplicity d at the point 1 on the unit circle. Such models are appropriate when the sample ACF is a slowly decaying positive function as in Figure 6.2, since sample autocorrelation functions of this form are associated with models ϕ (B)X t  = θ(B)Z t in which ϕ has a zero either at or close to 1.

Sample autocorrelations with slowly decaying oscillatory behavior as in Figure 6.8 are associated with models ϕ (B)X t  = θ(B)Z t in which ϕ has a zero close to e i ω for some ω ∈ (−π, π] other than 0. Figure 6.8 is the sample ACF of the series of 200 observations in Figure 6.7, obtained from ITSM by simulating the AR(2) process

$$\displaystyle{ X_{t} - (2r^{-1}\cos \omega )X_{ t-1} + r^{-2}X_{ t-2} = Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}(0,1), }$$
(6.1.6)

with r = 1. 005 and ω = π∕3, i.e.,

$$\displaystyle{X_{t} - 0.9950X_{t-1} + 0.9901X_{t-2} = Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}(0,1).}$$
Fig. 6.7
figure 7

200 observations of the AR(2) process defined by (6.1.6) with r = 1. 005 and ω = π∕3

Fig. 6.8
figure 8

The sample ACF of the data in Figure 6.7

The autocorrelation function of the model (6.1.6) can be derived by noting that

$$\displaystyle{ 1 -\left (2r^{-1}\cos \omega \right )B + r^{-2}B^{2} = \left (1 - r^{-1}e^{i\omega }B\right )\left (1 - r^{-1}e^{-i\omega }B\right ) }$$
(6.1.7)

and using (3.2.12). This gives

$$\displaystyle{ \rho (h) = r^{-h}{\sin (h\omega +\psi ) \over \sin \psi },\quad h \geq 0, }$$
(6.1.8)

where

$$\displaystyle{ \tan \psi ={ r^{2} + 1 \over r^{2} - 1}\tan \omega. }$$
(6.1.9)

It is clear from these equations that

$$\displaystyle{ \rho (h) \rightarrow \cos (h\omega )\mbox{ as }r \downarrow 1. }$$
(6.1.10)

With r = 1. 005 and ω = π∕3 as in the model generating Figure 6.7, the model ACF (6.1.8) is a damped sine wave with damping ratio 1∕1. 005 and period 6. These properties are reflected in the sample ACF shown in Figure 6.8. For values of r closer to 1, the damping will be even slower as the model ACF approaches its limiting form (6.1.10).

If we were simply given the data shown in Figure 6.7, with no indication of the model from which it was generated, the slowly damped sinusoidal sample ACF with period 6 would suggest trying to make the sample ACF decay more rapidly by applying the operator (6.1.7) with r = 1 and ω = π∕3, i.e., \({\bigl (1 - B + B^{2}\bigr )}\). If it happens, as in this case, that the period 2πω is close to some integer s (in this case 6), then the operator 1 − B s can also be applied to produce a series with more rapidly decaying autocorrelation function (see also Section 6.5). Figures 6.9 and 6.10 show the sample autocorrelation functions obtained after applying the operators 1 − B + B 2 and 1 − B 6, respectively, to the data shown in Figure 6.7. For either one of these two differenced series, it is then not difficult to fit an ARMA model ϕ(B)X t  = θ(B)Z t for which the zeros of ϕ are well outside the unit circle. Techniques for identifying and determining such ARMA models have already been introduced in Chapter 5 For convenience we shall collect these together in the following sections with a number of illustrative examples.

Fig. 6.9
figure 9

The sample ACF of (1 − B + B 2)X t with {X t } as in Figure 6.7

Fig. 6.10
figure 10

The sample ACF of (1 − B 6)X t with {X t } as in Figure 6.7

6.2 Identification Techniques

(a) Preliminary Transformations. The estimation methods of Chapter 5 enable us to find, for given values of p and q, an ARMA( p, q) model to fit a given series of data. For this procedure to be meaningful it must be at least plausible that the data are in fact a realization of an ARMA process and in particular a realization of a stationary process. If the data display characteristics suggesting nonstationarity (e.g., trend and seasonality), then it may be necessary to make a transformation so as to produce a new series that is more compatible with the assumption of stationarity.

Deviations from stationarity may be suggested by the graph of the series itself or by the sample autocorrelation function or both.

Inspection of the graph of the series will occasionally reveal a strong dependence of variability on the level of the series, in which case the data should first be transformed to reduce or eliminate this dependence. For example, Figure 1.1 shows the Australian monthly red wine sales from January 1980 through October 1991, and Figure 1.17 shows how the increasing variability with sales level is reduced by taking natural logarithms of the original series. The logarithmic transformation V t  = lnU t used here is in fact appropriate whenever {U t } is a series whose standard deviation increases linearly with the mean. For a systematic account of a general class of variance-stabilizing transformations, we refer the reader to Box and Cox (1964). The defining equation for the general Box–Cox transformation f λ is

$$\displaystyle{f_{\lambda }(U_{t}) = \left \{\begin{array}{@{}l@{\quad }l@{}} \lambda ^{-1}(U_{ t}^{\lambda } - 1),\quad &U_{ t} \geq 0,\lambda > 0, \\ \ln U_{t}, \quad &U_{t} > 0,\lambda = 0, \end{array} \right.}$$

and the program ITSM provides the option (Transform>Box-Cox) of applying f λ (with \(0 \leq \lambda \leq 1.5\)) prior to the elimination of trend and/or seasonality from the data. In practice, if a Box–Cox transformation is necessary, it is often the case that either f 0 or f 0. 5 is adequate.

Trend and seasonality are usually detected by inspecting the graph of the (possibly transformed) series. However, they are also characterized by autocorrelation functions that are slowly decaying and nearly periodic, respectively. The elimination of trend and seasonality was discussed in Section 1.5, where we described two methods:

  1. (i)

    “classical decomposition” of the series into a trend component, a seasonal component, and a random residual component, and

  2. (ii)

    differencing.

The program ITSM (in the Transform option) offers a choice between these techniques. The results of applying methods (i) and (ii) to the transformed red wine data V t  = lnU t in Figure 1.17 are shown in Figures 6.11 and 6.12, respectively. Figure 6.11 was obtained from ITSM by estimating and removing from {V t } a linear trend component and a seasonal component with period 12. Figure 6.12 was obtained by applying the operator \({\bigl (1 - B^{12}\bigr )}\) to {V t }. Neither of the two resulting series displays any apparent deviations from stationarity, nor do their sample autocorrelation functions. The sample ACF and PACF of \(\big\{(1 - B^{12}\bigr )V _{t}\big\}\) are shown in Figures 6.13 and 6.14, respectively.

Fig. 6.11
figure 11

The Australian red wine data after taking natural logarithms and removing a seasonal component of period 12 and a linear trend

Fig. 6.12
figure 12

The Australian red wine data after taking natural logarithms and differencing at lag 12

Fig. 6.13
figure 13

The sample ACF of the data in Figure 6.12

Fig. 6.14
figure 14

The sample PACF of the data in Figure 6.12

After the elimination of trend and seasonality, it is still possible that the sample autocorrelation function may appear to be that of a nonstationary (or nearly nonstationary) process, in which case further differencing may be carried out.

In Section 1.5 we also mentioned a third possible approach:

  1. (iii)

    fitting a sum of harmonics and a polynomial trend to generate a noise sequence that consists of the residuals from the regression.

In Section 6.6 we discuss the modifications to classical least squares regression analysis that allow for dependence among the residuals from the regression. These modifications are implemented in the ITSM option Regression> Estimation> Generalized LS.

(b) Identification and Estimation. Let {X t } be the mean-corrected transformed series found as described in (a). The problem now is to find the most satisfactory ARMA( p, q) model to represent {X t }. If p and q were known in advance, this would be a straightforward application of the estimation techniques described in Chapter 5 However, this is usually not the case, so it becomes necessary also to identify appropriate values for p and q.

It might appear at first sight that the higher the values chosen for p and q, the better the resulting fitted model will be. However, as pointed out in Section 5.5, estimation of too large a number of parameters introduces estimation errors that adversely affect the use of the fitted model for prediction as illustrated in Section 5.4 We therefore minimize one of the model selection criteria discussed in Section 5.5 in order to choose the values of p and q. Each of these criteria includes a penalty term to discourage the fitting of too many parameters. We shall base our choice of p and q primarily on the minimization of the AICC statistic, defined as

$$\displaystyle{ \mathrm{AICC}(\phi,\theta ) = -2\ln L(\phi,\theta,S(\phi,\theta )/n) + 2(p + q + 1)n/(n - p - q - 2), }$$
(6.2.1)

where \(L(\phi,\theta,\sigma ^{2})\) is the likelihood of the data under the Gaussian ARMA model with parameters \({\bigl (\phi,\theta,\sigma ^{2}\bigr )}\), and \(S(\phi,\theta )\) is the residual sum of squares defined in (5.2.11). Once a model has been found that minimizes the AICC value, it is then necessary to check the model for goodness of fit (essentially by checking that the residuals are like white noise) as discussed in Section 5.3.

For any fixed values of p and q, the maximum likelihood estimates of ϕ and \(\theta\) are the values that minimize the AICC. Hence, the minimum AICC model (over any given range of p and q values) can be found by computing the maximum likelihood estimators for each fixed p and q and choosing from these the maximum likelihood model with the smallest value of AICC. This can be done with the program ITSM by using the option Model>Estimation>Autofit. When this option is selected and upper and lower bounds for p and q are specified, the program fits maximum likelihood models for each pair ( p, q) in the range specified and selects the model with smallest AICC value. If some of the coefficient estimates are small compared with their estimated standard deviations, maximum likelihood subset models (with those coefficients set to zero) can also be explored.

The steps in model identification and estimation can be summarized as follows:

  • After transforming the data (if necessary) to make the fitting of an ARMA(p, q) model reasonable, examine the sample ACF and PACF to get some idea of potential p and q values. Preliminary estimation using the ITSM option Model>Estimation>Preliminary is also useful in this respect. Burg’s algorithm with AICC minimization rapidly fits autoregressions of all orders up to 27 and selects the one with minimum AICC value. For preliminary estimation of models with q > 0, each pair (p, q) must be considered separately.

  • Select the option Model>Estimation>Autofit of ITSM. Specify the required limits for p and q, and the program will then use maximum likelihood estimation to find the minimum AICC model with p and q in the range specified.

  • Examination of the fitted coefficients and their standard errors may suggest that some of them can be set to zero. If this is the case, then a subset model can be fitted by clicking on the button Constrain optimization in the Maximum Likelihood Estimation dialog box and setting the selected coefficients to zero. Optimization will then give the maximum likelihood model with the chosen coefficients constrained to be zero. The constrained model is assessed by comparing its AICC value with those of the other candidate models.

  • Check the candidate model(s) for goodness of fit as described in Section 5.3. These tests can be performed by selecting the option Statistics>Residual Analysis.

Example 6.2.1

The Australian Red Wine Data

Let {X 1, , X 130} denote the series obtained from the red wine data of Example 1.1.1 after taking natural logarithms, differencing at lag 12, and subtracting the mean (0.0681) of the differences. The data prior to mean correction are shown in Figure 6.12. The sample PACF of {X t }, shown in Figure 6.14, suggests that an AR(12) model might be appropriate for this series. To explore this possibility we use the ITSM option Model>Estimation>Preliminary with Burg’s algorithm and AICC minimization. As anticipated, the fitted Burg models do indeed have minimum AICC when p = 12. The fitted model is

$$\displaystyle\begin{array}{rcl} \big(1 - 0.245B& -& 0.069B^{2} - 0.012B^{3} - 0.021B^{4} - 0.200B^{5}+0.025B^{6}+0.004B^{7} {}\\ & -& 0.133B^{8} + 0.010B^{9} - 0.095B^{10} + 0.118B^{11} + 0.384B^{12}\big)X_{ t} = Z_{t},{}\\ \end{array}$$

with \(\{Z_{t}\} \sim \mathrm{WN}(0,0.0135)\) and AICC value − 158. 77. Selecting the option Model> Estimation>Max likelihood then gives the maximum likelihood AR(12) model, which is very similar to the Burg model and has AICC value − 158. 87. Inspection of the standard errors of the coefficient estimators suggests the possibility of setting those at lags 2,3,4,6,7,9,10, and 11 equal to zero. If we do this by clicking on the Constrain optimization button in the Maximum Likelihood Estimation dialog box and then reoptimize, we obtain the model,

$$\displaystyle{{\bigl (1 - 0.270B - 0.224B^{5} - 0.149B^{8} + 0.099B^{11} + 0.353B^{12}\bigr )}X_{ t} = Z_{t},}$$

with \(\{Z_{t}\} \sim \mathrm{WN}(0,0.0138)\) and AICC value − 172. 49. 

In order to check more general ARMA(p, q) models, select the option Model> Estimation>Autofit and specify the minimum and maximum values of p and q to be zero and 15, respectively. (The sample ACF and PACF suggest that these limits should be more than adequate to include the minimum AICC model.) In a few minutes (depending on the speed of your computer) the program selects an ARMA(1,12) model with AICC value − 172. 74, which is slightly better than the subset AR(12) model just found. Inspection of the estimated standard deviations of the MA coefficients at lags 1, 3, 4, 6, 7, 9, and 11 suggests setting them equal to zero and reestimating the values of the remaining coefficients. If we do this by clicking on the Constrain optimization button in the Maximum Likelihood Estimation dialog box, setting the required coefficients to zero and then reoptimizing, we obtain the model,

$$\displaystyle{(1 - 0.286B)X_{t} = \left (1 + 0.127B^{2} + 0.183B^{5} + 0.177B^{8} + 0.181B^{10} - 0.554B^{12}\right )Z_{ t},}$$

with \(\{Z_{t}\} \sim \mathrm{WN}(0,0.0120)\) and AICC value − 184. 09. 

The subset ARMA(1,12) model easily passes all the goodness of fit tests in the Statistics>Residual Analysis option. In view of this and its small AICC value, we accept it as a plausible model for the transformed red wine series.

Example 6.2.2

The Lake Data

Let {Y t , t = 1, , 99} denote the lake data of Example 1.3.5. We have seen already in Example 5.2.5 that the ITSM option Model>Estimation>Autofit gives the minimum-AICC model

$$\displaystyle{X_{t}-0.7446X_{t-1}=Z_{t}+0.3213Z_{t-1},\ \ \{Z_{t}\} \sim \mathrm{WN}(0,0.4750),}$$

for the mean-corrected series X t  = Y t − 9. 0041. The corresponding AICC value is 212.77. Since the model passes all the goodness of fit tests, we accept it as a reasonable model for the data.

6.3 Unit Roots in Time Series Models

The unit root problem in time series arises when either the autoregressive or moving-average polynomial of an ARMA model has a root on or near the unit circle. A unit root in either of these polynomials has important implications for modeling. For example, a root near 1 of the autoregressive polynomial suggests that the data should be differenced before fitting an ARMA model, whereas a root near 1 of the moving-average polynomial indicates that the data were overdifferenced. In this section, we consider inference procedures for detecting the presence of a unit root in the autoregressive and moving-average polynomials.

6.3.1 Unit Roots in Autoregressions

In Section 6.1 we discussed the use of differencing to transform a nonstationary time series with a slowly decaying sample ACF and values near 1 at small lags into one with a rapidly decreasing sample ACF. The degree of differencing of a time series {X t } was largely determined by applying the difference operator repeatedly until the sample ACF of \(\big\{\nabla ^{d}X_{t}\big\}\) decays quickly. The differenced time series could then be modeled by a low-order ARMA( p, q) process, and hence the resulting ARIMA( p, d, q) model for the original data has an autoregressive polynomial \({\bigl (1 -\phi _{1}z -\cdots -\phi _{p}z^{p}\bigr )}(1 - z)^{d}\) [see (6.1.1)] with d roots on the unit circle. In this subsection we discuss a more systematic approach to testing for the presence of a unit root of the autoregressive polynomial in order to decide whether or not a time series should be differenced. This approach was pioneered by Dickey and Fuller (1979).

Let X 1, , X n be observations from the AR(1) model

$$\displaystyle{ X_{t}-\mu =\phi _{1}(X_{t-1}-\mu ) + Z_{t},\qquad \{Z_{t}\} \sim \mathrm{WN}{\bigl (0,\sigma ^{2}\bigr )}, }$$
(6.3.1)

where | ϕ 1 |  < 1 and μ = EX t . For large n, the maximum likelihood estimator \(\hat{\phi }_{1}\) of ϕ 1 is approximately N\({\bigl (\phi _{1},{\bigl (1 -\phi _{1}^{2}\bigr )}/n\bigr )}\). For the unit root case, this normal approximation is no longer applicable, even asymptotically, which precludes its use for testing the unit root hypothesis H 0: ϕ 1 = 1 vs. H 1: ϕ 1 < 1. To construct a test of H 0, write the model (6.3.1) as

$$\displaystyle{ \nabla X_{t} = X_{t} - X_{t-1} =\phi _{ 0}^{{\ast}} +\phi _{ 1}^{{\ast}}X_{ t-1} + Z_{t},\qquad \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ), }$$
(6.3.2)

where ϕ 0  = μ(1 −ϕ 1) and ϕ 1  = ϕ 1 − 1. Now let \(\hat{\phi }_{1}^{{\ast}}\) be the ordinary least squares (OLS) estimator of ϕ 1 found by regressing ∇X t on 1 and X t−1. The estimated standard error of \(\hat{\phi }_{1}^{{\ast}}\) is

$$\displaystyle{\widehat{\mathrm{SE}}\left (\hat{\phi }_{1}^{{\ast}}\right ) = S\left (\sum _{ t=2}^{n}\left (X_{ t-1} -\bar{ X}\right )^{2}\right )^{-1/2},}$$

where \(S^{2} =\sum _{ t=2}^{n}\left (\nabla X_{t} -\hat{\phi }_{0}^{{\ast}}-\hat{\phi }_{1}^{{\ast}}X_{t-1}\right )^{2}/(n - 3)\) and \(\bar{X}\) is the sample mean of X 1, , X n−1. Dickey and Fuller derived the limit distribution as n →  of the t-ratio

$$\displaystyle{ \hat{\tau }_{\mu }:=\hat{\phi }_{ 1}^{{\ast}}/\widehat{\mathrm{SE}}\left (\hat{\phi }_{ 1}^{{\ast}}\right ) }$$
(6.3.3)

under the unit root assumption ϕ 1  = 0, from which a test of the null hypothesis H 0: ϕ 1 = 1 can be constructed. The 0.01, 0.05, and 0.10 quantiles of the limit distribution of \(\hat{\tau }_{\mu }\) (see Table 8.5.2 of Fuller 1976) are − 3. 43, − 2. 86, and − 2. 57, respectively. The augmented Dickey–Fuller test then rejects the null hypothesis of a unit root, at say, level 0.05 if \(\hat{\tau }_{\mu } < -2.86\). Notice that the cutoff value for this test statistic is much smaller than the standard cutoff value of − 1. 645 obtained from the normal approximation to the t-distribution, so that the unit root hypothesis is less likely to be rejected using the correct limit distribution.

The above procedure can be extended to the case where {X t } follows the AR( p) model with mean μ given by

$$\displaystyle{X_{t } -\mu =\phi _{1}\left (X_{t-1}-\mu \right ) + \cdots +\phi _{p}\left (X_{t-p}-\mu \right ) + Z_{t},\qquad \{Z_{t}\} \sim \mathrm{WN}{\bigl (0,\sigma ^{2}\bigr )}.}$$

This model can be rewritten as (see Problem 6.2)

$$\displaystyle{ \nabla X_{t} =\phi _{ 0}^{{\ast}} +\phi _{ 1}^{{\ast}}X_{ t-1} +\phi _{ 2}^{{\ast}}\nabla X_{ t-1} + \cdots +\phi _{ p}^{{\ast}}\nabla X_{ t-p+1} + Z_{t}, }$$
(6.3.4)

where \(\phi _{0} =\mu \left (1 -\phi _{1} -\cdots -\phi _{p}\right ),\) ϕ 1  =  i = 1 p ϕ i − 1, and ϕ j  = − i = j p ϕ i , j = 2, , p. If the autoregressive polynomial has a unit root at 1, then \(0 =\phi \left (1\right ) = -\phi _{1}^{{\ast}}\), and the differenced series {∇X t } is an AR(p − 1) process. Consequently, testing the hypothesis of a unit root at 1 of the autoregressive polynomial is equivalent to testing ϕ 1  = 0. As in the AR(1) example, ϕ 1 can be estimated as the coefficient of X t−1 in the OLS regression of ∇X t onto \(1,X_{t-1},\nabla X_{t-1},\ldots,\nabla X_{t-p+1}\). For large n the t-ratio

$$\displaystyle{ \hat{\tau }_{\mu }:=\hat{\phi }_{ 1}^{{\ast}}/\widehat{\mathrm{SE}}\left (\hat{\phi }_{ 1}^{{\ast}}\right ), }$$
(6.3.5)

where \(\widehat{\mathrm{SE}}\left (\hat{\phi }_{1}^{{\ast}}\right )\) is the estimated standard error of \(\hat{\phi }_{1}^{{\ast}}\), has the same limit distribution as the test statistic in (6.3.3). The augmented Dickey–Fuller test in this case is applied in exactly the same manner as for the AR(1) case using the test statistic (6.3.5) and the cutoff values given above.

Example 6.3.1

Consider testing the time series of Example 6.1.1 (see Figure 6.1) for the presence of a unit root in the autoregressive operator. The sample PACF in Figure 6.3 suggests fitting an AR(2) or possibly an AR(3) model to the data. Regressing ∇X t on 1, X t−1, ∇X t−1, ∇X t−2 for t = 4, , 200 using OLS gives

$$\displaystyle\begin{array}{rcl} \nabla X_{t}& =& 0.1503 - 0.0041X_{t-1} + 0.9335\nabla X_{t-1} - 0.1548\nabla X_{t-2} + Z_{t}, {}\\ & & \!\!\!(0.1135)\ \ (0.0028) (0.0707) (0.0708) {}\\ \end{array}$$

where \(\{Z_{t}\} \sim \mathrm{WN}(0,0.9639)\). The test statistic for testing the presence of a unit root is

$$\displaystyle{\hat{\tau }_{\mu } = \frac{-0.0041} {0.0028} = -1.464.}$$

Since − 1. 464 > −2. 57, the unit root hypothesis is not rejected at level 0.10. In contrast, if we had mistakenly used the t-distribution with 193 degrees of freedom as an approximation to \(\hat{\tau }_{\mu }\), then we would have rejected the unit root hypothesis at the 0.10 level (p-value is 0.074). The t-ratios for the other coefficients, ϕ 0 , ϕ 2 , and ϕ 3 , have an approximate t-distribution with 193 degrees of freedom. Based on these t-ratios, the intercept should be 0, while the coefficient of ∇X t−2 is barely significant. The evidence is much stronger in favor of a unit root if the analysis is repeated without a mean term. The fitted model without a mean term is

$$\displaystyle\begin{array}{rcl} \nabla X_{t}& =& 0.0012X_{t-1} + 0.9395\nabla X_{t-1} - 0.1585\nabla X_{t-2} + Z_{t}, {}\\ & & \!\!(0.0018) (0.0707) (0.0709) {}\\ \end{array}$$

where \(\{Z_{t}\} \sim \mathrm{WN}(0,0.9677)\). The 0.01, 0.05, and 0.10 cutoff values for the corresponding test statistic when a mean term is excluded from the model are − 2. 58, − 1. 95, and − 1. 62 (see Table 8.5.2 of Fuller 1976). In this example, the test statistic is

$$\displaystyle{\hat{\tau }= \frac{-0.0012} {0.0018} = -0.667,}$$

which is substantially larger than the 0.10 cutoff value of − 1. 62.

Further extensions of the above test to AR models with \(p = O{\bigl (n^{1/3}\bigr )}\) and to ARMA( p, q) models can be found in Said and Dickey (1984). However, as reported in Schwert (1987) and Pantula (1991), this test must be used with caution if the underlying model orders are not correctly specified.

6.3.2 Unit Roots in Moving Averages

A unit root in the moving-average polynomial can have a number of interpretations depending on the modeling application. For example, let {X t } be a causal and invertible ARMA( p, q) process satisfying the equations

$$\displaystyle{\phi (B)X_{t} =\theta (B)Z_{t},\qquad \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ).}$$

Then the differenced series Y t : = ∇X t is a noninvertible ARMA( p, q + 1) process with moving-average polynomial θ(z)(1 − z). Consequently, testing for a unit root in the moving-average polynomial is equivalent to testing that the time series has been overdifferenced.

As a second application, it is possible to distinguish between the competing models

$$\displaystyle{\nabla ^{k}X_{ t} = a + V _{t}}$$

and

$$\displaystyle{X_{t} = c_{0} + c_{1}t + \cdots + c_{k}t^{\,k} + W_{ t},}$$

where {V t } and {W t } are invertible ARMA processes. For the former model the differenced series \(\big\{\nabla ^{k}X_{t}\big\}\) has no moving-average unit roots, while for the latter model {∇k X t } has a multiple moving-average unit root of order k. We can therefore distinguish between the two models by using the observed values of \(\left \{\nabla ^{k}X_{t}\right \}\) to test for the presence of a moving-average unit root.

We confine our discussion of unit root tests to first-order moving-average models, the general case being considerably more complicated and not fully resolved. Let X 1, , X n be observations from the MA(1) model

$$\displaystyle{X_{t} = Z_{t} +\theta Z_{t-1},\qquad \{Z_{t}\} \sim \mathrm{IID}\left (0,\sigma ^{2}\right ).}$$

Davis and Dunsmuir (1996) showed that under the assumption θ = −1, \(n(\hat{\theta }+1)\) (\(\hat{\theta }\) is the maximum likelihood estimator) converges in distribution. A test of H 0: θ = −1 vs. H 1: θ > −1 can be fashioned on this limiting result by rejecting H 0 when

$$\displaystyle{\hat{\theta }> -1 + c_{\alpha }/n,}$$

where c α is the (1 −α) quantile of the limit distribution of \(n{\bigl (\hat{\theta } + 1\bigr )}\). (From Table 3.2 of Davis et al. (1995), c 0. 01 = 11. 93, c 0. 05 = 6. 80, and c 0. 10 = 4. 90.) In particular, if n = 50, then the null hypothesis is rejected at level 0.05 if \(\hat{\theta }> -1 + 6.80/50 = -0.864\).

The likelihood ratio test can also be used for testing the unit root hypothesis. The likelihood ratio for this problem is \(L(-1,S(-1)/n)/L\left (\hat{\theta },\hat{\sigma }^{2}\right )\), where \(L\left (\theta,\sigma ^{2}\right )\) is the Gaussian likelihood of the data based on an MA(1) model, S(−1) is the sum of squares given by (5.2.11) when θ = −1, and \(\hat{\theta }\) and \(\hat{\sigma }^{2}\) are the maximum likelihood estimators of θ and σ 2. The null hypothesis is rejected at level α if

$$\displaystyle{\lambda _{n}:= -2\ln \left (\frac{L(-1,S(-1)/n)} {L\left (\hat{\theta },\hat{\sigma }^{2}\right )} \right ) > c_{\mathrm{LR},\alpha }}$$

where the cutoff value is chosen such that P θ = −1[λ n  > c LR, α ] = α. The limit distribution of λ n was derived by Davis et al. (1995), who also gave selected quantiles of the limit. It was found that these quantiles provide a good approximation to their finite-sample counterparts for time series of length n ≥ 50. The limiting quantiles for λ n under H 0 are c LR, 0. 01 = 4. 41, c LR, 0. 05 = 1. 94, and c LR, 0. 10 = 1. 00.

Example 6.3.2

For the overshort data {X t } of Example 3.2.8, the maximum likelihood MA(1) model for the mean corrected data \(\{Y _{t} = X_{t} + 4.035\}\) was (see Example 5.4.1)

$$\displaystyle{Y _{t} = Z_{t} - 0.818Z_{t-1},\quad \{Z_{t}\} \sim \mathrm{WN}(0,2040.75).}$$

In the structural formulation of this model given in Example 3.2.8, the moving-average parameter θ was related to the measurement error variances σ U 2 and σ V 2 through the equation

$$\displaystyle{ \frac{\theta } {1 +\theta ^{2}} = \frac{-\sigma _{U}^{2}} {2\sigma _{U}^{2} +\sigma _{ V }^{2}}.}$$

(These error variances correspond to the daily measured amounts of fuel in the tank and the daily measured adjustments due to sales and deliveries.) A value of θ = −1 indicates that there is no appreciable measurement error due to sales and deliveries (i.e., σ V 2 = 0), and hence testing for a unit root in this case is equivalent to testing that σ U 2 = 0. Assuming that the mean is known, the unit root hypothesis is rejected at α = 0. 05, since \(-0.818 > -1 + 6.80/57 = -0.881\). The evidence against H 0 is stronger using the likelihood ratio statistic. Using ITSM and entering the MA(1) model θ = −1 and σ 2 = 2203. 12, we find that − 2lnL(−1, 2203. 12) = 604. 584, while \(-2\ln L(\hat{\theta },\hat{\sigma }^{2}) = 597.267\). Comparing the likelihood ratio statistic λ n  = 604. 584 − 597. 267 = 7. 317 with the cutoff value c LR, 0. 01, we reject H 0 at level α = 0. 01 and conclude that the measurement error associated with sales and deliveries is nonzero.

In the above example it was assumed that the mean was known. In practice, these tests should be adjusted for the fact that the mean is also being estimated.

Tanaka (1990) proposed a locally best invariant unbiased (LBIU) test for the unit root hypothesis. It was found that the LBIU test has slightly greater power than the likelihood ratio test for alternatives close to θ = −1 but has less power for alternatives further away from − 1 (see Davis et al. 1995). The LBIU test has been extended to cover more general models by Tanaka (1990) and Tam and Reinsel (1995). Similar extensions to tests based on the maximum likelihood estimator and the likelihood ratio statistic have been explored in Davis et al. (1996).

6.4 Forecasting ARIMA Models

In this section we demonstrate how the methods of Sections 3.3 and 5.4 can be adapted to forecast the future values of an ARIMA( p, d, q) process {X t }. (The required numerical calculations can all be carried out using the program ITSM.)

If d ≥ 1, the first and second moments EX t and E(X t+h X t ) are not determined by the difference equations (6.1.1). We cannot expect, therefore, to determine best linear predictors for {X t } without further assumptions.

For example, suppose that {Y t } is a causal ARMA( p, q) process and that X 0 is any random variable. Define

$$\displaystyle{X_{t} = X_{0} +\sum _{ j=1}^{t}Y _{ j},\quad t = 1,2,\ldots.}$$

Then {X t , t ≥ 0} is an ARIMA(p, 1, q) process with mean EX t  = EX 0 and autocovariances E(X t+h X t ) − (EX 0)2 that depend on Var(X 0) and Cov(X 0, Y j ), j = 1, 2, . The best linear predictor of X n+1 based on {1, X 0, X 1, , X n } is the same as the best linear predictor in terms of the set {1, X 0, Y 1, , Y n }, since each linear combination of the latter is a linear combination of the former and vice versa. Hence, using P n to denote best linear predictor in terms of either set and using the linearity of P n , we can write

$$\displaystyle{ P_{n}X_{n+1} = P_{n}(X_{0}+Y _{1}+\cdots +Y _{n+1}) = P_{n}(X_{n}+Y _{n+1}) = X_{n}+P_{n}Y _{n+1}. }$$

To evaluate P n Y n+1 it is necessary (see Section 2.5) to know E(X 0 Y j ), j​ = ​ 1, , n​ +​ 1, and EX 0 2. However, if we assume that X 0 is uncorrelated with {Y t , t ≥ 1}, then P n Y n+1 is the same (Problem 6.5) as the best linear predictor \(\hat{Y }_{n+1}\) of Y n+1 in terms of {1, Y 1, , Y n }, which can be calculated as described in Section 3.3 The assumption that X 0 is uncorrelated with Y 1, Y 2,  is therefore sufficient to determine the best linear predictor P n X n+1 in this case.

Turning now to the general case, we shall assume that our observed process {X t } satisfies the difference equations

$$\displaystyle{ (1 - B)^{d}X_{ t} = Y _{t},\quad t = 1,2,\ldots, }$$

where {Y t } is a causal ARMA(p, q) process, and that the random vector (X 1−d , …, X 0) is uncorrelated with Y t , t > 0. The difference equations can be rewritten in the form

$$\displaystyle{ X_{t} = Y _{t} -\sum _{j=1}^{d}{d\choose j}(-1)^{\,j}X_{ t-j},\quad t = 1,2,\ldots. }$$
(6.4.1)

It is convenient, by relabeling the time axis if necessary, to assume that we observe X 1−d , X 2−d , , X n . (The observed values of {Y t } are then Y 1, , Y n .) As usual, we shall use P n to denote best linear prediction in terms of the observations up to time n (in this case 1, X 1−d , , X n or equivalently 1, X 1−d , , X 0, Y 1, , Y n ).

Our goal is to compute the best linear predictors P n X n+h . This can be done by applying the operator P n to each side of (6.4.1) (with t = n + h) and using the linearity of P n to obtain

$$\displaystyle{ P_{n}X_{n+h} = P_{n}Y _{n+h} -\sum _{j=1}^{d}{d\choose j}(-1)^{\,j}P_{ n}X_{n+h-j}. }$$
(6.4.2)

Now the assumption that (X 1−d , , X 0) is uncorrelated with Y t , t > 0, enables us to identify P n Y n+h with the best linear predictor of Y n+h in terms of {1, Y 1, , Y n }, and this can be calculated as described in Section 3.3 The predictor P n X n+1 is obtained directly from (6.4.2) by noting that P n X n+1−j  = X n+1−j for each j ≥ 1. The predictor P n X n+2 can then be found from (6.4.2) using the previously calculated value of P n X n+1. The predictors P n X n+3, P n X n+4,  can be computed recursively in the same way.

To find the mean squared error of prediction it is convenient to express P n Y n+h in terms of {X j }. For n ≥ 0 we denote the one-step predictors by \(\hat{Y }_{n+1} = P_{n}Y _{n+1}\) and \(\hat{X}_{n+1} = P_{n}X_{n+1}\). Then from (6.4.1) and (6.4.2) we have

$$\displaystyle{ X_{n+1} -\hat{ X}_{n+1} = Y _{n+1} -\hat{ Y }_{n+1},\quad n \geq 1, }$$

and hence from (3.3.12), if n > m = max(p, q) and \(h \geq 1\), we can write

$$\displaystyle{ P_{n}Y _{n+h} =\sum _{ i=1}^{p}\phi _{ i}P_{n}Y _{n+h-i} +\sum _{ j=h}^{q}\theta _{ n+h-1,\,j}\left (X_{n+h-j} -\hat{ X}_{n+h-j}\right ). }$$
(6.4.3)

Setting ϕ (z) = (1 − z)d ϕ(z) = 1 −ϕ 1 z −⋯ −ϕ p+d z p+d, we find from (6.4.2) and (6.4.3) that

$$\displaystyle{ P_{n}X_{n+h} =\sum _{ j=1}^{p+d}\phi _{ j}^{{\ast}}P_{ n}X_{n+h-j}+\sum _{j=h}^{q}\theta _{ n+h-1,\,j}\left (X_{n+h-j} -\hat{ X}_{n+h-j}\right ), }$$
(6.4.4)

which is analogous to the h-step prediction formula (3.3.12) for an ARMA process. As in (3.3.13), the mean squared error of the h-step predictor is

$$\displaystyle{ \sigma _{n}^{2}(h) = E(X_{ n+h} - P_{n}X_{n+h})^{2} =\sum _{ j=0}^{h-1}\left (\sum _{ r=0}^{j}\chi _{ r}\theta _{n+h-r-1,\,j-r}\right )^{2}v_{ n+h-j-1}, }$$
(6.4.5)

where θ n0 = 1,

$$\displaystyle{\chi (z) =\sum _{ r=0}^{\infty }\chi _{ r}z^{r} = \left (1 -\phi _{ 1}^{{\ast}}z -\cdots -\phi _{ p+d}^{{\ast}}z^{p+d}\right )^{-1},}$$

and

$$\displaystyle{v_{n+h-j-1} = E\left (X_{n+h-j} -\hat{ X}_{n+h-j}\right )^{2} = E\left (Y _{ n+h-j} -\hat{ Y }_{n+h-j}\right )^{2}.}$$

The coefficients χ j can be found from the recursions (3.3.14) with ϕ j replacing ϕ j . For large n we can approximate (6.4.5), provided that θ(⋅ ) is invertible, by

$$\displaystyle{ \sigma _{n}^{2}(h) =\sum _{ j=0}^{h-1}\psi _{ j}^{2}\sigma ^{2}, }$$
(6.4.6)

where

$$\displaystyle{ \psi (z) =\sum _{ j=0}^{\infty }\psi _{ j}z^{\,j} = (\phi ^{{\ast}}(z))^{-1}\theta (z). }$$

6.4.1 The Forecast Function

Inspection of equation (6.4.4) shows that for fixed n > m = max(p, q), the h-step predictors

$$\displaystyle{g(h):= P_{n}X_{n+h},}$$

satisfy the homogeneous linear difference equations

$$\displaystyle{ g(h) -\phi _{1}^{{\ast}}g(h - 1) -\cdots -\phi _{ p+d}^{{\ast}}g(h - p - d) = 0,\quad h > q, }$$
(6.4.7)

where ϕ 1 , , ϕ p+d are the coefficients of z, , z p+d in

$$\displaystyle{\phi ^{{\ast}}(z) = (1 - z)^{d}\phi (z).}$$

The solution of (6.4.7) is well known from the theory of linear difference equations (see Brockwell and Davis (1991), Section 3.6). If we assume that the zeros of ϕ(z) (denoted by ξ 1, , ξ p ) are all distinct, then the solution is

$$\displaystyle{ g(h) = a_{0} + a_{1}h + \cdots + a_{d-1}h^{d-1} + b_{ 1}\xi _{1}^{-h} + \cdots + b_{ p}\xi _{p}^{-h},\quad h > q - p - d, }$$
(6.4.8)

where the coefficients a 0, , a d−1 and b 1, , b p can be determined from the p + d equations obtained by equating the right-hand side of (6.4.8) for qpd < h ≤ q with the corresponding value of g(h) computed numerically (for h ≤ 0, P n X n+h  = X n+h , and for 1 ≤ h ≤ q, P n X n+h can be computed from (6.4.4) as already described). Once the constants a i and b i have been evaluated, the algebraic expression (6.4.8) gives the predictors for all h > qpd. In the case q = 0, the values of g(h) in the equations for a 0, , a d−1, b 1, , b p are simply the observed values g(h)​ = ​ X n+h ,  − pd ≤ h ≤ 0, and the expression (6.4.6) for the mean squared error is exact.

The calculation of the forecast function is easily generalized to deal with more complicated ARIMA processes. For example, if the observations X −13, X −12, , X n are differenced at lags 12 and 1, and \((1 - B){\bigl (1 - B^{12}\bigr )}X_{t}\) is modeled as a causal invertible ARMA(p, q) process with mean μ and max(p, q) < n, then {X t } satisfies an equation of the form

$$\displaystyle{ \phi (B)[(1 - B)(1 - B^{12})X_{ t}-\mu ] =\theta (B)Z_{t},\ \ \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ), }$$
(6.4.9)

and the forecast function g(h) = P n X n+h satisfies the analogue of (6.4.7), namely,

$$\displaystyle{ \phi (B)(1 - B)(1 - B^{12})g(h) =\phi (1)\mu,\ \ h > q. }$$
(6.4.10)

To find the general solution of these inhomogeneous linear difference equations, it suffices (see Brockwell and Davis (1991), Section 3.6) to find one particular solution of (6.4.10) and then add to it the general solution of the same equations with the right-hand side set equal to zero. A particular solution is easily found (by trial and error) to be

$$\displaystyle{g(h) ={ \mu h^{2} \over 24},}$$

and the general solution is therefore

$$\displaystyle\begin{array}{rcl} g(h) ={ \mu h^{2} \over 24} + a_{0} + a_{1}h +\sum _{ j=1}^{11}c_{ j}e^{ij\pi /6} + b_{ 1}\xi _{1}^{-h} + \cdots + b_{ p}\xi _{p}^{-h},& & \\ \quad h > q - p - 13.& &{}\end{array}$$
(6.4.11)

(The terms a 0 and a 1 h correspond to the double root z = 1 of the equation ϕ(z)(1 − z)(1 − z 12) = 0, and the subsequent terms to each of the other roots, which we assume to be distinct.) For qp − 13 < h ≤ 0, g(h) = X n+h , and for 1 ≤ h ≤ q, the values of g(h) = P n X n+h can be determined recursively from the equations

$$\displaystyle{P_{n}X_{n+h} =\mu +P_{n}X_{n-1} + P_{n}X_{n-12} - P_{n}X_{n-13} + P_{n}Y _{n+h},}$$

where {Y t } is the ARMA process \(Y _{t} = (1 - B){\bigl (1 - B^{12}\bigr )}X_{t}-\mu\). Substituting these values of g(h) into (6.4.11), we obtain a set of p + 13 equations for the coefficients a i , b j , and c k . Solving these equations then completes the determination of g(h).

The large-sample approximation to the mean squared error is again given by (6.4.6), with ψ j redefined as the coefficient of z j in the power series expansion of \(\theta (z)/\big[(1 - z){\bigl (1 - z^{12}\bigr )}\phi (z)\big]\).

Example 6.4.1

An ARIMA(1,1,0) Model

In Example 5.2.4 we found the maximum likelihood AR(1) model for the mean-corrected differences X t of the Dow Jones Utilities Index (August 28–December 18, 1972). The model was

$$\displaystyle{ X_{t} - 0.4471X_{t-1} = Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}(0,0.1455), }$$
(6.4.12)

where X t  = D t D t−1 − 0. 1336,  t = 1, , 77, and {D t , t = 0, 1, 2, , 77} is the original series. The model for {D t } is thus

$$\displaystyle{(1 - 0.4471B)[(1 - B)D_{t} - 0.1336] = Z_{t},\ \ \{Z_{t}\} \sim \mathrm{WN}(0,0.1455).}$$

The recursions for g(h) therefore take the form

$$\displaystyle{ (1 - 0.4471B)(1 - B)g(h) = 0.5529 \times 0.1336 = 0.07387,\ \ h > 0. }$$
(6.4.13)

A particular solution of these equations is g(h) = 0. 1336h, so the general solution is

$$\displaystyle{ g(h) = 0.1336h + a + b(0.4471)^{h},\ \ h > -2. }$$
(6.4.14)

Substituting g(−1) = D 76 = 122 and g(0) = D 77 = 121. 23 in the equations with h = −1 and h = 0, and solving for a and b gives

$$\displaystyle{g(h) = 0.1366h + 120.50 + 0.7331(0.4471)^{h}.}$$

Setting h = 1 and h = 2 gives

$$\displaystyle{P_{77}D_{78} = 120.97\ \ \mathrm{and}\ \ P_{77}D_{79} = 120.94.}$$

From (6.4.5) we find that the corresponding mean squared errors are

$$\displaystyle{\sigma _{77}^{2}(1) = v_{ 77} =\sigma ^{2} = 0.1455\phantom{aaaaaaaaaaaaaaaaaaa}}$$

and

$$\displaystyle{\sigma _{77}^{2}(2) = v_{ 78} + \phi _{1}^{{\ast}}{}^{2}v_{ 77} =\sigma ^{2}\left (1 + 1.4471^{2}\right ) = 0.4502.}$$

(Notice that the approximation (6.4.6) is exact in this case.) The predictors and their mean squared errors are easily obtained from the program ITSM by opening the file DOWJ.TSM, differencing at lag 1, fitting a preliminary AR(1) model to the mean-corrected data with Burg’s algorithm, and selecting Model>Estimation>Max likelihood to find the maximum likelihood AR(1) model. Predicted values and their mean squared errors are then found using the option Forecasting>ARMA

6.5 Seasonal ARIMA Models

We have already seen how differencing the series {X t } at lag s is a convenient way of eliminating a seasonal component of period s. If we fit an ARMA( p, q) model ϕ(B)Y t  = θ(B)Z t to the differenced series Y t  = (1 − B s)X t , then the model for the original series is \(\phi (B)\left (1 - B^{s}\right )X_{t} =\theta (B)Z_{t}\). This is a special case of the general seasonal ARIMA (SARIMA) model defined as follows.

Definition 6.5.1

If d and D are nonnegative integers, then {X t } is a seasonal ARIMA \((\,\mathbf{\mathit{p,\,d,\,q}})\boldsymbol{ \times } (\mathbf{\mathit{P,\,D,\,Q}})_{s}\) process with period s if the differenced series Y t  = (1−B)d(1−B s)DX t is a causal ARMA process defined by

$$\displaystyle{ \phi (B)\varPhi \left (B^{s}\right )Y _{ t} =\theta (B)\varTheta \left (B^{s}\right )Z_{ t},\quad \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ), }$$
(6.5.1)

where ϕ(z) = 1 −ϕ 1 z −⋯ −ϕ p z p, Φ(z) = 1 −Φ 1 z −⋯ −Φ P z P, \(\theta (z) = 1 +\theta _{1}z + \cdots +\theta _{q}z^{q}\), and Θ(z) = 1 +Θ 1 z + ⋯ +Θ Q z Q.

Remark 1.

Note that the process {Y t } is causal if and only if ϕ(z) ≠ 0 and Φ(z) ≠ 0 for | z | ≤ 1. In applications D is rarely more than one, and P and Q are typically less than three. □ 

Remark 2.

Equation (6.5.1) satisfied by the differenced process {Y t } can be rewritten in the equivalent form

$$\displaystyle{ \phi ^{{\ast}}(B)Y _{ t} =\theta ^{{\ast}}(B)Z_{ t}, }$$
(6.5.2)

where ϕ (⋅ ), θ (⋅ ) are polynomials of degree p + sP and q + sQ, respectively, whose coefficients can all be expressed in terms of ϕ 1, , ϕ p , Φ 1, , Φ P , θ 1, , θ q , and Θ 1, , Θ Q . Provided that p < s and q < s, the constraints on the coefficients of ϕ (⋅ ) and θ (⋅ ) can all be expressed as multiplicative relations

$$\displaystyle{\phi _{is+j}^{{\ast}} =\phi _{ is}^{{\ast}}\phi _{ j}^{{\ast}},\quad i = 1,2,\ldots;\;\;j = 1,\ldots,s - 1,}$$

and

$$\displaystyle{\theta _{is+j}^{{\ast}} =\theta _{ is}^{{\ast}}\theta _{ j}^{{\ast}},\quad i = 1,2,\ldots;\;\;j = 1,\ldots,s - 1.}$$

In Section 1.5 we discussed the classical decomposition model incorporating trend, seasonality, and random noise, namely, X t  = m t + s t + Y t . In modeling real data it might not be reasonable to assume, as in the classical decomposition model, that the seasonal component s t repeats itself precisely in the same way cycle after cycle. Seasonal ARIMA models allow for randomness in the seasonal pattern from one cycle to the next. □ 

Example 6.5.1

Suppose we have r years of monthly data, which we tabulate as follows:

Table 1

Each column in this table may itself be viewed as a realization of a time series. Suppose that each one of these twelve time series is generated by the same ARMA(P, Q) model, or more specifically, that the series corresponding to the jth month, Y j+12t , t = 0, , r − 1, satisfies a difference equation of the form

$$ \displaystyle\begin{array}{rcl} Y _{j+12t}& =& \varPhi _{1}Y _{j+12(t-1)} + \cdots +\varPhi _{P}Y _{j+12(t-P)} + U_{j+12t} \\ \hspace{-60.0pt}&&+\varTheta _{1 } U_{j+12(t-1)} + \cdots +\varTheta _{Q}U_{j+12(t-Q)}, {}\end{array}$$
(6.5.3)

where

$$\displaystyle{ \{U_{j+12t},t =\ldots,-1,0,1,\ldots \}\sim \mathrm{WN}\left (0,\sigma _{U}^{2}\right ). }$$
(6.5.4)

Then since the same ARMA(P, Q) model is assumed to apply to each month, (6.5.3) holds for each j = 1, , 12. (Notice, however, that E(U t U t+h ) is not necessarily zero except when h is an integer multiple of 12.) We can thus write (6.5.3) in the compact form

$$\displaystyle{ \varPhi \left (B^{12}\right )Y _{ t} =\varTheta \left (B^{12}\right )U_{ t}, }$$
(6.5.5)

where Φ(z) = 1 −Φ 1 z −⋯ −Φ P z P, Θ(z) = 1 +Θ 1 z + ⋯ +Θ Q z Q, and \(\{U_{j+12t},t =\ldots,-1,0,1,\ldots \}\sim \mathrm{WN}\left (0,\sigma _{U}^{2}\right )\) for each j. We refer to the model (6.5.5) as the between-year model.

Example 6.5.2

​​Suppose P = 0, Q = 1, and Θ 1 = −0. 4 in (6.5.5). Then the series for any particular month is a moving-average of order 1. If E(U t U t+h ) = 0 for all h, i.e., if the white noise sequences for different months are uncorrelated with each other, then the columns themselves are uncorrelated. The correlation function for such a process is shown in Figure 6.15.

Fig. 6.15
figure 15

The ACF of the model X t  = U t − 0. 4U t−12 of Example 6.5.2

Example 6.5.3

​​Suppose P = 1, Q = 0, and Φ 1 = 0. 7 in (6.5.5). In this case the 12 series (one for each month) are AR(1) processes that are uncorrelated if the white noise sequences for different months are uncorrelated. A graph of the autocorrelation function of this process is shown in Figure 6.16.

Fig. 6.16
figure 16

The ACF of the model X t − 0. 7X t−12 = U t of Example 6.5.3

In each of the Examples 6.5.16.5.3, the 12 series corresponding to the different months are uncorrelated. To incorporate dependence between these series we allow the process {U t } in (6.5.5) to follow an ARMA( p, q) model,

$$\displaystyle{ \phi (B)U_{t} =\theta (B)Z_{t},\qquad \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ). }$$
(6.5.6)

This assumption implies possible nonzero correlation not only between consecutive values of U t , but also within the 12 sequences {U j+12t , t = , −1, 0, 1, }, each of which was assumed to be uncorrelated in the preceding examples. In this case (6.5.4) may no longer hold; however, the coefficients in (6.5.6) will frequently have values such that E(U t U t+12j ) is small for j = ±1, ±2, . Combining the two models (6.5.5) and (6.5.6) and allowing for possible differencing leads directly to Definition 6.5.1 of the general SARIMA model as given above.

The first steps in identifying SARIMA models for a (possibly transformed) data set are to find d and D so as to make the differenced observations

$$\displaystyle{Y _{t} = (1 - B)^{d}\left (1 - B^{s}\right )^{D}X_{ t}}$$

stationary in appearance (see Sections 6.16.3). Next we examine the sample ACF and PACF of {Y t } at lags that are multiples of s for an indication of the orders P and Q in the model (6.5.5). If \(\hat{\rho }(\cdot )\) is the sample ACF of {Y t }, then P and Q should be chosen such that \(\hat{\rho }(ks)\), k = 1, 2, , is compatible with the ACF of an ARMA( P, Q) process. The orders p and q are then selected by trying to match \(\hat{\rho }(1),\ldots,\hat{\rho }(s - 1)\) with the ACF of an ARMA( p, q) process. Ultimately, the AICC criterion (Section 5.5) and the goodness of fit tests (Section 5.3) are used to select the best SARIMA model from competing alternatives.

For given values of p, d, q, P, D, and Q, the parameters ϕ, θ, Φ, Θ, and σ 2 can be found using the maximum likelihood procedure of Section 5.2 The differences \(Y _{t} = (1 - B)^{d}{\bigl (1 - B^{s}\bigr )}^{D}X_{t}\) constitute an ARMA( p + sP, q + sQ) process in which some of the coefficients are zero and the rest are functions of the ( p + P + q + Q)-dimensional vector β′ = (ϕ′, Φ′, θ′, Θ′). For any fixed β the reduced likelihood (β) of the differences Y t+d+sD , , Y n is easily computed as described in Section 5.2 The maximum likelihood estimator of β is the value that minimizes (β), and the maximum likelihood estimate of σ 2 is given by (5.2.10). The estimates can be found using the program ITSM by specifying the required multiplicative relationships among the coefficients as given in Remark 2 above.

A more direct approach to modeling the differenced series {Y t } is simply to fit a subset ARMA model of the form (6.5.2) without making use of the multiplicative form of ϕ (⋅ ) and θ (⋅ ) in (6.5.1).

Example 6.5.4

Monthly Accidental Deaths

In Figure 1.27 we showed the series \(\big\{Y _{t} ={\bigl ( 1 - B^{12}\bigr )}(1 - B)X_{t}\big\}\) obtained by differencing the accidental deaths series {X t } once at lag 12 and once at lag 1. The sample ACF of {Y t } is shown in Figure 6.17.

Fig. 6.17
figure 17

The sample ACF of the differenced accidental deaths {∇∇12 X t }

The values \(\hat{\rho }(12) = -0.333\), \(\hat{\rho }(24) = -0.099\), and \(\hat{\rho }(36) = 0.013\) suggest a moving-average of order 1 for the between-year model (i.e., P = 0 and Q = 1). Moreover, inspection of \(\hat{\rho }(1),\ldots,\hat{\rho }(11)\) suggests that ρ(1) is the only short-term correlation different from zero, so we also choose a moving-average of order 1 for the between-month model (i.e., p = 0 and q = 1). Taking into account the sample mean (28.831) of the differences {Y t }, we therefore arrive at the model

$$\displaystyle{ Y _{t} = 28.831 + (1 +\theta _{1}B)(1 +\varTheta _{1}B^{12})Z_{ t},\quad \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ), }$$
(6.5.7)

for the series {Y t }. The maximum likelihood estimates of the parameters are obtained from ITSM by opening the file DEATHS.TSM and proceeding as follows. After differencing (at lags 1 and 12) and then mean-correcting the data, choose the option Model>Specify. In the dialog box enter an MA(13) model with θ 1 = −0. 3, θ 12 = −0. 3, θ 13 = 0. 09, and all other coefficients zero. (This corresponds to the initial guess \(Y _{t} = (1 - 0.3B){\bigl (1 - 0.3B^{12}\bigr )}Z_{t}\).) Then choose Model>Estimation>Max likelihood and click on the button Constrain optimization. Specify the number of multiplicative relations (one in this case) in the box provided and define the relationship by entering 1, 12, 13 to indicate that θ 1 ×θ 12 = θ 13. Click OK to return to the Maximum Likelihood dialog box. Click OK again to obtain the parameter estimates

$$\displaystyle{\hat{\theta }_{1} = -0.478,}$$
$$\displaystyle{\varTheta _{1} = -0.591,}$$

and

$$\displaystyle{ \hat{\sigma }^{2} = 94,255, }$$

with AICC value 855.53. The corresponding fitted model for {X t } is thus the SARIMA(0, 1, 1) × (0, 1, 1)12 process

$$\displaystyle{ \nabla \nabla _{12}X_{t} = 28.831 + (1 - 0.478B)\left (1 - 0.591B^{12}\right )Z_{ t}, }$$
(6.5.8)

where \(\{Z_{t}\} \sim \mathrm{WN}(0,94390)\).

If we adopt the alternative approach of fitting a subset ARMA model to {Y t } without seeking a multiplicative structure for the operators ϕ (B) and θ (B) in (6.5.2), we begin by fitting a preliminary MA(13) model (as suggested by Figure 6.17) to the series {Y t }. We then fit a maximum likelihood MA(13) model and examine the standard errors of the coefficient estimators. This suggests setting the coefficients at lags 2, 3, 8, 10, and 11 equal to zero, since these are all less than one standard error from zero. To do this select Model>Estimation>Max likelihood and click on the button Constrain optimization. Then highlight the coefficients to be set to zero and click on the button Set to zero. Click OK to return to the Maximum Likelihood Estimation dialog box and again to carry out the constrained optimization. The coefficients that have been set to zero will be held at that value, and the optimization will be with respect to the remaining coefficients. This gives a model with substantially smaller AICC than the unconstrained MA(13) model. Examining the standard errors again we see that the coefficients at lags 4, 5, and 7 are promising candidates to be set to zero, since each of them is less than one standard error from zero. Setting these coefficients to zero in the same way and reoptimizing gives a further reduction in AICC. Setting the coefficient at lag 9 to zero and reoptimizing again gives a further reduction in AICC (to 855.61) and the fitted model

$$\displaystyle\begin{array}{rcl} & & \nabla \nabla _{12}X_{t} = 28.831 + Z_{t} - 0.596Z_{t-1} - 0.407Z_{t-6} - 0.685Z_{t-12} + 0.460Z_{t-13}, \\ & & \quad \quad \quad \quad \quad \{Z_{t}\} \sim \mathrm{WN}(0,71240). {}\end{array}$$
(6.5.9)

The AICC value 855.61 is quite close to the value 855.53 for the model (6.5.8). The residuals from the two models are also very similar, the randomness tests (with the exception of the difference-sign test) yielding high p-values for both.

6.5.1 Forecasting SARIMA Processes

Forecasting SARIMA processes is completely analogous to the forecasting of ARIMA processes discussed in Section 6.4. Expanding out the operator (1 − B)d \({\bigl (1 - B^{s}\bigr )}^{D}\) in powers of B, rearranging the equation

$$\displaystyle{(1 - B)^{d}\left (1 - B^{s}\right )^{D}X_{ t} = Y _{t},}$$

and setting t = n + h gives the analogue

$$\displaystyle{ X_{n+h} = Y _{n+h} +\sum _{ j=1}^{d+Ds}a_{ j}X_{n+h-j} }$$
(6.5.10)

of equation (6.4.2). Under the assumption that the first d + Ds observations X dDs+1, , X 0 are uncorrelated with {Y t , t ≥ 1}, we can determine the best linear predictors P n X n+h of X n+h based on {1, X dDs+1, , X n } by applying P n to each side of (6.5.10) to obtain

$$\displaystyle{ P_{n}X_{n+h} = P_{n}Y _{n+h} +\sum _{ j=1}^{d+Ds}a_{ j}P_{n}X_{n+h-j}. }$$
(6.5.11)

The first term on the right is just the best linear predictor of the (possibly nonzero-mean) ARMA process {Y t } in terms of {1, Y 1, , Y n }, which can be calculated as described in Section 3.3 The predictors P n X n+h can then be computed recursively for h = 1, 2,  from (6.5.11), if we note that P n X n+1−j  = X n+1−j for each j ≥ 1.

An argument analogous to the one leading to (6.4.5) gives the prediction mean squared error as

$$\displaystyle{ \sigma _{n}^{2}(h) = E(X_{ n+h} - P_{n}X_{n+h})^{2} =\sum _{ j=0}^{h-1}\left (\sum _{ r=0}^{j}\chi _{ r}\theta _{n+h-r-1,j-r}\right )^{2}v_{ n+h-j-1}, }$$
(6.5.12)

where θ nj and v n are obtained by applying the innovations algorithm to the differenced series {Y t } and

$$\displaystyle{\chi (z) =\sum _{ r=0}^{\infty }\chi _{ r}z^{\,r} = \left [\phi (z)\varPhi {\bigl (z^{\,s}\bigr )}(1 - z)^{d}{\bigl (1 - z^{\,s}\bigr )}^{D}\right ]^{-1},\ \ \vert z\vert < 1.}$$

For large n we can approximate (6.5.12), if \(\theta (z)\varTheta \left (z^{\,s}\right )\) is nonzero for all \(\vert z\vert \leq 1\), by

$$\displaystyle{ \sigma _{n}^{2}(h) =\sum _{ j=0}^{h-1}\psi _{ j}^{2}\sigma ^{2}, }$$
(6.5.13)

where

$$\displaystyle{\psi (z) =\sum _{ j=0}^{\infty }\psi _{ j}\,z^{\,j} ={ \theta (z)\varTheta \left (z^{\,s}\right ) \over \phi (z)\varPhi \left (z^{\,s}\right )(1 - z)^{d}\left (1 - z^{\,s}\right )^{D}},\ \ \vert z\vert < 1.}$$

The required calculations can all be carried out with the aid of the program ITSM. The mean squared errors are computed from the large-sample approximation (6.5.13) if the fitted model is invertible. If the fitted model is not invertible, ITSM computes the mean squared errors by converting the model to the equivalent (in terms of Gaussian likelihood) invertible model and then using (6.5.13).

Example 6.5.5

Monthly Accidental Deaths

Continuing with Example 6.5.4, we next use ITSM to predict six future values of the Accidental Deaths series using the fitted models (6.5.8) and (6.5.9). First fit the desired model as described in Example 6.5.4 or enter the data and model directly by opening the file DEATHS.TSM, differencing at lags 12 and 1, subtracting the mean, and then entering the MA(13) coefficients and white noise variance using the option Model>Specify. Select Forecasting>ARMA, and you will see the ARMA Forecast dialog box. Enter 6 for the number of predicted values required. You will notice that the default options in the dialog box are set to generate predictors of the original series by reversing the transformations applied to the data. If for some reason you wish to predict the transformed data, these check marks can be removed. If you wish to include prediction bounds in the graph of the predictors, check the appropriate box and specify the desired coefficient (e.g., 95 %). Click OK, and you will see a graph of the data with the six predicted values appended. For numerical values of the predictors and prediction bounds, right-click on the graph and then on Info. The prediction bounds are computed under the assumption that the white noise sequence in the ARMA model for the transformed data is Gaussian. Table 6.1 shows the predictors and standard deviations of the prediction errors under both models (6.5.8) and (6.5.9) for the Accidental Deaths series.

Table 6.1 Predicted values of the Accidental Deaths series for t = 73, , 78, the standard deviations σ t of the prediction errors, and the corresponding observed values of X t f r the same period

6.6 Regression with ARMA Errors

6.6.1 OLS and GLS Estimation

In standard linear regression, the errors (or deviations of the observations from the regression function) are assumed to be independent and identically distributed. In many applications of regression analysis, however, this assumption is clearly violated, as can be seen by examination of the residuals from the fitted regression and their sample autocorrelations. It is often more appropriate to assume that the errors are observations of a zero-mean second-order stationary process. Since many autocorrelation functions can be well approximated by the autocorrelation function of a suitably chosen ARMA(p, q) process, it is of particular interest to consider the model

$$\displaystyle{ Y _{t} = \mathbf{x}_{t}'\beta + W_{t},\ \ t = 1,\ldots,n, }$$
(6.6.1)

or in matrix notation,

$$\displaystyle{ \mathbf{Y} = X\beta + \mathbf{W}, }$$
(6.6.2)

where Y = (Y 1, , Y n )′ is the vector of observations at times t = 1, , n, X is the design matrix whose tth row, x t ′ = (x t1, , x tk ), consists of the values of the explanatory variables at time t, β = (β 1, , β k )′ is the vector of regression coefficients, and the components of W = (W 1, , W n )′ are values of a causal zero-mean ARMA( p, q) process satisfying

$$\displaystyle{ \phi (B)W_{t} =\theta (B)Z_{t},\ \ \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ). }$$
(6.6.3)

The model (6.6.1) arises naturally in trend estimation for time series data. For example, the explanatory variables x t1 = 1, x t2 = t, and x t3 = t 2 can be used to estimate a quadratic trend, and the variables x t1 = 1, x t2 = cos(ω t), and x t3 = sin(ω t) can be used to estimate a sinusoidal trend with frequency ω. The columns of X are not necessarily simple functions of t as in these two examples. Any specified column of relevant variables, e.g., temperatures at times t = 1, , n, can be included in the design matrix X, in which case the regression is conditional on the observed values of the variables included in the matrix.

The ordinary least squares (OLS) estimator of β is the value, \(\hat{\beta }_{\mathrm{OLS}}\), which minimizes the sum of squares

$$\displaystyle{ (\mathbf{Y} - X\beta )'(\mathbf{Y} - X\beta ) =\sum _{ t=1}^{n}\left (Y _{ t} -\mathbf{x}_{t}'\beta \right )^{2}. }$$

Equating to zero the partial derivatives with respect to each component of β and assuming (as we shall) that XX is nonsingular, we find that

$$\displaystyle{ \hat{\beta }_{\mathrm{OLS}} = (X'X)^{-1}X'\mathbf{Y}. }$$
(6.6.4)

(If XX is singular, \(\hat{\beta }_{\mathrm{OLS}}\) is not uniquely determined but still satisfies (6.6.4) with (XX)−1 any generalized inverse of XX.) The OLS estimate also maximizes the likelihood of the observations when the errors W 1, , W n are iid and Gaussian. If the design matrix X is nonrandom, then even when the errors are non-Gaussian and dependent, the OLS estimator is unbiased (i.e., \(E{\bigl (\hat{\beta }_{\mathrm{OLS}}\bigr )} =\beta\)) and its covariance matrix is

$$\displaystyle{ \mathrm{Cov}(\hat{\beta }_{\mathrm{OLS}}) = \left (X'X\right )^{-1}X'\varGamma _{ n}X\left (X'X\right )^{-1}, }$$
(6.6.5)

where \(\varGamma _{n} = E{\bigl (\mathbf{W}\mathbf{W}'\bigr )}\) is the covariance matrix of W.

The generalized least squares (GLS) estimator of β is the value \(\hat{\beta }_{GLS}\) that minimizes the weighted sum of squares

$$\displaystyle{ \left (\mathbf{Y} - X\beta \right )'\varGamma _{n}^{-1}\left (\mathbf{Y} - X\beta \right ). }$$
(6.6.6)

Differentiating partially with respect to each component of β and setting the derivatives equal to zero, we find that

$$\displaystyle{ \hat{\beta }_{\mathrm{GLS}} = \left (X'\varGamma _{n}^{-1}X\right )^{-1}X'\varGamma _{ n}^{-1}\mathbf{Y}. }$$
(6.6.7)

If the design matrix X is nonrandom, the GLS estimator is unbiased and has covariance matrix

$$\displaystyle{ \mathrm{Cov}\left (\hat{\beta }_{\mathrm{GLS}}\right ) = \left (X'\varGamma _{n}^{-1}X\right )^{-1}. }$$
(6.6.8)

It can be shown that the GLS estimator is the best linear unbiased estimator of β,  i.e., for any k-dimensional vector c and for any unbiased estimator \(\hat{\beta }\) of β that is a linear function of the observations Y 1, , Y n ,

$$\displaystyle{ \mathrm{Var}\left (\mathbf{c}'\hat{\beta }_{\mathrm{GLS}}\right ) \leq \mathrm{Var}\left (\mathbf{c}'\hat{\beta }\right ). }$$

In this sense the GLS estimator is therefore superior to the OLS estimator. However, it can be computed only if ϕ and θ are known.

Let V (ϕ, θ) denote the matrix σ −2 Γ n and let T(ϕ, θ) be any square root of V −1 (i.e., a matrix such that TT = V −1). Then we can multiply each side of (6.6.2) by T to obtain

$$\displaystyle{ T\mathbf{Y} = TX\beta + T\mathbf{W}, }$$
(6.6.9)

a regression equation with coefficient vector β, data vector T Y, design matrix TX, and error vector T W. Since the latter has uncorrelated, zero-mean components, each with variance σ 2, the best linear estimator of β in terms of T Y (which is clearly the same as the best linear estimator of β in terms of Y, i.e., \(\hat{\beta }_{\mathrm{GLS}}\)) can be obtained by applying OLS estimation to the transformed regression equation (6.6.9). This gives

$$\displaystyle{ \hat{\beta }_{\mathrm{GLS}} = \left (X'T'TX\right )^{-1}X'T'T\mathbf{Y}, }$$
(6.6.10)

which is clearly the same as (6.6.7). Cochran and Orcutt (1949) pointed out that if {W t } is an AR( p) process satisfying

$$\displaystyle{ \phi (B)W_{t} = Z_{t},\ \ \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ), }$$

then application of ϕ(B) to each side of the regression equations (6.6.1) transforms them into regression equations with uncorrelated, zero-mean, constant-variance errors, so that ordinary least squares can again be used to compute best linear unbiased estimates of the components of β in terms of Y t  = ϕ(B)Y t ,  t = p + 1, , n. This approach eliminates the need to compute the matrix T but suffers from the drawback that Y does not contain all the information in Y. Cochrane and Orcutt’s transformation can be improved, and at the same generalized to ARMA errors, as follows.

Instead of applying the operator ϕ(B) to each side of the regression equations (6.6.1), we multiply each side of equation (6.6.2) by the matrix T(ϕ, θ) that maps {W t } into the residuals [see (5.3.1)] of {W t } from the ARMA model (6.6.3). We have already seen how to calculate these residuals using the innovations algorithm in Section 3.3 To see that T is a square root of the matrix V as defined in the previous paragraph, we simply recall that the residuals are uncorrelated with zero mean and variance σ 2, so that

$$\displaystyle{ \mathrm{Cov}(T\mathbf{W}) = T\varGamma _{n}T' =\sigma ^{2}I, }$$

where I is the n × n identity matrix. Hence

$$\displaystyle{ T'T =\sigma ^{2}\varGamma _{ n}^{-1} = V ^{-1}. }$$

GLS estimation of β can therefore be carried out by multiplying each side of (6.6.2) by T and applying ordinary least squares to the transformed regression model. It remains only to compute T Y and TX.

Any data vector d = (d 1, , d n )′ can be left-multiplied by T simply by reading it into ITSM, entering the model (6.6.3), and pressing the green button labeled RES, which plots the residuals. (The calculations are performed using the innovations algorithm as described in Section 3.3) The GLS estimator \(\hat{\beta }_{\mathrm{GLS}}\) is computed as follows. The data vector Y is left-multiplied by T to generate the transformed data vector Y , and each column of the design matrix X is left-multiplied by T to generate the corresponding column of the transformed design matrix X . Then

$$\displaystyle{ \hat{\beta }_{\mathrm{GLS}} = \left (X^{{\ast}'}X^{{\ast}}\right )^{-1}X^{{\ast}'}\mathbf{Y}^{{\ast}}. }$$
(6.6.11)

The calculations of Y , X , and hence of \(\hat{\beta }_{\mathrm{GLS}}\), are all carried out by the program ITSM in the option Regression>Estimation>Generalized LS.

6.6.2 ML Estimation

If (as is usually the case) the parameters of the ARMA(p, q) model for the errors are unknown, they can be estimated together with the regression coefficients by maximizing the Gaussian likelihood

$$\displaystyle{L\left (\beta,\phi,\theta,\sigma ^{2}\right ) = (2\pi )^{-n/2}(\det \varGamma _{ n})^{-1/2}\exp \left \{-\frac{1} {2}{\bigl (\mathbf{Y} - X\beta \bigr )}'\varGamma _{n}^{-1}{\bigl (\mathbf{Y} - X\beta \bigr )}\right \}, }$$

where \(\varGamma _{n}\left (\phi,\theta,\sigma ^{2}\right )\) is the covariance matrix of W = YX β. Since {W t } is an ARMA(p, q) process with parameters \(\left (\phi,\theta,\sigma ^{2}\right )\), the maximum likelihood estimators \(\hat{\beta }\), \(\hat{\phi }\), and \(\hat{\theta }\) are found (as in Section 5.2) by minimizing

$$\displaystyle{ \ell(\beta,\phi,\theta ) =\ln \left (n^{-1}S(\beta,\phi,\theta )\right ) + n^{-1}\sum _{ t=1}^{n}\ln r_{ t-1}, }$$
(6.6.12)

where

$$\displaystyle{ S(\beta,\phi,\theta ) =\sum _{ t=1}^{n}\left (W_{ t} -\hat{ W}_{t}\right )^{2}/r_{ t-1}, }$$

\(\hat{W}_{t}\) is the best one-step predictor of W t , and r t−1 σ 2 is its mean squared error. The function (β, ϕ, θ) can be expressed in terms of the observations {Y t } and the parameters β, ϕ, and θ using the innovations algorithm (see Section 3.3) and minimized numerically to give the maximum likelihood estimators, \(\hat{\beta },\ \hat{\phi }\), and \(\hat{\theta }\). The maximum likelihood estimator of σ 2 is then given, as in Section 5.2, by \(\hat{\sigma }^{2} = S\left (\hat{\beta },\hat{\phi },\hat{\theta }\right )/n\).

An extension of an iterative scheme, proposed by Cochran and Orcutt (1949)  for the case q = 0, simplifies the minimization considerably. It is based on the observation that for fixed ϕ and θ, the value of β that minimizes (β, ϕ, θ) is \(\hat{\beta }_{\mathrm{GLS}}(\phi,\theta )\), which can be computed algebraically from (6.6.11) instead of by searching numerically for the minimizing value. The scheme is as follows.

  1. (i)

    Compute \(\hat{\beta }_{\mathrm{OLS}}\) and the estimated residuals \(Y _{t} -\mathbf{x}_{t}'\hat{\beta }_{\mathrm{OLS}},\ \ t = 1,\ldots,n\).

  2. (ii)

    Fit an ARMA(p. q) model by maximum Gaussian likelihood to the estimated residuals.

  3. (iii)

    For the fitted ARMA model compute the corresponding estimator \(\hat{\beta }_{\mathrm{GLS}}\) from (6.6.11).

  4. (iv)

    Compute the residuals \(Y _{t} -\mathbf{x}_{t}'\hat{\beta }_{\mathrm{GLS}},\ \ t = 1,\ldots,n\), and return to (ii), stopping when the estimators have stabilized.

If {W t } is a causal and invertible ARMA process, then under mild conditions on the explanatory variables x t , the maximum likelihood estimates are asymptotically multivariate normal (see Fuller 1976). In addition, the estimated regression coefficients are asymptotically independent of the estimated ARMA parameters.

The large-sample covariance matrix of the ARMA parameter estimators, suitably normalized, has a complicated form that involves both the regression variables x t  and the covariance function of {W t }. It is therefore convenient to estimate the covariance matrix as − H −1, where H is the Hessian matrix of the observed log-likelihood evaluated at its maximum.

The OLS, GLS, and maximum likelihood estimators of the regression coefficients all have the same asymptotic covariance matrix, so in this sense the dependence does not play a major role. However, the asymptotic covariance of both the OLS and GLS estimators can be very inaccurate if the appropriate covariance matrix Γ n is not used in the expressions (6.6.5) and (6.6.8). This point is illustrated in the following examples.

Remark 1.

​​​The use of the innovations algorithm for GLS and ML estimation extends to regression with ARIMA errors (see Example 6.6.3 below) and FARIMA errors (FARIMA processes are defined in Section 10.5). □ 

Example 6.6.1

The Overshort Data

The analysis of the overshort data in Example 3.2.8 suggested the model

$$\displaystyle{ Y _{t} =\beta +W_{t}, }$$

where −β is interpreted as the daily leakage from the underground storage tank and {W t } is the MA(1) process

$$\displaystyle{ W_{t} = Z_{t} +\theta Z_{t-1},\ \ \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ). }$$

(Here k = 1 and x t1 = 1.) The OLS estimate of β is simply the sample mean \(\hat{\beta }_{\mathrm{OLS}} =\bar{ Y }_{n} = -4.035\). Under the assumption that {W t } is iid noise, the estimated variance of the OLS estimator of β is \(\hat{\gamma }_{_{ Y }}(0)/57 = 59.92\). However, since this estimate of the variance fails to take dependence into account, it is not reliable.

To find maximum Gaussian likelihood estimates of β and the parameters of {W t } using ITSM, open the file OSHORTS.TSM, select the option Regression>Specify and check the box marked Include intercept term only. Then press the blue GLS button and you will see the estimated value of β. (This is in fact the same as the OLS estimator since the default model in ITSM is WN(0,1).) Then select Model>Estimation>Autofit and press Start. The autofit option selects the minimum AICC model for the residuals,

$$\displaystyle{ W_{t} = Z_{t} - 0.818Z_{t-1},\ \ \{Z_{t}\} \sim \mathrm{WN}(0,2041), }$$
(6.6.13)

and displays the estimated MA coefficient \(\hat{\theta }_{1}^{(0)} = -0.818\) and the corresponding GLS estimate \(\hat{\beta }_{\mathrm{GLS}}^{(1)} = -4.745\), with a standard error of 1.188, in the Regression estimates window. (If we reestimate the variance of the OLS estimator, using (6.6.5) with Γ 57 computed from the model (6.6.13), we obtain the value 2.214, a drastic reduction from the value 59.92 obtained when dependence is ignored. For a positively correlated time series, ignoring the dependence would lead to underestimation of the variance.)

Pressing the blue MLE button will reestimate the MA parameters using the residuals from the updated regression and at the same time reestimate the regression coefficient, printing the new parameters in the Regression estimates window. After this operation has been repeated several times, the parameters will stabilize, as shown in Table 6.2. Estimated 95 % confidence bounds for β using the GLS estimate are \(-4.75 \pm 1.96(1.408)^{1/2} = (-7.07,-2.43)\), strongly suggesting that the storage tank has a leak. Such a conclusion would not have been reached without taking into account the dependence in the data.

Table 6.2 Estimates of β and θ 1 for the overshort data of Example 6.6.1

Example 6.6.2

The Lake Data

In Examples 5.2.4 and 5.5.2 we found maximum likelihood ARMA(1,1) and AR(2) models for the mean-corrected lake data. Now let us consider fitting a linear trend to the data with AR(2) noise. The choice of an AR(2) model was suggested by an analysis of the residuals obtained after removing a linear trend from the data using OLS. Our model now takes the form

$$\displaystyle{ Y _{t} =\beta _{0} +\beta _{1}t + W_{t}, }$$

where {W t } is the AR(2) process satisfying

$$\displaystyle{ W_{t} =\phi _{1}W_{t-1} +\phi _{2}W_{t-2} + Z_{t},\ \ \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ). }$$

From Example 1.3.5, we find that the OLS estimate of β is \(\hat{\beta }_{\mathrm{OLS}}=(10.202,-0.0242)'\). If we ignore the correlation structure of the noise, the estimated covariance matrix Γ n of W is \(\hat{\gamma }(0)I\) (where I is the identity matrix). The corresponding estimated covariance matrix of \(\hat{\beta }_{\mathrm{OLS}}\) is (from (6.6.5))

$$\displaystyle{ \hat{\gamma }_{Y }(0)\left (X'X\right )^{-1} =\hat{\gamma } _{ Y }(0)\left [\begin{array}{*{10}c} n & \sum _{t=1}^{n}t \\ \sum _{t=1}^{n}t&\sum _{t=1}^{n}t^{2} \end{array} \right ]^{-1} = \left [\begin{array}{*{10}c} 0.07203 &-0.00110 \\ -0.00110& 0.00002 \end{array} \right ]. }$$
(6.6.14)

However, the estimated model for the noise process, found by fitting an AR(2) model to the residuals \(Y _{t} -\hat{\beta }_{\mathrm{OLS}}'\mathbf{x}_{t}\), is

$$\displaystyle{ W_{t} = 1.008W_{t-1} - 0.295W_{t-2} + Z_{t},\ \ \{Z_{t}\} \sim \mathrm{WN}(0,0.4571). }$$

Assuming that this is the true model for {W t }, the GLS estimate is found to be (10. 091, −0. 0216)′, in close agreement with the OLS estimate. The estimated covariance matrices for the OLS and GLS estimates are given by

$$\displaystyle{ \mathop{\mathrm{Cov}}\left (\hat{\beta }_{\mathrm{OLS}}\right ) = \left [\begin{array}{*{10}c} 0.22177 &-0.00335\\ -0.00335 & 0.00007\\ \end{array} \right ] }$$

and

$$\displaystyle{ \mathop{\mathrm{Cov}}\left (\hat{\beta }_{\mathrm{GLS}}\right ) = \left [\begin{array}{*{10}c} 0.21392 &-0.00321\\ -0.00321 & 0.00006\\ \end{array} \right ]. }$$

Notice how the estimated variances of the OLS and GLS estimators are nearly three times the magnitude of the corresponding variance estimates of the OLS calculated under the independence assumption [see (6.6.14)]. Estimated 95 % confidence bounds for the slope β 1 using the GLS estimate are − 0. 0216 ± 1. 96(0. 00006)1∕2 = −0. 0216 ±. 0048, indicating a significant decreasing trend in the level of Lake Huron during the years 1875–1972.

The iterative procedure described above was used to produce maximum likelihood estimates of the parameters. The calculations using ITSM are analogous to those in Example 6.6.1. The results from each iteration are summarized in Table 6.3. As in Example 6.6.1, the convergence of the estimates is very rapid.

Table 6.3 Estimates of β and ϕ for the lake data after 3 iterations

Example 6.6.3

Seat-Belt Legislation; SBL.TSM

Figure 6.18 shows the numbers of monthly deaths and serious injuries Y t , t = 1, , 120, on UK roads for 10 years beginning in January 1975. They are filed as SBL.TSM. Seat-belt legislation was introduced in February 1983 in the hope of reducing the mean number of monthly “deaths and serious injuries,” (from t = 99 onwards). In order to study whether or not there was a drop in mean from that time onwards, we consider the regression,

$$\displaystyle{ Y _{t} = a + bf(t) + W_{t},\ \ t = 1,\ldots,120, }$$
(6.6.15)

where f t  = 0 for 1 ≤ t ≤ 98, and f t  = 1 for t ≥ 99. The seat-belt legislation will be considered effective if the estimated value of the regression coefficient b is significantly negative. This problem also falls under the heading of intervention analysis (see Section 11.2).

Fig. 6.18
figure 18

Monthly deaths and serious injuries {Y t } on UK roads, January 1975–December 1984

OLS regression based on the model (6.6.15) suggests that the error sequence {W t } is highly correlated with a strong seasonal component of period 12. (To do the regression using ITSM open the file SBL.TSM, select Regression>Specify, check only Include intercept term and Include auxiliary variables, press the Browse button, and select the file SBLIN.TSM, which contains the function f t of (6.6.15) and enter 1 for the number of columns. Then select the option Regression>Estimation>Generalized LS. The estimates of the coefficients a and b are displayed in the Regression estimates window, and the data become the estimates of the residuals {W t }.) The graphs of the data and sample ACF clearly suggest a strong seasonal component with period 12. In order to transform the model (6.6.15) into one with stationary residuals, we therefore consider the differenced data X t  = Y t Y t−12, which satisfy

$$\displaystyle{ X_{t} = bg_{t} + N_{t},\ t = 13,\ldots,120, }$$
(6.6.16)

where g t  = 1 for 98 ≤ t ≤ 110, g t  = 0 otherwise, and {N t  = W t W t−12} is a stationary sequence to be represented by a suitably chosen ARMA model. The series {X t } is contained in the file SBLD.TSM, and the function g t is contained in the file SBLDIN.TSM.

The next step is to perform ordinary least squares regression of X t on g t following steps analogous to those of the previous paragraph (but this time checking only the box marked Include auxiliary variables in the Regression Trend Function dialog box) and again using the option Regression>Estimation> Generalized LS or pressing the blue GLS button. The model

$$\displaystyle{ X_{t} = -346.92g_{t} + N_{t}, }$$
(6.6.17)

is then displayed in the Regression estimates window together with the assumed noise model (white noise in this case). Inspection of the sample ACF of the residuals suggests an MA(13) or AR(13) model for {N t }. Fitting AR and MA models of order up to 13 (with no mean-correction) using the option Model>Estimation>Autofit gives an MA(12) model as the minimum AICC fit for the residuals. Once this model has been fitted, the model in the Regression estimates window is automatically updated to

$$\displaystyle{ X_{t} = -328.45g_{t} + N_{t}, }$$
(6.6.18)

with the fitted MA(12) model for the residuals also displayed. After several iterations (each iteration is performed by pressing the MLE button) we arrive at the model

$$\displaystyle{ X_{t} = -328.45g_{t} + N_{t}, }$$
(6.6.19)

with

$$\displaystyle\begin{array}{rcl} \quad N_{t}& =& Z_{t}+0.219Z_{t-1}+0.098Z_{t-2}+0.031Z_{t-3}+0.064Z_{t-4}+0.069Z_{t-5}+0.111Z_{t-6} {}\\ \quad & & +0.081Z_{t-7} + 0.057Z_{t-8}+0.092Z_{t-9} - 0.028Z_{t-10}+0.183Z_{t-11}-0.627Z_{t-12},\quad \quad \quad {}\\ \end{array}$$

where \(\{Z_{t}\} \sim \mathrm{WN}(0,12,581)\). The estimated standard deviation of the regression coefficient estimator is 49.41, so the estimated coefficient, − 328. 45, is very significantly negative, indicating the effectiveness of the legislation. The differenced data are shown in Figure 6.19 with the fitted regression function.

Fig. 6.19
figure 19

The differenced deaths and serious injuries on UK roads {X t  = Y t Y t−12}, showing the fitted GLS regression line

Problems

  1. 6.1

    Suppose that {X t } is an ARIMA(p, d, q) process satisfying the difference equations

    $$\displaystyle{ \phi (B)(1 - B)^{d}X_{ t} =\theta (B)Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}\left (0,\sigma ^{2}\right ). }$$

    Show that these difference equations are also satisfied by the process W t  = X t + A 0 + A 1 t + ⋯ + A d−1 t d−1, where A 0, , A d−1 are arbitrary random variables.

  2. 6.2

    Verify the representation given in (6.3.4).

  3. 6.3

    Test the data in Example 6.3.1 for the presence of a unit root in an AR(2) model using the augmented Dickey–Fuller test.

  4. 6.4

    Apply the augmented Dickey–Fuller test to the levels of Lake Huron data (LAKE.TSM). Perform two analyses assuming AR(1) and AR(2) models.

  5. 6.5

    If {Y t } is a causal ARMA process (with zero mean) and if X 0 is a random variable with finite second moment such that X 0 is uncorrelated with Y t for each t = 1, 2, , show that the best linear predictor of Y n+1in terms of 1, X 0, Y 1, , Y n is the same as the best linear predictor of Y n+1 in terms of 1, Y 1, , Y n .

  6. 6.6

    Let {X t } be the ARIMA(2,1,0) process satisfying

    $$\displaystyle{ \left (1 - 0.8B + 0.25B^{2}\right )\nabla X_{ t} = Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}(0,1). }$$
    1. (a)

      Determine the forecast function g(h) = P n X n+h for h > 0.

    2. (b)

      Assuming that n is large, compute σ n 2(h) for h = 1, , 5.

  7. 6.7

    Use a text editor to create a new data set ASHORT.TSM that consists of the data in AIRPASS.TSM with the last 12 values deleted. Use ITSM to find an ARIMA model for the logarithms of the data in ASHORT.TSM. Your analysis should include

    1. (a)

      a logical explanation of the steps taken to find the chosen model,

    2. (b)

      approximate 95 % bounds for the components of ϕ and θ,

    3. (c)

      an examination of the residuals to check for whiteness as described in Section 1.6,

    4. (d)

      a graph of the series ASHORT.TSM showing forecasts of the next 12 values and 95 % prediction bounds for the forecasts,

    5. (e)

      numerical values for the 12-step ahead forecast and the corresponding 95 % prediction bounds,

    6. (f)

      a table of the actual forecast errors, i.e.,, the true value (deleted from AIRPASS.TSM) minus the forecast value, for each of the 12 forecasts.

    Does the last value of AIRPASS.TSM lie within the corresponding 95 % prediction bounds?

  8. 6.8

    Repeat Problem 6.7, but instead of differencing, apply the classical decomposition method to the logarithms of the data in ASHORT.TSM by deseasonalizing, subtracting a quadratic trend, and then finding an appropriate ARMA model for the residuals. Compare the 12 forecast errors found from this approach with those found in Problem 6.7.

  9. 6.9

    Repeat Problem 6.7 for the series BEER.TSM, deleting the last 12 values to create a file named BSHORT.TSM.

  10. 6.10

    Repeat Problem 6.8 for the series BEER.TSM and the shortened series BSHORT.TSM.

  11. 6.11

    A time series {X t } is differenced at lag 12, then at lag 1 to produce a zero-mean series {Y t } with the following sample ACF:

    $$\displaystyle{ \begin{array}{ll} \hat{\rho }(12j) \approx (0.8)^{\,j}, &\quad j = 0,\pm 1,\pm 2,\ldots, \\ \hat{\rho }(12j \pm 1) \approx (0.4)(0.8)^{\,j},&\quad j = 0,\pm 1,\pm 2,\ldots, \\ \hat{\rho }(h) \approx 0, &\quad \mathrm{otherwise},\end{array} }$$

    and \(\hat{\gamma }(0) = 25.\)

    1. (a)

      Suggest a SARIMA model for {X t } specifying all parameters.

    2. (b)

      For large n, express the one- and twelve-step linear predictors P n X n+1 and P n X n+12 in terms of X t , t = −12, −11, , n, and \(Y _{t} -\hat{ Y }_{t},\,t = 1,\ldots,n\).

    3. (c)

      Find the mean squared errors of the predictors in (b).

  12. 6.12

    Use ITSM to verify the calculations of Examples 6.6.16.6.3.

  13. 6.13

    The file TUNDRA.TSM contains the average maximum temperature over the month of February for the years 1895-1993 in an area of the USA whose vegetation is characterized as tundra.

    1. (a)

      Fit a straight line to the data using OLS. Is the slope of the line significantly different from zero?

    2. (b)

      Find an appropriate ARMA model to the residuals from the OLS fit in (a).

    3. (c)

      Calculate the MLE estimates of the intercept and the slope of the line and the ARMA parameters in (a). Is the slope of the line significantly different from zero?

    4. (d)

      Use your model to forecast the average maximum temperature for the years 1994–2004.