Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Many time series arising in practice are best considered as components of some vector- valued (multivariate) time series {X t } having not only serial dependence within each component series {X ti } but also interdependence between the different component series {X ti } and {X tj }, ij. Much of the theory of univariate time series extends in a natural way to the multivariate case; however, new problems arise. In this chapter we introduce the basic properties of multivariate series and consider the multivariate extensions of some of the techniques developed earlier. In Section 8.1 we introduce two sets of bivariate time series data for which we develop multivariate models later in the chapter. In Section 8.2 we discuss the basic properties of stationary multivariate time series, namely, the mean vector \(\boldsymbol{\mu } = E\mathbf{X}_{t}\) and the covariance matrices \(\varGamma (h) = E(\mathbf{X}_{t+h}\mathbf{X}_{t}') -\boldsymbol{\mu }\boldsymbol{\mu }',h = 0,\pm 1,\pm 2,\ldots\), with reference to some simple examples, including multivariate white noise. Section 8.3 deals with estimation of \(\boldsymbol{\mu }\) and Γ(⋅ ) and the question of testing for serial independence on the basis of observations of X 1, , X n . In Section 8.4 we introduce multivariate ARMA processes and illustrate the problem of multivariate model identification with an example of a multivariate AR(1) process that also has an MA(1) representation. (Such examples do not exist in the univariate case.) The identification problem can be avoided by confining attention to multivariate autoregressive (or VAR) models. Forecasting multivariate time series with known second-order properties is discussed in Section 8.5, and in Section 8.6 we consider the modeling and forecasting of multivariate time series using the multivariate Yule–Walker equations and Whittle’s generalization of the Durbin–Levinson algorithm. Section 8.7 contains a brief introduction to the notion of cointegrated time series.

8.1 Examples

In this section we introduce two examples of bivariate time series. A bivariate time series is a series of two-dimensional vectors (X t1,  X t2)′ observed at times t (usually t = 1, 2, 3, ). The two component series \(\{X_{t1}\}\) and \(\{X_{t2}\}\) could be studied independently as univariate time series, each characterized, from a second-order point of view, by its own mean and autocovariance function. Such an approach, however, fails to take into account possible dependence between the two component series, and such cross-dependence may be of great importance, for example in predicting future values of the two component series.

We therefore consider the series of random vectors X t  = (X t1,  X t2)′ and define the mean vector

$$\displaystyle{ \boldsymbol{\mu }_{t}:= E\mathbf{X}_{t} = \left [\begin{array}{*{10}c} EX_{t1} \\ EX_{t2} \end{array} \right ] }$$

and covariance matrices

$$\displaystyle{ \varGamma (t+h,t):=\mathop{ \mathrm{Cov}}{\bigl (\mathbf{X}_{t+h},\mathbf{X}_{t}\bigr )} = \left [\begin{array}{l@{\quad }l} \mathop{\mathrm{cov}}(X_{t+h,1},X_{t1})\quad &\mathop{\mathrm{cov}}(X_{t+h,1},X_{t2}) \\ \mathop{\mathrm{cov}}(X_{t+h,2},X_{t1})\quad &\mathop{\mathrm{cov}}(X_{t+h,2},X_{t2}) \end{array} \right ]. }$$

The bivariate series \(\big\{\mathbf{X}_{t}\big\}\) is said to be (weakly ) stationary if the moments \(\boldsymbol{\mu }_{t}\) and Γ(t + h, t) are both independent of t, in which case we use the notation

$$\displaystyle{ \boldsymbol{\mu } = E\mathbf{X}_{t} = \left [\begin{array}{*{10}c} EX_{t1} \\ EX_{t2} \end{array} \right ] }$$

and

$$\displaystyle{ \varGamma (h) =\mathop{ \mathrm{Cov}}{\bigl (\mathbf{X}_{t+h},\mathbf{X}_{t}\bigr )} = \left [\begin{array}{l@{\quad }l} \gamma _{11}(h)\quad &\gamma _{12}(h) \\ \gamma _{21}(h)\quad &\gamma _{22}(h) \end{array} \right ]. }$$

The diagonal elements are the autocovariance functions of the univariate series \(\{X_{t1}\}\) and \(\{X_{t2}\}\) as defined in Chapter 2, while the off-diagonal elements are the covariances between X t+h, i and X tj , ij. Notice that γ 12(h) = γ 21(−h).

A natural estimator of the mean vector \(\boldsymbol{\mu }\) in terms of the observations X 1, , X n is the vector of sample means

$$\displaystyle{\overline{\mathbf{X}}_{n} ={ 1 \over n}\sum _{t=1}^{n}\mathbf{X}_{ t},}$$

and a natural estimator of Γ(h) is

$$\displaystyle\begin{array}{rcl} \hat{\varGamma }(h) = \left \{\begin{array}{@{}l@{\quad }l@{}} n^{-1}\sum \limits _{ t=1}^{n-h}\left (\mathbf{X}_{ t+h} -\overline{\mathbf{X}}_{n}\right )\left (\mathbf{X}_{t} -\overline{\mathbf{X}}_{n}\right )'\quad &\mbox{ for }0 \leq h \leq n - 1,\\ \quad \\ \quad \\ \hat{\varGamma }(-h)' \quad &\mbox{ for } - n + 1 \leq h < 0. \end{array} \right.& & {}\\ \end{array}$$

The correlation ρ ij (h) between X t+h, i and X t, j is estimated by

$$\displaystyle{\hat{\rho }_{ij}(h) =\hat{\gamma } _{ij}(h)(\hat{\gamma }_{ii}(0)\hat{\gamma }_{jj}(0))^{-1/2}.}$$

If i = j, then \(\hat{\rho }_{ij}\) reduces to the sample autocorrelation function of the ith series. These estimators will be discussed in more detail in Section 8.2.

Example 8.1.1

Dow Jones and All Ordinaries Indices; DJAO2.TSM

Figure 8.1 shows the closing values D 0, , D 250 of the Dow Jones Index of stocks on the New York Stock Exchange and the closing values A 0, , A 250 of the Australian All Ordinaries Index of Share Prices, recorded at the termination of trading on 251 successive trading days up to August 26th, 1994. (Because of the time difference between Sydney and New York, the markets do not close simultaneously in both places; however, in Sydney the closing price of the Dow Jones index for the previous day is known before the opening of the market on any trading day.) The efficient market hypothesis suggests that these processes should resemble random walks with uncorrelated increments. In order to model the data as a stationary bivariate time series we first reexpress them as percentage relative price changes or percentage returns (filed as DJAOPC2.TSM)

$$\displaystyle{X_{t1} = 100{(D_{t} - D_{t-1}) \over D_{t-1}},\quad \ t = 1,\ldots,250,}$$

and

$$\displaystyle{X_{t2} = 100{(A_{t} - A_{t-1}) \over A_{t-1}},\quad \ t = 1,\ldots,250.}$$

The estimators \(\hat{\rho }_{11}(h)\) and \(\hat{\rho }_{22}(h)\) of the autocorrelations of the two univariate series are shown in Figures 8.2 and 8.3. They are not significantly different from zero.

Fig. 8.1
figure 1

The Dow Jones Index (top) and Australian All Ordinaries Index (bottom) at closing on 251 trading days ending August 26th, 1994

Fig. 8.2
figure 2

The sample ACF \(\hat{\rho }_{11}\) of the observed values of \(\{X_{t1}\}\) in Example 8.1.1, showing the bounds ± 1. 96n −1∕2

Fig. 8.3
figure 3

The sample ACF \(\hat{\rho }_{22}\) of the observed values of \(\{X_{t2}\}\) in Example 8.1.1, showing the bounds ± 1. 96n −1∕2

To compute the sample cross-correlations \(\hat{\rho }_{12}(h)\) and \(\hat{\rho }_{21}(h)\) using ITSM, select File>Project>Open>Multivariate. Then click OK and double-click on the file name DJAOPC2.TSM. You will see a dialog box in which Number of columns should be set to 2 (the number of components of the observation vectors). Then click OK, and the graphs of the two component series will appear. To see the correlations, press the middle yellow button at the top of the ITSM window. The correlation functions are plotted as a 2 × 2 array of graphs with \(\hat{\rho }_{11}(h)\), \(\hat{\rho }_{12}(h)\) in the top row and \(\hat{\rho }_{21}(h)\), \(\hat{\rho }_{22}(h)\) in the second row. We see from these graphs (shown in Figure 8.4) that although the autocorrelations \(\hat{\rho }_{ii}(h)\), i = 1, 2, are all small, there is a much larger correlation between X t−1, 1 and X t, 2. This indicates the importance of considering the two series jointly as components of a bivariate time series. It also suggests that the value of X t−1, 1, i.e., the Dow Jones return on day t − 1, may be of assistance in predicting the value of X t, 2, the All Ordinaries return on day t. This last observation is supported by the scatterplot of the points (x t−1, 1, x t, 2), t = 2, , 250, shown in Figure 8.5.

Fig. 8.4
figure 4

The sample correlations ​​​​​​​\(\hat{\rho }_{ij}(h)\) between X t+h, i  and X t, j for Example 8.1.1. (ρ ij (h) is plotted as the jth graph in the ith row, i, j = 1, 2. Series 1 and 2 consist of the daily Dow Jones and All Ordinaries percentage returns, respectively.)

Fig. 8.5
figure 5

Scatterplot of (x t−1, 1, x t, 2), t = 2, , 250, for the data in Example 8.1.1

Example 8.1.2

Sales with a leading indicator; LS2.TSM

In this example we consider the sales data {Y t2, t = 1, , 150} with leading indicator {Y t1, t = 1, , 150} given by Box and Jenkins (1976, p. 537). The two series are stored in the ITSM data files SALES.TSM and LEAD.TSM, respectively, and in bivariate format as LS2.TSM. The graphs of the two series and their sample autocorrelation functions strongly suggest that both series are nonstationary. Application of the operator (1 − B) yields the two differenced series {D t1} and {D t2}, whose properties are compatible with those of low-order ARMA processes. Using ITSM, we find that the models

$$\displaystyle{ D_{t1}\,-\,0.0228 = Z_{t1}\,-\,0.474Z_{t-1,1},\!\quad \{Z_{t1}\} \sim \mathrm{WN}(0,0.0779), }$$
(8.1.1)
$$\displaystyle{D_{t2} - 0.838D_{t-1,2} - 0.0676 = Z_{t2} - 0.610Z_{t-1,2},}$$
$$\displaystyle{ \{Z_{t2}\} \sim \mathrm{WN}(0,1.754), }$$
(8.1.2)

provide good fits to the series {D t1} and {D t2}.

The sample autocorrelations and cross-correlations of {D t1} and {D t2}, are computed by opening the bivariate ITSM file LS2.TSM (as described in Example 8.1.1). The option Transform>Difference, with differencing lag equal to 1, generates the bivariate differenced series {(D t1, D t2)}, and the correlation functions are then obtained as in Example 8.1.1 by clicking on the middle yellow button at the top of the ITSM screen. The sample auto- and cross-correlations \(\hat{\rho }_{ij}(h)\), i, j = 1, 2, are shown in Figure 8.6. As we shall see in Section 8.3, care must be taken in interpreting the cross-correlations without first taking into account the autocorrelations of {D t1} and {D t2}.

Fig. 8.6
figure 6

The sample correlations \(\hat{\rho }_{ij}(h)\) of the series {D t1} and {D t2} of Example 8.1.2, showing the bounds ± 1. 96n −1∕2. (\(\hat{\rho }_{ij}(h)\) is plotted as the jth graph in the ith row, i, j = 1, 2.)

8.2 Second-Order Properties of Multivariate Time Series

Consider m time series {X ti , t = 0, ±1, , }, i = 1, , m, with EX ti 2 <  for all t and i. If all the finite-dimensional distributions of the random variables {X ti } were multivariate normal, then the distributional properties of {X ti } would be completely determined by the means

$$\displaystyle{ \mu _{ti}:= EX_{ti} }$$
(8.2.1)

and the covariances

$$\displaystyle{ \gamma _{ij}(t + h,t):= E[(X_{t+h,i} -\mu _{ti})(X_{tj} -\mu _{tj})]. }$$
(8.2.2)

Even when the observations {X ti } do not have joint normal distributions, the quantities μ ti and γ ij (t + h, t) specify the second-order properties, the covariances providing us with a measure of the dependence, not only between observations in the same series, but also between the observations in different series.

It is more convenient in dealing with m interrelated series to use vector notation. Thus we define

$$\displaystyle{ \mathbf{X}_{t}:= \left [\begin{array}{*{10}c} X_{t1}\\ \vdots \\ X_{tm} \end{array} \right ],\quad t = 0,\pm 1,\ldots. }$$
(8.2.3)

The second-order properties of the multivariate time series \(\{\mathbf{X}_{t}\}\) are then specified by the mean vectors

$$\displaystyle{ \boldsymbol{\mu }_{t}:= E\mathbf{X}_{t} = \left [\begin{array}{*{10}c} \mu _{t1}\\ \vdots\\ \mu _{ tm} \end{array} \right ] }$$
(8.2.4)

and covariance matrices

$$\displaystyle{ \varGamma (t+h,t):= \left [\begin{array}{*{10}c} \gamma _{11}(t + h,t) &\cdots & \gamma _{1m}(t + h,t)\\ \vdots & \ddots & \vdots \\ \gamma _{m1}(t + h,t)&\cdots &\gamma _{mm}(t + h,t) \end{array} \right ], }$$
(8.2.5)

where

$$\displaystyle{\gamma _{ij}(t + h,t):=\mathop{ \mathrm{Cov}}(X_{t+h,i},\ X_{t,\,j}).}$$

Remark 1.

The matrix Γ(t + h, t) can also be expressed as

$$\displaystyle{\varGamma (t + h,t):= E[(\mathbf{X}_{t+h} -\boldsymbol{\mu }_{t+h})(\mathbf{X}_{t} -\boldsymbol{\mu }_{t})'],}$$

where as usual, the expected value of a random matrix A is the matrix whose components are the expected values of the components of A. □ 

As in the univariate case, a particularly important role is played by the class of multivariate stationary time series, defined as follows.

Definition 8.2.1

The m-variate series \(\{\mathbf{X}_{t}\}\) is (weakly ) stationary if

$$\displaystyle{ \mathrm{(i)}\quad \boldsymbol{\mu }_{X}(t)\mbox{ is independent of }t }$$

and

$$\displaystyle{ \mathrm{(ii)}\quad \varGamma _{X}(t + h,t)\mbox{ is independent of $t$ for each $h$}. }$$

For a stationary time series we shall use the notation

$$\displaystyle{ \boldsymbol{\mu }:= E\mathbf{X}_{t} = \left [\begin{array}{*{10}c} \mu _{1}\\ \vdots\\ \mu _{m} \end{array} \right ] }$$
(8.2.6)

and

$$\displaystyle{ \varGamma (h):= E[(\mathbf{X}_{t+h}-\boldsymbol{\mu })(\mathbf{X}_{t}-\boldsymbol{\mu })'] = \left [\begin{array}{c@{\quad }c@{\quad }c} \gamma _{11}(h)\quad &\cdots \quad &\gamma _{1m}(h)\\ \vdots \quad & \ddots \quad & \vdots \\ \gamma _{m1}(h)\quad &\cdots \quad &\gamma _{mm}(h) \end{array} \right ]. }$$
(8.2.7)

We shall refer to \(\boldsymbol{\mu }\) as the mean of the series and to Γ(h) as the covariance matrix at lag h. Notice that if \(\{\mathbf{X}_{t}\}\) is stationary with covariance matrix function Γ(⋅ ), then for each i, {X ti } is stationary with covariance function γ ii (⋅ ). The function γ ij (⋅ ), ij, is called the cross-covariance function of the two series {X ti } and {X tj }. It should be noted that γ ij (⋅ ) is not in general the same as γ ji (⋅ ). The correlation matrix function R(⋅ ) is defined by

$$\displaystyle{ R(h):= \left [\begin{array}{c@{\quad }c@{\quad }c} \rho _{11}(h)\quad &\cdots \quad &\rho _{1m}(h)\\ \vdots \quad & \ddots \quad & \vdots \\ \rho _{m1}(h)\quad &\cdots \quad &\rho _{mm}(h) \end{array} \right ], }$$
(8.2.8)

where ρ ij (h) = γ ij (h)∕[γ ii (0)γ jj (0)]1∕2. The function R(⋅ ) is the covariance matrix function of the normalized series obtained by subtracting \(\boldsymbol{\mu }\) from X t and then dividing each component by its standard deviation.

Example 8.2.1

​​Consider the bivariate stationary process \(\{\mathbf{X}_{t}\}\) defined by

$$\displaystyle{X_{t1} = Z_{t},}$$
$$\displaystyle{X_{t2} = Z_{t} + 0.75Z_{t-10},}$$

where {Z t } ∼ WN(0, 1). Elementary calculations yield \(\boldsymbol{\mu } = \mathbf{0}\),

$$\displaystyle\begin{array}{rcl} \varGamma (-10) = \left [\begin{array}{*{10}c} 0 & 0.75\\ 0 & 0.75 \end{array} \right ],\quad \varGamma (0) = \left [\begin{array}{*{10}c} 1 & 1\\ 1 & 1.5625 \end{array} \right ],\quad \varGamma (10) = \left [\begin{array}{*{10}c} 0 & 0\\ 0.75 & 0.75 \end{array} \right ],& & {}\\ \end{array}$$

and Γ( j) = 0 otherwise. The correlation matrix function is given by

$$\displaystyle\begin{array}{rcl} R(-10) = \left [\begin{array}{*{10}c} 0 & 0.60\\ 0 & 0.48 \end{array} \right ],\quad R(0) = \left [\begin{array}{*{10}c} 1 & 0.8\\ 0.8 & 1 \end{array} \right ],\quad R(10) = \left [\begin{array}{*{10}c} 0 & 0\\ 0.60 & 0.48 \end{array} \right ],& & {}\\ \end{array}$$

and R( j) = 0 otherwise. □ 

Basic Properties of Γ(⋅ ):

  1. 1.

    Γ(h) = Γ′(−h), 

  2. 2.

    \(\vert \gamma _{ij}(h)\vert \leq [\gamma _{ii}(0)\gamma _{jj}(0)]^{1/2}\), i,  j, = 1, , m,

  3. 3.

    γ ii (⋅ ) is an autocovariance function, i = 1, , m, and

  4. 4.

    j, k = 1 n a j Γ( jk)a k  ≥ 0 for all n ∈ { 1, 2, } and \(\mathbf{a}_{1},\ldots,\mathbf{a}_{n} \in \mathbb{R}^{m}\).

Proof

The first property follows at once from the definition, the second from the fact that correlations cannot be greater than one in absolute value, and the third from the observation that γ ii (⋅ ) is the autocovariance function of the stationary series {X ti , t = 0, ±1, }. Property 4 is a statement of the obvious fact that

$$\displaystyle{E{\biggl (\sum _{j=1}^{n}\mathbf{a}_{ j}'(\mathbf{X}_{j} -\boldsymbol{\mu })\biggr )}^{2} \geq 0.\mbox{ $\square $}}$$

Remark 2.

The basic properties of the matrices Γ(h) are shared also by the corresponding matrices of correlations R(h) = [ρ ij (h)] i, j = 1 m, which have the additional property

$$\displaystyle{\rho _{ii}(0) = 1\quad \mathrm{for\ all\ }i.}$$

The correlation ρ ij (0) is the correlation between X ti and X tj , which is generally not equal to 1 if ij (see Example 8.2.1). It is also possible that | γ ij (h) |  >  | γ ij (0) | if ij (see Problem 7.1). □ 

The simplest multivariate time series is multivariate white noise, the definition of which is quite analogous to that of univariate white noise.

Definition 8.2.2.

The m-variate series {Z t } is called white noise with mean 0 and covariance matrix \(\mathop{\varSigma \vert }\,\), written

$$\displaystyle{ \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,), }$$
(8.2.9)

if {Z t } is stationary with mean vector 0 and covariance matrix function

$$\displaystyle{ \varGamma (h) = \left \{\begin{array}{@{}l@{\quad }l@{}} \mathop{\varSigma \vert }\,, \quad &\mbox{ if }h = 0,\\ 0,\quad &\mbox{ otherwise}. \end{array} \right. }$$
(8.2.10)

Definition 8.2.3.

The m-variate series {Z t } is called iid noise with mean 0 and covariance matrix \(\mathop{\varSigma \vert }\,\), written

$$\displaystyle{ \{\mathbf{Z}_{t}\} \sim \mathrm{iid}(\mathbf{0},\mathop{\varSigma \vert } \,), }$$
(8.2.11)

if the random vectors {Z t } are independent and identically distributed with mean 0 and covariance matrix \(\mathop{\varSigma \vert }\,\).

Multivariate white noise {Z t } is used as a building block from which can be constructed an enormous variety of multivariate time series. The linear processes are generated as follows.

Definition 8.2.4.

The m-variate series \(\{\mathbf{X}_{t}\}\) is a linear process if it has the representation

$$\displaystyle{ \mathbf{X}_{t} =\sum _{ j=-\infty }^{\infty }C_{ j}\mathbf{Z}_{t-j},\quad \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,), }$$
(8.2.12)

where {C j } is a sequence of m × m matrices whose components are absolutely summable.

The linear process (8.2.12) is stationary (Problem 7.2) with mean 0 and covariance function

$$\displaystyle{ \varGamma (h) =\sum _{ j=-\infty }^{\infty }C_{ j+h}\mathop{\varSigma \vert } \,C_{j}',\quad h = 0,\pm 1,\ldots. }$$
(8.2.13)

An MA(∞) process is a linear process with C j  = 0 for j < 0. Thus \(\{\mathbf{X}_{t}\}\) is an MA() process if and only if there exists a white noise sequence {Z t } and a sequence of matrices C j with absolutely summable components such that

$$\displaystyle{\mathbf{X}_{t} =\sum _{ j=0}^{\infty }C_{ j}\mathbf{Z}_{t-j}.}$$

Multivariate ARMA processes will be discussed in Section 8.4, where it will be shown in particular that any causal ARMA( p, q) process can be expressed as an MA() process, while any invertible ARMA( p, q) process can be expressed as an AR(∞) process, i.e., a process satisfying equations of the form

$$\displaystyle{ \mathbf{X}_{t} +\sum _{ j=1}^{\infty }A_{ j}\mathbf{X}_{t-j} = \mathbf{Z}_{t}, }$$

in which the matrices A j have absolutely summable components.

8.2.1 Second-Order Properties in the Frequency Domain

Provided that the components of the covariance matrix function Γ(⋅ ) have the property h = −  | γ ij (h) |  < , i, j = 1, , m, then Γ has a matrix-valued spectral density function

$$\displaystyle{f(\lambda ) ={ 1 \over 2\pi }\sum _{h=-\infty }^{\infty }e^{-i\lambda h}\varGamma (h),\quad -\pi \leq \lambda \leq \pi,}$$

and Γ can be expressed in terms of f as

$$\displaystyle{\varGamma (h) =\int _{ -\pi }^{\pi }e^{i\lambda h}f(\lambda )d\lambda.}$$

The second-order properties of the stationary process \(\{\mathbf{X}_{t}\}\) can therefore be described equivalently in terms of f(⋅ ) rather than Γ(⋅ ). Similarly, \(\{\mathbf{X}_{t}\}\) has a spectral representation

$$\displaystyle{\mathbf{X}_{t} =\int _{ -\pi }^{\pi }e^{i\lambda t}d\mathbf{Z}(\lambda ),}$$

where {Z(λ), −π ≤ λ ≤ π} is a process whose components are complex-valued processes satisfying

$$\displaystyle{ E\left (dZ_{j}(\lambda )d\overline{Z}_{k}(\mu )\right ) = \left \{\begin{array}{@{}l@{\quad }l@{}} f_{jk}(\lambda )d\lambda \quad &\mbox{ if }\lambda =\mu, \\ 0 \quad &\mbox{ if }\lambda \neq \mu, \end{array} \right. }$$

and \(\overline{Z}_{k}\) denotes the complex conjugate of Z k . We shall not go into the spectral representation in this book. For details see Brockwell and Davis (1991).

8.3 Estimation of the Mean and Covariance Function

As in the univariate case, the estimation of the mean vector and covariances of a stationary multivariate time series plays an important role in describing and model- ing the dependence structure of the component series. In this section we introduce estimators, for a stationary m-variate time series \(\{\mathbf{X}_{t}\}\), of the components μ j , γ ij (h), and ρ ij (h) of \(\boldsymbol{\mu }\), Γ(h), and R(h), respectively. We also examine the large-sample properties of these estimators.

8.3.1 Estimation of \(\boldsymbol{\mu }\)

A natural unbiased estimator of the mean vector \(\boldsymbol{\mu }\) based on the observations X 1, , X n is the vector of sample means

$$\displaystyle{\overline{\mathbf{X}}_{n} ={ 1 \over n}\sum _{t=1}^{n}\mathbf{X}_{ t}.}$$

The resulting estimate of the mean of the jth time series is then the univariate sample mean (1∕n) t = 1 n X tj . If each of the univariate autocovariance functions γ ii (⋅ ), i = 1, , m, satisfies the conditions of Proposition 2.4.1, then the consistency of the estimator \(\overline{\mathbf{X}}_{n}\) can be established by applying the proposition to each of the component time series {X ti }. This immediately gives the following result.

Proposition 8.3.1.

If \(\{\mathbf{X}_{t}\}\) is a stationary multivariate time series with mean \(\boldsymbol{\mu }\) and covariance function Γ(⋅), then as n →∞,

$$\displaystyle{E\left (\overline{\mathbf{X}}_{n} -\boldsymbol{\mu }\right )'\left (\overline{\mathbf{X}}_{n} -\boldsymbol{\mu }\right ) \rightarrow 0\quad \mathrm{if}\ \gamma _{ii}(n) \rightarrow 0,\quad 1 \leq i \leq m,}$$

and

$$\displaystyle{nE\left (\overline{\mathbf{X} } _{n} -\boldsymbol{\mu }\right )'\left (\overline{\mathbf{X}}_{n} -\boldsymbol{\mu }\right ) \rightarrow \sum _{i=1}^{m}\sum _{ h=-\infty }^{\infty }\gamma _{ ii}(h)\quad \mathrm{if}\ \sum _{h=-\infty }^{\infty }\vert \gamma _{ ii}(h)\vert < \infty,\quad 1 \leq i \leq m.}$$

Under more restrictive assumptions on the process \(\{\mathbf{X}_{t}\}\) it can also be shown that \(\overline{\mathbf{X}}_{n}\) is approximately normally distributed for large n. Determination of the covariance matrix of this distribution would allow us to obtain confidence regions for \(\boldsymbol{\mu }\). However, this is quite complicated, and the following simple approximation is useful in practice.

For each i we construct a confidence interval for μ i based on the sample mean \(\overline{X}_{i}\) of the univariate series X 1i , , X ti and combine these to form a confidence region for \(\boldsymbol{\mu }\). If f i (ω) is the spectral density of the ith process {X ti } and if the sample size n is large, then we know, under the same conditions as in Section 2.4, that \(\sqrt{n}\left (\overline{X}_{ i} -\mu _{i}\right )\) is approximately normally distributed with mean zero and variance

$$\displaystyle{2\pi \ f_{i}(0) =\sum _{ k=-\infty }^{\infty }\gamma _{ ii}(k).}$$

It can also be shown (see, e.g., Anderson 1971) that

$$\displaystyle{2\pi \,\hat{f}_{i}(0):=\sum _{\vert h\vert \leq r}\left (1 -{\vert h\vert \over r} \right )\hat{\gamma }_{ii}(h)}$$

is a consistent estimator of 2π f i (0), provided that r = r n is a sequence of numbers depending on n in such a way that r n  →  and r n n → 0 as n → . Thus if \(\overline{X}_{i}\) denotes the sample mean of the ith process and Φ α is the α-quantile of the standard normal distribution, then the bounds

$$\displaystyle{\overline{X}_{i} \pm \varPhi _{1-\alpha /2}\left (2\pi \,\hat{f}_{i}(0)/n\right )^{1/2}}$$

are asymptotic (1 −α) confidence bounds for μ i . Hence

$$\displaystyle\begin{array}{rcl} P\bigg(\vert \mu _{i} -\overline{X}_{i}\vert & \leq & \varPhi _{1-\alpha /2}\left (2\pi \ \,\hat{f}_{i}(0)/n\right )^{1/2},i = 1,\ldots,m\bigg) {}\\ & \geq & 1 -\sum _{i=1}^{m}P\left (\left \vert \mu _{ i} -\overline{X}_{i}\right \vert >\varPhi _{1-\alpha /2}\left (2\pi \ \,\hat{f}_{i}(0)/n\right )^{1/2}\right ), {}\\ \end{array}$$

where the right-hand side converges to 1 − m α as n → . Consequently, as n → , the set of m-dimensional vectors bounded by

$$\displaystyle{ \left \{x_{i} = \overline{X}_{i} \pm \varPhi _{1-\left (\alpha /(2m)\right )}\left (2\pi \,\hat{f}_{i}(0)/n\right )^{1/2},i = 1,\ldots,m\right \} }$$
(8.3.1)

has a confidence coefficient that converges to a value greater than or equal to 1 −α (and substantially greater if m is large). Nevertheless, the region defined by (8.3.1) is easy to determine and is of reasonable size, provided that m is not too large.

8.3.2 Estimation of \(\boldsymbol{\varGamma }(h)\)

As in the univariate case, a natural estimator of the covariance \(\varGamma (h) = E\big[{\bigl (\mathbf{X}_{t+h} -\boldsymbol{\mu }\bigr )}{\bigl (\mathbf{X}_{t} -\boldsymbol{\mu }\bigr )}'\big]\) is

$$\displaystyle{ \hat{\varGamma }(h) = \left \{\begin{array}{@{}l@{\quad }l@{}} n^{-1}\sum _{ t=1}^{n-h}\left (\mathbf{X}_{ t+h} -\overline{\mathbf{X}}_{n}\right )\left (\mathbf{X}_{t} -\overline{\mathbf{X}}_{n}\right )'\quad &\mbox{ for }0 \leq h \leq -1, \\ \hat{\varGamma }'(-h) \quad &\mbox{ for } - n + 1 \leq h < 0. \end{array} \right. }$$

Writing \(\hat{\gamma }_{ij}(h)\) for the (i, j)-component of \(\hat{\varGamma }(h)\), i, j = 1, 2, , we estimate the cross-correlations by

$$\displaystyle{\hat{\rho }_{ij}(h) =\hat{\gamma } _{ij}(h)(\hat{\gamma }_{ii}(0)\hat{\gamma }_{jj}(0))^{-1/2}.}$$

If i = j, then \(\hat{\rho }_{ij}\) reduces to the sample autocorrelation function of the ith series.

Derivation of the large-sample properties of \(\hat{\gamma }_{ij}\) and \(\hat{\rho }_{ij}\) is quite complicated in general. Here we shall simply note one result that is of particular importance for testing the independence of two component series. For details of the proof of this and related results, see Brockwell and Davis (1991).

Theorem 8.3.1.

Let \(\{\mathbf{X}_{t}\}\) be the bivariate time series whose components are defined by

$$\displaystyle{X_{t1} =\sum _{ k=-\infty }^{\infty }\alpha _{ k}Z_{t-k,1},\quad \{Z_{t1}\} \sim \mathrm{IID}\left (0,\sigma _{1}^{2}\right ),}$$

and

$$\displaystyle{X_{t2} =\sum _{ k=-\infty }^{\infty }\beta _{ k}Z_{t-k,2},\quad \{Z_{t2}\} \sim \mathrm{IID}\left (0,\sigma _{2}^{2}\right ),}$$

where the two sequences {Z t1 } and {Z t2 } are independent, ∑ k k | < ∞, and ∑ k k | < ∞.

Then for all integers h and k with h ≠ k, the random variables \(n^{1/2}\hat{\rho }_{12}(h)\) and \(n^{1/2}\hat{\rho }_{12}(k)\) are approximately bivariate normal with mean 0 , variance ∑ j=−∞ ρ 11 ( j)ρ 22 ( j), and covariance ∑ j=−∞ ρ 11 ( j)ρ 22 ( j + k − h), for n large.

[For a related result that does not require the independence of the two series \(\{X_{t1}\}\) and \(\{X_{t2}\}\) see Bartlett’s Formula, Section 8.3.4 below. ]

Theorem 8.3.1 is useful in testing for correlation between two time series. If one of the two processes in the theorem is white noise, then it follows at once from the theorem that \(\hat{\rho }_{12}(h)\) is approximately normally distributed with mean 0 and variance 1∕n, in which case it is straightforward to test the hypothesis that ρ 12(h) = 0. However, if neither process is white noise, then a value of \(\hat{\rho }_{12}(h)\) that is large relative to n −1∕2 does not necessarily indicate that ρ 12(h) is different from zero. For example, suppose that \(\{X_{t1}\}\) and \(\{X_{t2}\}\) are two independent AR(1) processes with ρ 11(h) = ρ 22(h) = 0. 8 | h | . Then the large-sample variance of \(\hat{\rho }_{12}(h)\) is \(n^{-1}\left (1 + 2\sum _{k=1}^{\infty }(0.64)^{k}\right ) = 4.556n^{-1}\). It would therefore not be surprising to observe a value of \(\hat{\rho }_{12}(h)\) as large as 3n −1∕2 even though \(\{X_{t1}\}\) and \(\{X_{t2}\}\) are independent. If on the other hand, ρ 11(h) = 0. 8 | h |  and ρ 22(h) = (−0. 8) | h | , then the large-sample variance of \(\hat{\rho }_{12}(h)\) is 0. 2195n −1, and an observed value of 3n −1∕2 for \(\hat{\rho }_{12}(h)\) would be very unlikely.

8.3.3 Testing for Independence of Two Stationary Time Series

Since by Theorem 8.3.1 the large-sample distribution of \(\hat{\rho }_{12}(h)\) depends on both ρ 11(⋅ ) and ρ 22(⋅ ), any test for independence of the two component series cannot be based solely on estimated values of ρ 12(h), h = 0, ±1, , without taking into account the nature of the two component series.

This difficulty can be circumvented by “prewhitening” the two series before computing the cross-correlations \(\hat{\rho }_{12}(h)\), i.e., by transforming the two series to white noise by application of suitable filters. If \(\{X_{t1}\}\) and \(\{X_{t2}\}\) are invertible ARMA ( p, q) processes, this can be achieved by the transformations

$$\displaystyle{Z_{ti} =\sum _{ j=0}^{\infty }\pi _{ j}^{(i)}X_{ t-j,i},}$$

where \(\sum _{j=0}^{\infty }\pi _{j}^{(i)}z^{\,j} =\phi ^{(i)}(z)/\theta ^{(i)}(z)\) and ϕ (i), θ (i) are the autoregressive and moving-average polynomials of the ith series, i = 1, 2.

Since in practice the true model is nearly always unknown and since the data X tj , t ≤ 0, are not available, it is convenient to replace the sequences {Z ti } by the residuals \(\big\{\hat{W}_{ti}\big\}\) after fitting a maximum likelihood ARMA model to each of the component series (see (5.3.1)). If the fitted ARMA models were in fact the true models, the series \(\big\{\hat{W}_{ti}\big\}\) would be white noise sequences for i = 1, 2.

To test the hypothesis H 0 that \(\{X_{t1}\}\) and \(\{X_{t2}\}\) are independent series, we observe that under H 0, the corresponding two prewhitened series {Z t1} and {Z t2} are also independent. Theorem 8.3.1 then implies that the sample cross-correlations \(\hat{\rho }_{12}(h)\), \(\hat{\rho }_{12}(k)\), hk, of {Z t1} and {Z t2} are for large n approximately independent and normally distributed with means 0 and variances n −1. An approximate test for independence can therefore be obtained by comparing the values of \(\vert \hat{\rho }_{12}(h)\vert \) with 1. 96n −1∕2, exactly as in Section 5.3.2 If we prewhiten only one of the two original series, say \(\{X_{t1}\}\), then under H 0 Theorem 8.3.1 implies that the sample cross-correlations \(\tilde{\rho }_{12}(h)\), \(\tilde{\rho }_{12}(k)\), hk, of {Z t1} and \(\{X_{t2}\}\) are for large n approximately normal with means 0, variances n −1 and covariance n −1 ρ 22(kh), where ρ 22(⋅ ) is the autocorrelation function of \(\{X_{t2}\}\). Hence, for any fixed h, \(\tilde{\rho }_{12}(h)\) also falls (under H 0) between the bounds ± 1. 96n −1∕2 with a probability of approximately 0.95.

Example 8.3.1.

The sample correlation functions \(\hat{\rho }_{ij}(\cdot )\), i, j = 1, 2, of the bivariate time series E731A.TSM (of length n = 200) are shown in Figure 8.7. Without taking into account the autocorrelations \(\hat{\rho }_{ii}(\cdot )\), i = 1, 2, it is impossible to decide on the basis of the cross-correlations whether or not the two component processes are independent of each other. Notice that many of the sample cross-correlations \(\hat{\rho }_{ij}(h)\), ij, lie outside the bounds \(\pm 1.96n^{-1/2} = \pm 0.139\). However, these bounds are relevant only if at least one of the component series is white noise. Since this is clearly not the case, a whitening transformation must be applied to at least one of the two component series. Analysis using ITSM leads to AR(1) models for each. The residuals from these maximum likelihood models are stored as a bivariate series in the file E731B.TSM, and their sample correlations, obtained from ITSM, are shown in Figure 8.8. All but two of the cross-correlations are between the bounds ± 0. 139, suggesting by Theorem 8.3.1 that the two residual series (and hence the two original series) are uncorrelated. The data for this example were in fact generated as two independent AR(1) series with ϕ = 0. 8 and σ 2 = 1.

Fig. 8.7
figure 7

The sample correlations of the bivariate series E731A.TSM of Example 8.3.1, showing the bounds ± 1. 96n −1∕2

Fig. 8.8
figure 8

The sample correlations of the bivariate series of residuals E731B.TSM, whose components are the residuals from the AR(1) models fitted to each of the component series in E731A.TSM

8.3.4 Bartlett’s Formula

In Section 2.4 we gave Bartlett’s formula for the large-sample distribution of the sample autocorrelation vector \(\hat{\boldsymbol{\rho }}= \left (\hat{\rho }(1),\ldots,\hat{\rho }(k)\right )'\) of a univariate time series. The following theorem gives a large-sample approximation to the covariances of the sample cross-correlations \(\hat{\rho }_{12}(h)\) and \(\hat{\rho }_{12}(k)\) of the bivariate time series \(\{\mathbf{X}_{t}\}\) under the assumption that \(\{\mathbf{X}_{t}\}\) is Gaussian. However, it is not assumed (as in Theorem 8.3.1) that \(\{X_{t1}\}\) is independent of \(\{X_{t2}\}\).

Bartlett’s Formula:

If \(\{\mathbf{X}_{t}\}\) is a bivariate Gaussian time series with covariances satisfying \(\sum _{h=-\infty }^{\infty }\vert \gamma _{ij}(h)\vert < \infty \), i, j = 1, 2, then

$$\displaystyle\begin{array}{rcl} \lim _{n\rightarrow \infty }n\mathrm{Cov}\big(\hat{\rho }_{12}(h),\hat{\rho }_{12}(k)\bigr )& =& \sum _{j=-\infty }^{\infty }\bigg[\rho _{ 11}(\,j)\rho _{22}(\,j + k - h) +\rho _{12}(\,j + k)\rho _{21}(\,j - h)\bigg. {}\\ & & \begin{array}{l} -\rho _{12}(h)\{\rho _{11}(\,j)\rho _{12}(\,j + k) +\rho _{22}(\,j)\rho _{21}(\,j - k)\} \\ -\rho _{12}(k)\{\rho _{11}(\,j)\rho _{12}(\,j + h) +\rho _{22}(\,j)\rho _{21}(\,j - h)\} \\ +\rho _{12}(h)\rho _{12}(k)\Bigg\{\dfrac{1} {2}\rho _{11}^{2}(\,j) +\rho _{ 12}^{2}(\,j) + \dfrac{1} {2}\rho _{22}^{2}(\,j)\Bigg\}\!\bigg.\bigg] \end{array} {}\\ \end{array}$$

Corollary 8.3.1.

If \(\{\mathbf{X}_{t}\}\) satisfies the conditions for Bartlett’s formula, if either \(\{X_{t1}\}\) or \(\{X_{t2}\}\) is white noise, and if

$$\displaystyle{\rho _{12}(h) = 0,\quad h\notin [a,b],}$$

then

$$\displaystyle{\lim _{n\rightarrow \infty }n\mathrm{Var}\left (\hat{\rho }_{12}(h)\right ) = 1,\quad h\notin [a,b].}$$

Example 8.3.2.

Sales with a leading indicator

We consider again the differenced series {D t1} and {D t2} of Example 8.1.2, for which we found the maximum likelihood models (8.1.1) and (8.1.2) using ITSM. The residuals from the two models (which can be filed by ITSM) are the two “whitened” series \(\big\{\hat{W}_{t1}\big\}\) and \(\big\{\hat{W}_{t2}\big\}\) with sample variances 0.0779 and 1.754, respectively. This bivariate series is contained in the file E732.TSM.

The sample auto- and cross-correlations of {D t1} and {D t2} were shown in Figure 8.6. Without taking into account the autocorrelations, it is not possible to draw any conclusions about the dependence between the two component series from the cross-correlations.

Examination of the sample cross-correlation function of the whitened series \(\big\{\hat{W}_{t1}\big\}\) and \(\big\{\hat{W}_{t2}\big\}\), on the other hand, is much more informative. From Figure 8.9 it is apparent that there is one large-sample cross-correlation (between \(\hat{W}_{t+3,2}\) and \(\hat{W}_{t,1}\)), while the others are all between ± 1. 96n −1∕2.

Fig. 8.9
figure 9

The sample correlations of the whitened series \(\hat{W}_{t+h,1}\) and \(\hat{W}_{t2}\) of Example 8.3.2, showing the bounds ± 1. 96n −1∕2

If \(\big\{\hat{W}_{t1}\big\}\) and \(\big\{\hat{W}_{t2}\big\}\) are assumed to be jointly Gaussian, Corollary 8.3.1 indicates the compatibility of the cross-correlations with a model for which

$$\displaystyle{\rho _{12}(-3)\neq 0}$$

and

$$\displaystyle{\rho _{12}(h) = 0,\quad h\neq - 3.}$$

The value \(\hat{\rho }_{12}(-3) = 0.969\) suggests the model

$$\displaystyle{ \hat{W}_{t2} = 4.74\hat{W}_{t-3,1} + N_{t}, }$$
(8.3.2)

where the stationary noise {N t } has small variance compared with \(\big\{\hat{W}_{t2}\big\}\) and \(\big\{\hat{W}_{t1}\big\}\), and the coefficient 4.74 is the square root of the ratio of sample variances of \(\big\{\hat{W}_{t2}\big\}\) and \(\big\{\hat{W}_{t1}\big\}\). A study of the sample values of \(\big\{\hat{W}_{t2} - 4.74\hat{W}_{t-3,1}\big\}\) suggests the model

$$\displaystyle{ (1 + 0.345B)N_{t} = U_{t},\ \ \{U_{t}\} \sim \mathrm{WN}(0,0.0782) }$$
(8.3.3)

for {N t }. Finally, replacing \(\hat{W}_{t2}\) and \(\hat{W}_{t-3,1}\) in (8.3.2) by Z t2 and Z t−3, 1, respectively, and then using (8.1.1) and (8.1.2) to express Z t2 and Z t−3, 1 in terms of {D t2} and {D t1}, we obtain a model relating {D t1}, {D t2}, and {U t1}, namely,

$$\displaystyle\begin{array}{rcl} D_{t2} + 0.0773 = (1 - 0.610B)(1 - 0.838B)^{-1}[4.74(1 - 0.474B)^{-1}D_{ t-3,1}& & {}\\ \,+(1 + 0.345B)^{-1}U_{ t}].\qquad & & {}\\ \end{array}$$

This model should be compared with the one derived later in Section 11.1 by the more systematic technique of transfer function modeling.

8.4 Multivariate ARMA Processes

As in the univariate case, we can define an extremely useful class of multivariate stationary processes \(\{\mathbf{X}_{t}\}\) by requiring that \(\{\mathbf{X}_{t}\}\) should satisfy a set of linear difference equations with constant coefficients. Multivariate white noise {Z t } (see Definition 8.2.2) is a fundamental building block from which these ARMA processes are constructed.

Definition 8.4.1.

\(\{\mathbf{X}_{t}\}\) is an ARMA( p,q) process if \(\{\mathbf{X}_{t}\}\) is stationary and if for every t,

$$\displaystyle{ \mathbf{X}_{t} -\varPhi _{1}\mathbf{X}_{t-1} -\cdots -\varPhi _{p}\mathbf{X}_{t-p} = \mathbf{Z}_{t} +\varTheta _{1}\mathbf{Z}_{t-1} + \cdots +\varTheta _{q}\mathbf{Z}_{t-q}, }$$
(8.4.1)

where \(\{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert }\,)\). (\(\{\mathbf{X}_{t}\}\) is an ARMA(p,q) process with mean \(\boldsymbol{\mu }\) \(\{\mathbf{X}_{t} -\boldsymbol{\mu }\}\) is an ARMA( p, q) process.)

Equations (8.4.1) can be written in the more compact form

$$\displaystyle{ \varPhi (B)\mathbf{X}_{t} =\varTheta (B)\mathbf{Z}_{t},\ \ \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,), }$$
(8.4.2)

where Φ(z): = IΦ 1 z −⋯ −Φ p z p and Θ(z): = I +Θ 1 z + ⋯ +Θ q z q are matrix-valued polynomials, I is the m × m identity matrix, and B as usual denotes the backward shift operator. (Each component of the matrices Φ(z), Θ(z) is a polynomial with real coefficients and degree less than or equal to p, q, respectively.)

Example 8.4.1.

The multivariate AR(1) process

Setting p = 1 and q = 0 in (8.4.1) gives the defining equations

$$\displaystyle{ \mathbf{X}_{t} =\varPhi \mathbf{X}_{t-1} + \mathbf{Z}_{t},\quad \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,), }$$
(8.4.3)

for the multivariate AR(1) series \(\{\mathbf{X}_{t}\}\). By exactly the same argument as used in Example 2.2.1, we can express X t as

$$\displaystyle{ \mathbf{X}_{t} =\sum _{ j=0}^{\infty }\varPhi ^{\,j}\mathbf{Z}_{ t-j}, }$$
(8.4.4)

provided that all the eigenvalues of Φ are less than 1 in absolute value, i.e., provided that

$$\displaystyle{ \det (I - z\varPhi )\neq 0\quad \mathrm{for\ all}\ z \in \mathbb{C}\ \mathrm{such\ that}\ \vert z\vert \leq 1. }$$
(8.4.5)

If this condition is satisfied, then the coefficients Φ j are absolutely summable, and hence the series in (8.4.4) converges; i.e., each component of the matrix j = 0 n Φ j Z tj converges (see Remark 1 of Section 2.2). The same argument as in Example 2.2.1 also shows that (8.4.4) is the unique stationary solution of (8.4.3). The condition that all the eigenvalues of Φ should be less than 1 in absolute value (or equivalently (8.4.5)) is just the multivariate analogue of the condition | ϕ |  < 1 required for the existence of a causal stationary solution of the univariate AR(1) equations (2.2.8). □ 

Causality and invertibility of a multivariate ARMA( p, q) process are defined precisely as in Section 3.1, except that the coefficients ψ j , π j in the representations X t  =  j = 0 ψ j Z tj and Z t  =  j = 0 π j X tj are replaced by m × m matrices Ψ j and Π j whose components are required to be absolutely summable. The following two theorems (proofs of which can be found in Brockwell and Davis (1991)) provide us with criteria for causality and invertibility analogous to those of Section 3.1.

Causality:

An ARMA(p, q) process \(\{\mathbf{X}_{t}\}\) is causal, or a causal function of \(\boldsymbol{\{}\mathbf{Z}_{t}\boldsymbol{\}}\), if there exist matrices {Ψ j } with absolutely summable components such that

$$\displaystyle{ \mathbf{X}_{t} =\sum _{ j=0}^{\infty }\varPsi _{ j}\mathbf{Z}_{t-j}\quad \mathrm{for\ all\ }t. }$$
(8.4.6)

Causality is equivalent to the condition

$$\displaystyle{ \det \varPhi (z)\neq 0\mbox{ for all }z \in \mathbb{C}\mbox{ such that }\vert z\vert \leq 1. }$$
(8.4.7)

The matrices Ψ j are found recursively from the equations

$$\displaystyle{ \varPsi _{j} =\varTheta _{j} +\sum _{ k=1}^{\infty }\varPhi _{ k}\varPsi _{j-k},\quad j = 0,1,\ldots, }$$
(8.4.8)

where we define Θ 0 = I, Θ j  = 0 for j > q, Φ j  = 0 for j > p, and Ψ j  = 0 for j < 0.

Invertibility:

An ARMA( p, q) process \(\{\mathbf{X}_{t}\}\) is invertible if there exist matrices {Π j } with absolutely summable components such that

$$\displaystyle{ \mathbf{Z}_{t} =\sum _{ j=0}^{\infty }\varPi _{ j}\mathbf{X}_{t-j}\mbox{ for all }t. }$$
(8.4.9)

Invertibility is equivalent to the condition

$$\displaystyle{ \det \varTheta (z)\neq 0\mbox{ for all }z \in \mathbb{C}\mbox{ such that }\vert z\vert \leq 1. }$$
(8.4.10)

The matrices Π j are found recursively from the equations

$$\displaystyle{ \varPi _{j} = -\varPhi _{j} -\sum _{k=1}^{\infty }\varTheta _{ k}\varPi _{j-k},\quad j = 0,1,\ldots, }$$
(8.4.11)

where we define Φ 0 = −I, Φ j  = 0 for j > p, Θ j  = 0 for j > q, and Π j  = 0 for j < 0.

Example 8.4.2.

For the multivariate AR(1) process defined by (8.4.3), the recursions (8.4.8) give

$$\displaystyle{\varPsi _{0} = I,}$$
$$\displaystyle{\varPsi _{1} =\varPhi \varPsi _{0} =\varPhi,}$$
$$\displaystyle{\varPsi _{2} =\varPhi \varPsi _{1} =\varPhi ^{2},}$$
$$\displaystyle{\vdots}$$
$$\displaystyle{\varPsi _{j} =\varPhi \varPsi _{j-1} =\varPhi ^{\,j},\ j \geq 3,}$$

as already found in Example 8.4.1. □ 

Remark 3.

For the bivariate AR(1) process (8.4.3) with

$$\displaystyle{\varPhi = \left [\begin{array}{*{10}c} 0 &0.5\\ 0 & 0 \end{array} \right ]}$$

it is easy to check that Ψ j  = Φ j = 0 for j > 1 and hence that \(\{\mathbf{X}_{t}\}\) has the alternative representation

$$\displaystyle{\mathbf{X}_{t} = \mathbf{Z}_{t} +\varPhi \mathbf{Z}_{t-1}}$$

as an MA(1) process. This example shows that it is not always possible to distinguish between multivariate ARMA models of different orders without imposing further restrictions. If, for example, attention is restricted to pure AR processes, the problem does not arise. For detailed accounts of the identification problem for general ARMA( p, q) models see Hannan and Deistler (1988) and Lütkepohl (1993). □ 

8.4.1 The Covariance Matrix Function of a Causal ARMA Process

From (8.2.13) we can express the covariance matrix Γ(h) = E(X t+h X t ′) of the causal process (8.4.6) as

$$\displaystyle{ \varGamma (h) =\sum _{ j=0}^{\infty }\varPsi _{ h+j}\mathop{\varSigma \vert } \,\varPsi _{j}',\quad h = 0,\pm 1,\ldots, }$$
(8.4.12)

where the matrices Ψ j are found from (8.4.8) and Ψ j : = 0 for j < 0.

The covariance matrices Γ(h), h = 0, ±1, , can also be found by solving the Yule–Walker equations

$$\displaystyle{ \varGamma (\,j) -\sum _{r=1}^{p}\varPhi _{ r}\varGamma (\,j - r) =\sum _{j\leq r\leq q}\varTheta _{r}\mathop{\varSigma \vert } \,\varPsi _{r-j},\quad j = 0,1,2,\ldots, }$$
(8.4.13)

obtained by postmultiplying (8.4.1) by X tj ′ and taking expectations. The first p + 1 of the equation  (8.4.13) can be solved for the components of Γ(0), , Γ(p) using the fact that Γ(−h) = Γ′(h). The remaining equations then give Γ(p + 1), Γ(p + 2), recursively. An explicit form of the solution of these equations can be written down by making use of Kronecker products and the vec operator (see e.g., Lütkepohl 1993).

Remark 4.

If z 0 is the root of detΦ(z) = 0 with smallest absolute value, then it can be shown from the recursions (8.4.8) that Ψ j r j → 0 as j →  for all r such that | z 0 | −1 < r < 1. Hence, there is a constant C such that each component of Ψ j is smaller in absolute value than Cr j. This implies in turn that there is a constant K such that each component of the matrix \(\varPsi _{h+j}\mathop{\varSigma \vert } \,\varPsi _{j}'\) on the right of (8.4.12) is bounded in absolute value by Kr  2j. Provided that | z 0 | is not very close to 1, this means that the series (8.4.12) converges rapidly, and the error incurred in each component by truncating the series after the term with j = k − 1 is smaller in absolute value than \(\sum _{j=k}^{\infty }Kr^{\,2j} = Kr^{\,2k}/\left (1 - r^{\,2}\right )\).

8.5 Best Linear Predictors of Second-Order Random Vectors

Let \(\big\{\mathbf{X}_{t} = (X_{t1},\ldots,X_{tm})'\big\}\) be an m-variate time series with means \(E\mathbf{X}_{t} = \boldsymbol{\mu }_{t}\) and covariance function given by the m × m matrices

$$\displaystyle{K(i,j) = E\left (\mathbf{X}_{i}\mathbf{X}_{j}'\right ) -\boldsymbol{\mu }_{i}\boldsymbol{\mu }_{j}'.}$$

If Y = (Y 1, , Y m )′ is a random vector with finite second moments and \(E\mathbf{Y} = \boldsymbol{\mu }\), we define

$$\displaystyle{ P_{n}(\mathbf{Y}) = (P_{n}Y _{1},\ldots,P_{n}Y _{m})', }$$
(8.5.1)

where P n Y j is the best linear predictor of the component Y j of Y in terms of all of the components of the vectors X t , t = 1, , n, and the constant 1. It follows immediately from the properties of the prediction operator (Section 2.5) that

$$\displaystyle{ P_{n}(\mathbf{Y}) = \boldsymbol{\mu } + A_{1}(\mathbf{X}_{n} -\boldsymbol{\mu }_{n}) + \cdots + A_{n}(\mathbf{X}_{1} -\boldsymbol{\mu }_{1}) }$$
(8.5.2)

for some matrices A 1, , A n , and that

$$\displaystyle{ \mathbf{Y} - P_{n}(\mathbf{Y}) \perp \mathbf{X}_{n+1-i},\ \ i = 1,\ldots,n, }$$
(8.5.3)

where we say that two m-dimensional random vectors X and Y are orthogonal (written X ⊥ Y) if E(X Y′) is a matrix of zeros. The vector of best predictors (8.5.1) is uniquely determined by (8.5.2) and (8.5.3), although it is possible that there may be more than one possible choice for A 1, , A n .

As a special case of the above, if \(\{\mathbf{X}_{t}\}\) is a zero-mean time series, the best linear predictor \(\hat{\mathbf{X}}_{n+1}\) of X n+1 in terms of X 1, , X n is obtained on replacing Y by X n+1 in (8.5.1). Thus

$$\displaystyle{\hat{\mathbf{X}}_{n+1} = \left \{\begin{array}{@{}l@{\quad }l@{}} \mathbf{0}, \quad &\mbox{ if }n = 0,\\ P_{ n}(\mathbf{X}_{n+1}),\quad &\mbox{ if }n \geq 1. \end{array} \right.}$$

Hence, we can write

$$\displaystyle{ \hat{\mathbf{X}}_{n+1} =\varPhi _{n1}\mathbf{X}_{n} + \cdots +\varPhi _{nn}\mathbf{X}_{1},\ \ n = 1,2,\ldots, }$$
(8.5.4)

where, from (8.5.3), the coefficients Φ nj ,  j = 1, , n, are such that

$$\displaystyle{ E\left (\hat{\mathbf{X}}_{n+1}\mathbf{X}_{n+1-i}'\right ) = E\left (\mathbf{X}_{n+1}\mathbf{X}_{n+1-i}'\right ),\quad i = 1,\ldots,n, }$$
(8.5.5)

i.e.,

$$\displaystyle{\sum _{j=1}^{n}\varPhi _{ nj}K(n + 1 - j,n + 1 - i) = K(n + 1,n + 1 - i),\quad i = 1,\ldots,n.}$$

In the case where \(\{\mathbf{X}_{t}\}\) is stationary with K(i,  j) = Γ(ij), the prediction equations simplify to the m-dimensional analogues of (2.5.7), i.e.,

$$\displaystyle{ \sum _{j=1}^{n}\varPhi _{ nj}\varGamma (i - j) =\varGamma (i),\quad i = 1,\ldots,n. }$$
(8.5.6)

Provided that the covariance matrix of the nm components of X 1, , X n is nonsingular for every n ≥ 1, the coefficients {Φ nj } can be determined recursively using a multivariate version of the Durbin–Levinson algorithm given by Whittle (1963) (for details see Brockwell and Davis (1991), Proposition 11.4.1). Whittle’s recursions also determine the covariance matrices of the one-step prediction errors, namely, V 0 = Γ(0) and, for n ≥ 1,

$$\displaystyle{V _{n} = E(\mathbf{X}_{n+1} -\hat{\mathbf{X}}_{n+1})(\mathbf{X}_{n+1} -\hat{\mathbf{X}}_{n+1})'}$$
$$\displaystyle{ =\varGamma (0) -\varPhi _{n1}\varGamma (-1) -\cdots -\varPhi _{nn}\varGamma (-n). }$$
(8.5.7)

Remark 5.

The innovations algorithm also has a multivariate version that can be used for prediction in much the same way as the univariate version described in Section 2.5.4 (for details see Brockwell and Davis (1991), Proposition 11.4.2). □ 

8.6 Modeling and Forecasting with Multivariate AR Processes

If \(\{\mathbf{X}_{t}\}\) is any zero-mean second-order multivariate time series, it is easy to show from the results of Section 8.5 (Problem 8.4) that the one-step prediction errors \(\mathbf{X}_{j} -\hat{\mathbf{X}}_{j}\), j = 1, , n, have the property

$$\displaystyle{ E\left (\mathbf{X}_{j} -\hat{\mathbf{X}}_{j}\right )\left (\mathbf{X}_{k} -\hat{\mathbf{X}}_{k}\right )' = 0\ \mathrm{for}\ j\neq k. }$$
(8.6.1)

Moreover, the matrix M such that

$$\displaystyle{ \left [\begin{array}{*{10}c} \mathbf{X}_{1} -\hat{\mathbf{X}}_{1} \\ \mathbf{X}_{2} -\hat{\mathbf{X}}_{2} \\ \mathbf{X}_{3} -\hat{\mathbf{X}}_{3}\\ \vdots \\ \mathbf{X}_{n} -\hat{\mathbf{X}}_{n} \end{array} \right ] = M\left [\begin{array}{*{10}c} \mathbf{X}_{1} \\ \mathbf{X}_{2} \\ \mathbf{X}_{3}\\ \vdots \\ \mathbf{X}_{n} \end{array} \right ] }$$
(8.6.2)

is lower triangular with ones on the diagonal and therefore has determinant equal to 1.

If the series \(\{\mathbf{X}_{t}\}\) is also Gaussian, then (8.6.1) implies that the prediction errors \(\mathbf{U}_{j} = \mathbf{X}_{j} -\hat{\mathbf{X}}_{j}\), j = 1, , n, are independent with covariance matrices V 0, , V n−1, respectively (as specified in (8.5.7)). Consequently, the joint density of the prediction errors is the product

$$\displaystyle{f(\mathbf{u}_{1},\ldots,\mathbf{u}_{n}) = (2\pi )^{-nm/2}\left (\prod _{ j=1}^{n}\mathrm{det}V _{ j-1}\right )^{-1/2}\exp \left [-{1 \over 2}\sum _{j=1}^{n}\mathbf{u}_{ j}'V _{j-1}^{-1}\mathbf{u}_{ j}\right ].}$$

Since the determinant of the matrix M in (8.6.2) is equal to 1, the joint density of the observations X 1, , X n at x 1, , x n is obtained on replacing u 1, , u n in the last expression by the values of \(\mathbf{X}_{j} -\hat{\mathbf{X}}_{j}\) corresponding to the observations x 1, , x n .

If we suppose that \(\{\mathbf{X}_{t}\}\) is a zero-mean m-variate AR( p) process with coefficient matrices \(\boldsymbol{\varPhi }=\{\varPhi _{1},\ldots,\varPhi _{p}\}\) and white noise covariance matrix \(\mathop{\varSigma \vert }\,\), we can therefore express the likelihood of the observations X 1, , X n as

$$\displaystyle{L(\boldsymbol{\varPhi },\mathop{\varSigma \vert } \,) = (2\pi )^{-nm/2}\left (\prod _{ j=1}^{n}\mathrm{det}V _{ j-1}\right )^{-1/2}\exp \left [-{1 \over 2}\sum _{j=1}^{n}\mathbf{U}_{ j}'V _{j-1}^{-1}\mathbf{U}_{ j}\right ],}$$

where \(\mathbf{U}_{j} = \mathbf{X}_{j} -\hat{\mathbf{X}}_{j}\), j = 1, , n, and \(\hat{\mathbf{X}}_{j}\) and V j are found from (8.5.4), (8.5.6), and (8.5.7).

Maximization of the Gaussian likelihood is much more difficult in the multivariate than in the univariate case because of the potentially large number of parameters involved and the fact that it is not possible to compute the maximum likelihood estimator of \(\boldsymbol{\varPhi }\) independently of \(\mathop{\varSigma \vert }\,\) as in the univariate case. In principle, maximum likelihood estimators can be computed with the aid of efficient nonlinear optimization algorithms, but it is important to begin the search with preliminary estimates that are reasonably close to the maximum. For pure AR processes good preliminary estimates can be obtained using Whittle’s algorithm or a multivariate version of Burg’s algorithm given by Jones (1978). We shall restrict our discussion here to the use of Whittle’s algorithm (the multivariate option AR-Model>Estimation>Yule-Walker in ITSM), but Jones’s multivariate version of Burg’s algorithm is also available (AR-Model>Estimation>Burg). Other useful algorithms can be found in Lütkepohl (1993), in particular the method of conditional least squares and the method of Hannan and Rissanen (1982), the latter being useful also for preliminary estimation in the more difficult problem of fitting ARMA( p, q) models with q > 0. Spectral methods of estimation for multivariate ARMA processes are also frequently used. A discussion of these (as well as some time-domain methods) is given in Anderson (1980).

Order selection for multivariate autoregressive models can be made by minimizing a multivariate analogue of the univariate AICC statistic

$$\displaystyle{ \mathrm{AICC} = -2\ln L(\varPhi _{1},\ldots,\varPhi _{p},\mathop{\varSigma \vert } \,) +{ 2(\,pm^{2} + 1)nm \over nm - pm^{2} - 2}. }$$
(8.6.3)

8.6.1 Estimation for Autoregressive Processes Using Whittle’s Algorithm

If \(\{\mathbf{X}_{t}\}\) is the (causal) multivariate AR( p) process defined by the difference equations

$$\displaystyle{ \mathbf{X}_{t} =\varPhi _{1}\mathbf{X}_{t-1} + \cdots +\varPhi _{p}\mathbf{X}_{t-p} + \mathbf{Z}_{t},\quad \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,), }$$
(8.6.4)

then postmultiplying by X tj ′, j = 0, , p, and taking expectations gives the equations

$$\displaystyle{ \mathop{\varSigma \vert } \,=\varGamma (0) -\sum _{j=1}^{p}\varPhi _{ j}\varGamma (-j) }$$
(8.6.5)

and

$$\displaystyle{ \varGamma (i) =\sum _{ j=1}^{n}\varPhi _{ j}\varGamma (i - j),\quad i = 1,\ldots,p. }$$
(8.6.6)

Given the matrices Γ(0), , Γ(p), equation (8.6.6) can be used to determine the coefficient matrices Φ 1, , Φ p . The white noise covariance matrix \(\mathop{\varSigma \vert }\,\) can then be found from (8.6.5). The solution of these equations for Φ 1, , Φ p , and \(\mathop{\varSigma \vert }\,\) is identical to the solution of (8.5.6) and (8.5.7) for the prediction coefficient matrices Φ p1, , Φ pp and the corresponding prediction error covariance matrix V p . Consequently, Whittle’s algorithm can be used to carry out the algebra.

The Yule–Walker estimators \(\hat{\varPhi }_{1},\ldots,\hat{\varPhi }_{p}\), and \(\hat{\mathop{\varSigma \vert }\,}\) for the model (8.6.4) fitted to the data X 1, , X n are obtained by replacing Γ( j) in (8.6.5) and (8.6.6) by \(\hat{\varGamma }(\,j)\), j = 0, , p, and solving the resulting equations for Φ 1, , Φ p , and \(\mathop{\varSigma \vert }\,\). The solution of these equations is obtained from ITSM by selecting the multivariate option AR-Model>Estimation>Yule-Walker. The mean vector of the fitted model is the sample mean of the data, and Whittle’s algorithm is used to solve the equations (8.6.5) and (8.6.6) for the coefficient matrices and the white noise covariance matrix. The fitted model is displayed by ITSM in the form

$$\displaystyle{\mathbf{X}_{t} =\phi _{0} +\varPhi _{1}\mathbf{X}_{t-1} + \cdots +\varPhi _{p}\mathbf{X}_{t-p} + \mathbf{Z}_{t},\{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,).}$$

Note that the mean \(\boldsymbol{\mu }\) of this model is not the vector ϕ 0, but

$$\displaystyle{\boldsymbol{\mu } = (I -\varPhi _{1} -\cdots -\varPhi _{p})^{-1}\phi _{ 0}.}$$

In fitting multivariate autoregressive models using ITSM, check the box Find minimum AICC model to find the AR(p) model with 0 ≤ p ≤ 20 that minimizes the AICC value as defined in (8.6.3).

Analogous calculations using Jones’s multivariate version of Burg’s algorithm can be carried out by selecting AR-Model>Estimation>Burg.

Example 8.6.1.

The Dow Jones and All Ordinaries Indices

To find the minimum AICC Yule–Walker model (of order less than or equal to 20) for the bivariate series {(X t1, X t2)′, t = 1, , 250} of Example 8.1.1, proceed as follows. Select File>Project>Open> Multivariate, click OK, and then double-click on the file name, DJAOPC2.TSM. Check that Number of columns is set to 2, the dimension of the observation vectors, and click OK again to see graphs of the two component time series. No differencing is required (recalling from Example 8.1.1 that \(\{X_{t1}\}\) and \(\{X_{t2}\}\) are the daily percentage price changes of the original Dow Jones and All Ordinaries Indices). Select AR-Model>Estimation>Yule-Walker, check the box Find minimum AICC Model, click OK, and you will obtain the model

$$\displaystyle{\left [\begin{array}{*{10}c} X_{t1} \\ X_{t2} \end{array} \right ] = \left [\begin{array}{*{10}c} 0.0288\\ 0.00836 \end{array} \right ]+\left [\begin{array}{*{10}c} -0.0148 &0.0357\\ 0.6589 &0.0998 \end{array} \right ]\left [\begin{array}{*{10}c} X_{t-1,1} \\ X_{t-1,2} \end{array} \right ]+\left [\begin{array}{*{10}c} Z_{t1} \\ Z_{t2} \end{array} \right ],}$$

where

$$\displaystyle{\left [\begin{array}{*{10}c} Z_{t1} \\ Z_{t2} \end{array} \right ] \sim \mathrm{WN}\left (\left [\begin{array}{*{10}c} 0\\ 0 \end{array} \right ],\left [\begin{array}{*{10}c} 0.3653 &0.0224\\ 0.0224 &0.6016 \end{array} \right ]\right ).}$$

 □ 

Example 8.6.2.

Sales with a leading indicator

The series {Y t1} (leading indicator) and {Y t2} (sales) are stored in bivariate form (Y t1 in column 1 and Y t2 in column 2) in the file LS2.TSM. On opening this file in ITSM you will see the graphs of the two component time series. Inspection of the graphs immediately suggests, as in Example 8.2.2, that the differencing operator ∇ = 1 − B should be applied to the data before a stationary AR model is fitted. Select Transform>Difference and specify 1 for the differencing lag. Click OK and you will see the graphs of the two differenced series. Inspection of the series and their correlation functions (obtained by pressing the second yellow button at the top of the ITSM window) suggests that no further differencing is necessary. The next step is to select AR-model>Estimation>Yule-Walker with the option Find minimum AICC model. The resulting model has order p = 5 and parameters ϕ 0 = (0. 0328  0. 0156)′,

$$\displaystyle\begin{array}{rcl} \hat{\varPhi }_{1}& =& \left [\begin{array}{*{10}c} -0.517 & 0.024\\ -0.019 &-0.051 \end{array} \right ],\hat{\varPhi }_{2} = \left [\begin{array}{*{10}c} -0.192 &-0.018\\ 0.047 & 0.250 \end{array} \right ],\hat{\varPhi }_{3} = \left [\begin{array}{*{10}c} -0.073 &0.010\\ 4.678 &0.207 \end{array} \right ], {}\\ \hat{\varPhi }_{4}& =& \left [\begin{array}{*{10}c} -0.032 &-0.009\\ 3.664 & 0.004 \end{array} \right ],\hat{\varPhi }_{5} = \left [\begin{array}{*{10}c} 0.022 &0.011\\ 1.300 &0.029 \end{array} \right ],\hat{\mathop{\varSigma \vert } \,}= \left [\begin{array}{*{10}c} 0.076 &-0.003\\ -0.003 & 0.095 \end{array} \right ], {}\\ \end{array}$$

with AICC = 109.49. (Analogous calculations using Burg’s algorithm give an AR(8) model for the differenced series.) The sample cross-correlations of the residual vectors \(\hat{\mathbf{Z}}_{t}\) can be plotted by clicking on the last blue button at the top of the ITSM window. These are nearly all within the bounds \(\pm 1.96/\sqrt{n}\), suggesting that the model is a good fit. The components of the residual vectors themselves are plotted by selecting AR Model>Residual Analysis>Plot Residuals. Simulated observations from the fitted model can be generated using the option AR Model>Simulate. The fitted model has the interesting property that the upper right component of each of the co- efficient matrices is close to zero. This suggests that \(\{X_{t1}\}\) can be effectively modeled independently of \(\{X_{t2}\}\). In fact, the MA(1) model

$$\displaystyle{ X_{t1} = (1 - 0.474B)U_{t},\ \ \{U_{t}\} \sim \mathrm{WN}(0,0.0779), }$$
(8.6.7)

provides an adequate fit to the univariate series \(\{X_{t1}\}\). Inspecting the bottom rows of the coefficient matrices and deleting small entries, we find that the relation between \(\{X_{t1}\}\) and \(\{X_{t2}\}\) can be expressed approximately as

$$\displaystyle{ X_{t2 } =\, 0.250X_{t-2,2}\,+\,0.207X_{t-3,2}\,+\,4.678X_{t-3,1}\,+\,3.664X_{t-4,1}\,+\,1.300X_{t-5,1}\,+\,W_{t}, }$$

or equivalently,

$$\displaystyle{ X_{t2} ={ 4.678B^{3}(1 + 0.783B + 0.278B^{2}) \over 1 - 0.250B^{2} - 0.207B^{3}} X_{t1} +{ W_{t} \over 1 - 0.250B^{2} - 0.207B^{3}}, }$$
(8.6.8)

where {W t } ∼ WN(0, 0. 095). Moreover, since the estimated noise covariance matrix is essentially diagonal, it follows that the two sequences \(\{X_{t1}\}\) and {W t } are uncorrelated. This reduced model defined by (8.6.7) and (8.6.8) is an example of a transfer function model that expresses the “output” series \(\{X_{t2}\}\) as the output of a linear filter with “input” \(\{X_{t1}\}\) plus added noise. A more direct approach to the fitting of transfer function models is given in Section 11.1 and applied to this same data set. □ 

8.6.2 Forecasting Multivariate Autoregressive Processes

The technique developed in Section 8.5 allows us to compute the minimum mean squared error one-step linear predictors \(\hat{\mathbf{X}}_{n+1}\) for any multivariate stationary time series from the mean \(\boldsymbol{\mu }\) and autocovariance matrices Γ(h) by recursively determining the coefficients Φ ni , i = 1, , n, and evaluating

$$\displaystyle{ \hat{\mathbf{X}}_{n+1} = \boldsymbol{\mu } +\varPhi _{n1}(\mathbf{X}_{n} -\boldsymbol{\mu }) + \cdots +\varPhi _{nn}(\mathbf{X}_{1} -\boldsymbol{\mu }). }$$
(8.6.9)

The situation is simplified when \(\{\mathbf{X}_{t}\}\) is the causal AR(p) process defined by (8.6.4), since for n ≥ p (as is almost always the case in practice)

$$\displaystyle{ \hat{\mathbf{X}}_{n+1} =\varPhi _{1}\mathbf{X}_{n} + \cdots +\varPhi _{p}\mathbf{X}_{n+1-p}. }$$
(8.6.10)

To verify (8.6.10) it suffices to observe that the right-hand side has the required form (8.5.2) and that the prediction error

$$\displaystyle{\mathbf{X}_{n+1} -\varPhi _{1}\mathbf{X}_{n} -\cdots -\varPhi _{p}\mathbf{X}_{n+1-p} = \mathbf{Z}_{n+1}}$$

is orthogonal to X 1, , X n in the sense of (8.5.3). (In fact, the prediction error is orthogonal to all X j , − < j ≤ n, showing that if n ≥ p, then (8.6.10) is also the best linear predictor of X n+1 in terms of all components of X j , − < j ≤ n.) The covariance matrix of the one-step prediction error is clearly \(E(\mathbf{Z}_{n+1}\mathbf{Z}_{n+1}') = \mathop{\varSigma \vert }\,\).

To compute the best h-step linear predictor P n X n+h based on all the components of X 1, , X n we apply the linear operator P n to (8.6.4) to obtain the recursions

$$\displaystyle{ P_{n}\mathbf{X}_{n+h} =\varPhi _{1}P_{n}\mathbf{X}_{n+h-1} + \cdots +\varPhi _{p}P_{n}\mathbf{X}_{n+h-p}. }$$
(8.6.11)

These equations are easily solved recursively, first for P n X n+1, then for P n X n+2, P n X n+3, , etc. If n ≥ p, then the h-step predictors based on all components of X j , − < j ≤ n, also satisfy (8.6.11) and are therefore the same as the h-step predictors based on X 1, , X n .

To compute the h-step error covariance matrices, recall from (8.4.6) that

$$\displaystyle{ \mathbf{X}_{n+h} =\sum _{ j=0}^{\infty }\varPsi _{ j}\mathbf{Z}_{n+h-j}, }$$
(8.6.12)

where the coefficient matrices Ψ j are found from the recursions (8.4.8) with q = 0. From (8.6.12) we find that for n ≥ p,

$$\displaystyle{ P_{n}\mathbf{X}_{n+h} =\sum _{ j=h}^{\infty }\varPsi _{ j}\mathbf{Z}_{n+h-j}. }$$
(8.6.13)

Subtracting (8.6.13) from (8.6.12) gives the h-step prediction error

$$\displaystyle{ \mathbf{X}_{n+h} - P_{n}\mathbf{X}_{n+h} =\sum _{ j=0}^{h-1}\varPsi _{ j}\mathbf{Z}_{n+h-j}, }$$
(8.6.14)

with covariance matrix

$$\displaystyle{ E\left [(\mathbf{X}_{n+h} - P_{n}\mathbf{X}_{n+h})(\mathbf{X}_{n+h} - P_{n}\mathbf{X}_{n+h})'\right ] =\sum _{ j=0}^{h-1}\varPsi _{ j}\mathop{\varSigma \vert } \,\varPsi _{j}',\ \ n \geq p. }$$
(8.6.15)

For the (not necessarily zero-mean) causal AR(p) process defined by

$$\displaystyle{\mathbf{X}_{t} =\phi _{0} +\varPhi _{1}\mathbf{X}_{t-1} + \cdots +\varPhi _{p}\mathbf{X}_{t-p} + \mathbf{Z}_{t},\ \ \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,),}$$

Equations (8.6.10) and (8.6.11) remain valid, provided that \(\boldsymbol{\phi }_{0}\) is added to each of their right-hand sides. The error covariance matrices are the same as in the case ϕ 0 = 0.

The above calculations are all based on the assumption that the AR( p) model for the series is known. However, in practice, the parameters of the model are usually estimated from the data, and the uncertainty in the predicted values of the series will be larger than indicated by (8.6.15) because of parameter estimation errors. See Lütkepohl (1993).

Example 8.6.3.

The Dow Jones and All Ordinaries Indices

The VAR(1) model fitted to the series {X t , t = 1, , 250} in Example 8.6.1 was

$$\displaystyle{\left [\begin{array}{*{10}c} X_{t1} \\ X_{t2} \end{array} \right ] = \left [\begin{array}{*{10}c} 0.0288\\ 0.00836 \end{array} \right ]+\left [\begin{array}{*{10}c} -0.0148 &0.0357\\ 0.6589 &0.0998 \end{array} \right ]\left [\begin{array}{*{10}c} X_{t-1,1} \\ X_{t-1,2} \end{array} \right ]+\left [\begin{array}{*{10}c} Z_{t1} \\ Z_{t2} \end{array} \right ],}$$

where

$$\displaystyle{\left [\begin{array}{*{10}c} Z_{t1} \\ Z_{t2} \end{array} \right ] \sim \mathrm{WN}\left (\left [\begin{array}{*{10}c} 0\\ 0 \end{array} \right ],\left [\begin{array}{*{10}c} 0.3653 &0.0224\\ 0.0224 &0.6016 \end{array} \right ]\right ).}$$

The one-step mean squared error for prediction of X t2, assuming the validity of this model, is thus 0. 6016. This is a substantial reduction from the estimated mean squared error \(\hat{\gamma }_{22}(0) = 0.7712\) when the sample mean \(\hat{\mu }_{2} = 0.0309\) is used as the one-step predictor.

If we fit a univariate model to the series \(\{X_{t2}\}\) using ITSM, we find that the autoregression with minimum AICC value (645.0) is

$$\displaystyle{X_{t2} = 0.0273 + 0.1180X_{t-1,2} + Z_{t},\quad \{Z_{t}\} \sim \mathrm{WN}(0,0.7604).}$$

Assuming the validity of this model, we thus obtain a mean squared error for one-step prediction of 0. 7604, which is slightly less than the estimated mean squared error (0. 7712) incurred when the sample mean is used for one-step prediction.

The preceding calculations suggest that there is little to be gained from the point of view of one-step prediction by fitting a univariate model to \(\{X_{t2}\}\), while there is a substantial reduction achieved by the bivariate AR(1) model for {X t  = (X t1, X t2)′}.

To test the models fitted above, we consider the next forty values {X t , t = 251, , 290}, which are stored in the file DJAOPCF.TSM. We can use these values, in conjunction with the bivariate and univariate models fitted to the data for t = 1, , 250, to compute one-step predictors of \(X_{t2},\ t = 251,\ldots,290\). The results are as follows:

$$\displaystyle{\begin{array}{c@{\quad }c} \mathrm{Predictor}\quad &\mathrm{Average\ Squared\ Error} \\ \hat{\mu } = 0.0309\quad & 0.4706 \\ \mathrm{AR}(1) \quad & 0.4591 \\ \mathrm{VAR}(1) \quad & 0.3962 \end{array} }$$

It is clear from these results that the sample variance of the series {X t2, t = 251, , 290} is rather less than that of the series {X t2, t = 1, , 250}, and consequently, the average squared errors of all three predictors are substantially less than expected from the models fitted to the latter series. Both the AR(1) and VAR(1) models show an improvement in one-step average squared error over the sample mean \(\hat{\mu }\), but the improvement shown by the bivariate model is much more pronounced. □ 

The calculation of predictors and their error covariance matrices for multivariate ARIMA and SARIMA processes is analogous to the corresponding univariate calculation, so we shall simply state the pertinent results. Suppose that {Y t } is a nonstationary process satisfying D(B)Y t  = U t where D(z) = 1 − d 1 z −⋯ − d r z r is a polynomial with D(1) = 0 and {U t } is a causal invertible ARMA process with mean \(\boldsymbol{\mu }\). Then \(\mathbf{X}_{t} = \mathbf{U}_{t} -\boldsymbol{\mu }\) satisfies

$$\displaystyle{ \varPhi (B)\mathbf{X}_{t} =\varTheta (B)\mathbf{Z}_{t},\quad \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,). }$$
(8.6.16)

Under the assumption that the random vectors Y r+1, , Y 0 are uncorrelated with the sequence {Z t }, the best linear predictors \(\tilde{P}_{n}\mathbf{Y}_{j}\) of Y j , j > n > 0, based on 1 and the components of Y j , −r + 1, ≤ j ≤ n, are found as follows. Compute the observed values of U t  = D(B)Y t , t = 1, , n, and use the ARMA model for \(\mathbf{X}_{t} = \mathbf{U}_{t} -\boldsymbol{\mu }\) to compute predictors P n U n+h . Then use the recursions

$$\displaystyle{ \tilde{P}_{n}\mathbf{Y}_{n+h} = P_{n}\mathbf{U}_{n+h} +\sum _{ j=1}^{r}d_{ j}\tilde{P}_{n}\mathbf{Y}_{n+h-j} }$$
(8.6.17)

to compute successively \(\tilde{P}_{n}\mathbf{Y}_{n+1}\), \(\tilde{P}_{n}\mathbf{Y}_{n+2}\), \(\tilde{P}_{n}\mathbf{Y}_{n+3}\), etc. The error covariance matrices are approximately (for large n)

$$\displaystyle{ E\left [(\mathbf{Y}_{n+h} -\tilde{ P}_{n}\mathbf{Y}_{n+h})(\mathbf{Y}_{n+h} -\tilde{ P}_{n}\mathbf{Y}_{n+h})'\right ] =\sum _{ j=0}^{h-1}\varPsi _{ j}^{{\ast}}\mathop{\varSigma \vert }\,\varPsi _{ j}^{}{{\ast}}', }$$
(8.6.18)

where Ψ j is the coefficient of z j in the power series expansion

$$\displaystyle{\sum _{j=0}^{\infty }\varPsi _{ j}^{{\ast}}z^{\,j} = D(z)^{-1}\varPhi ^{-1}(z)\varTheta (z),\quad \vert z\vert < 1.}$$

The matrices Ψ j are most readily found from the recursions (8.4.8) after replacing Φ j , j = 1, , p, by Φ j , j = 1, , p + r, where Φ j is the coefficient of z j in D(z)Φ(z).

Remark 6.

In the special case where Θ(z) = I (i.e., in the purely autoregressive case) the expression (8.6.18) for the h-step error covariance matrix is exact for all n ≥ p (i.e., if there are at least p + r observed vectors). The program ITSM allows differencing transformations and subtraction of the mean before fitting a multivariate autoregression. Predicted values for the original series and the standard deviations of the prediction errors can be determined using the multivariate option Forecast- ing>AR Model. □ 

Remark 7.

In the multivariate case, simple differencing of the type discussed in this section where the same operator D(B) is applied to all components of the random vectors is rather restrictive. It is useful to consider more general linear transformations of the data for the purpose of generating a stationary series. Such considerations lead to the class of cointegrated models discussed briefly in Section 8.7 below. □ 

Example 8.6.4.

Sales with a leading indicator

Assume that the model fitted to the bivariate series {Y t , t = 0, , 149} in Example 8.6.2 is correct, i.e., that

$$\displaystyle{\varPhi (B)\mathbf{X}_{t} = \mathbf{Z}_{t},\quad \{\mathbf{Z}_{t}\} \sim \mathrm{WN}\left (\mathbf{0},\hat{\mathop{\varSigma \vert } \,} \right ),}$$

where

$$\displaystyle{\mathbf{X}_{t} = (1 - B)\mathbf{Y}_{t} - (0.0228,0.420)',\quad t = 1,\ldots,149,}$$

\(\varPhi (B) = I -\hat{\varPhi }_{1}B -\cdots -\hat{\varPhi }_{5}B^{5}\), and \(\hat{\varPhi }_{1},\ldots,\hat{\varPhi }_{5},\hat{\mathop{\varSigma \vert }\,}\) are the matrices found in Example 8.6.2. Then the one- and two-step predictors of X 150 and X 151 are obtained from (8.6.11) as

$$\displaystyle{P_{149}\mathbf{X}_{150} =\hat{\varPhi } _{1}\mathbf{X}_{149}+\cdots +\hat{\varPhi }_{5}\mathbf{X}_{145} = \left [\begin{array}{*{10}c} 0.163\\ -0.217 \end{array} \right ]}$$

and

$$\displaystyle{P_{149}\mathbf{X}_{151} =\hat{\varPhi } _{1}P_{149}\mathbf{X}_{150}+\hat{\varPhi }_{2}\mathbf{X}_{149}+\cdots +\hat{\varPhi }_{5}\mathbf{X}_{146} = \left [\begin{array}{*{10}c} -0.027\\ 0.816 \end{array} \right ]}$$

with error covariance matrices, from (8.6.15),

$$\displaystyle{\mathop{\varSigma \vert }\,= \left [\begin{array}{*{10}c} 0.076 &-0.003\\ -0.003 & 0.095 \end{array} \right ]}$$

and

$$\displaystyle{\mathop{\varSigma \vert }\,+\hat{\varPhi }_{1}\mathop{\varSigma \vert } \,\hat{\varPhi }_{1}' = \left [\begin{array}{*{10}c} 0.096 &-0.002\\ -0.002 & 0.095 \end{array} \right ],}$$

respectively.

Similarly, the one- and two-step predictors of Y 150 and Y 151 are obtained from (8.6.17) as

$$\displaystyle{\tilde{P}_{149}\mathbf{Y}_{150} = \left [\begin{array}{*{10}c} 0.0228\\ 0.420 \end{array} \right ]+P_{149}\mathbf{X}_{150}+\mathbf{Y}_{149} = \left [\begin{array}{*{10}c} 13.59\\ 262.90 \end{array} \right ]}$$

and

$$\displaystyle{\tilde{P}_{149}\mathbf{Y}_{151} = \left [\begin{array}{*{10}c} 0.0228\\ 0.420 \end{array} \right ]+P_{149}\mathbf{X}_{151}+\tilde{P}_{149}\mathbf{Y}_{150} = \left [\begin{array}{*{10}c} 13.59\\ 264.14 \end{array} \right ]}$$

with error covariance matrices, from (8.6.18),

$$\displaystyle{\mathop{\varSigma \vert }\,= \left [\begin{array}{*{10}c} 0.076 & -0.003\\ -0.003 & 0.095 \end{array} \right ]}$$

and

$$\displaystyle{\mathop{\varSigma \vert }\,+\left (I +\hat{\varPhi } _{1}\right )\mathop{\varSigma \vert } \,\left (I +\hat{\varPhi } _{1}\right )' = \left [\begin{array}{*{10}c} 0.094 & -0.003\\ -0.003 & 0.181 \end{array} \right ],}$$

respectively. The predicted values and the standard deviations of the predictors can easily be verified with the aid of the program ITSM. It is also of interest to compare the results with those obtained by fitting a transfer function model to the data as described in Section 11.1 below. □ 

8.7 Cointegration

We have seen that nonstationary univariate time series can frequently be made stationary by applying the differencing operator ∇ = 1 − B repeatedly. If \(\big\{\nabla ^{d}X_{t}\big\}\) is stationary for some positive integer d but \(\big\{\nabla ^{d-1}X_{t}\big\}\) is nonstationary, we say that {X t } is integrated of order d, or more concisely, {X t } ∼ I(d). Many macroeconomic time series are found to be integrated of order 1.

If \(\{\mathbf{X}_{t}\}\) is a k-variate time series, we define \(\big\{\nabla ^{d}\mathbf{X}_{t}\big\}\) to be the series whose jth component is obtained by applying the operator (1 − B)d to the jth component of \(\{\mathbf{X}_{t}\}\), j = 1, , k. The idea of a cointegrated multivariate time series was introduced by Granger (1981) and developed by Engle and Granger (1987). Here we use the slightly different definition of Lütkepohl (1993). We say that the k-dimensional time series \(\{\mathbf{X}_{t}\}\) is integrated of order d (or {X t } ∼ I(d)) if d is a positive integer, \(\big\{\nabla ^{d}\mathbf{X}_{t}\big\}\) is stationary, and \(\big\{\nabla ^{d-1}\mathbf{X}_{t}\big\}\) is nonstationary. The I(d) process \(\{\mathbf{X}_{t}\}\) is said to be cointegrated with cointegration vector \(\boldsymbol{\alpha }\) if \(\boldsymbol{\alpha }\) is a k × 1 vector such that \(\{\boldsymbol{\alpha }'\mathbf{X}_{t}\}\) is of order less than d.

Example 8.7.1.

A simple example is provided by the bivariate process whose first component is the random walk

$$\displaystyle{X_{t} =\sum _{ j=1}^{t}Z_{ j},\quad t = 1,2,\ldots,\quad \{Z_{t}\} \sim \mathrm{IID}\left (0,\sigma ^{2}\right ),}$$

and whose second component consists of noisy observations of the same random walk,

$$\displaystyle{Y _{t} = X_{t} + W_{t},\quad t = 1,2,\ldots,\quad \{W_{t}\} \sim \mathrm{IID}\left (0,\tau ^{2}\right ),}$$

where {W t } is independent of {Z t }. Then {(X t , Y t )′} is integrated of order 1 and cointegrated with cointegration vector \(\boldsymbol{\alpha }= (1,-1)'\).

The notion of cointegration captures the idea of univariate nonstationary time series “moving together.” Thus, even though {X t } and {Y t } in Example 8.7.1 are both nonstationary, they are linked in the sense that they differ only by the stationary sequence {W t }. Series that behave in a cointegrated manner are often encountered in economics. Engle and Granger (1991) give as an illustrative example the prices of tomatoes U t and V t in Northern and Southern California. These are linked by the fact that if one were to increase sufficiently relative to the other, the profitability of buying in one market and selling for a profit in the other would tend to push the prices (U t , V t )′ toward the straight line v = u in \(\mathbb{R}^{2}\). This line is said to be an attractor for (U t , V t )′, since although U t and V t may both vary in a nonstationary manner as t increases, the points (U t , V t )′ will exhibit relatively small random deviations from the line v = u. □ 

Example 8.7.2.

If we apply the operator ∇ = 1 − B to the bivariate process defined in Example 8.7.1 in order to render it stationary, we obtain the series (U t , V t )′, where

$$\displaystyle{U_{t} = Z_{t}}$$

and

$$\displaystyle{V _{t} = Z_{t} + W_{t} - W_{t-1}.}$$

The series {(U t , V t )′} is clearly a stationary multivariate MA(1) process

$$\displaystyle{\left [\begin{array}{*{10}c} U_{t} \\ V _{t} \end{array} \right ] = \left [\begin{array}{*{10}c} 1 &0\\ 0 &1 \end{array} \right ]\left [\begin{array}{*{10}c} Z_{t} \\ Z_{t} + W_{t} \end{array} \right ]-\left [\begin{array}{*{10}c} 0 &0\\ -1 &1 \end{array} \right ]\left [\begin{array}{*{10}c} Z_{t-1} \\ Z_{t-1} + W_{t-1} \end{array} \right ].}$$

However, the process {(U t , V t )′} cannot be represented as an AR() process, since the matrix \(\left [\begin{array}{*{10}c} 1&0\\ 0 &1 \end{array} \right ]-z\left [\begin{array}{*{10}c} 0 &0\\ -1 &1 \end{array} \right ]\) has zero determinant when z = 1, thus violating condition (8.4.10). Care is therefore needed in the estimation of parameters for such models (and the closely related error-correction models). We shall not go into the details here but refer the reader to Engle and Granger (1987) and Lütkepohl (1993). □ 

Problems

  1. 8.1

    Let {Y t } be a stationary process and define the bivariate process X t1 = Y t , X t2 = Y td , where d ≠ 0. Show that {(X t1, X t2)′} is stationary and express its cross-correlation function in terms of the autocorrelation function of {Y t }. If ρ Y (h) → 0 as h → , show that there exists a lag k for which ρ 12(k) > ρ 12(0).

  2. 8.2

    Show that the covariance matrix function of the multivariate linear process defined by (8.2.12) is as specified in (8.2.13).

  3. 8.3

    Let \(\{\mathbf{X}_{t}\}\) be the bivariate time series whose components are the MA(1) processes defined by

    $$\displaystyle{X_{t1} = Z_{t,1} + 0.8Z_{t-1,1},\quad \{Z_{t1}\} \sim \mathrm{IID}\left (0,\sigma _{1}^{2}\right ),}$$

    and

    $$\displaystyle{X_{t2} = Z_{t,2} - 0.6Z_{t-1,2},\quad \{Z_{t2}\} \sim \mathrm{IID}\left (0,\sigma _{2}^{2}\right ),}$$

    where the two sequences {Z t1} and {Z t2} are independent.

    1. a.

      Find a large-sample approximation to the variance of \(n^{1/2}\hat{\rho }_{12}(h)\).

    2. b.

      Find a large-sample approximation to the covariance of \(n^{1/2}\hat{\rho }_{12}(h)\) and \(n^{1/2}\hat{\rho }_{12}(k)\) for hk.

  4. 8.4

    Use the characterization (8.5.3) of the multivariate best linear predictor of Y in terms of {X 1,  X n } to establish the orthogonality of the one-step prediction errors \(\mathbf{X}_{j} -\hat{\mathbf{X}}_{j}\) and \(\mathbf{X}_{k} -\hat{\mathbf{X}}_{k}\), jk, as asserted in (8.6.1).

  5. 8.5

    Determine the covariance matrix function of the ARMA(1,1) process satisfying

    $$\displaystyle{\mathbf{X}_{t} -\varPhi \mathbf{X}_{t-1} = \mathbf{Z}_{t} +\varTheta \mathbf{Z}_{t-1},\quad \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},I_{2}),}$$

    where I 2 is the 2 × 2 identity matrix and \(\varPhi =\varTheta ' = \left [\begin{array}{*{10}c} 0.5&0.5\\ 0 &0.5\end{array} \right ]\).

  6. 8.6
    1. a.

      Let {X t } be a causal AR( p) process satisfying the recursions

      $$\displaystyle{\mathbf{X}_{t} =\varPhi _{1}\mathbf{X}_{t-1} + \cdots +\varPhi _{p}\mathbf{X}_{t-p} + \mathbf{Z}_{t},\ \ \{\mathbf{Z}_{t}\} \sim \mathrm{WN}(\mathbf{0},\mathop{\varSigma \vert } \,).}$$

      For n ≥ p write down recursions for the predictors P n X n+h , h ≥ 0, and find explicit expressions for the error covariance matrices in terms of the AR coefficients and \(\mathop{\varSigma \vert }\,\) when h = 1, 2, and 3.

    2. b.

      Suppose now that {Y t } is the multivariate ARIMA( p, 1, 0) process satisfying ∇Y t  = X t , where {X t } is the AR process in (a). Assuming that E(Y 0 X t ′) = 0, for t ≥ 1, show (using (8.6.17) with r = 1 and d = 1) that

      $$\displaystyle{\tilde{P}_{n}(\mathbf{Y}_{n+h}) = \mathbf{Y}_{n} +\sum _{ j=1}^{h}P_{ n}\mathbf{X}_{n+j},}$$

      and derive the error covariance matrices when h = 1, 2, and 3. Compare these results with those obtained in Example 8.6.4.

  7. 8.7

    Use the program ITSM to find the minimum AICC AR model of order less than or equal to 20 for the bivariate series {(X t1, X t2)′, t = 1, , 200} with components filed as APPJK2.TSM. Use the fitted model to predict (X t1, X t2)′, t = 201, 202, 203 and estimate the error covariance matrices of the predictors (assuming that the fitted model is appropriate for the data).

  8. 8.8

    Let {X t1, t = 1, , 63} and {X t2, t = 1, , 63} denote the differenced series \(\{\nabla \ln Y _{t1}\}\) and {∇lnY t2}, where {Y t1} and {Y t2} are the annual mink and muskrat trappings filed as APPH.TSM and APPI.TSM, respectively).

    1. a.

      Use ITSM to construct and save the series \(\{X_{t1}\}\) and \(\{X_{t2}\}\) as univariate data files X1.TSM and X2.TSM, respectively. (After making the required transformations press the red EXP button and save each transformed series to a file with the appropriate name.) To enter X1 and X2 as a bivariate series in ITSM, open X1 as a multivariate series with Number of columns equal to 1. Then open X2 as a univariate series. Click the project editor button (at the top left of the ITSM window), click on the plus signs next to the projects X1.TSM and X2.TSM, then click on the series that appears just below X2.TSM and drag it to the first line of the project X1.TSM. It will then be added as a second component, making X1.TSM a bivariate project consisting of the two component series X1 and X2. Click OK to close the project editor and close the ITSM window labeled X2.TSM. You will then see the graphs of X1 and X2. Press the second yellow button to see the correlation functions of \(\{X_{t1}\}\) and \(\{X_{t2}\}\). For more information on the project editor in ITSM consult the Project Editor section of the PDF file ITSM_HELP.

    2. b.

      Conduct a test for independence of the two series \(\{X_{t1}\}\) and \(\{X_{t1}\}\).

  9. 8.9

    Use ITSM to open the data file STOCK7.TSM, which contains the daily returns on seven different stock market indices from April 27th, 1998, through April 9th, 1999. (Consult the Data Sets section of the PDF file ITSM_HELP for more information.) Fit a multivariate autoregression to the trivariate series consisting of the returns on the Dow Jones Industrials, All Ordinaries, and Nikkei indices. Check the model for goodness of fit and interpret the results.