Prediction Formulas

Whiteman, Charles H.; Lewis, Kurt F.

doi:10.1057/978-1-349-95189-5_2180

Charles H. Whiteman¹ &
Kurt F. Lewis¹

33 Accesses

Abstract

Prediction formulas for multi-step forecasts and geometric distributed leads of stationary time series are derived using classical, frequency domain methods. Starting with the Wold representation, optimal squared-error loss predictions are derived using the analytic function theory approach of Whittle. This approach is easily adapted to the problem of making predictions that are robust under model misspecification. Forecasts and expected present value calculations are illustrated under both objectives for low-order autoregressive and moving average processes.

Access provided by CONRICYT-eBooks. Download reference work entry PDF

Multi-step estimators and shrinkage effect in time series models

Article Open access 24 June 2023

Forecasting with Time Series Models

Discussion of “Bayesian forecasting of multivariate time series: scalability, structure uncertainty and decisions”

Article 15 December 2019

Keywords

JEL Classifications

C1

Introduction

This article reviews the derivation of formulas for linear least squares and robust prediction of stationary time series and geometrically discounted distributed leads of such series. The derivations employed are the classical, frequency-domain procedures employed by Whittle (1983) and Whiteman (1983), and result in nearly closed-form expressions. The formulas themselves are useful directly in forecasting, and have also found uses in economic modelling, primarily in macroeconomics. Indeed, Hansen and Sargent (1980) refer to the cross-equation restrictions connecting the time series representation of driving variables to the analogous representation for predicting the present value of such variables as the ‘hallmark of rational expectations models’.

The Wold Representation

Suppose that {x_t} is a covariance-stationary stochastic process and assume (without loss of generality) that Ex_t = 0. Covariance stationarity ensures that first and second unconditional moments of the process do not vary with time. Then, by the Wold decomposition theorem (see Sargent 1987, for an elementary exposition and proof), x_t can be represented by:

$$ {x}_t=\sum\limits_{j=0}^{\infty }{a}_j{\varepsilon}_{t-j} $$

(1)

with

$$ {a}_0=1,\sum\limits_{j=0}^{\infty}\kern0.24em {a}_j^2<\infty $$

and

$$ {\varepsilon}_t={x}_t-P\left({x}_t|{x}_{t-1},{x}_{t-2},\dots \right),E{\varepsilon}_t^2={\sigma}^2 $$

where P(x_t|x_t−1, x_t−2, …) denotes the linear least squares projection (population regression) of x_t on x_t−1, x_t−2 , … Here, ‘represented by’ need not mean ‘generated by’, but rather ‘has the same variance and covariance stmcture as’. By construction, the ‘fundamental’ innovation ε_t is uncorrelated with information dated prior to t, including earlier values of the process itself: Eε_tε_t−s = 0 ∀ s > 0. This fact makes the Wold representation very convenient for computing predictions. The convolution in (1) is often written x_t = A(L)ε_t using the polynomial $ A(L)={\sum\limits}_{j=0}^{\infty}\kern0.24em {a}_j{L}^j $ in the ‘lag operator’ L, where Lε_t = ε_t−1.

Squared-Error Loss Optimal Prediction

The optimal prediction problem under squared-error loss can be thought of as follows. Given {x_t} with the Wold representation (1) we want to find the stochastic process y_t,

$$ {y}_t=\sum\limits_{j=0}^{\infty}\kern0.24em {c}_j{\varepsilon}_{t-j}=C(L){\varepsilon}_t $$

that will minimize the squared forecast error of the h-step ahead prediction

$$ \underset{\left\{{y}_t\right\}}{\min}\;E{\left({x}_{t+h}-{y}_t\right)}^2. $$

Equivalently, the problem can be written as

$$ \underset{\left\{{y}_t\right\}}{\min}\;E{\left({L}^{-h}{x}_t-{y}_t\right)}^2 $$

or

$$ \underset{\left\{{c}_j\right\}}{\min}\;E{\left({L}^{-h}\sum\limits_{j=0}^{\infty}\kern0.24em {a}_j{\varepsilon}_{t-j}-\sum\limits_{j=0}^{\infty}\kern0.24em {c}_j{\varepsilon}_{t-j}\right)}^2. $$

(2)

The problem in (2) involves finding a sequence of coefficients in the Wold representation of the unknown prediction process y_t, and is referred to as the time domain problem. By virtue of the Riesz–Fisher theorem (see again Sargent 1987, for an exposition), the time-domain problem is equivalent to a frequency domain problem of finding an analytic function C(z) on the unit disk |z| ≤ 1 corresponding to the ‘z-transform’ of the {c_j} sequence

$$ C(z)=\sum\limits_{j=0}^{\infty}\kern0.24em {c}_j{z}^j $$

that solves

$$ \underset{C(z)\in {H}^2}{\min}\frac{1}{2\pi i}\oint {\left|{z}^{-h}A(z)-C(z)\right|}^2\frac{dz}{z} $$

(3)

where H² denotes the Hardy space of square-integrable analytic functions on the unit disk, and ∮ denotes (counterclockwise) integration about the unit circle. The requirement that C(z) ∈ H² ensures that the forecast is causal, and contains no future values of the εʼs; this is equivalent to the requirement that C(z) have a well-behaved power series expansion in non-negative powers of z.

Each formulation of the problem is useful, as often one or the other will be simpler to solve. This stems from the fact that convolution in the time domain becomes multiplication in the frequency domain and vice versa. To see this, consider the two sequences $ {\left\{{g}_k\right\}}_{k=-\infty}^{\infty } $ and $ {\left\{{h}_k\right\}}_{k=-\infty}^{\infty } $. The convolution of {g_k} and {h_k} is the sequence {f_k}, in which a typical element would be:

$$ {f}_k=\sum\limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{h}_{k-j}. $$

The z-transform of the convolution is given by

$$ {\displaystyle \begin{array}{ll}\sum\limits \limits_{k=-\infty}^{\infty }{f}_k{z}^k& =\sum\limits \limits_{k=-\infty}^{\infty}\left(\sum\limits \limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{h}_{k-j}\right){z}^k\hfill \\ {}& =\sum\limits \limits_{k=-\infty}^{\infty}\;\sum\limits \limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{z}^j{h}_{k-j}{z}^{k-j}\hfill \\ {}& =\sum\limits \limits_{\left(k-j\right)=-\infty}^{\infty}\;\sum\limits \limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{z}^j{h}_{k-j}{z}^{k-j}\hfill \\ {}& =\sum\limits \limits_{s=-\infty}^{\infty}\;\sum\limits \limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{z}^j{h}_s{z}^s\;\left(\mathrm{Substituting}\;s=k-j\right)\hfill \\ {}& =\sum\limits \limits_{s=-\infty}^{\infty }{h}_s{z}^s\sum\limits \limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{z}^j=g(z)h(z).\hfill \end{array}} $$

Thus the ʻz-transform’ of the convolution of the sequences {g_k} and {h_k} is the product of the z-transforms of the two sequences.

Similarly, the z-transform of the product of two sequences is the convolution of the z-transforms:

$$ \sum\limits_{k=-\infty}^{\infty}\kern0.24em {g}_k{h}_k{z}^k=\frac{1}{2\pi i}\oint g(p)h\left(z/p\right)\frac{dp}{p}. $$

To see why this is the case, note that

$$ g(p)h\left(z/p\right){p}^{-1}=\sum\limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{p}^j\sum\limits_{k=-\infty}^{\infty}\kern0.24em {h}_k{z}^k{p}^{-k-1}, $$

implying

$$ {\displaystyle \begin{array}{ll}\hfill & \frac{1}{2\pi i}\oint g(p)h\left(z/p\right){p}^{-1} dp\\ {}& =\frac{1}{2\pi i}\oint \sum\limits \limits_{j=-\infty}^{\infty}\kern0.5em \sum\limits \limits_{k=-\infty}^{\infty}\kern0.24em {g}_j{h}_k{z}^k{p}^{j-k-1} dp.\hfill \end{array}} $$

But all of the terms vanish except where j = k because

$$ \frac{1}{2\pi i}\oint {z}^k\frac{dz}{z}=0 $$

except when k = 0. To see why, let z = e^iθ. As θ increases from 0 to 2π, z goes around the unit circle. So, since dz = ie^iθdθ, we have that

$$ \frac{1}{2\pi i}\oint {z}^k\frac{dz}{z}=\frac{i}{2\pi i}\oint {e}^{i\theta k} d\theta =\left\{\begin{array}{ll}1\hfill & \mathrm{if}\kern0.6em k=0\hfill \\ {}{\left.\frac{1}{2\pi}\frac{1}{i k}{e}^{i\theta k}\right|}_0^{2\pi }=0\hfill & \mathrm{otherwise}.\hfill \end{array}\right. $$

Thus,

$$ \frac{1}{2\pi i}\oint g(p)h\left(z/p\right){p}^{-1} dp=\sum\limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{h}_j{z}^j\frac{1}{2\pi i}\oint \frac{dp}{p}=\sum\limits_{j=-\infty}^{\infty}\kern0.24em {g}_j{h}_j{z}^j $$

by Cauchy’s Integral formula.

The frequency domain formulas can now be used to calculate moments quickly and conveniently. Consider $ {Ex}_t^2 $:

$$ {Ex}_t^2=E{\left(A(L){\varepsilon}_t\right)}^2=E{\left(\sum\limits_{j=0}^{\infty }{A}_j{\varepsilon}_{t-j}\right)}^2={\sigma}_{\varepsilon}^2\sum\limits_{j=0}^{\infty }{A}_j^2. $$

(4)

The result in Eq. (4) comes from the fact that Eε_tε_t−s = 0, ∀s ≠ 0. Using the product-convolution relation, we see that

$$ \sum\limits_{j=0}^{\infty }{A}_j^2=\sum\limits_{j=0}^{\infty }{A}_j^2{z}^j{\left|{}_{z=1}=\frac{1}{2\pi i}\oint A(p)A\left(z/p\right)\frac{dp}{p}\right|}_{z=1}=\frac{1}{2\pi i}\oint A(p)A\left({p}^{-1}\right)\frac{dp}{p}\kern1.56em =\frac{1}{2\pi i}\oint {\left|A(z)\right|}^2\frac{dz}{z}. $$

(5)

Returning to the prediction problem, the task is to choose c₀, c₁, c₂, … to

$$ \underset{\left\{{c}_j\right\}}{\min}\frac{1}{2\pi i}\oint {\left|{z}^{-h}A(z)-\sum\limits_{j=0}^{\infty}\kern0.24em {c}_j{z}^j\right|}^2\frac{dz}{z}. $$

(6)

The first order conditions for the optimization in expression (7) are

$$ 0=\frac{1}{2\pi i}\oint \left\{{z}^j\left[{z}^hA\left({z}^{-1}\right)-C\left({z}^{-1}\right)\right]\right.+\left.{z}^{-j}\left[{z}^{-h}A(z)-C(z)\right]\right\}\frac{dz}{z}=\frac{1}{2\pi i}\oint {z}^{-j}\left[{z}^{-h}A(z)-C(z)\right]\frac{dz}{z}-\frac{1}{2\pi i}\oint {p}^{-j}\left[{p}^{-h}A(p)-C(p)\right]\frac{dp}{p} $$

(7)

for j = 0, 1, 2, … , where the second integral is the result of a change of variable p = z⁻¹ so that dp = −z⁻¹dz, resulting in

$$ \frac{dp}{p}=z\left(-{z}^{-2} dz\right)=-\frac{dz}{z}. $$

The result is that in the second integral, the direction of the contour integration is clockwise. Multiplying by −1 and integrating counterclockwise, the second integral becomes identical to the first, and we can write the set of first-order conditions as

$$ 0=\frac{1}{\pi i}\oint {z}^{-j}\left[{z}^{-h}A(z)-C(z)\right]\frac{dz}{z}j=0,1,2,\dots $$

(8)

Define F(z) such that

$$ F(z)={z}^{-h}A(z)-C(z)=\sum\limits_{j=-\infty}^{\infty }{F}_j{z}^j. $$

From Eq. (8), it must be the case that all coefficients on non-negative powers of z equal zero:

$$ {F}_j=0,\kern0.62em j=0,1,2,\dots . $$

Multiplying by z^j and summing over all j = 0, ±1, ±2, … , we obtain

$$ F(z)=\sum\limits_{-\infty}^{-1} $$

(9)

where the term on the right-hand-side of (9) represents an unknown function in negative powers of z. Thus

$$ {z}^{-h}A(z)-C(z)=\sum\limits_{-\infty}^{-1}\kern0.36em , $$

which is an example of a ‘Wiener–Hopf’ equation. Now apply the (linear) ‘plussing’ operator, [⋅]₊ , which means ‘ignore negative powers of z’ The unknown function in negative powers of z is ‘annihilated’ by this operation, resulting in

$$ {\displaystyle \begin{array}{ll}C(z)& ={\left[{z}^{-h}A(z)\right]}_{+}\hfill \\ {}& ={\left[{z}^{-h}{a}_0+{z}^{-h+1}{a}_1+{z}^{-h+2}{a}_2+\dots \right]}_{+}\hfill \\ {}& \begin{array}{ll}& =\left[{z}^0{a}_h+{z}^1{a}_{h+1}+{z}^2{a}_{h+2}+\dots \right]\hfill \\ {}& =\sum\limits \limits_{j=h}^{\infty }{a}_j{z}^{j-h}\hfill \end{array}\hfill \\ {}& ={z}^{-h}A(z)- pr\left[{z}^{-h}A(z)\right]\hfill \end{array}} $$

where pr[z^−hA(z)] is the principal part of the Laurent expansion of z^−hA(z) about z = 0. (The principal part of the Laurent expansion about z = 0 is the part involving negative powers of z.) This provides a very simple formula for computing forecasts.

AR(1) Example

Suppose that x_t = ax_t−1 + ε_t. This means that A(z) = 1/(1 − az). In this case:

$$ {\displaystyle \begin{array}{ll}C(z)& ={\left[{z}^{-h}A(z)\right]}_{+}\hfill \\ {}& ={\left[{z}^{-h}\left(1+ az+{a}^2{z}^2+\dots \right)\right]}_{+}\hfill \\ {}& ={a}^h\left(1+ az+{a}^2{z}^2+\dots \right)\hfill \\ {}& =\frac{a^h}{\left(1- az\right)}\hfill \end{array}} $$

and the least squares loss predictor of x_t+h using information dated t and earlier is

$$ {P}_t^{LS}{x}_{t+h}={y}_t=C(L){\varepsilon}_t=C(L){A}^{-1}(L){x}_t={a}^h{x}_t. $$

The forecast error is

$$ {x}_{t+h}-{a}^h{x}_t={\varepsilon}_{t+h}+a{\varepsilon}_{t+h-1}+\dots \kern0.5em +{a}^{h-1}{\varepsilon}_{t+1}, $$

which is serially correlated (for h ≥ 2), but not correlated with information dated t and earlier.

MA(1) Example

Supposed that x_t = ε_t − αε_t−1, meaning A(z) = 1 − az. Thus,

$$ C(z)=\left[{z}^{-h}A(z)\right]=\left[{z}^{-h}\left(1-\alpha z\right)\right]=\left\{\begin{array}{l}\alpha \kern0.24em \mathrm{if}\kern0.62em h=1,\hfill \\ {}0\kern0.62em \mathrm{otherwise}.\hfill \end{array}\right. $$

So, the best one-step ahead predictor is

$$ {\alpha \varepsilon}_t=\alpha \left(1+\alpha L+{\alpha}^2{L}^2+\dots \right){x}_t $$

and the best predictor for forecasts of horizon two or more is exactly zero. For two-step-ahead (and beyond) prediction, the forecast error is x_t+h itself, which is serially correlated but not correlated with information dated t and earlier.

Least Squares Prediction of Geometric Distributed Leads

A prediction problem that characterizes many models in economics involves the expectation of a discounted value. Perhaps the most common and widely studied example is the present value formula for stock prices. Abstracting from mean and trend, suppose the dividend process has a Wold representation given by

$$ {d}_t=\sum\limits_{j=0}^{\infty }{q}_j{\varepsilon}_{t-j}=q(L){\varepsilon}_t\kern0.86em E\left({\varepsilon}_t\right)=0,\kern0.86em E\left({\varepsilon}_t^2\right)=1. $$

(10)

Assuming that the constant discount factor is given by γ, we have the present value formula

$$ {p}_t={E}_t\sum\limits_{j=0}^{\infty }{\gamma}^j{d}_{t+j}={E}_t\left(\frac{q(L)}{1-\gamma {L}^{-1}}{\varepsilon}_t\right)={E}_t\left({p}_t^{\ast}\right). $$

(11)

The least-squares minimization problem the predictor faces is to find a stochastic process p_t to minimize the expected squared prediction error $ E{\left({p}_t-{p}_t^{\ast}\right)}^2 $. In terms of the information known at date t, the agent’s task is to find a linear combination of current and past dividends, or, equivalently, of current and past dividend innovations ε_t, that is ‘close’ to $ {p}_t^{\ast } $. Writing p_t = f(L)ε_t, the problem becomes one of finding the coefficients f_j in f(L) = f₀ + f₁L + f₂L² + … to minimize $ E{\left(f(L){\varepsilon}_t-{p}_t^{\ast}\right)}^2. $ Using the method described in the previous section, the problem has an equivalent, frequency-domain representation

$$ \underset{f(z)\in {H}^2}{\min}\frac{1}{2\pi i}\oint {\left|\frac{q(z)}{1-\gamma {z}^{-1}}-f(z)\kern0.5em \right|}^2\frac{dz}{z}. $$

(12)

The first-order conditions for choosing f_j are, after employing the same simplification used in (7),

$$ -\frac{2}{2\pi i}\oint {z}^{-j}\left[\frac{q\left(\mathrm{z}\right)}{1-\gamma {z}^{-1}}-f(z)\right]\frac{dz}{z}=0,j=0,1,2,\dots . $$

(13)

Now define

$$ H(z)=\frac{q(z)}{1-\gamma {z}^{-1}}-f(z) $$

so that (13) becomes

$$ -\frac{2}{2\pi i}\oint {z}^{-j}H(z)\frac{dz}{z}=0. $$

Then multiplying by z^j and summing over all j = 0, ±1, ±2, … as above, we obtain

$$ H(z)=\frac{q(z)}{1-\gamma {z}^{-1}}-f(z)=\sum\limits_{-\infty}^{-1}, $$

the Wiener–Hopf equation for this problem. Applying the plussing operator to both sides yields

$$ {\left[\frac{q(z)}{1-\gamma {z}^{-1}}\right]}_{+}-{\left[f(z)\right]}_{+}=0 $$

implying

$$ f(z)={\left[\frac{q(z)}{1-\gamma {z}^{-1}}\right]}_{+}={\left[\frac{zq(z)}{z-\gamma}\right]}_{+} $$

because f(z) is, by construction, one-sided in non-negative powers of z. As in the previous section,

$$ {\left[A(z)\right]}_{+}=A(z)-P(z) $$

where P(z) is the principal part of the Laurent series expansion of A(z). To determine the principal part of [(z − γ)⁻¹zq(z)], note that zq(z) has a well-behaved power series expansion about z = γ, where ‘well-behaved’ means ‘involving no negative powers of (z − γ)’. Thus [(z − γ)⁻¹zq(z)] has a power series expansion about z = γ involving a single term in (z − γ)⁻¹:

$$ \left(\frac{zq(z)}{z-\gamma}\right)=\frac{b_{-1}}{z-\gamma }+{b}_0+{b}_1{\left(z-\gamma \right)}^1+{b}_2{\left(z-\gamma \right)}^2+\dots . $$

The principal part here is the part involving negative powers of (z − γ) : b₋₁(z − γ)⁻¹. To determine it, multiply both sides by (z − γ) and evaluate what is left at z = γ to find b₋₁ = γq(γ). Thus

$$ f(z)={\left[\frac{q(z)}{1-\gamma {z}^{-1}}\right]}_{+}={\left[\frac{zq(z)}{z-\gamma}\right]}_{+}=\frac{zq(z)-\gamma q\left(\gamma \right)}{z-\gamma }. $$

(14)

The ‘cross-equation restrictions’ of rational expectations refer to the connection between the serial correlation structure of the driving process (here dividends) and the serial correlation structure of the expected discounted value of the driving process (here prices). That is, when dividends are characterized by q(z), prices are characterized by f(z), and f(z) depends upon q(z) as depicted in (14).

To illustrate how the formula works, suppose detrended dividends are described by a first-order autoregression; that is, that q(L) = (1 − ρL)⁻¹. Then

$$ {p}_t=f(L){\varepsilon}_t=\frac{Lq(L)-\gamma q\left(\gamma \right)}{L-\gamma }{\varepsilon}_t=\left(\frac{1}{1-\rho \gamma}\right){d}_t. $$

(15)

It is instructive to note that, while the pricing formula (15) makes p_t the best least squares predictor of $ {p}_t^{\ast } $, the prediction errors $ {p}_t-{p}_t^{\ast } $ will not be serially uncorrelated. Indeed

$$ {\displaystyle \begin{array}{ll}{p}_t-{p}_t^{\ast }& =\gamma \left\{\frac{Lq(L)-\gamma q\left(\gamma \right)}{L-\gamma }-\frac{q(L)}{1-\gamma {L}^{-1}}\right\}{\varepsilon}_t\hfill \\ {}& =\frac{-{\gamma}^2q\left(\gamma \right)}{L-\gamma }{\varepsilon}_t=-{\gamma}^2q\left(\gamma \right)\frac{L^{-1}}{1-\gamma {L}^{-1}}{\varepsilon}_t\hfill \\ {}& =-{\gamma}^2q\left(\gamma \right)\left\{{\varepsilon}_{t+1}+{\gamma \varepsilon}_{t+2}+{\gamma}^2{\varepsilon}_{t+3}+\dots \right\}.\hfill \end{array}} $$

Thus the prediction errors will be described by a highly persistent (γ is close to unity) first-order autoregression. But because this autoregression involves future ε_t’s, the serial correlation structure of the errors cannot be exploited to improve the quality of the prediction of $ {p}_t^{\ast } $. The reason is that the predictor ‘knows’ the model for price setting (the present value formula) and the dividend process; the best predictor $ {p}_t={E}_t{p}_t^{\ast } $ of $ {p}_t^{\ast } $ tolerates’ the serial correlation because the (correct) model implies that it involves future ε_t’s and therefore cannot be predicted. If one only had data on the errors (and did not know the model that generated them), they would appear (rightly) to be characterized by a first-order autoregression; fitting an AR(1) (that is, the best linear model) and using it to ‘adjust’ p_t by accounting for the serial correlation in the errors $ {p}_t-{p}_t^{\ast } $ would decrease the quality of the estimate of $ {p}_t^{\ast } $. The reason is the usual one that the Wold representation for $ {p}_t-{p}_t^{\ast } $ is not the economic model of $ {p}_t-{p}_t^{\ast } $, and (correct) models always beat Wold representations. This also serves as a reminder of circumstances under which one should be willing to tolerate serially correlated errors: when one knows the model that generated them, and the model implies that they are as small as they can be made.

Robust Optimal Prediction of Time Series

The squared-error loss function employed to this point is appropriate for situations in which the model (either the time series model or the economic model) is thought to be correct. But in many settings the forecaster or model builder may wish to guard against the possibility of misspecification. There are many ways to do this; an approach popular in the engineering literature and recently introduced into the economics literature by Hansen and Sargent (2007) involves behaving so as to minimize the maximum loss sustainable by using an approximating model when the truth may be something else. The ‘robust’ approach to this involves replacing the squared-error loss problem

$$ \underset{\left\{C(z)\right\}}{\min}\frac{1}{2\pi i}\oint {\left|{z}^{-h}A(z)-C(z)\right|}^2\frac{dz}{z} $$

with the ‘min-max’ problem

$$ \underset{\left\{C\left(\mathrm{z}\right)\right\}}{\min}\kern0.24em \underset{\mid z\mid =1}{\sup }{\left|{z}^{-h}A(z)-C(z)\right|}^2, $$

so that minimizing the ‘average’ value on the unit circle has been replaced by minimizing the max. This problem can also be written

$$ \underset{\left\{C(z)\right\}}{\min}\kern0.24em \underset{\mid z\mid =1}{\sup }{\left|A(z)-{z}^hC(z)\right|}^2. $$

This is known as the ‘minimum norm interpolation problem’ and amounts to finding a function φ(z) to

$$ \min \mid \mid \varphi (z)\mid {}_{\infty } $$

subject to the restriction that the power series expansion of φ(z) matches that of A(z) for the first h − 1 powers of z. This means that the following must hold:

$$ \sum\limits_{j=0}^{h-1}{\varphi}_j{z}^j=\sum\limits_{j=0}^{h-1}{a}_j{z}^j. $$

(16)

Theorem 1

The minimizing φ(z) function is such that |φ(z)|²is constant on |z| = 1. Moreover,

$$ \varphi (z)=M\prod\limits_{j=1}^h\frac{z-{\alpha}_j}{1-{\overline{\alpha}}_jz} $$

where M, α₁, α₂, … ,α_nare chosen to ensure that (16) holds.

Proof: see Nehari (1957).

To see that φ(z) must be of the indicated form, note that the ‘Blaschke factors’ in the product have unit modulus:

$$ {\displaystyle \begin{array}{l}\frac{z-{\alpha}_j}{1-{\alpha}_jz}\left(\frac{z^{-1}-{\overline{\alpha}}_j}{1-{\alpha}_j{z}^{-1}}\right)=\left(\frac{z-{\alpha}_j}{1-{\alpha}_jz}\right)\left({z}^{-1}z\right)\hfill \\ {}\left(\frac{z^{-1}-{\overline{\alpha}}_j}{1-{\alpha}_j{z}^{-1}}\right)=\left(\frac{1-{\alpha}_j{z}^{-1}}{1-{\alpha}_jz}\right)\left(\frac{1-{\alpha}_jz}{1-{\alpha}_j{z}^{-1}}\right)=1,\hfill \end{array}} $$

so that |φ(z)|² = M².

In the general h-step-ahead prediction problem, we have that

$$ \varphi (z)=M\prod\limits_{j=1}^{h-1}\frac{z-{\alpha}_j}{1-{\overline{\alpha}}_jz}=A(z)-{z}^hC(z), $$

meaning that

$$ C(z)=\frac{1}{z^h}\left(A(z)-M\prod\limits_{j=1}^{h-1}\frac{z-{\alpha}_j}{1-{\overline{\alpha}}_jz}\right). $$

This is analogous to the solution in the least-squares case, but, instead of subtracting the principal part of z^−hA(z), we subtract a different function from z^−hA(z). Note also that because

$$ M\prod\limits_{j=1}^{h-1}\frac{z-{\alpha}_j}{1-{\overline{\alpha}}_jz} $$

matches the power series expansion of A(z) up to the power z^h−1, C(z) is of the form

$$ C(z)={c}_0+{c}_1z+{c}_2{z}^2+\dots $$

Finally, note that the forecast error is serially uncorrelated because φ(z) is constant on |z| = 1.

Example. AR(1)

Let

$$ A(z)=\frac{1}{1- az}. $$

For h = 1, we see that φ(z) = A(z) − zC(z) must be constant on |z| = 1, and that φ(0) = A(0) = 1. Thus, φ(z) = M = 1, so that

$$ C(z)=\frac{A(z)-1}{z}=\frac{az}{\left(1- az\right)z}=\frac{a}{1- az}, $$

which implies that the robust one-step ahead forecast is

$$ {y}_t^R={ax}_t, $$

which coincides with the best least-squares forecast. This equivalence between the robust and least-squares one-step ahead forecasts is to be expected because the best one-step-ahead least-squares forecast also has serially uncorrelated errors. For h = 2, we have that

$$ \varphi (z)=\frac{M\left(z-\alpha \right)}{1-\overline{\alpha}z} $$

where (again) φ(0) = 1, but now we also see that φ^′(0) = a. Thus,

$$ \varphi (0)=1=-\alpha M\Rightarrow M=-\frac{1}{\alpha }, $$

and furthermore

$$ {\left.{\varphi}^{\prime }(0)=a=\frac{\left(1-\overline{\alpha}z\right)M-M\left(z-\alpha \right)\left(-\overline{\alpha}\right)}{{\left(1-\overline{\alpha}z\right)}^2}\right|}_{z=0}=M-M\left(\alpha \overline{\alpha}\right)=M\left(1-\alpha \overline{\alpha}\right). $$

Therefore, the solution will have the property that

$$ a=-\frac{1}{\alpha}\left(1-\alpha \overline{\alpha}\right)\kern0.36em - a\alpha =1-\alpha \overline{\alpha}\kern0.50em 0=1+ a\alpha -\alpha \overline{\alpha}. $$

That is, the roots are reciprocal pairs. Notice that the discriminant is positive $ \left({a}^2{\alpha}^2+4\alpha \overline{\alpha}>0\right) $, meaning that we will always have a real solution, and we choose |α| < 1. Then, we have that

$$ {\displaystyle \begin{array}{ll}C(z)& =\frac{1}{z^2}\left[\frac{1}{1- az}-\frac{M\left(z-\alpha \right)}{1-\alpha z}\right]\hfill \\ {}& =\frac{1}{z^2}\frac{1-\alpha z-\left(1- az\right)\left(1-\frac{1}{\alpha }z\right)}{\left(1- az\right)\left(1-\alpha z\right)}\hfill \\ {}& =\frac{1-\alpha z-1+ az+\frac{1}{\alpha }z-\frac{a}{\alpha }{z}^2}{z^2\left(1- az\right)\left(1-\alpha z\right)}\hfill \\ {}& =\frac{-\frac{a}{\alpha }}{\left(1- az\right)\left(1-\alpha z\right)}.\hfill \end{array}} $$

So, the robust prediction is given by

$$ {P}_t^R{x}_{t+2}=-\frac{a}{\alpha}\sum\limits_{j=0}^{\infty}\kern0.24em {\alpha}^j{x}_{t-j}, $$

in contrast to the least-squares prediction

$$ {P}_t^{LS}{x}_{t+2}={a}^2{x}_t. $$

Example. MA(1)

Suppose that the process follows an MA(1), x_t =ε_t − βε_t−1, and therefore A(z) = 1 − βz. The analysis from the previous example still holds, and all of the following are true:

$$ \varphi (z)=\frac{M\left(z-\alpha \right)}{1-\overline{\alpha}z} $$

while

$$ \varphi (0)=1=-\alpha M\kern0.36em \Rightarrow \kern0.36em M=-{\alpha}^{-1} $$

and

$$ {\varphi}^{\prime }(0)=-\beta =-\frac{1}{\alpha}\left(1-\alpha \overline{\alpha}\right). $$

Therefore,

$$ 0=1-\alpha \beta -\alpha \overline{\alpha}, $$

meaning that, again, we have real roots which are reciprocal pairs and we can choose |α| < 1. Of course, α will depend upon the value of β, and we write α(β). Thus

$$ {\displaystyle \begin{array}{ll}C(z)& =\frac{1}{z^2}\left[1-\beta z-\frac{M\left(z-\alpha \left(\beta \right)\right)}{\left(1-\alpha \left(\beta \right)z\right)}\right]\hfill \\ {}& =\frac{1}{z^2}\left[\frac{\left(1-\beta z\right)\left(1-\alpha \left(\beta \right)z\right)\Big)-M\left(z-\alpha \left(\beta \right)\right)}{1-\alpha \left(\beta \right)z}\right]\hfill \\ {}& =\frac{1}{z^2}\left[\frac{1-\beta z-\alpha \left(\beta \right)z+\beta \alpha \left(\beta \right){z}^2- Mz+ M\alpha \left(\beta \right)}{1-\alpha \left(\beta \right)z}\right]\hfill \\ {}& =\frac{\beta \alpha \left(\beta \right)}{1-\alpha \left(\beta \right)z}.\hfill \end{array}} $$

Therefore, we have the robust prediction

$$ {P}_t^R{x}_{t+2}=\frac{\beta \alpha \left(\beta \right)}{1-\alpha \left(\beta \right)L}{\varepsilon}_t=\frac{\beta \alpha \left(\beta \right)}{1-\alpha \left(\beta \right)L}\left[{x}_t+\beta {x}_{t-1}+\beta {x}_{t-2}+\dots \right], $$

while the least-squares prediction is the standard

$$ {P}_t^{LS}{x}_{t+2}=0. $$

Robust Prediction of Geometric Distributed Leads

Following the excellent treatment in Kasa (2001), a robust present-value predictor fears that dividends may not be generated by the process in (10), and so, instead of choosing an f(z) to minimize the average loss around the unit circle, chooses f(z) to minimize the maximum loss:

$$ \underset{f(z)\in {H}^{\infty }}{\min}\kern0.24em \underset{\mid z\mid =1}{\sup }{\left|\frac{q(z)}{1-\gamma {z}^{-1}}-f(z)\right|}^2\iff \underset{f(z)\in {H}^{\infty }}{\min}\kern0.24em \underset{\mid z\mid =1}{\sup }{\left|\frac{zq(z)}{z-\gamma }-f(z)\right|}^2. $$

Unlike in the least squares case (14), where f(z) was restricted to the class H² of functions finitely square integrable on the unit circle, the restriction now is to the class of functions with finite maximum modulus on the unit circle, and the H² norm has been replaced by H^∞ norm.

To begin the solution process, note that there is considerable freedom in designing the minimizing function f(z): it must be well-behaved (that is, must have a convergent power series in non-negative powers of z on the unit disk), but is otherwise unrestricted. Recalling the Laurent expansion

$$ \frac{zq(z)}{z-\gamma }=\frac{b_{-1}}{z-\gamma }+{b}_0+{b}_1\left(z-\gamma \right)+{b}_2{\left(z-\gamma \right)}^2+\dots, $$

while in the least squares case f(z) was set to ‘cancel’ all the terms of this series except the first, here f(z) will be set to do something else. Now define the Blaschke factor B_γ(z) = (z − γ)/(1 − γz) and note that, because of the unit modulus condition, the problem can be written

$$ \underset{\left\{f(z)\right\}}{\min}\kern0.24em \underset{\mid z\mid =1}{\sup }{\left|\frac{zq(z)}{1-\gamma z}-\frac{z-\gamma }{1-\gamma z}f(z)\right|}^2. $$

Defining

$$ T(z)=\frac{zq(z)}{1-\gamma z} $$

we have

$$ \underset{f\in {H}^{\infty }}{\min}\kern0.24em \underset{\mid z\mid =1}{\sup}\mid T(z)-{B}_{\gamma }(z)f(z)\mid \iff \underset{f\in {H}^{\infty }}{\min}\mid \mid T(z)-{B}_{\gamma }(z)f(z)\mid {}_{\infty }. $$

Define the function inside the ∥’s as

$$ \varphi (z)=T(z)-{B}_{\gamma }(z)f(z) $$

and note that φ(γ) = T(γ). Thus the problem of finding f(z) reduces to the problem of finding the smallest φ(z) satisfying φ(γ) = T(γ):

$$ \underset{\varphi \in {H}^{\infty }}{\min}\mid \mid \varphi (z)\mid {}_{\infty}\mathrm{s}.\mathrm{t}.\varphi \left(\gamma \right)=T\left(\gamma \right) $$

Theorem 2

(Kasa 2001). The solution to (17) is the constant function φ(z) = T(γ).

Proof. To see this, first note that the norm of a constant function is the modulus of the constant itself. This is written as

$$ \mid {\left|\varphi (z)\right|}_{\infty }=\mid {\left|T\left(\gamma \right)\right|}_{\infty }={\left|T\left(\gamma \right)\right|}^2. $$

(17)

Next, suppose that there exists another function Ψ(z) ∈ H^∞, with Ψ(γ) = T(γ) and also

$$ \mid \mid \Psi (z)\mid {}_{\infty }<\mid \mid \varphi (z)\mid {}_{\infty }. $$

(18)

Recall the definition of the H^∞ norm, and using Eqs. (17) and (18):

$$ \mid {\left|\Psi (z)\right|}_{\infty }=\underset{\mid z\mid =1}{\sup }{\left|\Psi (z)\right|}^2<{\left|T\left(\gamma \right)\right|}^2. $$

The maximum modulus theorem states that a function f which is analytic on the disk U achieves its maximum on the boundary of the disk. That is

$$ \underset{z\in U}{\sup }{\left|f(z)\right|}^2\le \underset{z\in \partial U}{\sup }{\left|f(z)\right|}^2. $$

Therefore, we can see that

$$ \underset{\mid z\mid <1}{\sup }{\left|\Psi (z)\right|}^2\le \underset{\mid z\mid =1}{\sup }{\left|\Psi (z)\right|}^2<{\left|T\left(\gamma \right)\right|}^2. $$

However, one of the values on the interior of the unit disk is z = γ, which can be inserted into the far left-hand-side of Eq. (6) to get the result

$$ {\left|\Psi \left(\gamma \right)\right|}^2\le \underset{\mid z\mid =1}{\sup }{\left|\Psi (z)\right|}^2<{\left|T\left(\gamma \right)\right|}^2\Rightarrow {\left|\Psi \left(\gamma \right)\right|}^2<{\left|T\left(\gamma \right)\right|}^2. $$

This contradicts the requirement that Ψ(γ) = T(γ). Therefore, we have verified that there does not exist another function Ψ(z) ∈ H^∞ such that Ψ(γ) = T(γ) and ||Ψ(z)||_∞ < ||φ(z)||_∞ . □

Given the form for φ(z), the form for f(z) follows. After some tedious algebra, we obtain

$$ f(z)=\frac{T(z)-\varphi (z)}{B_{\gamma }(z)}=\frac{zq(z)-\gamma q\left(\gamma \right)}{z-\gamma }+\frac{\gamma^2}{1-{\gamma}^2}q\left(\gamma \right) $$

which is the least squares solution plus a constant. Thus the robust cross-equation restrictions likewise differ from the least squares cross-equation restrictions. After the initial period, the impulse response function for the robust predictor is identical to that of the least squares predictor. In the initial period, the least squares impulse response is q(γ), while the robust impulse response is larger: q(γ)/(1 − γ²).

Because γ is the discount factor, and therefore close to unity, the robust impulse response can be considerably larger than that of the least squares response. Relatedly, the volatility of prices in the robust case will be larger as well. For example, in the first-order autoregressive case studied above,

$$ {p}_t=f(L){\varepsilon}_t=\frac{1}{1-\rho \gamma}{d}_t+\frac{\gamma^2}{\left(1-{\gamma}^2\right)\left(1-\rho \gamma \right)}{\varepsilon}_t $$

(19)

from which the variance can be calculated as

$$ {\sigma}^2\left({p}_t\right)={\left(\frac{1}{1-\rho \gamma}\right)}^2{\sigma}^2\left({d}_t\right)+\frac{2{\gamma}^2-{\gamma}^4}{{\left(1-\rho \gamma \right)}^2{\left(1-{\gamma}^2\right)}^2}. $$

When the discount factor is large and dividends are highly persistent, the variance of the robust present value prediction can be considerably larger than that of the least squares prediction (the first term on the right alone).

Finally, recall that the least-squares present-value predictor behaved in such a way as to minimize the variance of the error $ {p}_t-{p}_t^{\ast } $. Here, robust prediction results in an error with Wold representation

$$ {p}_t-{p}_t^{\ast }=\gamma \left\{\frac{Lq(L)-\gamma q\left(\gamma \right)}{L-\gamma }+\frac{\gamma^2}{1-{\gamma}^2}q\left(\gamma \right)-\frac{q(L)}{1-\gamma {L}^{-1}}\right\}{\varepsilon}_t=-\frac{\gamma q\left(\gamma \right)}{1-{\gamma}^2}\left\{\frac{1-\gamma L}{L-\gamma}\right\}{\varepsilon}_t. $$

The term in braces has the form of a Blaschke factor. Applying such factors in the lag operator to a serially uncorrelated process like ε_t leaves a serially uncorrelated result; thus the robust present value predictor has behaved in such a way that the resulting errors are white noise. Of course this comes at a cost: to make the error serially uncorrelated, the robust predictor must tolerate an error variance that is larger than the least squares error variance by a factor of a²/(1 − γ²), which can be substantial when γ is close to unity.

Bibliography

Hansen, L.P., and T.J. Sargent. 1980. Formulating and estimating dynamic linear rational expectations models. Journal of Economic Dynamics and Control 2: 7–46.
Article Google Scholar
Hansen, L.P., and T.J. Sargent. 2007. Robustness. Princeton: Princeton University Press.
Google Scholar
Kasa, K. 2001. A robust Hansen–Sargent prediction formula. Economic Letters 71: 43–48.
Article Google Scholar
Nehari, Z. 1957. On bounded bilinear forms. Annals of Mathematics 65(1): 153–162.
Article Google Scholar
Sargent, T.J. 1987. Macroeconomic theory. New York: Academic Press.
Google Scholar
Whiteman, C.H. 1983. Linear rational expectations: A user ’s guide. Minneapolis: University of Minnesota Press.
Google Scholar
Whittle, P. 1983. Prediction and regulation by linear least-square methods. 2nd ed. Minneapolis: University of Minnesota Press.
Google Scholar

Download references

Author information

Authors and Affiliations

http://springerlink.bibliotecabuap.elogim.com/referencework/10.1057/978-1-349-95121-5
Charles H. Whiteman & Kurt F. Lewis

Authors

Charles H. Whiteman
View author publications
You can also search for this author in PubMed Google Scholar
Kurt F. Lewis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Copyright information

About this entry

Cite this entry

Whiteman, C.H., Lewis, K.F. (2018). Prediction Formulas. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_2180

Download citation

DOI: https://doi.org/10.1057/978-1-349-95189-5_2180
Published: 15 February 2018
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-95188-8
Online ISBN: 978-1-349-95189-5
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics

Prediction Formulas

Abstract

Similar content being viewed by others

Multi-step estimators and shrinkage effect in time series models

Forecasting with Time Series Models

Discussion of “Bayesian forecasting of multivariate time series: scalability, structure uncertainty and decisions”

Keywords

JEL Classifications

Introduction

The Wold Representation

Squared-Error Loss Optimal Prediction

AR(1) Example

MA(1) Example

Least Squares Prediction of Geometric Distributed Leads

Robust Optimal Prediction of Time Series

Theorem 1

Example. AR(1)

Example. MA(1)

Robust Prediction of Geometric Distributed Leads

Theorem 2

See Also

Bibliography

Author information

Authors and Affiliations

Editor information

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Prediction Formulas

Abstract

Similar content being viewed by others

Multi-step estimators and shrinkage effect in time series models

Forecasting with Time Series Models

Discussion of “Bayesian forecasting of multivariate time series: scalability, structure uncertainty and decisions”

Keywords

JEL Classifications

Introduction

The Wold Representation

Squared-Error Loss Optimal Prediction

AR(1) Example

MA(1) Example

Least Squares Prediction of Geometric Distributed Leads

Robust Optimal Prediction of Time Series

Theorem 1

Example. AR(1)

Example. MA(1)

Robust Prediction of Geometric Distributed Leads

Theorem 2

See Also

Bibliography

Author information

Authors and Affiliations

Editor information

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation