Abstract
Prediction formulas for multi-step forecasts and geometric distributed leads of stationary time series are derived using classical, frequency domain methods. Starting with the Wold representation, optimal squared-error loss predictions are derived using the analytic function theory approach of Whittle. This approach is easily adapted to the problem of making predictions that are robust under model misspecification. Forecasts and expected present value calculations are illustrated under both objectives for low-order autoregressive and moving average processes.
Access provided by CONRICYT-eBooks. Download reference work entry PDF
Similar content being viewed by others
Keywords
- Blaschke factors
- Contour integral
- Cross-equation restrictions
- Distributed leads
- Frequency domain problems
- Least squares
- Linear least squares projection
- Minimum norm interpolation problem
- Min-max problem
- Misspecification
- Prediction formulas
- Rational expectations
- Riesz–Fisher th
- Robustness
- Squared-error loss optimal prediction
- Time domain problems
- Wiener–Hopf equation
- Wold decomposition th
- Wold representation
JEL Classifications
Introduction
This article reviews the derivation of formulas for linear least squares and robust prediction of stationary time series and geometrically discounted distributed leads of such series. The derivations employed are the classical, frequency-domain procedures employed by Whittle (1983) and Whiteman (1983), and result in nearly closed-form expressions. The formulas themselves are useful directly in forecasting, and have also found uses in economic modelling, primarily in macroeconomics. Indeed, Hansen and Sargent (1980) refer to the cross-equation restrictions connecting the time series representation of driving variables to the analogous representation for predicting the present value of such variables as the ‘hallmark of rational expectations models’.
The Wold Representation
Suppose that {xt} is a covariance-stationary stochastic process and assume (without loss of generality) that Ext = 0. Covariance stationarity ensures that first and second unconditional moments of the process do not vary with time. Then, by the Wold decomposition theorem (see Sargent 1987, for an elementary exposition and proof), xt can be represented by:
with
and
where P(xt|xt−1, xt−2, …) denotes the linear least squares projection (population regression) of xt on xt−1, xt−2 , … Here, ‘represented by’ need not mean ‘generated by’, but rather ‘has the same variance and covariance stmcture as’. By construction, the ‘fundamental’ innovation εt is uncorrelated with information dated prior to t, including earlier values of the process itself: Eεtεt−s = 0 ∀ s > 0. This fact makes the Wold representation very convenient for computing predictions. The convolution in (1) is often written xt = A(L)εt using the polynomial \( A(L)={\sum\limits}_{j=0}^{\infty}\kern0.24em {a}_j{L}^j \) in the ‘lag operator’ L, where Lεt = εt−1.
Squared-Error Loss Optimal Prediction
The optimal prediction problem under squared-error loss can be thought of as follows. Given {xt} with the Wold representation (1) we want to find the stochastic process yt,
that will minimize the squared forecast error of the h-step ahead prediction
Equivalently, the problem can be written as
or
The problem in (2) involves finding a sequence of coefficients in the Wold representation of the unknown prediction process yt, and is referred to as the time domain problem. By virtue of the Riesz–Fisher theorem (see again Sargent 1987, for an exposition), the time-domain problem is equivalent to a frequency domain problem of finding an analytic function C(z) on the unit disk |z| ≤ 1 corresponding to the ‘z-transform’ of the {cj} sequence
that solves
where H2 denotes the Hardy space of square-integrable analytic functions on the unit disk, and ∮ denotes (counterclockwise) integration about the unit circle. The requirement that C(z) ∈ H2 ensures that the forecast is causal, and contains no future values of the εʼs; this is equivalent to the requirement that C(z) have a well-behaved power series expansion in non-negative powers of z.
Each formulation of the problem is useful, as often one or the other will be simpler to solve. This stems from the fact that convolution in the time domain becomes multiplication in the frequency domain and vice versa. To see this, consider the two sequences \( {\left\{{g}_k\right\}}_{k=-\infty}^{\infty } \) and \( {\left\{{h}_k\right\}}_{k=-\infty}^{\infty } \). The convolution of {gk} and {hk} is the sequence {fk}, in which a typical element would be:
The z-transform of the convolution is given by
Thus the ʻz-transform’ of the convolution of the sequences {gk} and {hk} is the product of the z-transforms of the two sequences.
Similarly, the z-transform of the product of two sequences is the convolution of the z-transforms:
To see why this is the case, note that
implying
But all of the terms vanish except where j = k because
except when k = 0. To see why, let z = eiθ. As θ increases from 0 to 2π, z goes around the unit circle. So, since dz = ieiθdθ, we have that
Thus,
by Cauchy’s Integral formula.
The frequency domain formulas can now be used to calculate moments quickly and conveniently. Consider \( {Ex}_t^2 \):
The result in Eq. (4) comes from the fact that Eεtεt−s = 0, ∀s ≠ 0. Using the product-convolution relation, we see that
Returning to the prediction problem, the task is to choose c0, c1, c2, … to
The first order conditions for the optimization in expression (7) are
for j = 0, 1, 2, … , where the second integral is the result of a change of variable p = z−1 so that dp = −z−1dz, resulting in
The result is that in the second integral, the direction of the contour integration is clockwise. Multiplying by −1 and integrating counterclockwise, the second integral becomes identical to the first, and we can write the set of first-order conditions as
Define F(z) such that
From Eq. (8), it must be the case that all coefficients on non-negative powers of z equal zero:
Multiplying by zj and summing over all j = 0, ±1, ±2, … , we obtain
where the term on the right-hand-side of (9) represents an unknown function in negative powers of z. Thus
which is an example of a ‘Wiener–Hopf’ equation. Now apply the (linear) ‘plussing’ operator, [⋅]+ , which means ‘ignore negative powers of z’ The unknown function in negative powers of z is ‘annihilated’ by this operation, resulting in
where pr[z−hA(z)] is the principal part of the Laurent expansion of z−hA(z) about z = 0. (The principal part of the Laurent expansion about z = 0 is the part involving negative powers of z.) This provides a very simple formula for computing forecasts.
AR(1) Example
Suppose that xt = axt−1 + εt. This means that A(z) = 1/(1 − az). In this case:
and the least squares loss predictor of xt+h using information dated t and earlier is
The forecast error is
which is serially correlated (for h ≥ 2), but not correlated with information dated t and earlier.
MA(1) Example
Supposed that xt = εt − αεt−1, meaning A(z) = 1 − az. Thus,
So, the best one-step ahead predictor is
and the best predictor for forecasts of horizon two or more is exactly zero. For two-step-ahead (and beyond) prediction, the forecast error is xt+h itself, which is serially correlated but not correlated with information dated t and earlier.
Least Squares Prediction of Geometric Distributed Leads
A prediction problem that characterizes many models in economics involves the expectation of a discounted value. Perhaps the most common and widely studied example is the present value formula for stock prices. Abstracting from mean and trend, suppose the dividend process has a Wold representation given by
Assuming that the constant discount factor is given by γ, we have the present value formula
The least-squares minimization problem the predictor faces is to find a stochastic process pt to minimize the expected squared prediction error \( E{\left({p}_t-{p}_t^{\ast}\right)}^2 \). In terms of the information known at date t, the agent’s task is to find a linear combination of current and past dividends, or, equivalently, of current and past dividend innovations εt, that is ‘close’ to \( {p}_t^{\ast } \). Writing pt = f(L)εt, the problem becomes one of finding the coefficients fj in f(L) = f0 + f1L + f2L2 + … to minimize \( E{\left(f(L){\varepsilon}_t-{p}_t^{\ast}\right)}^2. \) Using the method described in the previous section, the problem has an equivalent, frequency-domain representation
The first-order conditions for choosing fj are, after employing the same simplification used in (7),
Now define
so that (13) becomes
Then multiplying by zj and summing over all j = 0, ±1, ±2, … as above, we obtain
the Wiener–Hopf equation for this problem. Applying the plussing operator to both sides yields
implying
because f(z) is, by construction, one-sided in non-negative powers of z. As in the previous section,
where P(z) is the principal part of the Laurent series expansion of A(z). To determine the principal part of [(z − γ)−1zq(z)], note that zq(z) has a well-behaved power series expansion about z = γ, where ‘well-behaved’ means ‘involving no negative powers of (z − γ)’. Thus [(z − γ)−1zq(z)] has a power series expansion about z = γ involving a single term in (z − γ)−1:
The principal part here is the part involving negative powers of (z − γ) : b−1(z − γ)−1. To determine it, multiply both sides by (z − γ) and evaluate what is left at z = γ to find b−1 = γq(γ). Thus
The ‘cross-equation restrictions’ of rational expectations refer to the connection between the serial correlation structure of the driving process (here dividends) and the serial correlation structure of the expected discounted value of the driving process (here prices). That is, when dividends are characterized by q(z), prices are characterized by f(z), and f(z) depends upon q(z) as depicted in (14).
To illustrate how the formula works, suppose detrended dividends are described by a first-order autoregression; that is, that q(L) = (1 − ρL)−1. Then
It is instructive to note that, while the pricing formula (15) makes pt the best least squares predictor of \( {p}_t^{\ast } \), the prediction errors \( {p}_t-{p}_t^{\ast } \) will not be serially uncorrelated. Indeed
Thus the prediction errors will be described by a highly persistent (γ is close to unity) first-order autoregression. But because this autoregression involves future εt’s, the serial correlation structure of the errors cannot be exploited to improve the quality of the prediction of \( {p}_t^{\ast } \). The reason is that the predictor ‘knows’ the model for price setting (the present value formula) and the dividend process; the best predictor \( {p}_t={E}_t{p}_t^{\ast } \) of \( {p}_t^{\ast } \) tolerates’ the serial correlation because the (correct) model implies that it involves future εt’s and therefore cannot be predicted. If one only had data on the errors (and did not know the model that generated them), they would appear (rightly) to be characterized by a first-order autoregression; fitting an AR(1) (that is, the best linear model) and using it to ‘adjust’ pt by accounting for the serial correlation in the errors \( {p}_t-{p}_t^{\ast } \) would decrease the quality of the estimate of \( {p}_t^{\ast } \). The reason is the usual one that the Wold representation for \( {p}_t-{p}_t^{\ast } \) is not the economic model of \( {p}_t-{p}_t^{\ast } \), and (correct) models always beat Wold representations. This also serves as a reminder of circumstances under which one should be willing to tolerate serially correlated errors: when one knows the model that generated them, and the model implies that they are as small as they can be made.
Robust Optimal Prediction of Time Series
The squared-error loss function employed to this point is appropriate for situations in which the model (either the time series model or the economic model) is thought to be correct. But in many settings the forecaster or model builder may wish to guard against the possibility of misspecification. There are many ways to do this; an approach popular in the engineering literature and recently introduced into the economics literature by Hansen and Sargent (2007) involves behaving so as to minimize the maximum loss sustainable by using an approximating model when the truth may be something else. The ‘robust’ approach to this involves replacing the squared-error loss problem
with the ‘min-max’ problem
so that minimizing the ‘average’ value on the unit circle has been replaced by minimizing the max. This problem can also be written
This is known as the ‘minimum norm interpolation problem’ and amounts to finding a function φ(z) to
subject to the restriction that the power series expansion of φ(z) matches that of A(z) for the first h − 1 powers of z. This means that the following must hold:
Theorem 1
The minimizing φ(z) function is such that |φ(z)|2is constant on |z| = 1. Moreover,
where M, α1, α2, … ,αnare chosen to ensure that (16) holds.
Proof: see Nehari (1957).
To see that φ(z) must be of the indicated form, note that the ‘Blaschke factors’ in the product have unit modulus:
so that |φ(z)|2 = M2.
In the general h-step-ahead prediction problem, we have that
meaning that
This is analogous to the solution in the least-squares case, but, instead of subtracting the principal part of z−hA(z), we subtract a different function from z−hA(z). Note also that because
matches the power series expansion of A(z) up to the power zh−1, C(z) is of the form
Finally, note that the forecast error is serially uncorrelated because φ(z) is constant on |z| = 1.
Example. AR(1)
Let
For h = 1, we see that φ(z) = A(z) − zC(z) must be constant on |z| = 1, and that φ(0) = A(0) = 1. Thus, φ(z) = M = 1, so that
which implies that the robust one-step ahead forecast is
which coincides with the best least-squares forecast. This equivalence between the robust and least-squares one-step ahead forecasts is to be expected because the best one-step-ahead least-squares forecast also has serially uncorrelated errors. For h = 2, we have that
where (again) φ(0) = 1, but now we also see that φ′(0) = a. Thus,
and furthermore
Therefore, the solution will have the property that
That is, the roots are reciprocal pairs. Notice that the discriminant is positive \( \left({a}^2{\alpha}^2+4\alpha \overline{\alpha}>0\right) \), meaning that we will always have a real solution, and we choose |α| < 1. Then, we have that
So, the robust prediction is given by
in contrast to the least-squares prediction
Example. MA(1)
Suppose that the process follows an MA(1), xt =εt − βεt−1, and therefore A(z) = 1 − βz. The analysis from the previous example still holds, and all of the following are true:
while
and
Therefore,
meaning that, again, we have real roots which are reciprocal pairs and we can choose |α| < 1. Of course, α will depend upon the value of β, and we write α(β). Thus
Therefore, we have the robust prediction
while the least-squares prediction is the standard
Robust Prediction of Geometric Distributed Leads
Following the excellent treatment in Kasa (2001), a robust present-value predictor fears that dividends may not be generated by the process in (10), and so, instead of choosing an f(z) to minimize the average loss around the unit circle, chooses f(z) to minimize the maximum loss:
Unlike in the least squares case (14), where f(z) was restricted to the class H2 of functions finitely square integrable on the unit circle, the restriction now is to the class of functions with finite maximum modulus on the unit circle, and the H2 norm has been replaced by H∞ norm.
To begin the solution process, note that there is considerable freedom in designing the minimizing function f(z): it must be well-behaved (that is, must have a convergent power series in non-negative powers of z on the unit disk), but is otherwise unrestricted. Recalling the Laurent expansion
while in the least squares case f(z) was set to ‘cancel’ all the terms of this series except the first, here f(z) will be set to do something else. Now define the Blaschke factor Bγ(z) = (z − γ)/(1 − γz) and note that, because of the unit modulus condition, the problem can be written
Defining
we have
Define the function inside the ∥’s as
and note that φ(γ) = T(γ). Thus the problem of finding f(z) reduces to the problem of finding the smallest φ(z) satisfying φ(γ) = T(γ):
Theorem 2
(Kasa 2001). The solution to (17) is the constant function φ(z) = T(γ).
Proof. To see this, first note that the norm of a constant function is the modulus of the constant itself. This is written as
Next, suppose that there exists another function Ψ(z) ∈ H∞, with Ψ(γ) = T(γ) and also
Recall the definition of the H∞ norm, and using Eqs. (17) and (18):
The maximum modulus theorem states that a function f which is analytic on the disk U achieves its maximum on the boundary of the disk. That is
Therefore, we can see that
However, one of the values on the interior of the unit disk is z = γ, which can be inserted into the far left-hand-side of Eq. (6) to get the result
This contradicts the requirement that Ψ(γ) = T(γ). Therefore, we have verified that there does not exist another function Ψ(z) ∈ H∞ such that Ψ(γ) = T(γ) and ||Ψ(z)||∞ < ||φ(z)||∞ . □
Given the form for φ(z), the form for f(z) follows. After some tedious algebra, we obtain
which is the least squares solution plus a constant. Thus the robust cross-equation restrictions likewise differ from the least squares cross-equation restrictions. After the initial period, the impulse response function for the robust predictor is identical to that of the least squares predictor. In the initial period, the least squares impulse response is q(γ), while the robust impulse response is larger: q(γ)/(1 − γ2).
Because γ is the discount factor, and therefore close to unity, the robust impulse response can be considerably larger than that of the least squares response. Relatedly, the volatility of prices in the robust case will be larger as well. For example, in the first-order autoregressive case studied above,
from which the variance can be calculated as
When the discount factor is large and dividends are highly persistent, the variance of the robust present value prediction can be considerably larger than that of the least squares prediction (the first term on the right alone).
Finally, recall that the least-squares present-value predictor behaved in such a way as to minimize the variance of the error \( {p}_t-{p}_t^{\ast } \). Here, robust prediction results in an error with Wold representation
The term in braces has the form of a Blaschke factor. Applying such factors in the lag operator to a serially uncorrelated process like εt leaves a serially uncorrelated result; thus the robust present value predictor has behaved in such a way that the resulting errors are white noise. Of course this comes at a cost: to make the error serially uncorrelated, the robust predictor must tolerate an error variance that is larger than the least squares error variance by a factor of a2/(1 − γ2), which can be substantial when γ is close to unity.
See Also
Bibliography
Hansen, L.P., and T.J. Sargent. 1980. Formulating and estimating dynamic linear rational expectations models. Journal of Economic Dynamics and Control 2: 7–46.
Hansen, L.P., and T.J. Sargent. 2007. Robustness. Princeton: Princeton University Press.
Kasa, K. 2001. A robust Hansen–Sargent prediction formula. Economic Letters 71: 43–48.
Nehari, Z. 1957. On bounded bilinear forms. Annals of Mathematics 65(1): 153–162.
Sargent, T.J. 1987. Macroeconomic theory. New York: Academic Press.
Whiteman, C.H. 1983. Linear rational expectations: A user ’s guide. Minneapolis: University of Minnesota Press.
Whittle, P. 1983. Prediction and regulation by linear least-square methods. 2nd ed. Minneapolis: University of Minnesota Press.
Author information
Authors and Affiliations
Editor information
Copyright information
© 2018 Macmillan Publishers Ltd.
About this entry
Cite this entry
Whiteman, C.H., Lewis, K.F. (2018). Prediction Formulas. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_2180
Download citation
DOI: https://doi.org/10.1057/978-1-349-95189-5_2180
Published:
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-95188-8
Online ISBN: 978-1-349-95189-5
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences