Having selected a model and fitted its parameters to a given times series, the model can then be used to estimate new data of the time series. If such data are estimated for a time period following the final data value X T of the given time series, we speak of a prediction or forecast. The estimation of data lying between given data points is called interpolation. The question now arises as to how a model such as those given in Eqs. 32.6 or 32.13 could be used to obtain an “optimal” estimate. To answer this question the forecasting error

$$\displaystyle \begin{aligned} X_{T+k}-\widehat{X}_{T+k}\;,\quad k\in\mathbb{N} \end{aligned}$$

between the estimated values \(\widehat {X}_{T+k}\) and the actual observed time series values X T+k can be used if the last value used in the calibration of the model was X T. The best forecast is that which minimizes the mean square error (MSE for short). The MSE is defined as the expectation of the squared forecasting error

$$\displaystyle \begin{aligned} \text{MQF}:=\text{E}[(X_{T+k}-\widehat{X}_{T+k})^{2}] \;.{} \end{aligned} $$
(33.1)

This expression is the mathematical formulation of the intuitive concept of the “distance” between the estimated and the actual values which is to be minimized on average (more cannot be expected when dealing with random variables). Minimizing this mean square error yields the result that the best forecasting estimate (called the optimalforecast) is given by the conditional expectation

$$\displaystyle \begin{aligned} \widehat{X}_{T+k}=\text{E}[X_{T+k}|X_{T},\ldots,X_{2},X_{1}]\;.{} \end{aligned} $$
(33.2)

This is the expectation of X T+k, conditional on all available information about the time series up to and including T.

In practice, however, the concrete computation of this conditional expectation is generally very difficult since the joint distribution of the random variables must be known. Therefore, we often limit our consideration to the linear forecast

$$\displaystyle \begin{aligned} \widehat{X}_{T+k}=u_{1}X_{1}+u_{2}X_{2}+\cdots+u_{T}X_{T} {} \end{aligned} $$
(33.3)

with appropriate coefficients u i. This linear forecast is, in contrast to the optimal forecast, often more easily interpreted. For the special case that the {X t} are normally distributed, the linear forecast and the optimal forecast agree. The best linear forecast can be characterized by the fact that the forecasting error \(X_{T+k}-\widehat {X}_{T+k}\) and the X 1, X 2, …, X T are uncorrelated. The intuitive interpretation is that the X 1, X 2, …, X T cannot provide any additional information for the forecast and the error is thus purely random.

1 Forecasting with Autoregressive Models

This forecasting procedure will now be applied to an AR(p) process, Eq. 32.6. The optimal one-step forecast is, according to Eq. 33.2, given directly by the conditional expectation in Eq. 32.7

$$\displaystyle \begin{aligned} \widehat{X}_{T+1}^{\text{optimal}}=\text{E}[X_{T+1}|X_{T},\ldots,X_{1} ]=\sum_{i=1}^{p}\phi_{i}X_{T+1-i}\;.{} \end{aligned} $$
(33.4)

This has the form indicated in Eq. 33.3. The optimal one-step forecast is thus the best linear one-step forecast. Equation 32.6 shows that the forecasting error \(X_{T+1}-\widehat {X}_{T+1}\) is precisely ε T+1 and independent of X 1, X 2, …, X T. The MSE for the one-step forecast is given by Eq. 33.1 with k = 1. Thus,

$$\displaystyle \begin{aligned} \text{MSE}=\text{E}[(X_{T+1}-\widehat{X}_{T+1})^{2}]=\text{E}[\varepsilon_{T+1}^{2}]=\sigma^{2}\;. \end{aligned}$$

The optimal two-step forecast is the conditional expectation of X T+2 on the basis of knowledge of X T, …, X 1, as can be seen from Eq. 33.2:

$$\displaystyle \begin{aligned} \widehat{X}_{T+2}^{\text{optimal}}=\text{E}[X_{T+2}|X_{T},\ldots,X_{1}]\;. \end{aligned} $$

This is not the equivalent to the conditional expectation of X T+2 on the basis of knowledge of X T+1, …, X 1. Hence, Eq. 32.7 cannot be applied directly. An additional difficulty arises due to the fact that X T+1 is unknown. The optimal two-step forecast cannot be calculated. We therefore proceed by computing the linear two-step forecast. The best linear two-step forecast is obtained by actually calculating the conditional expectation of X T+2 as if all the X T+1, …, X 1 were known and replacing the (unknown) value X T+1 by its best estimate \(\widehat {X}_{T+1}\) (which was calculated in the previous step):

$$\displaystyle \begin{aligned} \widehat{X}_{T+2}^{\text{linear}}=\text{E}[X_{T+2}|\widehat{X}_{T+1},X_{T}, \ldots,X_{1}]\;. \end{aligned} $$

Now Eq. 32.7 can be applied to this conditional expectation and utilizing Eq. 33.4 we obtain

$$\displaystyle \begin{aligned} \widehat{X}_{T+2}^{\text{linear}} & =\phi_{1}\widehat{X}_{T+1}+ \sum_{j=2}^{p}\phi_{j}X_{T+2-j}=\phi_{1}\sum_{i=1}^{p}\phi_{i}X_{T+1-i}+ \sum_{j=2}^{p}\phi_{j}X_{T+1-(j-1)}\\ & =\phi_{1}\sum_{i=1}^{p}\phi_{i}X_{T+1-i}+\sum_{i=1}^{p-1}\phi_{i+1}X_{T+1-i}\\ & =\phi_{1}\phi_{p}X_{T+1-p}+\sum_{i=1}^{p-1}\left[ \phi_{1}\phi_{i} +\phi_{i+1}\right] X_{T+1-i}\;. \end{aligned} $$

The linear two-step forecast then has the form indicated in Eq. 33.3. The forecasting error is found to be

$$\displaystyle \begin{aligned} X_{T+2}-\widehat{X}_{T+2} &=\underset{X_{T+2},\;\text{see Eqn. 32.6} }{\underbrace{\sum_{i=1}^{p}\phi_{i}X_{T+2-i}+\varepsilon_{T+2}}}-\phi _{1}\widehat{X}_{T+1}-\sum_{j=2}^{p}\phi_{j}X_{T+2-j}\\ & =\varepsilon_{T+2}+\phi_{1}(X_{T+1}-\widehat{X}_{T+1})\\ & =\varepsilon_{T+2}+\phi_{1}\varepsilon_{T+1}\;. \end{aligned} $$

Thus, the forecasting error is a sum of two normally distributed random variables and therefore itself normally distributed. The MSE can now be computed as follows:

$$\displaystyle \begin{aligned} \text{MSE} & =\text{E}[(X_{T+2}-\widehat{X}_{T+2})^{2}]=\text{E} [(\varepsilon_{T+2}+\phi_{1}\varepsilon_{T+1})^{2}]\\[.3em] & =\text{E}[\varepsilon_{T+2}^{2}]+\phi_{1}^{2}\text{E}[\varepsilon_{T+1} ^{2}]+2\phi_{1}\text{E}[\varepsilon_{T+2}\,\varepsilon_{T+1}]\\[.3em] & =\operatorname{var}[\varepsilon_{T+2}^{2}]+\phi_{1}^{2} \operatorname{var}[\varepsilon_{T+1}^{2}]+2\phi_{1}0\\[.3em] & =\sigma^{2}(1+\phi_{1}^{2})\;. \end{aligned} $$

This implies that the forecasting error \(X_{T+2}-\widehat {X}_{T+2}\) of the two-step forecast is normally distributed with variance \(\sigma ^{2}(1+\phi _{1}^{2}),\) i.e., N\((0,\sigma ^{2}(1+\phi _{1}^{2}))\).

Proceeding analogously, the best linear h-step forecast is obtained by taking the conditional expectation of X T+h as if all X t were known up to X T+h−1 and then replacing the yet unknown values of X t for T < t < h with their best estimators calculated inductively in previous steps as described above:

$$\displaystyle \begin{aligned} \widehat{X}_{T+h}^{\text{linear}}=\text{E}[X_{T+h}|\widehat{X}_{T+h-1} ,\widehat{X}_{T+h-2},\ldots,\widehat{X}_{T+1},X_{T},\ldots,X_{1}]\;. \end{aligned} $$

Equation 32.7 is then applied to these conditional expectations resulting in

$$\displaystyle \begin{aligned} \widehat{X}_{T+h}=\sum_{i=1}^{\min(h-1,\ p)}\phi_{i}\widehat{X}_{T+h-i} +\sum_{j=h}^{p}\phi_{j}X_{T+h-j}\;. \end{aligned}$$

The forecasting error of the h-step forecast is

$$\displaystyle \begin{aligned} X_{T+h}-\widehat{X}_{T+h} & =\underset{X_{T+h},\;\text{see Eqn. 32.6}} {\underbrace{\sum_{i=1}^{p}\phi_{i}X_{T+h-i}+\varepsilon_{T+h}}}-\sum _{i=1}^{\min(h-1, p)}\phi_{i}\widehat{X}_{T+h-i}-\sum_{j=h}^{p}\phi _{j}X_{T+h-j}\\ & =\varepsilon_{T+h}+\sum_{i=1}^{\min(h-1, p)}\phi_{i}X_{T+h-i}+\sum _{i=h}^{p}\phi_{i}X_{T+h-i}\\ &\quad -\sum_{i=1}^{\min(h-1, p)}\phi_{i}\widehat{X}_{T+h-i}-\sum_{j=h}^{p} \phi_{j}X_{T+h-j}\\ & =\varepsilon_{T+h}+\sum_{i=1}^{\min(h-1, p)}\phi_{i}\left(X_{T+h-i} -\widehat{X}_{T+h-i}\right)\;. \end{aligned} $$

This is a recursion expressing the h-step forecasting error in terms of the forecasting errors for fewer than h steps. From this it can be shown that the h-step forecasting error is distributed as \(\text{N}(0,\sigma ^{2} (1+\phi _{1}^{2}+\cdots +\phi _{h-1}^{2}))\).

The unknown coefficients ϕ 1, ϕ 2, …, ϕ p are estimated from the time series as shown in Sect. 32.2.1. The ϕ i in the forecast equation are simply replaced with \(\widehat {\phi }_{i}\).

2 Volatility Forecasts with GARCH(p, q) Processes

GARCH models of the form indicated in Eq. 32.13 are not suitable for the prediction of the actual values X i of a time series since the random variable in Eq. 32.13 appears as a product (rather than a sum as in Eq. 32.6). In consequence, the conditional expectations of the X i are identically zero. However, GARCH models are well adapted for forecasting the (conditional) variance of time series values. According to Eq. 33.2, the conditional expectation is in general the optimal forecast. We are therefore looking for the conditional expectation of the conditional variance.

2.1 Forecast Over Several Time Steps

2.1.1 The One-Step Forecast

Equation 32.16 shows that the conditional variance of X T is equal to H T if all X t for t ≤ T − 1 are known. Its conditional expectation is then the conditional expectation of H T. Based on Eq. 32.17, the conditional expectation of H T is simply H T itself if the X values are known up to the time T − 1. Hence, the optimal one-step forecast for the conditional variance isFootnote 1

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+1}^{\text{optimal}} & =\text{E}[\operatorname{var}_{T+1}|X_{T},\ldots,X_{1}]\\ & =\text{E}[H_{T+1}|X_{T},\ldots,X_{1}]\\ & =H_{T+1}\\ & =\alpha_{0}+\sum_{j=1}^{q}\alpha_{j}X_{T+1-j}^{2}+\sum_{i=1}^{p}\beta _{i}H_{T+1-i} \;.{} \end{aligned} $$
(33.5)

To clarify the argument used in the derivation of this result, each of the equalities in Eq. 33.5 will receive somewhat more scrutiny. The first equation is obtained from the general forecast equation, Eq. 33.2. The second follows from Eq. 32.16. The third holds as a result of Eq. 32.17 while the fourth equation is derived from Eq. 32.13 used in the construction of the GARCH process.

2.1.2 The Two-Step Forecast

The two-step forecast is somewhat more complicated. The optimal two-step forecast is, according to Eq. 33.2, the conditional expectation of \(\operatorname {var}_{T+2}\) under the condition that X T, …, X 1 are known:

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+2}^{\text{optimal}}=\text{E} [\operatorname{var}_{T+2}|X_{T},\ldots,X_{1}]\;. \end{aligned} $$

Again, this is not equal to the conditional expectation of \(\operatorname {var}_{T+2}\) under the condition that X T+1, …, X 1 are known. Equation 32.16 cannot be applied directly. Indeed, the optimal two-step forecast cannot be computed. We calculate instead, analogously to the linear forecast of the AR(p) process illustrated in Sect. 33.1, the best possible two-step forecast by replacing the expectation of \(\operatorname {var}_{T+2}\) conditional upon X T, …, X 1 with the conditional expectation of \(\operatorname {var}_{T+2}\) as if the X T+1, …, X 1 were all known:

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+2}=\text{E}[\operatorname{var}_{T+2} |X_{T+1},\ldots,X_{1}]\;. \end{aligned} $$

Now Eq. 32.16 can be applied to obtain

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+2}=\text{E}[\operatorname{var}_{T+2} |X_{T+1},\ldots,X_{1}]=H_{T+2}\;. \end{aligned}$$

Remember however, that X T+1 is not known and therefore H T+2 appearing here is not known at time T. The best we can do is to replace H T+2 by its optimal estimator which, according to Eq. 33.2, is given by its conditional expectation

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+2}=\text{E}[H_{T+2}|X_{T},\ldots,X_{1}]\;. {} \end{aligned} $$
(33.6)

Inside this expectation, we now replace H T+2 in accordance with the construction in 32.13:

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+2} & =\text{E}[H_{T+2}|X_{T},\ldots,X_{1}]\\ & =\alpha_{0}+\sum_{j=1}^{q}\alpha_{j}\text{E}[X_{T+2-j}^{2}|X_{T},\ldots,X_{1}] +\sum_{i=1}^{p}\beta_{i}\text{E}[H_{T+2-i}|X_{T},\ldots,X_{1}]\\ & =\alpha_{0}+\alpha_{1}\text{E}[X_{T+1}^{2}|X_{T},\ldots,X_{1}]+\sum _{j=2}^{q}\alpha_{j}X_{T+2-j}^{2}+\sum_{i=1}^{p}\beta_{i}H_{T+2-i}\;. {} \end{aligned} $$
(33.7)

In the last step, we have exploited the fact that all X t are known for all times t ≤ T and, according to Eq. 32.17, all of the H t for the times t ≤ T + 1. The expectation of the known quantities can be replaced by the quantities themselves. Only one unknown quantity remains, namely \(X_{T+1}^{2}.\) Because of Eq. 32.16, the conditional expectation of all of the X T+h for all h > 0 is always equal to zero. The expectation of \(X_{T+h}^{2}\) can therefore be replaced by the variance of X T+h:

$$\displaystyle \begin{aligned} \text{E}[X_{T+h}^{2}|X_{T},\ldots,X_{1}] & =\text{E}[X_{T+h}^{2} |X_{T},\ldots,X_{1}]-(\underset{0}{\underbrace{\text{E}[X_{T+h}|X_{T}, \ldots,X_{1}]}})^{2}\\ & =\operatorname{var}[X_{T+h}|X_{T},\ldots,X_{1}] \quad \text{for every}\;h>0\;.{} \end{aligned} $$
(33.8)

For h = 1 this implies

$$\displaystyle \begin{aligned} \text{E}[X_{T+1}^{2}|X_{T},\ldots,X_{1}]=\operatorname{var} [X_{T+1}|X_{T},\ldots,X_{1}]=H_{T+1}\;, \end{aligned}$$

where Eq. 32.16 has again been used in the last step. The two-step forecast then becomes

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+2}=\alpha_{0}+\alpha_{1}H_{T+1} +\sum_{j=2}^{q}\alpha_{j}X_{T+2-j}^{2}+\sum_{i=1}^{p}\beta_{i}H_{T+2-i}\;.{} \end{aligned} $$
(33.9)

2.1.3 The Three-Step Forecast

For the two-step forecast, only the value of X T+1 in Eq. 33.7 was unknown, the necessary H values were known up to time T + 1. This it no longer the case in the three-step forecast. In this case, some of the H values are also unknown. Because of this additional difficulty, it is advisable to demonstrate the computation of a three-step forecast before generalizing to arbitrarily many steps.

The three-step forecast now proceeds analogous to the two-step forecast: the optimal forecast is, as indicated in Eq. 33.2, the conditional expectation of X T+3 under the condition that X T, …, X 1 are known.

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+3}^{\text{optimal}}=\text{E} [\operatorname{var}_{T+3}|X_{T},\ldots,X_{1}]\;. \end{aligned} $$

Again, Eq. 32.16 cannot be directly applied since the X are only known up to X T and not up to X T+2. The best possible three-step forecast is thus, analogous to Eq. 33.6

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+3}=\text{E}[H_{T+3}|X_{T},\ldots,X_{1}] \;.{} \end{aligned} $$
(33.10)

In this expectation we now replace H T+3 with its expression constructed in Eq. 32.13 to obtain

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+3} & = \text{E}[H_{T+3}|X_{T},\ldots,X_{1}]\\ & =\alpha_{0}+\sum_{i=1}^{p}\beta_{i}\text{E}[H_{T+3-i}|X_{T}, \ldots,X_{1}]+\sum_{j=1}^{q}\alpha_{j}\text{E}[X_{T+3-j}^{2}|X_{T}, \ldots,X_{1}]\\ & =\alpha_{0}+\sum_{i=2}^{p}\beta_{i}H_{T+3-i}+\beta_{1}\text{E} [H_{T+2}|X_{T},\ldots,X_{1}]\\ & +\alpha_{1}\text{E}[X_{T+2}^{2}|X_{T},\ldots,X_{1}]+\alpha_{2} \text{E}[X_{T+1}^{2}|X_{T},\ldots,X_{1}]+ \sum_{j=3}^{q}\alpha_{j}X_{T+3-j}^{2}\;.{} \end{aligned} $$
(33.11)

In the last step, the expectations of the known values were again replaced by the values themselves (all X t for times t ≤ T and all H t for times t ≤ T + 1). Only three unknown values remain, namely \(X_{T+1}^{2}\), \(X_{T+2}^{2}\) and H T+2. For the conditional expectation of \(X_{T+1}^{2}\) and \(X_{T+2}^{2}\) we can use Eq. 33.8 to write

$$\displaystyle \begin{aligned} \text{E}[X_{T+1}^{2}|X_{T},\ldots,X_{1}] & =\operatorname{var}[X_{T+1} |X_{T},\ldots,X_{1}]=H_{T+1}\\ \text{E}[X_{T+2}^{2}|X_{T},\ldots,X_{1}] & =\operatorname{var}[X_{T+2} |X_{T},\ldots,X_{1}]=\widehat{\operatorname{var}}_{T+2}\;. \end{aligned} $$

Equation 32.16 has been used in the first of the above two equations. In the second equation, this is not possible since taking the conditional variance at time T + 2 under the condition that X T, …, X 1 are known is not the same as taking it conditional upon knowing the values of X T+1, …, X 1. We have no other choice than to replace the unknown \(\operatorname {var}[X_{T+2}|X_{T},\ldots ,X_{1}]\) with the (previously calculated) estimator \(\widehat {\operatorname {var}}_{T+2}\).

For the expectation E[H T+2|X T, …, X 1] we make use of the fact that, according to Eq. 33.6, it is equal to the two-step forecast for the variance

$$\displaystyle \begin{aligned} \text{E}[H_{T+2}|X_{T},\ldots,X_{1}]=\widehat{\operatorname{var} }_{T+2}\;. \end{aligned}$$

Substituting all this into Eq. 33.11 finally yields

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+3}=\alpha_{0}+\left(\alpha_{1}+\beta _{1}\right) \widehat{\operatorname{var}}_{T+2}+\alpha_{2}H_{T+1}+\sum _{j=3}^{q}\alpha_{j}X_{T+3-j}^{2}+\sum_{i=2}^{p}\beta_{i}H_{T+3-i} \;.{} \end{aligned} $$
(33.12)

2.1.4 The Forecast for h Steps

The generalization to the forecast for an arbitrary number of steps h is now quite simple. Analogous to Eqs. 33.6 and 33.10 the best possible estimate is

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+h}=\text{E}[H_{T+h}|X_{T},\ldots,X_{1}] \quad \text{for every}\;h>0\;.{} \end{aligned} $$
(33.13)

Within this expectation, we now replace H T+h as in the construction Eq. 32.13 and obtain an equation analogous to Eq. 33.11

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+h} & =\text{E}[H_{T+h}|X_{T},\ldots,X_{1}]\\ & =\alpha_{0}+\sum_{j=1}^{q}\alpha_{j} \text{E}[X_{T+h-j}^{2}|X_{T},\ldots,X_{1}] +\sum_{i=1}^{p}\beta_{i}\text{E}[H_{T+h-i}|X_{T},\ldots,X_{1}]\\ & =\sum_{j=1}^{\min(h-1,\,q)}\alpha_{j} \text{E}[X_{T+h-j}^{2}|X_{T},\ldots,X_{1}] +\sum_{j=h}^{q}\alpha_{j}X_{T+h-j}^{2}+\alpha_{0}\\ & +\sum_{i=1}^{\min(h-2,\,p)}\beta_{i} \text{E}[H_{T+h-i}|X_{T},\ldots,X_{1}] +\sum_{i=h-1}^{p}\beta_{i}H_{T+h-i}\;. \end{aligned} $$

In the last step, the expectation of the known values have again been replaced by the values themselves (all X t for times t ≤ T and all H t for times t ≤ T + 1). The remaining expectations of the Hs are replaced according to Eq. 33.13 through the respective variance estimators. We again use Eq. 33.8 for the conditional expectation of X 2 and write

$$\displaystyle \begin{aligned} \text{E}[X_{T+1}^{2}|X_{T},\ldots,X_{1}] & =\operatorname{var}[X_{T+1} |X_{T},\ldots,X_{1}]=H_{T+1}\\ \text{E}[X_{T+k}^{2}|X_{T},\ldots,X_{1}] & =\operatorname{var}[X_{T+k} |X_{T},\ldots,X_{1}]=\widehat{\operatorname{var}}_{T+k} \quad \text{for}\; k>1\;. \end{aligned} $$

Substituting all of these relations for the conditional expectations finally yields the general h-step forecast of the conditional volatility in the GARCH(p, q) model:

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+h} & =\sum_{j=1}^{\min (h-2,\,q)}\alpha_{j}\widehat{\operatorname{var}}_{T+h-j} +\sum_{j=h}^{q}\alpha_{j}X_{T+h-j}^{2}+\alpha_{h-1}H_{T+1}+\alpha_{0} {}\\ & +\sum_{i=1}^{\min(h-2,\, p)}\beta_{i}\widehat{\operatorname{var}} _{T+h-i}+\sum_{i=h-1}^{p}\beta_{i}H_{T+h-i}\;. \end{aligned} $$
(33.14)

Together with the start value, Eq. 33.5, in the form of \(\widehat {\operatorname {var}}_{T+1}=H_{T+1}\) the h-step forecast can be computed recursively for all h.

From Eqs. 33.13 and 33.2, the estimator for the variance is simultaneously the estimator for H, and thus

$$\displaystyle \begin{aligned} \widehat{H}_{T+h}=\widehat{\operatorname{var}}_{T+h} \quad \text{for all}\, h>0\;. \end{aligned}$$

2.2 Forecast for the Total Variance

In the financial world, the time series \(\left \{X_{T}\right \}\) is usually taken to represent a relative price change (yield), see for example Fig. 32.1 of the FTSE data. We are therefore also interested in the variance of the return over an entire time period of length h, i.e., in the variance of the sum \(\sum _{j=1}^{h}X_{T+j}^{{}}\). In forecasts such as in Eq. 33.14, we are only dealing with a prediction of the conditional variance afterh steps and not with a prediction of the variance over the entire term of h steps (from X T to X T+h). In other words, Eq. 33.14 is forecasting the conditional variance of X T+h alone, and not predicting the variance of the sum \(\sum _{j=1}^{h}X_{T+j}\). The variance of the total return \(\sum _{j=1}^{h}X_{T+j}\) for independent (in particular uncorrelated) returns is simply the sum of the variances as can be seen in Eq. A.17. Since the X t of the process in Eq. 32.13 are uncorrelated (because the ε t are iid), the estimator for the total variance of the GARCH process over h steps is simply

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}[\sum_{j=1}^{h}X_{T+j}|X_{T}, \ldots,X_{1}]=\sum_{j=1}^{h}\widehat{\operatorname{var}}[X_{T+j} |X_{T},\ldots,X_{1}]=\sum_{j=1}^{h} \widehat{\operatorname{var}}_{T+j}\;.{} \end{aligned} $$
(33.15)

Even in the case of weak autocorrelations between the returns in a given time series, this result holds in good approximation.

2.3 Volatility Term Structure

The variance of the total return over a term from T until T + h is a function of this term. The square root of the (annualized) variance of the total return as a function of the term is called the volatility term structure. This plays an important role in pricing options since for an option with a lifetime of h, the volatility associated with this term is the relevant parameter value. From the estimator for the variance of the total return over the pertinent term, we obtain the estimator of the volatility structure as

$$\displaystyle \begin{aligned} \sigma(T,T+h)=\sqrt{\frac{1}{h}\widehat{\operatorname{var}} \left[\sum_{i=1}^{h}X_{T+i}\Big|X_{T},\ldots,X_{1}\right]} =\sqrt{\frac{1}{h}\sum_{j=1} ^{h}\widehat{\operatorname{var}}_{T+j}}\;. {} \end{aligned} $$
(33.16)

3 Volatility Forecasts with GARCH (1,1) Processes

For the GARCH(1, 1) process (q = 1, p = 1), all of the above estimators can be computed explicitly and the recursion equation 33.14 can be carried out. The start value of the recursion is simply

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+1}^{{}}=H_{T+1}=\alpha_{0}+\alpha _{1}X_{T}^{2}+\beta_{1}H_{T} \;.{} \end{aligned} $$
(33.17)

as can be seen from Eq. 33.5. The two-step forecast as given by Eq. 33.9 simplifies to

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+2}=\alpha_{0}+\kappa H_{T+1} \end{aligned}$$

where we defined the abbreviation

$$\displaystyle \begin{aligned} \kappa:=\alpha_{1}+\beta_{1}\;. \end{aligned}$$

For h > 2 and p = q = 1, the upper limits in the sums over the variance estimators in the general recursion equation 33.14 are simply

$$\displaystyle \begin{aligned} \min(h-2,\ q)=\min(h-2,\ p)=1 \quad \text{for}\; h>2\;. \end{aligned}$$

Neither of the other sums makes any contribution since the lower limit in these sums is greater than the upper limit. The term α h−1H T+1 likewise does not exist since q = 1 implies that only α 0 and α 1 exist. However, h − 1 is greater than 1 for h > 2. All things considered, the h-step forecast in Eq. 33.14 reduces to

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+h}=\alpha_{0}+\kappa\,\widehat {\operatorname{var}}_{T+h-1} \end{aligned} $$

where κ = α 1 + β 1. This recursion relation has a closed form expression in the form of a geometric series:

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}_{T+h} & =\alpha_{0}+\kappa\widehat {\operatorname{var}}_{T+h-1}\\ & =\alpha_{0}+\kappa(\alpha_{0}+\kappa\widehat{\operatorname{var}} _{T+h-2})=\alpha_{0}(1+\kappa)+\kappa^{2}\widehat{\operatorname{var}} _{T+h-2}\\ & =\alpha_{0}(1+\kappa)+\kappa^{2}(\alpha_{0}+\kappa\widehat {\operatorname{var}}_{T+h-3})=(\alpha_{0}(1+\kappa+\kappa^{2})+\kappa ^{3}\widehat{\operatorname{var}}_{T+h-3}) \\ & \cdots\\ & =\alpha_{0}\underset{\text{geometric series}}{\underbrace{\sum_{i=1} ^{h-1}\kappa^{i-1}}}+\,\,\kappa^{h-1}\underset{H_{T+1}}{\underbrace {\widehat{\operatorname{var}}_{T+1}}}\\ & =\alpha_{0}\left(\frac{1-\kappa^{h-1}}{1-\kappa}\right) +\kappa ^{h-1}H_{T+1}\\ & =\widetilde{\alpha}_{0}+\kappa^{h-1}\left(H_{T+1}-\widetilde{\alpha} _{0}\right) \\ & =\widetilde{\alpha}_{0}+\kappa^{h-1}\left(\alpha_{0}+\beta_{1} H_{T}+\alpha_{1}X_{T}^{2}-\widetilde{\alpha}_{0}\right)\,,\qquad h>1 \;,{} \end{aligned} $$
(33.18)

where for \(\widehat {\operatorname {var}}_{T+1}\) the start-value of the recursion H T+1 was used and the geometric series was calculated according to Eq. 15.10. Here

$$\displaystyle \begin{aligned} \widetilde{\alpha}_{0}:=\frac{\alpha_{0}}{1-\alpha_{1}-\beta_{1}} \end{aligned}$$

again denotes the unconditional variance from Eq. 32.18. The GARCH(1,1) prediction for the conditional variance after h steps is therefore equal to the unconditional variance plus the difference between the one-step forecast and the unconditional variance dampened by the factor κ h−1. The stationarity condition requiring that α 1 + β 1 < 1 implies that for h → (a long prediction period) the GARCH prediction converges towards the unconditional variance.

The variance of the total return \(\sum _{i=1}^{h}X_{T+i}^{{}}\) over a term of length h as the sum of the conditional forecasts is obtained for the textGARCH(1, 1) process as indicated in Eq. 33.15:

(33.19)

where Eq. 15.10 for the geometric series is used again in the last step. The volatility term structure resulting from the 33.16 GARCH(1, 1) process is thus

$$\displaystyle \begin{aligned} \sigma(T,T+h) & =\sqrt{\frac{1}{h}\widehat{\operatorname{var}} [\sum_{i=1}^{h}X_{T+i}|X_{T},\ldots,X_{1}]}\\ & =\sqrt{\widetilde{\alpha}_{0}+\frac{1}{h}\left(\frac{1-\kappa^{h}} {1-\kappa}\right) \left(\alpha_{0}+\beta_{1}H_{T}+\alpha_{1}X_{T} ^{2}-\widetilde{\alpha}_{0}\right)} \;. \end{aligned} $$

which approaches the unconditional variance \(\widetilde {\alpha }_{0}\) like \(1/\sqrt {h}\) for large h.

4 Volatility Forecasts with Moving Averages

In addition to the relatively modern GARCH models, older methods such as moving averages exist in the market, which, despite their obvious shortcomings, are still widely used, thanks to their simplicity. Before entering into a discussion of volatility forecasts via moving averages and comparing them with those of the GARCH models, we will first introduce the two most important varieties, the simple moving average, abbreviated here as MA and the exponentially weighted moving average, abbreviated as EWMA.

The (simple) moving average measures the conditional variance (of a time series with zero mean) simply as the sum of evenly weighted squared time series values over a time window of width b. The form for the MA corresponding to Eq. 32.16 is simply

$$\displaystyle \begin{aligned} \operatorname{Var}[X_{t}|X_{t-1},\ldots,X_{1}]=\frac{1}{b}\sum _{k=1}^{b}X_{t-k}^{2}\;. \end{aligned}$$

The well-known phantom structures arise from this equation because every swing in the \(X_{t}^{2}\) is felt fully for b periods and then suddenly disappears completely when the term causing the perturbation no longer contributes to the average. An improvement would be to consider weighted sums where time series values further in the past are weighted less than values closer to the present. This can be realized, for example, by the exponentially weighted moving average EWMA. The conditional variance in the EWMA is

For λ < 1 the values lying further back contribute less. The values commonly assigned to the parameter λ lie between 0.8 and 0.98. Naturally, the simple MA can be interpreted as a special case of the EWMA with λ = 1. The conditional variance of the EWMA is very similar to that of a GARCH(1,1) process since the recursion for H T in Eq. 32.16 can be performed explicitly for p = q = 1 and the conditional variance of the GARCH(1,1) process becomes

If we now choose the parameters α 0 = 0, β 1 = λ and \(\alpha _{1}=\lambda (\sum _{j=1}^{b}\lambda ^{j})^{-1}\) then this conditional variance after b steps (apart from remainder term \(\beta _{1}^{b}H_{T-b}\) which contains the influence of factors lying still further in the past) is exactly the same expression as for the EWMA. The difference between the GARCH(1, 1) and EWMA models first appears clearly in variance forecasts over more than one time step.

The conditional variances presented above can be interpreted as a one-step forecast for the conditional variance. The forecast over h steps delivers nothing new for the moving averages since both the MA and EWMA are static and fail to take the time structure into consideration. They start with the basic assumption that prices are lognormally distributed with a constant volatility. This implies that

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}[X_{T+h}|X_{T},\ldots,X_{1} ]=\widehat{\operatorname{var}}[X_{T+1}|X_{T},\ldots,X_{1} ]=\operatorname{var}[X_{T+1}|X_{T},\ldots,X_{1}]\;. \end{aligned}$$

holds for the h-step forecast of the MA as well as for the EWMA. As mentioned after Eq. 33.14, the conditional variance afterh steps is being forecasted and not the variance of the total return over a term of h steps. The prediction for the variance of the total return as the sum over the conditional one-step forecasts is for moving averages (MA and EWMA) simply

$$\displaystyle \begin{aligned} \widehat{\operatorname{var}}[\sum_{i=1}^{h}X_{T+i}|X_{T}, \ldots,X_{1}]=\sum_{i=1}^{h}\underset{\widehat{\operatorname{var}} [X_{T+1}|X_{T},\ldots,X_{1}]}{\underbrace{\widehat{\operatorname{var} }[X_{T+i}|X_{T},\ldots,X_{1}]}}=h\,\operatorname{var}[X_{T+1} |X_{T},\ldots,X_{1}]\;. \end{aligned} $$

This is again the famous square root law for the growth of the standard deviation over time. The variance simply increases linearly over time and the standard deviation is therefore proportional to the square root of time. This leads to a static prediction of the volatility, and extrapolating, for example, daily to yearly volatilities in this way can easily result in an overestimation of the volatilities. The volatility term structure for the moving average is then, as expected, a constant:

$$\displaystyle \begin{aligned} \sigma(T,T+h)=\sqrt{\frac{1}{h}\widehat{\operatorname{var}} [\sum_{i=1}^{h}X_{T+i}|X_{T},\ldots,X_{1}]}=\sqrt{\operatorname{var} [X_{T+1}|X_{T},\ldots,X_{1}]}\;. \end{aligned}$$

In the Excel workbook Garch.xlsx , the one-step forecast of a GARCH(1, 1) process, an MA with b = 80 and an EWMA with b = 80 and λ = 0, 95 are presented. Furthermore, the ten-step forecast of the GARCH(1, 1) process is shown. Since the time series we are dealing with is a simulated GARCH process, the “true” volatility is known (it is the H t from the simulated series) and direct comparison can be made with each of the various estimates. As is clearly illustrated in Garch.xlsx, the one-step GARCH(1, 1) forecast (H t with the parameters \(\widehat {\alpha }_{0},\) \(\widehat {\alpha }_{1}\) and \(\widehat {\beta }_{1}\) fitted by simulated annealing) produces estimates which are quite close to the true volatility. The computation of the GARCH volatility term structure is presented in Garch.xlsx as well.