Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In this chapter we discuss some of the time series models which have been found useful in the analysis of financial data. These include both discrete-time and continuous-time models, the latter being used widely, following the celebrated work of Black, Merton and Scholes, for the pricing of stock options. The closing price on trading day t, say P t , of a particular stock or stock-price index, typically appears to be non-stationary while the log asset price, X t : = log(P t ), has observed sample-paths like those of a random walk with stationary uncorrelated increments, i.e., the differenced log asset price, Z t : = X t X t−1, known as the log return (or simply return) for day t, has sample-paths resembling those of white noise. Although the sequence Z t appears to be white noise, there is strong evidence to suggest that it is not independent white noise. Much of the analysis of financial time series is devoted to representing and exploiting this dependence, which is not visible in the sample autocorrelation function of {Z t }. The continuous time analogue of a random walk with independent and identically distributed increments is known as a Lévy process, the most familiar examples of which are the Poisson process and Brownian motion. Lévy processes play a key role in the continuous-time modeling of financial data, both as models for the log asset price itself and as building blocks for more complex models. We give a brief introduction to these processes and some of the continuous-time models constructed from them. Finally we consider the pricing of European stock options using the geometric Brownian motion model for stock prices, a model which, in spite of its limitations, has been found useful in practice.

7.1 Historical Overview

For more than 30 years now, discrete-time models (including stochastic volatility, ARCH, GARCH and their many generalizations) have been developed to reflect the so-called stylized features of financial time series. These properties, which include tail heaviness, asymmetry, volatility clustering and serial dependence without correlation, cannot be captured with traditional linear time series models such as the ARMA models considered earlier in this book. If P t denotes the price of a stock or other financial asset at time t,  t ∈ , then the series of log returns, {Z t : = logP t − logP t−1}, is typically modeled as a stationary time series. An ARMA model for the series {Z t } would have the property that the conditional variance h t of Z t given {Z s , s < t} is independent of t and of {Z s , s < t}. However even a cursory inspection of most empirical log return series (see e.g., Figure 7.4) strongly suggests that this is not the case in practice. The fundamental idea of the ARCH (autoregressive conditional heteroscedasticity) model (Engle 1982) is to incorporate the sequence {h t } into the model by postulating that

$$\displaystyle{Z_{t} = \sqrt{h_{t}}e_{t},\ \ \mathrm{where}\ \ \{e_{t}\} \sim \mathrm{IID\ N}(0,1)}$$

and h t (known as the volatility ) is related to the past values of Z t 2 via a relation of the form,

$$\displaystyle{h_{t} =\alpha _{0} +\sum _{ i=1}^{p}\alpha _{ i}Z_{t-i}^{2},}$$

for some positive integer p, where α 0 > 0 and α i  ≥ 0,  i = 1, , p. The GARCH (generalized ARCH) model of Bollerslev (1986) postulates a more general relation,

$$\displaystyle{h_{t} =\alpha _{0} +\sum _{ i=1}^{p}\alpha _{ i}Z_{t-i}^{2} +\sum _{ i=1}^{q}\beta _{ i}h_{t-i},}$$

with α 0 > 0, α i  ≥ 0, i = 1, , p, and β i  ≥ 0, i = 1, , q. These models have been studied intensively since their introduction and a variety of parameter estimation techniques have been developed. They will be discussed in Section 7.2 and some of their extensions in Section 7.3.

An alternative approach to modeling the changing variability of log returns, due to Taylor (1982), is to suppose that \(Z_{t} = \sqrt{h_{t}}e_{t}\), where {e t } ∼ IID(0, 1) and the volatility sequence {h t } is independent of {e t }. (Taylor originally allowed {e t } to be an autoregression, but it is now customary to use the more restrictive definition just given.) A critical difference from the ARCH and GARCH models is the fact that the conditional distribution of h t given {h s , s < t} is independent of {e s , s < t}. A widely used special case of this model is the so-called log-normal stochastic volatility (SV) model in which {e t } ∼ IID N(0, 1), lnh t  = γ 0 +γ 1lnh t−1 +η t , {η t } ∼ IID N(0, σ 2) and {η t } and {e t } are independent. We shall discuss this model in Section 7.4.

Continuous-time models for financial time series have a long history, going back at least to Bachelier (1900), who used Brownian motion to represent the prices {P(t), t ≥ 0} of a stock in the Paris stock exchange. This model had the unfortunate feature of permitting negative stock prices, a shortcoming which was eliminated in the geometric Brownian motion model of Samuelson (1965), according to which P(t) satisfies an Itô stochastic differential equation of the form,

$$\displaystyle{\mathrm{d}P(t) =\mu P(t)\ \mathrm{d}t +\sigma P(t)\ \mathrm{d}B(t),}$$

where μ ∈ , σ > 0 and B is standard Brownian motion. For any fixed positive value of P(0) the solution (see Section 7.5.2 and Appendix D.4) is

$$\displaystyle{P(t) = P(0)\exp \left [(\mu -\sigma ^{2}/2)t +\sigma B(t)\right ],\ t \geq 0,}$$

so that the log asset price, X(t): = logP(t), is Brownian motion and the log return over the time-interval (t, t +Δ) is

$$\displaystyle{X(t+\varDelta ) - X(t) = (\mu -{1 \over 2}\sigma ^{2})\varDelta +\sigma (B(t+\varDelta ) - B(t)).}$$

For disjoint intervals of length Δ the log returns are therefore independent normally distributed random variables with mean (μσ 2∕2)Δ and variance σ 2 Δ. The normality is a conclusion which can easily be checked against observed log returns, and it is found that although the observed values are approximately normally distributed for intervals Δ greater than 1 day, the deviations from normality are substantial for shorter time intervals. This is one of the reasons for developing the more realistic models described in Section 7.5. The parameter σ 2 is called the volatility parameter of the geometric Brownian motion model and plays a key role in the celebrated option pricing results (see Section 7.6) developed for this model by Black, Scholes and Merton, earning the Nobel Economics Prize for Merton and Scholes in 1997 (unfortunately Black died before the award was made). These results inspired an explosion of interest, not only in the pricing of more complicated financial derivatives, but also in the development of new continuous-time models which, like the discrete-time ARCH, GARCH and stochastic volatility models, better reflect the observed properties of financial time series.

7.2 GARCH Models

For modeling changing volatility as discussed above, Engle (1982) introduced the ARCH( p ) process {Z t } as a stationary solution of the equations

$$\displaystyle{ Z_{t} = \sqrt{h_{t}}e_{t},\{e_{t}\} \sim \mathrm{IID\ N}(0,1), }$$
(7.2.1)

where h t is the (positive) function of {Z s , s < t}, defined by

$$\displaystyle{ h_{t} =\alpha _{0} +\sum _{ i=1}^{p}\alpha _{ i}Z_{t-i}^{2}, }$$
(7.2.2)

with α 0 > 0 and α j  ≥ 0, j = 1, , p. The name ARCH signifies autoregressive conditional heteroscedasticity and h t is the conditional variance of Z t given {Z s , s < t}.

The simplest such process is the ARCH(1) process. In this case the recursions (7.2.1) and (7.2.2) give

$$\displaystyle\begin{array}{rcl} Z_{t}^{2}& =& \alpha _{ 0}e_{t}^{2} +\alpha _{ 1}Z_{t-1}^{2}e_{ t}^{2} {}\\ & =& \alpha _{0}e_{t}^{2} +\alpha _{ 1}\alpha _{0}e_{t}^{2}e_{ t-1}^{2} +\alpha _{ 1}^{2}Z_{ t-2}^{2}e_{ t}^{2}e_{ t-1}^{2} \phantom{well} {}\\ & =& \ \cdots {}\\ & =& \alpha _{0}\sum _{j=0}^{n}\alpha _{ 1}^{\,j}e_{ t}^{2}e_{ t-1}^{2}\cdots e_{ t-j}^{2} +\alpha _{ 1}^{n+1}Z_{ t-n-1}^{2}e_{ t}^{2}e_{ t-1}^{2}\cdots e_{ t-n}^{2}. {}\\ \end{array}$$

If α 1 < 1 and {Z t } is stationary and causal (i.e., Z t is a function of {e s , s ≤ t}), then the last term has expectation α n+1 EZ t 2 and converges to zero as n → . The first term converges as n →  since it is non-decreasing in n and its expected value is bounded above by α 0∕(1 −α 1). Hence

$$\displaystyle{ Z_{t}^{2} =\alpha _{ 0}\sum _{j=0}^{\infty }\alpha _{ 1}^{\,j}e_{ t}^{2}e_{ t-1}^{2}\cdots e_{ t-j}^{2} }$$
(7.2.3)

and

$$\displaystyle{ EZ_{t}^{2} =\alpha _{ 0}/(1 -\alpha _{1}). }$$
(7.2.4)

Since

$$\displaystyle{ Z_{t} = e_{t}\sqrt{\alpha _{1 } \left (1 +\sum _{ j=1 }^{\infty }\alpha _{1 }^{\,j }e_{t-1 }^{2 }\cdots e_{t-j }^{2 } \right )}, }$$
(7.2.5)

it is clear that {Z t } is strictly stationary and hence, since EZ t 2 < , also stationary in the weak sense. We have now established the following result.

Solution of the ARCH(1) Equations:

If α 1 < 1, the unique causal stationary solution of the ARCH(1) equations is given by (7.2.5). It has the properties

$$\displaystyle\begin{array}{rcl} & & E(Z_{t}) = E(E(Z_{t}\vert e_{s},s < t)) = 0, \phantom{well} {}\\ & & \mathrm{Var}(Z_{t}) =\alpha _{0}/(1 -\alpha _{1}), {}\\ \end{array}$$

and

$$\displaystyle{E(Z_{t+h } Z_{t}) = E(E(Z_{t+h}Z_{t}\vert e_{s},s < t + h)) = 0\mbox{ for }h > 0.}$$

Thus the ARCH(1) process with α 1 < 1 is strictly stationary white noise. However, it is not an iid sequence, since from (7.2.1) and (7.2.2),

$$\displaystyle{E(Z_{t}^{2}\vert Z_{ t-1}) = (\alpha _{0} +\alpha _{1}Z_{t-1}^{2})E(e_{ t}^{2}\vert Z_{ t-1}) =\alpha _{0} +\alpha _{1}Z_{t-1}^{2}.}$$

This also shows that {Z t } is not Gaussian, since strictly stationary Gaussian white noise is necessarily iid. From (7.2.5) it is clear that the distribution of Z t is symmetric, i.e., that Z t and − Z t have the same distribution. From (7.2.3) it is easy to calculate E{\bigl (Z t 4\bigr )} (Problem 7.1) and hence to show that E{\bigl (Z t 4\bigr )} is finite if and only if 3α 1 2 < 1. More generally (see Engle 1982), it can be shown that for every α 1 in the interval (0, 1), E{\bigl (Z 2k\bigr )} =  for some positive integer k. This indicates the “heavy-tailed” nature of the marginal distribution of Z t . If EZ t 4 < , the squared process Y t  = Z t 2 has the same ACF as the AR(1) process W t  = α 1 W t−1 + e t , a result that extends also to ARCH(p) processes (see Problem 7.3).

The ARCH(p) process is conditionally Gaussian, in the sense that for given values of {Z s , s = t − 1, t − 2, , tp}, Z t is Gaussian with known distribution. This makes it easy to write down the likelihood of Z p+1, , Z n conditional on {Z 1, , Z p } and hence, by numerical maximization, to compute conditional maximum likelihood estimates of the parameters. For example, the conditional likelihood of observations {z 2, , z n } of the ARCH(1) process given Z 1 = z 1 is

$$\displaystyle{L =\prod _{ t=2 }^{n }{ 1 \over \sqrt{2\pi \left (\alpha _{0 } +\alpha _{1 } z_{t-1 }^{\,2 } \right )}}\exp \left \{-{ z_{t}^{\,2} \over 2{\bigl (\alpha _{0} +\alpha _{1}z_{t-1}^{\,2}\bigr )}}\right \}.}$$

Example 7.2.1

An ARCH(1) Series Figure 7.1 shows a realization of the ARCH(1) process with α 0 = 1 and α 1 = 0. 5. The graph of the realization and the sample autocorrelation function shown in Figure 7.2 suggest that the process is white noise. This conclusion is correct from a second-order point of view.

Fig. 7.1
figure 1

A realization of the process \(Z_{t} = e_{t}\sqrt{1 + 0.5Z_{t-1 }^{2}}\)

Fig. 7.2
figure 2

The sample autocorrelation function of the series in Figure 7.1

However, the fact that the series is not a realization of iid noise is very strongly indicated by Figure 7.3, which shows the sample autocorrelation function of the series {\bigl \{Z t 2\bigr \}}. (The sample ACF of { | Z t  | } and that of {Z t 2} can be plotted in ITSM by selecting Statistics>Residual Analysis>ACF abs values/Squares.)

Fig. 7.3
figure 3

The sample autocorrelation function of the squares of the data shown in Figure 7.1

It is instructive to apply the Ljung–Box and McLeod–Li portmanteau tests for white noise to this series (see Section 1.6). To do this using ITSM, open the file ARCH.TSM, and then select Statistics>Residual Analysis>Tests of Randomness. We find (with h = 20) that the Ljung–Box test (and all the others except for the McLeod–Li test) are passed comfortably at level 0.05. However, the McLeod–Li test gives a p-value of 0 to five decimal places, clearly rejecting the hypothesis that the series is iid. □ 

The GARCH( p, q ) process (see Bollerslev 1986) is a generalization of the ARCH(p) process in which the variance equation (7.2.2) is replaced by

$$\displaystyle{ h_{t} =\alpha _{0} +\sum _{ i=1}^{p}\alpha _{ i}Z_{t-i}^{2} +\sum _{ j=1}^{q}\beta _{ j}h_{t-j}, }$$
(7.2.6)

with α 0 > 0 and α j , β j  ≥ 0, j = 1, 2, . 

In the analysis of empirical financial data such as percentage daily stock returns (defined as 100ln(P t P t−1), where P t is the closing price on trading day t), it is usually found that better fits to the data are obtained by relaxing the Gaussian assumption in (7.2.1) and supposing instead that the distribution of Z t given {Z s , s < t} has a heavier-tailed zero-mean distribution such as Student’s t-distribution. To incorporate such distributions we can define a general GARCH(p, q) process as a stationary process {Z t } satisfying (7.2.6) and the generalized form of (7.2.1),

$$\displaystyle{ Z_{t} = \sqrt{h_{t}}e_{t},\quad \{e_{t}\} \sim \mathrm{IID}(0,1). }$$
(7.2.7)

For modeling purposes it is usually assumed in addition that either

$$\displaystyle{ e_{t} \sim N(0,1), }$$
(7.2.8)

(as in (7.2.1)) or that

$$\displaystyle{ \sqrt{ \frac{\nu } {\nu -2}}e_{t} \sim t_{\nu },\quad \nu > 2, }$$
(7.2.9)

where t ν denotes Student’s t-distribution with ν degrees of freedom. (The scale factor on the left of (7.2.9) is introduced to make the variance of e t equal to 1.) Other distributions for e t can also be used.

One of the striking features of stock return data that is reflected by GARCH models is the “persistence of volatility,” or the phenomenon that large (small) fluctuations in the data tend to be followed by fluctuations of comparable magnitude. GARCH models reflect this by incorporating correlation in the sequence {h t } of conditional variances.

Example 7.2.2

Fitting GARCH Models to Stock Data

The top graph in Figure 7.4 shows the percentage daily returns of the Dow Jones Industrial Index for the period July 1st, 1997, through April 9th, 1999, contained in the file E1032.TSM. The graph suggests that there are sustained periods of both high volatility (in October, 1997, and August, 1998) and of low volatility. The sample autocorrelation function of this series, like that in Example 7.2.1, has very small values, however the sample autocorrelations of the absolute values and squares of the data (like those in Example 7.2.1) are significantly different from zero, indicating dependence in spite of the lack of autocorrelation. (The sample autocorrelations of the absolute values and squares of the residuals (or of the data if no transformations have been made and no model fitted) can be seen by clicking on the third green button at the top of the ITSM window.) These properties suggest that an ARCH or GARCH model might be appropriate for this series. □ 

Fig. 7.4
figure 4

The daily percentage returns of the Dow Jones Industrial Index (E1032.TSM) from July 1, 1997, through April 9, 1999 (above), and the estimates of \(\sigma _{t} = \sqrt{h_{t}}\) for the conditional Gaussian GARCH(1,1) model of Example 7.2.2

The model

$$\displaystyle{ Y _{t} = a + Z_{t}, }$$
(7.2.10)

where {Z t } is the GARCH(p, q) process defined by (7.2.6)–(7.2.8), can be fitted using ITSM as follows. Open the project E1032.TSM and click on the red button labeled GAR at the top of the ITSM screen. In the resulting dialog box enter the desired values of p and q, e.g., 1 and 1 if you wish to fit a GARCH(1,1) model. You may also enter initial values for the coefficients α 0, , α p , and β 1, , β q , or alternatively use the default values specified by the program. Make sure that Use normal noise is selected, click on OK and then click on the red MLE button. You will be advised to subtract the sample mean (unless you wish to assume that the parameter a in (7.2.10) is zero). If you subtract the sample mean it will be used as the estimate of a in the model (7.2.10). The GARCH Maximum Likelihood Estimation box will then open. When you click on OK the optimization will proceed. Denoting by \(\{\tilde{Z}_{t}\}\) the (possibly) mean-corrected observations, the GARCH coefficients are estimated by numerically maximizing the likelihood of \(\tilde{Z}_{p+1},\ldots,\tilde{Z}_{n}\) conditional on the known values \(\tilde{Z}_{1},\ldots,\tilde{Z}_{p}\), and with assumed values 0 for each \(\tilde{Z}_{t}\), t ≤ 0, and \(\hat{\sigma }^{2}\) for each h t , t ≤ 0, where \(\hat{\sigma }^{2}\) is the sample variance of \(\{\tilde{Z}_{1},\ldots,\tilde{Z}_{n}\}\). In other words the program maximizes

$$\displaystyle{ L(\alpha _{0},\ldots,\alpha _{p},\beta _{1},\ldots,\beta _{q}) =\prod _{ t=p+1}^{n}\frac{1} {\sigma _{t}} \phi \bigg(\frac{\tilde{Z}_{t}} {\sigma _{t}} \bigg), }$$
(7.2.11)

with respect to the coefficients α 0, , α p and β 1, , β q , where ϕ denotes the standard normal density, and the standard deviations \(\sigma _{t} = \sqrt{h_{t}},t \geq 1\), are computed recursively from (7.2.6) with Z t replaced by \(\tilde{Z}_{t}\), and with \(\tilde{Z}_{t} = 0\) and \(h_{t} = \hat{\sigma }^{2}\) for t ≤ 0. To find the minimum of − 2ln(L) it is advisable to repeat the optimization by clicking on the red MLE button and then on OK several times until the result stabilizes. It is also useful to try other initial values for α 0, , α p , and β 1, , β q , to minimize the chance of finding only a local minimum of − 2ln(L). Note that the optimization is constrained so that the estimated parameters are all non-negative with

$$\displaystyle{ \hat{\alpha }_{1} + \cdots + \hat{\alpha }_{p} + \hat{\beta }_{1} + \cdots + \hat{\beta }_{q} < 1, }$$
(7.2.12)

and \(\hat{\alpha }_{0} > 0\). Condition (7.2.12) is necessary and sufficient for the corresponding GARCH equations to have a causal weakly stationary solution.

Comparison of models with different orders p and q can be made with the aid of the AICC, which is defined in terms of the conditional likelihood L as

$$\displaystyle{ \mathrm{AICC}:= -2 \frac{n} {n - p}\mathrm{ln}L + 2(p + q + 2)n/(n - p - q - 3). }$$
(7.2.13)

The factor n∕(np) multiplying the first term on the right has been introduced to correct for the fact that the number of factors in (7.2.11) is only np. Notice also that the GARCH(p, q) model has p + q + 1 coefficients.

The estimated mean is \(\hat{a} = 0.0608\) and the minimum-AICC GARCH model (with Gaussian noise) for the residuals, \(\tilde{Z}_{t} = Y _{t} -\hat{a}\), is found to be the GARCH(1,1) with estimated parameter values

$$\displaystyle{\hat{\alpha }_{0} = 0.1300,\hat{\alpha }_{1} = 0.1266,\hat{\beta }_{1} = 0.7922,}$$

and an AICC value [defined by (7.2.13)] of 1469.02. The bottom graph in Figure 7.4 shows the corresponding estimated conditional standard deviations, \(\hat{\sigma }_{t}\), which clearly reflect the changing volatility of the series {Y t }. This graph is obtained from ITSM by clicking on the red SV (stochastic volatility) button. Under the model defined by (7.2.6)–(7.2.8) and (7.2.10), the GARCH residuals, \(\big\{\tilde{Z}_{t}/\hat{\sigma }_{t}\big\}\), should be approximately IID N(0,1). A check on the independence is provided by the sample ACF of the absolute values and squares of the residuals, which is obtained by clicking on the fifth red button at the top of the ITSM window. These are found to be not significantly different from zero. To check for normality, select Garch>Garch residuals>QQ-Plot(normal). If the model is appropriate the resulting graph should approximate a straight line through the origin with slope 1. It is found that the deviations from the expected line are quite large for large values of \(\big\vert \tilde{Z}_{t}\big\vert \), suggesting the need for a heavier-tailed model, e.g., a model with conditional t-distribution as defined by (7.2.9).

To fit the GARCH model defined by (7.2.6), (7.2.7), (7.2.9) and (7.2.10) (i.e., with conditional t-distribution), we proceed in the same way, but with the conditional likelihood replaced by

$$\displaystyle{ L(\alpha _{0},\ldots,\alpha _{p},\beta _{1},\ldots,\beta _{q},\nu ) =\prod _{ t=p+1}^{n} \frac{\sqrt{\nu }} {\sigma _{t}\sqrt{\nu -2}}t_{\nu }\left ( \frac{\tilde{Z}_{t}\sqrt{\nu }} {\sigma _{t}\sqrt{\nu -2}}\right ). }$$
(7.2.14)

Maximization is now carried out with respect to the coefficients, α 0, , α p , β 1, , β q and the degrees of freedom ν of the t-density, t ν . The optimization can be performed using ITSM in exactly the same way as described for the GARCH model with Gaussian noise, except that the option Use t-distribution for noise should be checked in each of the dialog boxes where it appears. In order to locate the minimum of − 2ln(L) it is often useful to initialize the coefficients of the model by first fitting a GARCH model with Gaussian noise and then carrying out the optimization using t-distributed noise.

The estimated mean is \(\hat{a} = 0.0608\) as before and the minimum-AICC GARCH model for the residuals, \(\tilde{Z}_{t} = Y _{t} -\hat{a}\), is the GARCH(1,1) with estimated parameter values

$$\displaystyle{\hat{\alpha }_{0 }= 0.1324,\quad \hat{\alpha }_{1} = 0.0672,\quad \hat{\beta }_{1} = 0.8400,\quad \hat{\nu } = 5.714,}$$

and an AICC value (as in (7.2.13) with q replaced by q + 1) of 1437.89. Thus from the point of view of AICC, the model with conditional t-distribution is substantially better than the conditional Gaussian model. The sample ACF of the absolute values and squares of the GARCH residuals are much the same as those found using Gaussian noise, but the qq plot (obtained by clicking on the red QQ button and based on the t-distribution with 5.714 degrees of freedom) is closer to the expected line than was the case for the model with Gaussian noise.

There are many important and interesting theoretical questions associated with the existence and properties of stationary solutions of the GARCH equations and their moments and of the sampling properties of these processes. As indicated above, in maximizing the conditional likelihood, ITSM constrains the GARCH coefficients to be non-negative and to satisfy the condition (7.2.12) with \(\hat{\alpha }_{0} > 0\). These conditions are sufficient for the process defined by the GARCH equations to be stationary. It is frequently found in practice that the estimated values of α 1, , α p and β 1, , β q have a sum which is very close to 1. A GARCH(p,q) model with α 1 + ⋯ +α p +β 1 + ⋯β q  = 1 is called I-GARCH (or integrated GARCH). Many generalizations of GARCH processes (ARCH-M, E-GARCH, I-GARCH, T-GARCH, FI-GARCH, etc., as well as ARMA models driven by GARCH noise, and regression models with GARCH errors) can now be found in the econometrics literature see Andersen et al. (2009).

ITSM can be used to fit ARMA and regression models with GARCH noise by using the procedures described in Example 7.2.2 to fit a GARCH model to the residuals \(\{\tilde{Z}_{t}\}\) from the ARMA (or regression) fit.

Example 7.2.3

Fitting ARMA Models Driven by GARCH Noise

If we open the data file SUNSPOTS.TSM, subtract the mean and use the option Model>Estimation>Autofit with the default ranges for p and q, we obtain an ARMA(3,4) model for the mean-corrected data. Clicking on the second green button at the top of the ITSM window, we see that the sample ACF of the ARMA residuals is compatible with iid noise. However the sample autocorrelation functions of the absolute values and squares of the residuals (obtained by clicking on the third green button) indicate that they are not independent. To fit a Gaussian GARCH(1,1) model to the ARMA residuals click on the red GAR button, enter the value 1 for both p and q and click OK. Then click on the red MLE button, click OK in the dialog box, and the GARCH ML Estimates window will open, showing the estimated parameter values. Repeat the steps in the previous sentence two more times and the window will display the following ARMA(3,4) model for the mean-corrected sunspot data and the fitted GARCH model for the ARMA noise process {Z t },

$$\displaystyle\begin{array}{rcl} X_{t}& =& 2.463X_{t-1} - 2.248X_{t-2} + 0.757X_{t-3} + Z_{t} - 0.948Z_{t-1} \phantom{well} {}\\ & & \qquad - 0.296Z_{t-2} + 0.313Z_{t-3} + 0.136Z_{t-4}, {}\\ \end{array}$$

where

$$\displaystyle{Z_{t} = \sqrt{h_{t}}e_{t}}$$

and

$$\displaystyle{h_{t} = 31.152 + 0.223Z_{t-1}^{2} + 0.596h_{ t-1}.}$$

The AICC value for the GARCH fit (805.12) should be used for comparing alternative GARCH models for the ARMA residuals. The AICC value adjusted for the ARMA fit (821.70) should be used for comparison with alternative ARMA models (with or without GARCH noise). Standard errors of the estimated coefficients are also displayed.

Simulation using the fitted ARMA(3,4) model with GARCH (1,1) noise can be carried out by selecting Garch>Simulate Garch process. If you retain the settings in the ARMA Simulation dialog box and click OK you will see a simulated realization of the model for the original data in SUNSPOTS.TSM. □ 

Some useful references for extensions and further properties of GARCH models are Weiss (1986), Engle (1995), Shephard (1996), Gourieroux (1997), Lindner (2009) and Francq and Zakoian (2010).

7.3 Modified GARCH Processes

The following are so-called “stylized features” associated with observed time series of financial returns:

  1. (i)

    the marginal distributions have heavy tails,

  2. (ii)

    there is persistence of volatility,

  3. (iii)

    the returns exhibit aggregational Gaussianity,

  4. (iv)

    there is asymmetry with respect to negative and positive disturbances and

  5. (v)

    the volatility frequently exhibits long-range dependence.

The properties (i), (ii) and (iii) are well accounted for by the GARCH models of Section 7.2. Property (iii) means that the sum, S n  =  t = 1 n Z t , of the daily returns, Z t  = lnP t − lnP t−1, is approximately normally distributed if n is large. For the GARCH model with EZ t 2 = σ 2 <  it follows from the martingale central limit theorem (see e.g. Billingsley (1995)) that n −1∕2(lnP n − lnP 0) = n −1∕2 t = 1 n Z t is asymptotically N(0, σ 2), in accordance with (iii).

To account for properties (iv) and (v) the EGARCH and FIGARCH models were devised.

7.3.1 EGARCH Models

To allow negative and positive values of e t in the definition of the GARCH process to have different impacts on the subsequent volatilities, h s ,  (s > t), Nelson (1991) introduced EGARCH models, illustrated in the following simple example.

Example 7.3.1

EGARCH(1,1)

Consider the process {Z t } defined by the equations,

$$\displaystyle{ Z_{t} = \sqrt{h_{t}}e_{t},\ \ \{e_{t}\} \sim \mathrm{IID}(0,1), }$$
(7.3.1)

where { t : = lnh t } is the weakly and strictly stationary solution of

$$\displaystyle{ \ell_{t} = c +\alpha _{1}g(e_{t-1}) +\gamma _{1}\ell_{t-1}, }$$
(7.3.2)

c ∈ , α 1 ∈ , | γ 1 |  < 1,

$$\displaystyle{ g(e_{t}) = e_{t} +\lambda (\vert e_{t}\vert - E\vert e_{t}\vert ), }$$
(7.3.3)

and e t has a distribution symmetric about zero, i.e., \(e_{t} = ^{^{\mathrm{d}} } - e_{t}.\)

The process is defined in terms of t to ensure that \(h_{t}(= e^{\ell_{t}}) > 0\). Equation (7.3.3) can be rewritten as

$$\displaystyle{g(e_{t}) = \left \{\begin{array}{@{}l@{\quad }l@{}} (1+\lambda )e_{t} -\lambda E\vert e_{t}\vert \quad &\mathrm{if}\ e_{t} \geq 0, \\ (1-\lambda )e_{t} -\lambda E\vert e_{t}\vert \quad &\mathrm{if}\ e_{t} < 0. \end{array} \right.}$$

showing that the function g is piecewise linear with slope (1 +λ) on (0, ) and slope (1 −λ) on (−, 0). This asymmetry in g allows t , to respond differently to positive and negative shocks e t−1 of the same magnitude. If λ = 0 there is no asymmetry.

When fitting EGARCH models to stock prices it is usually found that the estimated value of λ is negative, corresponding to large negative shocks having greater impact on volatility than positive ones of the same magnitude.

Properties of {g(e t )}: (i) {g(e t )} is iid.

(ii) Eg(e t ) = 0.

(iii) Var(g(e t )) = 1 +λ 2Var( | e t  | ).

(The symmetry of e t implies that e t and | e t  | − E | e t  | are uncorrelated.)

 □ 

More generally, the EGARCH(p,q) process is obtained by replacing the equation (7.3.2) for l t : = lnh t by

$$\displaystyle{ \ell_{t} = c +\alpha (B)g(e_{t}) +\gamma (B)\ell_{t}, }$$
(7.3.4)

where

$$\displaystyle{\alpha (B) =\sum _{ i=1}^{p}\alpha _{ i}B^{i},\ \ \gamma (B) =\sum _{ i=1}^{q}\gamma _{ i}B^{i}.}$$

Clearly { t }, {h t } and {Z t } are all strictly stationary and causal if 1 −γ(z) is non-zero for all complex z such that | z | ≤ 1.

Nelson also proposed the use of the generalized error distribution (GED) for e t , with density

$$\displaystyle{f(x) ={ \nu \exp [(-1/2)\vert x/\xi \vert ^{\nu }] \over \xi \cdot 2^{1+1/\nu }\varGamma (1/\nu )},}$$

where

$$\displaystyle{\xi = \left \{{2^{(-2/\nu )}\varGamma (1/\nu ) \over \varGamma (3/\nu )} \right \}^{1/2}}$$

and ν > 0. The value of ξ ensures that Var(e t ) = 1 and the parameter ν determines the tail heaviness. For ν = 2, e t  ∼ N(0, 1). Tail heaviness increases as ν decreases.

Properties of the GED: (i) f is symmetric and \(\frac{1} {2}\vert e_{t}/\xi \vert ^{\nu }\) has the gamma distribution with parameters 1∕ν and 1 (see Appendix A.1, Example (d)).

(ii) The specified value of ξ ensures that Var(e t ) = 1.

(iii) \(E\vert e_{t}\vert ^{k} ={ \varGamma ((k+1)/\nu ) \over \varGamma (1/\nu )} \cdot \left [{\varGamma (1/\nu ) \over \varGamma (3/\nu )}\right ]^{k/2}.\)

Inference via Conditional Maximum Likelihood

As in Section 7.2 we initialize the recursions (7.3.1) and (7.3.4) by supposing that

  1. (i)

    \(h_{t} =\hat{\sigma } ^{2},\ \ t \leq 0.\)

  2. (ii)

    e t  = 0,   t ≤ 0. 

Then \(h_{1},\ e_{1}\ (= Z_{1}/\sqrt{h_{1}}),\ h_{2},\ e_{2},\ldots,\) can be computed recursively from the observations Z 1, Z 2, , and the recursions defining the process.

The conditional likelihood is then computed as

$$\displaystyle{L =\prod _{ t=1}^{n}{ 1 \over \sqrt{h_{t}}}f\left ({ Z_{t} \over \sqrt{h_{t}}}\right ).}$$

We therefore need to minimize

$$\displaystyle{-2\ln L =\sum _{ t=1}^{n}\ln h_{ t} +\sum _{ t=1}^{n}\left \vert { Z_{t} \over \xi \sqrt{h_{t}}}\right \vert ^{\nu } + 2n\ln \left ({2\xi \over \nu } \cdot 2^{1/\nu }\varGamma (1/\nu )\right )}$$

with respect to

$$\displaystyle{c,\lambda,\nu,\alpha _{1},\ldots,\alpha _{p},\gamma _{1},\ldots,\gamma _{q}.}$$

Since h t is automatically positive, the only constraints in this optimization are the conditions

$$\displaystyle{\nu > 0}$$

and

$$\displaystyle{1 -\gamma (z)\neq 0\ \ \mathrm{for\ all\ complex}\ z\ \mathrm{such\ that}\ \vert z\vert \leq 1.}$$

7.3.2 FIGARCH and IGARCH Models

To allow for the very slow decay of the sample ACF frequently observed in long daily squared return series, the FIGARCH (fractionally integrated GARCH) models were developed. Before introducing them we first give a very brief account of fractionally integrated ARMA processes. (For more details see Section 11.4 and Brockwell and Davis (1991), Section 13.2.)

Fractionally Integrated ARMA Processes and “Long Memory”

The autocorrelation function ρ(⋅ ) of an ARMA process at lag h converges rapidly to zero as h →  in the sense that there exists r > 1 such that

$$\displaystyle{r^{h}\rho (h) \rightarrow 0,\ \ \mathrm{as}\ h \rightarrow \infty.}$$

The fractionally integrated ARMA (or ARFIMA) process of order (p, d, q), where p and q are non-negative integers and 0 < d < 0. 5, is a stationary time series with an autocorrelation function which for large lags decays at a much slower rate. It is defined to be the zero-mean stationary solution {X t } of the difference equations

$$\displaystyle{ (1 - B)^{d}\phi (B)X_{ t} =\theta (B)Z_{t}, }$$
(7.3.5)

where ϕ(z) and θ(z) are polynomials of degrees p and q respectively, with no common zeroes, satisfying

$$\displaystyle{\phi (z)\neq 0\ \ \mathrm{and}\ \ \theta (z)\neq 0\ \ \ \ \mbox{ for all complex $z$ such that $\vert z\vert \leq 1$},}$$

{Z t } ∼ WN(0, σ 2), B is the backward shift operator, and (1 − B)r, is defined via the power series expansion,

$$\displaystyle{(1 - z)^{r}:= 1 +\sum _{ j=1}^{\infty }{r(r - 1)\ldots (r - j + 1) \over j!} (-z)^{j},\ \vert z\vert < 1,\ r \in \mathbb{R}.}$$

The zero-mean stationary process {X t } defined by (7.3.5) has the mean-square convergent MA() representation,

$$\displaystyle{X_{t} =\sum _{ j=0}^{\infty }\psi _{ j}Z_{t-j},}$$

where ψ j is the coefficient of z j in the power series expansion,

$$\displaystyle{\psi (z) = (1 - z)^{-d}\theta (z)/\phi (z),\ \vert z\vert < 1.}$$

The autocorrelations ρ(j) of {X t } at lag j and the coefficients ψ j both converge to zero at hyperbolic rates as j → ; specifically, there exist non-zero constants γ and δ such that

$$\displaystyle{j^{1-d}\psi _{ j} \rightarrow \gamma \ \mathrm{and}\ j^{1-2d}\rho (j) \rightarrow \delta.}$$

Thus ψ j and ρ(j) converge to zero as j →  at much slower rates than the corresponding coefficients and autocorrelations of an ARMA process. Consequently fractionally integrated ARMA processes are said to have “long memory”. The spectral density of {X t } is given by

$$\displaystyle{f(\lambda ) ={ \sigma ^{2} \over 2\pi }{ \vert \theta (e^{-i\lambda })\vert ^{2} \over \vert \phi (e^{-i\lambda })\vert ^{2}}\vert 1 - e^{-i\lambda }\vert ^{-2d}.}$$

The exact Gaussian likelihood L of observations x n  = (x 1, , x n )′ of a fractionally integrated ARMA process is given by

$$\displaystyle{-2\ln (L) = n\ln (2\pi ) +\ln \det \varGamma _{n} + \mathbf{x}_{n}'\varGamma _{n}^{-1}\mathbf{x}_{ n},}$$

where Γ n  = E(X n X n ′). Calculation and maximization with respect to the parameters d, ϕ 1, , ϕ p , θ 1, , θ q and σ 2 is difficult. It is much easier to maximize the Whittle approximation L W (see (11.4.10)), i.e. to minimize

$$\displaystyle{-2\ln (L_{W}) = n\ln (2\pi ) +\sum _{j}\ln (2\pi f(\omega _{j})) +\sum _{j}{I_{n}(\omega _{j}) \over 2\pi f(\omega _{j})},}$$

where I n is the periodogram, and j denotes the sum over all nonzero Fourier frequencies, ω j  = 2π jn ∈ (−π, π]. The program ITSM allows estimation of parameters for ARIMA(p, d, q) models either by minimizing − 2ln(L W ), or by the slower and more computationally intensive process of minimizing − 2ln(L).

Fractionally Integrated GARCH Processes

In order to incorporate long memory into the family of GARCH models, (Baillie et al. 1996) defined a fractionally integrated GARCH (FIGARCH) process as a causal strictly stationary solution of the difference equations (7.3.9) and (7.3.10) specified below.

To motivate the definition, we recall that the GARCH(p, q) process is the causal stationary solution of the equations,

$$\displaystyle{ Z_{t} = \sqrt{h_{t}}e_{t},\ \ h_{t} =\alpha _{0} +\sum _{ i=1}^{p}\alpha _{ i}Z_{t-i}^{2} +\sum _{ i=1}^{q}\beta _{ i}h_{t-i}, }$$
(7.3.6)

where α 0 > 0, α 1, , α p  ≥ 0 and β 1, , β q  ≥ 0. It follows (Problem 7.5) that

$$\displaystyle{ (1 -\alpha (B) -\beta (B))Z_{t}^{2} =\alpha _{ 0} + (1 -\beta (B))W_{t}, }$$
(7.3.7)

where {W t : = Z t 2h t } is white noise, α(B) =  i = 1 p α i B i and β(B) =  i = 1 q β i B i. There is a causal weakly stationary solution for {Z t } if and only if the zeroes of 1 −α(z) −β(z) have absolute value greater than 1 and there is then exactly one such solution (Bollerslev 1986).

In order to define the IGARCH(p,q) (integrated GARCH(p, q)) process, Engle and Bollerslev (1986) supposed that the polynomial (1 −α(z) −β(z)) has a simple zero at z = 1, and that the other zeroes all fall outside the closed unit disc as in (7.3.6). Under these assumptions we can write

$$\displaystyle{(1 -\beta (z) -\alpha (z)) = (1 - z)\phi (z),}$$

where ϕ(z) is a polynomial with all of its zeroes outside the unit circle. We then say [cf. (7.3.6)] that {Z t } is an IGARCH(p, q) process if it satisfies

$$\displaystyle{ \phi (B)(1 - B)Z_{t}^{2} =\alpha _{ 0} + (1 -\beta (B))W_{t}, }$$
(7.3.8)

with \(Z_{t} = \sqrt{h_{t}}e_{t}\), W t  = Z t 2h t and {e t } ∼ IID(0, 1). Bougerol and Picard (1992) showed that if the distribution of e t has unbounded support and no atom at zero then there is a unique strictly stationary causal solution of these equations for {Z t }. The solution has the property that EZ t 2 = . In practice, for GARCH models fitted to empirical data, it is often found that α(1) +β(1) ≈ 1, supporting the practical relevance of the IGARCH model even though EZ t 2 = .

Baillie et al. (1996) defined the FIGARCH(p,d,q) process {Z t } to be a causal strictly stationary solution of the equations,

$$\displaystyle{ Z_{t} = \sqrt{h_{t}}e_{t}, }$$
(7.3.9)

and [cf. (7.3.8)]

$$\displaystyle{ \phi (B)(1 - B)^{d}Z_{ t}^{2} =\alpha _{ 0} + (1 -\beta (B))W_{t},\ \ 0 < d < 1, }$$
(7.3.10)

where W t  = Z t 2h t , {e t } ∼ IID(0, 1) and the polynomials ϕ(z) and 1 −β(z) are non-zero for all complex z such that | z | ≤ 1. Substituting W t  = Z t 2h t in (7.3.10) we see that (7.3.10) is equivalent to the equation,

$$\displaystyle{ h_{t} ={ \alpha _{0} \over 1 -\beta (1)} + \left [1 - (1 -\beta (B))^{-1}\phi (B)(1 - B)^{d}\right ]Z_{ t}^{2}, }$$
(7.3.11)

which means that the FIGARCH(p, q) process can be regarded as a special case of the IARCH(∞) process defined by (7.3.9) and

$$\displaystyle{ h_{t} = a_{0} +\sum _{ j=1}^{\infty }a_{ j}Z_{t-j}^{2}, }$$
(7.3.12)

with a 0 > 0 and j = 1 a j  = 1. The questions of existence and uniqueness of causal strictly stationary solutions of the IARCH() (including FIGARCH) equations have not yet been fully resolved. Any strictly stationary solution must have infinite variance since if σ 2: = EZ t 2 = Eh t  <  then, since j = 1 a j  = 1, it follows from (7.3.12) that σ 2 = a 0 +σ 2, contradicting the finiteness of σ 2. Sufficient conditions for the existence of causal strictly stationary solution of the IARCH)(), and in particular of the FIGARCH equations, have been given by Douc et al. (2008).

Other models, based on changing volatility levels, have been proposed to explain the “long-memory” effect in stock and exchange rate returns. Fractionally integrated E-GARCH models have also been introduced (Bollerslev and Mikkelsen 1996) in order to account for both long memory and asymmetry of the effects of positive and negative shocks e t .

7.4 Stochastic Volatility Models

The general discrete-time stochastic volatility (SV) model for the log return sequence {Z t } defined in Section 7.1 is [cf. (7.2.1)]

$$\displaystyle{ Z_{t} = \sqrt{h_{t}}e_{t},\ t \in \mathbb{Z}, }$$
(7.4.1)

where {e t } ∼ IID(0, 1), {h t } is a strictly stationary sequence of non-negative random variables, independent of {e t }, and h t is known, like the corresponding quantity in the GARCH models, as the volatility at time t. Note however that in the GARCH models, the sequences {h t } and {e t } are not independent since h t depends on e s ,  s < t through the defining equation (7.2.6).

The independence of {h t } and {e t } in the SV model (7.4.1) allows us to model the volatility process with any non-negative strictly stationary sequence we may wish to choose. This contrasts with the GARCH models in which the processes {Z t } and {h t } are inextricably linked. Inference for the GARCH models, based on observations of Z 1, , Z n , can be carried out using the conditional likelihood, which is easily written down, as in (7.2.14), in terms of the marginal probability density of the sequence {e t }. Inference for an SV model based on observations of {Z t } however is considerably more difficult since the process is driven by two independent random sequences rather than one and only {Z t } is observed. The unobserved sequence {h t } is said to be latent.

A general account of the probabilistic properties of SV models can be found in Davis and Mikosch (2009) and an extensive history and overview of both discrete-time and continuous-time SV models in Shephard and Andersen (2009). In this section we shall focus attention on an early, but still widely used, special case of the SV model due to Taylor (19821986) known as the lognormal SV model.

The lognormal SV process {Z t } is defined as,

$$\displaystyle{ Z_{t} = \sqrt{h_{t}}e_{t},\ \{e_{t}\} \sim \mathrm{IID\ N(0,1)}, }$$
(7.4.2)

where \(h_{t} = e^{\ell_{t}}\), { t } is a (strictly and weakly) stationary solution of the equations

$$\displaystyle{ \ell_{t} =\gamma _{0} +\gamma _{1}\ell_{t-1} +\eta _{t},\ \{\eta _{t}\} \sim \mathrm{IID\ N}(0,\sigma ^{2}), }$$
(7.4.3)

 | γ 1 |  < 1 and the sequences {e t } and {η t } are independent. The sequence { t } is clearly a Gaussian AR(1) process with mean

$$\displaystyle{ \mu _{\ell}:= E\ell_{t} ={ \gamma _{0} \over 1 -\gamma _{1}} }$$
(7.4.4)

and variance

$$\displaystyle{ v_{\ell}:= \mathrm{Var}(\ell_{t}) ={ \sigma ^{2} \over 1 -\gamma _{1}^{2}}. }$$
(7.4.5)

Properties of {Z t }.

  1. (i)

    {Z t } is strictly stationary.

  2. (ii)

    Moments:

    $$\displaystyle{EZ_{t}^{r} = E(e_{ t}^{r})E\exp (r\ell_{ t}/2)}$$
    $$\displaystyle{\qquad \! = \left \{\begin{array}{@{}l@{\quad }l@{}} 0, \quad &\mathrm{if}\ r\ \mathrm{is\ odd}, \\{} [\prod _{i=1}^{m}(2i - 1)]\exp \left ({ m\gamma _{0} \over 1-\gamma _{1}} +{ m^{2}\sigma ^{2} \over 2(1-\gamma _{1}^{2})}\right ),\quad &\mathrm{if}\ r = 2m.\\ \quad \end{array} \right.}$$
  3. (iii)

    Kurtosis:

    $$\displaystyle{{ EZ_{t}^{4} \over (EZ_{t}^{2})^{2}} = 3\exp \left ({ \sigma ^{2} \over 1 -\gamma _{1}^{2}}\right ) \geq 3.}$$

    Kurtosis (defined by the ratio on the left) is a standard measure of tail heaviness. For a normally distributed random variable it has the value 3, so, as measured by kurtosis, the tails of the marginal distribution of the lognormal SV process are heavier than those of a normally distributed random variable.

  4. (iv)

    The autocovariance function of {Z t 2}:

    We first observe that if t > s,

    $$\displaystyle{E(Z_{t}^{2}Z_{ s}^{2}\vert e_{ u},\eta _{u},u < t) = h_{s}h_{t}e_{s}^{2}E(e_{ t}^{2}\vert e_{ u},\eta _{u},u < t) = h_{s}h_{t}e_{s}^{2},}$$

    since h s , h t and e s 2 are each functions of {e u , η u , u < t} and e t 2 is independent of {e u , η u , u < t}. Taking expectations on both sides of the last equation and using the independence of {h t } and {e t } and the relation h t  = exp(l t ) gives

    $$\displaystyle{E(Z_{t}^{2}Z_{ s}^{2}) = E\exp (\ell_{ t} +\ell _{s}).}$$

    Hence, for h > 0,

    $$\displaystyle{\mathrm{Cov}(Z_{t+h}^{2},Z_{ t}^{2}) = E\exp (\ell_{ t+h} +\ell _{t}) - E\exp (\ell_{t+h})E\exp (\ell_{t})}$$
    $$\displaystyle{=\exp [2\mu _{\ell} + v_{\ell}(1 +\gamma _{ 1}^{2})] -\exp [2\mu _{\ell} + v_{\ell}].}$$

    Here we have used the facts that t+h is normally distributed with mean and variance which are easily computed from (7.2.17) and that for a normally distributed random variable X with mean μ and variance v, Eexp(X) = exp(μ + v∕2). From (ii) we also have

    $$\displaystyle{\mathrm{Var}(Z_{t}^{2}) = EZ_{ t}^{4} - (EZ_{ t}^{2})^{2} = 3\exp (2\mu _{\ell} + 2v_{ l}) -\exp (2\mu _{\ell} + v_{l}).}$$

    Hence, for h > 0,

    $$\displaystyle{\rho _{Z_{t}^{2}}(h) ={ \mathrm{Cov}(Z_{t+h}^{2},Z_{t}^{2}) \over \mathrm{Var}(Z_{t}^{2})} ={ \exp (v_{\ell}\gamma _{1}^{h}) - 1 \over 3\exp (v_{\ell}) - 1} \sim { v_{\ell} \over 3\exp (v_{\ell}) - 1}\gamma _{1}^{h},\ \mathrm{as}\ \gamma _{ 1} \rightarrow 0,}$$

    suggesting the approximation of the autocorrelation function of {Z t 2} by that of an ARMA(1,1) process. (Recall from Example 3.2.1 that the autocorrelation function of an ARMA(1,1) process has the form ρ(h) = c ϕ h, h ≥ 1, with ρ(0) = 1.) There is a similarity here to the autocovariance function of the squared GARCH(1,1) process which (see Problem 7.3) has the autocovariance function of an ARMA(1,1) process.

  5. (v)

    The process {lnZ t 2}:

    $$\displaystyle{ \ln Z_{t}^{2} =\ell _{ t} +\ln e_{t}^{2}. }$$
    (7.4.6)

    If e t  ∼ N(0, 1) then Elne t 2 = −1. 27 and Var(lne t 2) = 4. 93. From (7.4.6) we find at once that Var(lnZ t 2) = v l + 4. 93 and Cov(Z t+h 2, Z t ) = v l γ 1  | h |  for h ≠ 0. Hence the process {lnZ t 2} has the autocovariance function of an ARMA(1,1) process with autocorrelation function

    $$\displaystyle{\rho _{\ln Z_{t}^{2}}(h) ={ v_{l}\gamma _{1}^{\vert h\vert } \over v_{l} + 4.93},\ h\neq 0.}$$

Estimation for the lognormal SV model

The parameters to be estimated in the defining equations (7.4.2) and (7.4.3) are σ 2, γ 0 and γ 1. They can be estimated by maximization of the Gaussian likelihood which can be calculated, for any specified values of the parameters, as follows.

By property (v) above, the process {Y t : = lnZ t 2ElnZ t 2} satisfies the ARMA(1,1) equations,

$$\displaystyle{ Y _{t} -\phi Y _{t-1} = Z_{t} +\theta Z_{t-1},\ \{Z_{t}\} \sim \mathrm{WN}(0,\sigma _{Z}^{2}), }$$
(7.4.7)

for some coefficients ϕ and θ in the interval (−1, 1) and white-noise variance σ Z 2. Comparing the autocorrelation function of (7.4.7) with the autocorrelation function of {lnZ t 2} given above in Property (v), we find that

$$\displaystyle{ \gamma _{1} =\phi }$$
(7.4.8)

and

$$\displaystyle{{ v_{\ell} \over v_{\ell} + 4.93} ={ (\theta +\phi )(1+\theta \phi ) \over 1 + 2\theta \phi +\theta ^{2}}. }$$
(7.4.9)

To ensure that the right-hand side falls in the interval (0, 1) it is necessary and sufficient (assuming that ϕ ∈ (−1. 1) and θ ∈ (−1, 1)) that ϕ +θ > 0. The maximum Gaussian likelihood estimators \(\hat{\phi }\) and \(\hat{\theta }\) can be found using the program ITSM and the corresponding estimators \(\hat{\gamma }_{1}\) and \(\hat{v}_{\ell}\) on replacing ϕ and θ by their estimators in (7.4.8) and (7.4.9) respectively. From (7.4.5) the corresponding estimator of σ 2 is

$$\displaystyle{\hat{\sigma ^{2}} = (1 -\hat{\gamma _{ 1}}^{2})\hat{v}_{\ell},}$$

where \(\hat{\gamma }_{1} = \hat{\phi }\) and, from (7.4.4) and (7.4.6), the corresponding estimator of γ 0 is

$$\displaystyle{\hat{\gamma _{0}} = (1 -\hat{\gamma _{1}})(\overline{\ln \,Z_{t}^{2}} + 1.27),}$$

where \(\overline{\ln \,Z_{t}^{2}}\) denotes the sample mean of the observations of ln Z t 2. If it turns out that the estimators \(\hat{\phi }\) and \(\hat{\theta }\) satisfy \(\hat{\phi } + \hat{\theta } \leq 0\) then, from (7.4.9), \(\hat{v}_{\ell} \leq 0\), suggesting that the lognormal SV model is not appropriate in this case.

Forecasting the log volatility

The minimum mean-squared error predictor of t+h conditional on { s , s ≤ t} is easily found from (7.4.3) to be

$$\displaystyle{ P_{t}\ell_{t+h} =\gamma _{ 1}^{h}\ell_{ t} +\gamma _{0}{1 -\gamma _{1}^{h} \over 1 -\gamma _{1}}, }$$
(7.4.10)

with mean-squared error,

$$\displaystyle{ E(\ell_{t+h} - P_{t}\ell_{t+h})^{2} =\sigma ^{2}{1 -\gamma _{1}^{2h} \over 1 -\gamma _{1}^{2}}. }$$
(7.4.11)

We have seen how to estimate γ 0, γ 1 and σ 2, but unfortunately t is not observed. In order to forecast t+h using the observations {Z s , s ≤ t}, we can however use the Kalman recursions as described in Section 9.4, Example 9.4.2

7.5 Continuous-Time Models

7.5.1 Lévy Processes

Continuous-time models for asset prices have a long history, going back to Bachelier (1900) who used Brownian motion to represent the movement of asset prices in the Paris stock exchange. Continuous-time models have since moved to a central place in mathematical finance, largely because of their use in the field of option-pricing, initiated by the Nobel-Prize-winning work of Black, Scholes and Merton, and partly also because of the current availability of high-frequency and irregularly-spaced transaction data which are represented most naturally by continuous-time models.

We earlier defined the daily return on day t of a stock whose closing price is P t  as

$$\displaystyle{ Z_{t} = X_{t} - X_{t-1}, }$$
(7.5.1)

where

$$\displaystyle{ X_{t} =\log P_{t} }$$
(7.5.2)

is the log asset price at the close of day t. If the daily returns were iid this would mean that the process {X t } is a random walk (Example 1.4.3). This is an over-simplified model for daily asset prices as there is very strong evidence suggesting that the daily returns, although exhibiting little or no autocorrelation, are not independent.

Nevertheless it will be a useful starting point, in the construction of continuous-time models to introduce the continuous-time analogue of a random walk, known as a Lévy process. Like iid noise in discrete time, it is the building block for the construction of a large family of more complex models for financial data.

Definition 7.5.1

A Lévy process, {L(t), t ∈ } is a process with the following properties:

  1. (i)

    L(0) = 0. 

  2. (ii)

    L(t) − L(s) has the same distribution as L(ts) for all s and t such that s ≤ t.

  3. (iii)

    If (s, t) and (u, v) are disjoint intervals then L(t) − L(s) and L(v) − L(u) are independent.

  4. (v)

    {L(t)} is continuous in probability, i.e. for all ε > 0 and for all t ∈ ,

    $$\displaystyle{\lim _{s\rightarrow t}P(\vert L(t) - L(s)\vert >\epsilon ) = 0.}$$

The essential properties of Lévy processes are discussed in Appendix D. For thorough accounts of Lévy processes and their properties see the books of Applebaum (2004), Protter (2010) and Sato (1999) and for an extensive account of their applications to finance see Schoutens (2003) and Andersen et al. (2009). For now we restrict attention to two of the most familiar examples of Lévy processes, Brownian motion, whose sample-paths are continuous, and the compound Poisson process, whose sample-paths are constant except for jumps.

Example 7.5.1

Brownian Motion

This is a Lévy process for which L(t) ∼ N(μ t, σ 2 t),  t ≥ 0, with parameters μ ∈  and σ > 0. The sample-paths are continuous and the characteristic function of L(t) for t > 0 is

$$\displaystyle{ Ee^{i\theta L(t)} = e^{t\xi (\theta )},\ \ \theta \in \mathbb{R}, }$$
(7.5.3)

where

$$\displaystyle{\xi (\theta ) = i\theta \mu -\theta ^{2}\sigma ^{2}/2.}$$

The defining properties (ii) and (iii) imply that for any finite collection of times t 1 < t 2 < ⋯ < t n , the increments Δ i : = L(t i+1) − L(t i ),  i = 1, , n, are independent random variables satisfying Δ i  ∼ N(μ(t i+1t i ), σ 2(t i+1t i )). Brownian motion with μ = 0 and σ = 1 is known as standard Brownian motion. We shall denote it henceforth as {B(t),  t ∈ }. A realization of B(t), 0 ≤ t ≤ 10, is shown in Figure 7.5.

Fig. 7.5
figure 5

A realization of standard Brownian motion B(t), 0 ≤ t ≤ 10

 □ 

Example 7.5.2

The Poisson Process

The Poisson process {N(t), t ∈ } with intensity or jump-rate λ is a Lévy process such that N(t), for t ≥ 0, has the Poisson distribution with mean λ t. Its sample paths are right-continuous functions which are constant except for jumps of size 1, the number of jumps occurring in any time interval of length having the Poisson distribution with mean λ ℓ. The characteristic function of N(t) for t > 0 is given by (7.5.3) with

$$\displaystyle{\xi (\theta ) = e^{i\theta } - 1.}$$

A sample-path of a Poisson process with λ = 5 on the time-interval [0, 10] is shown in Figure 7.6. □ 

Fig. 7.6
figure 6

A realization of a Poisson process N(t), 0 ≤ t ≤ 10, with jump-rate 5 per unit time

Example 7.5.3

The Compound Poisson Process

The compound Poisson process {X(t), t ∈ } with jump-rate λ and jump-size distribution function F is a Lévy process with sample-paths which are constant except for jumps. The jump-times are those of a Poisson process {N(t)} with jump-rate λ and the sizes of the jumps are independent random variables, independent of the process {N(t)}, with a distribution function F assigning probability zero to the value zero. The characteristic function of L(t) for t > 0 is again given by (7.5.3) but now with

$$\displaystyle{ \xi (\theta ) = i\theta c +\int _{\mathbb{R}}(e^{i\theta x} - 1 - i\theta xI_{ (-1,1)}(x))\lambda dF(x), }$$
(7.5.4)

where c = λ ∫  | x | < 1 xdF(x) and I (−1, 1)(x) = 1 if | x |  < 1 and zero otherwise. A realization of a compound Poisson process on the interval [0,10] is shown in Figure 7.7

Fig. 7.7
figure 7

A realization of a compound Poisson process X(t), 0 ≤ t ≤ 10, with jump-rate 5 per unit time and jump-size distribution normal with mean 0 and variance 1

 □ 

The above examples give some idea of the immense variety in the class of Lévy processes. The Lévy-Itô decomposition implies that every Lévy process L can be expressed as the sum of a Brownian motion and an independent pure-jump process. The marginal distribution of L(t) can be any distribution from the class of infinitely divisible distributions (which includes the gamma, Gaussian, Student’s t, stable, compound Poisson and many additional well-known distributions). See Appendix D and the references given there for more details.

7.5.2 The Geometric Brownian Motion (GBM) Model for Asset Prices

In his pioneering mathematical analysis of stock prices, contained in his doctoral thesis, Théorie de la speculation, Bachelier (1900) introduced a model in which the price of an asset {P(t)} is Brownian motion with parameters μ and σ (see Example 7.5.1). Measuring time in units of 1 day, this implies in particular that the daily closing prices, P(t), t = 0, 1, 2, , constitute a random walk with increments P(t) − P(t − 1) which are independent and normally distributed with mean μ and variance σ 2. The normality of these increments and the fact that P(t) takes negative values with positive probability clearly limit the value of this model as a realistic approximation to observed daily prices. However, interest in the work of Bachelier and his use of the Brownian motion model to solve problems in mathematical finance led (Samuelson 1965) to develop and apply the more realistic geometric Brownian motion model for asset prices. A fascinating account of Bachelier’s work, including an English translation of his thesis and comments on its place in the history of both probability theory and mathematical finance is contained in the book of Davis and Etheridge (2006). The geometric Brownian motion model is the one for which the celebrated option-pricing formulae of Black, Scholes and Merton were first derived.

In the Brownian motion model the asset price {P(t), t ≥ 0} satisfies the stochastic differential equation,

$$\displaystyle{ dP(t) =\mu dt +\sigma dB(t), }$$
(7.5.5)

where {B(t)} is standard Brownian motion, i.e., Brownian motion with EB(t) = 0 and VarB(t) = t, t ≥ 0. Equation (7.5.5) is shorthand for the integrated form,

$$\displaystyle{P(t) - P(0) =\mu t +\sigma B(t).}$$

In addition to the obvious flaw that P(t) will take negative values for some values of t, the increments P(t) − P(t − 1) are normally distributed, while in practice it is observed that these increments have marginal distributions with heavier tails than the normal distribution. The geometric Brownian motion model addresses both of these shortcomings.

The geometric Brownian motion model for {P(t),  t ≥ 0} is defined by the Itô stochastic differential equation,

$$\displaystyle{ dP(t) = P(t)[\mu dt +\sigma dB(t)],\ \mathrm{with}\ P(0) > 0. }$$
(7.5.6)

Solution of this equation requires knowledge of Itô calculus, a brief introduction to which is given in Appendix D. A more extensive and very readable account with financial applications can be found in the book of Mikosch (1998). The solution of (7.5.6) satisfies (see Appendix D)

$$\displaystyle{ P(t) = P(0)\exp \left [(\mu -{\sigma ^{2} \over 2})t +\sigma B(t)\right ], }$$
(7.5.7)

from which it follows at once that the log asset price X(t) = logP(t) satisfies

$$\displaystyle{ X(t) = X(0) + (\mu -{\sigma ^{2} \over 2})t +\sigma B(t), }$$
(7.5.8)

or equivalently

$$\displaystyle{ dX(t) = \left (\mu -{\sigma ^{2} \over 2}\right )dt +\sigma dB(t). }$$
(7.5.9)

A realization of the process P(t), 0 ≤ t ≤ 10, with P(0) = 1, μ = 0 and σ = 0. 01 is shown in Figure 7.8.

Fig. 7.8
figure 8

A realization of GBM, P(t), 0 ≤ t ≤ 10, with P(0) = 1. μ = 0 and σ = 0. 01

The return for the time interval (tΔ, t) is

$$\displaystyle{ Z_{\varDelta }(t) = X(t) - X(t-\varDelta ) = (\mu -{\sigma ^{2} \over 2})\varDelta +\sigma [B(t) - B(t-\varDelta )]. }$$
(7.5.10)

For disjoint intervals of length Δ the returns are therefore independent normally distributed random variables with mean (μσ 2∕2)Δ and variance σ 2 Δ. The normality of the returns implied by this model is a property which can easily be checked against observed returns. It is found from empirically observed returns that the deviations from normality are substantial for time intervals of the order of a day or less, becoming less apparent as Δ increases. This is one of the reasons for developing the more complex models described in later sections.

Remark 1.

An asset-price model which overcomes the normality constraint is the so-called Lévy market model (LMM), in which the log asset price X is assumed to be a Lévy process, not necessarily Brownian motion as in the GBM model. For a discussion of such models see Eberlein (2009).

The parameter σ 2 in the GBM model is called the volatility parameter. It plays a key role in the option pricing analysis of Black and Scholes (1973) and Merton (1973) to be discussed in Section 7.6. Although σ 2 cannot be determined from discrete observations of a GBM process it can be estimated from closely-spaced discrete observations X(iN), i = 1, , N, with large N, as described in the following paragraph.

From (7.5.8) we can write

$$\displaystyle{ (\varDelta _{i}X)^{2}:= [X(i/N) - X((i - 1)/N)]^{2} = (c/N +\sigma \varDelta _{ i}B)^{2}, }$$
(7.5.11)

where Δ i B = B(iN) − B((i − 1)∕N) and c = μσ 2∕2. A simple calculation then gives

$$\displaystyle{\mathbf{E}[(\varDelta _{i}X)^{2}] = \frac{\sigma ^{2}} {N} + \frac{c^{2}} {N^{2}},}$$

and

$$\displaystyle{\mathrm{Var}[(\varDelta _{i}X)^{2}] = \frac{4\sigma ^{2}c^{2}} {N^{3}} + \frac{2\sigma ^{4}} {N^{2}}.}$$

By the independence of the summands, i = 1 N(Δ i X)2 has mean σ 2 + c 2N and variance 2σ 4N + 4σ 2 c 2N 2, showing that, as N → ,

$$\displaystyle{ \sum _{i=1}^{N}(\varDelta _{ i}X)^{2}\longrightarrow ^{\mathrm{m.s. } } \sigma ^{2} =\int _{ 0}^{1}\sigma ^{2}\mathrm{d}t. }$$
(7.5.12)

This calculation shows that, for the GBM process, the sum on the left is a consistent estimator of σ 2 as N → . The sum (for suitably large N) is known as the realized volatility for the time interval [0, 1] and the integral on the right is known as the integrated volatility for the same interval. σ 2 itself is known as the spot volatility. The realized volatility is widely used as an estimator of the integrated volatility and is consistent for a wide class of models in which the spot volatility is not necessarily constant as it is in the GBM model. For a discussion of realized volatility in a more general context see the article of Andersen and Benzoni (2009).

We shall denote the realized volatility, computed for day n, n = 1, 2, 3, , by \(\hat{\sigma }_{n}^{2}\). It is found in practice to vary significantly from 1 day to the next. The sequence \(\{\hat{\sigma }_{n}^{2}\}\) of realized volatilities exhibits clustering, i.e., periods of low values interrupted by bursts of large values, and has the appearance of a positively correlated stationary sequence, reinforcing the view that volatility is not constant as in the GBM model and suggesting the need for a model in which volatility is stochastic. Such observations are precisely those which led to the development in discrete time of stochastic volatility, ARCH, and GARCH models, and suggest the need for analogous models with continuous time parameter.

7.5.3 A Continuous-Time SV Model

In the discrete-time modeling of asset prices we have seen how both the GARCH and SV models allow for the variation of the volatility with time by modeling {h t } as a random process. A continuous-time analogue of this idea was introduced by Barndorff-Niesen and Shephard (2001) in their celebrated continuous-time SV model for the log asset price X(t) [cf. (7.5.9)],

$$\displaystyle{ dX(t) = [m + bh(t)]dt + \sqrt{h(t)}dB(t),\ t \geq 0,\ \mathrm{with}\ X(0) = 0, }$$
(7.5.13)

where m ∈ , b ∈ , {B(t)} is standard Brownian motion and {h(t)} is a stationary subordinator-driven Ornstein-Uhlenbeck process independent of {B(t)}. The connection with discrete-time SV models is clear if we set m = b = 0 in (7.5.13) and compare with (7.4.1). Notice also that (7.5.13) has the same form as the GBM equation (7.5.9) except that the constant volatility parameter σ 2 has been replaced by the random volatility h(t).

A subordinator is a Lévy process with non-decreasing sample paths. The simplest example of a subordinator is the Poisson process of Example 7.5.2. If the compound Poisson process in Example 7.5.3 has non-negative jumps, i.e., if the jump-size distribution function F satisfies F(0) = 0, then it too is a subordinator. Other examples of subordinators are the gamma process (see Appendix D), whose increments on disjoint intervals have a gamma distribution, and the stable subordinators, whose increments on disjoint intervals are independent non-negative stable random variables.

An Ornstein-Uhlenbeck process driven by the subordinator L satisfies the stochastic differential equation,

$$\displaystyle{ dh(t) =\lambda h(t)dt + dL(t),\ t \in \mathbb{R}, }$$
(7.5.14)

where λ < 0. If EL(1)r <  for some r > 0 this equation has a unique strictly stationary causal solution

$$\displaystyle{ h(t) =\int _{ -\infty }^{t}e^{\lambda (t-u)}dL(u). }$$
(7.5.15)

(Causal here means that h(t) is independent of the increments {L(u) − L(t): u > t} for every t.) A crucial feature of (7.5.15) is the non-negativity of h(t) which follows from the non-decreasing sample-paths of the subordinator {L(t)} and the non-negativity of the integrand. Non-negativity is clearly a necessary property if h(t) is to represent volatility. For a detailed account of Lévy-driven stochastic differential equations and integrals with respect to Lévy processes, see Protter (2010). In the case when L is a subordinator, (7.5.15) has the very simple interpretation as a pathwise integral with respect to the non-decreasing sample-path of L.

Quantities associated with the model (7.5.13) which are of particular interest are the returns over time intervals of length Δ > 0, i.e.

$$\displaystyle{Y _{n}:= X(n\varDelta ) - X((n - 1)\varDelta ),\ n \in \mathbb{N},}$$

and the integrated volatilities,

$$\displaystyle{I_{n} =\int _{ (n-1)\varDelta }^{n\varDelta }h(t)dt,\ n \in \mathbb{N}.}$$

The interval Δ is frequently one trading day. The return for the day is an observable quantity and the integrated volatility, although not directly observable, can be estimated from high-frequency within-day observations of X(t), as discussed in Section 7.5.2 for the GBM model.

For the model (7.5.13) with any second-order stationary non-negative volatility process h which is independent of B and has the properties,

$$\displaystyle{Eh(t) =\xi,\ \mathrm{Var}(h(t)) =\omega ^{2}}$$

and

$$\displaystyle{\mathrm{Cov}(h(t),h(t + s)) =\omega ^{2}\rho (s),\ s \in \mathbb{R},}$$

it can be shown (Problem 7.8) that the stationary sequence {I n } has mean,

$$\displaystyle{ EI_{n} =\xi \varDelta. }$$
(7.5.16)

and autocovariance function,

$$\displaystyle{ \gamma _{I}(k) = \left \{\begin{array}{@{}l@{\quad }l@{}} 2\omega ^{2}r(\varDelta ), \quad &\mathrm{if}\ k = 0,\\ \quad \\ \quad \\ \quad \\ \quad \\ \omega ^{2}\left [r((k + 1)\varDelta ) - 2r(k\varDelta ) + r((k - 1)\varDelta )\right ],\quad &\mathrm{if}\ k \geq 1. \end{array} \right. }$$
(7.5.17)

where

$$\displaystyle{ r(t):=\int _{ 0}^{t}\int _{ 0}^{y}\rho (u)du\ dy. }$$
(7.5.18)

The stationary sequence of log returns {Y n } has mean m + b ξ δ and autocovariance function,

$$\displaystyle{ \gamma _{Y }(k) = \left \{\begin{array}{@{}l@{\quad }l@{}} b^{2}\gamma _{ I}(0)+\xi \varDelta,\quad &\mathrm{if}\ k = 0,\\ \quad \\ \quad \\ \quad \\ \quad \\ b^{2}\gamma _{I}(k), \quad &\mathrm{if}\ k \geq 1. \end{array} \right. }$$
(7.5.19)

If in addition m = b = 0 then the log returns {Y n } are uncorrelated while the squared sequence {Y n } (see Problem 7.11) has mean,

$$\displaystyle{ EY _{n}^{2} =\xi \varDelta }$$
(7.5.20)

and autocovariance function,

$$\displaystyle{ \gamma _{Y ^{2}}(k) = \left \{\begin{array}{@{}l@{\quad }l@{}} \omega ^{2}\left [6r(\varDelta ) + 2\varDelta ^{2}\xi ^{2}/\omega ^{2}\right ], \quad &\mathrm{if}\ k = 0,\\ \quad \\ \quad \\ \quad \\ \quad \\ \omega ^{2}\left [r((k + 1)\varDelta ) - 2r(k\varDelta ) + r((k - 1)\varDelta )\right ],\quad &\mathrm{if}\ k \geq 1. \end{array} \right. }$$
(7.5.21)

Thus, under these assumptions, the log returns, Y n , calculated from the model are uncorrelated while the squares, Y n 2, are correlated, showing that the log returns are uncorrelated but not independent, in keeping with the “stylized facts” associated with empirically observed log returns.

Example 7.5.4.

The Ornstein-Uhlenbeck SV Model with m = b = 0

We can use the results (7.5.16)–(7.5.21) to determine properties of the sequences {Y n }, {Y n 2} and {I n } associated with the Ornstein-Uhlenbeck SV model,

$$\displaystyle{ dX(t) = \sqrt{h(t)}dB(t),\ t \geq 0,\ \mathrm{with}\ X(0) = 0, }$$
(7.5.22)

where

$$\displaystyle{ h(t) =\int _{ -\infty }^{t}e^{\lambda (t-u)}dL(u), }$$
(7.5.23)

λ < 0 and EL(1)2 < .

In order to apply (7.5.16)–(7.5.21) we need to determine ξ = Eh(t), ω 2 = Var(h(t)) and the autocorrelation function ρ of h. To this end we rewrite (7.5.23) as

$$\displaystyle{ h(t) =\int _{ -\infty }^{\infty }g(t - u)dL(u), }$$
(7.5.24)

where

$$\displaystyle{ g(x):= \left \{\begin{array}{@{}l@{\quad }l@{}} e^{\lambda x},\quad &\mathrm{if}\ x \geq 0,\\ \quad \\ \quad \\ \quad \\ \quad \\ 0, \quad &\mathrm{otherwise}\\ \quad \end{array} \right. }$$
(7.5.25)

The function g in the representation (7.5.24) is called a kernel function. If EL(1)2 < , as we shall assume from now on, and if f and g are integrable and square-integrable functions on , we have (see Appendix D),

$$\displaystyle{ E\int _{-\infty }^{\infty }f(t - u)dL(u) =\mu \int _{ -\infty }^{\infty }f(u)du }$$
(7.5.26)

and

$$\displaystyle{ \mathrm{Cov}\left (\int _{-\infty }^{\infty }f(t - u)dL(u),\int _{ -\infty }^{\infty }g(t - u)dL(u)\right ) =\sigma ^{2}\int _{ -\infty }^{\infty }f(u)g(u)du, }$$
(7.5.27)

where μ = EL(1) and σ 2 = Var(L(1)). Taking g as in (7.5.25) and f(x) = g(s + x),  x ∈ , we find from these equations that the mean and autocovariance function of the volatility process {h(t)} defined by (7.5.23) are given by

$$\displaystyle{\xi = Eh(t) ={ \mu \over \vert \lambda \vert }}$$

and

$$\displaystyle{\mbox{ Cov}(h(t + s),h(t)) ={ \sigma ^{2} \over 2\vert \lambda \vert }e^{\lambda s} =\omega ^{2}\rho (s),\quad s \geq 0,}$$

where ω 2 = Var(h(t)) = σ 2∕(2 | λ | ) and ρ(s) = e λ s,  s ≥ 0. Substituting for ρ into (7.5.17) gives

$$\displaystyle{r(t) ={ 1 \over \lambda ^{2}} \left (e^{\lambda t} - 1 -\lambda t\right ).}$$

We can now substitute for ξ, ω 2, ρ and r in equations (7.5.16)–(7.5.21) to get the second-order properties of the sequences {Y n }, {Y n 2} and {I n }. In particular we find that

$$\displaystyle{\{Y _{n}\} \sim \mathrm{WN}(0,\vert \lambda \vert ^{-1}\mu \varDelta ),}$$
$$\displaystyle{EY _{n}^{2} = EI_{ n} = \vert \lambda \vert ^{-1}\mu \varDelta }$$

and

$$\displaystyle{\gamma _{Y ^{2}}(k) =\gamma _{I}(k) ={ 1 \over 2}\vert \lambda \vert ^{-3}\sigma ^{2}e^{(k-1)\lambda \varDelta }(1 - e^{\lambda \varDelta })^{2},\ k \geq 1.}$$

The validity of the latter expressions for k ≥ 1 and not for k = 0 indicates that both the squared return sequence {Y n 2} and the integrated volatility sequence {I n } have the autocovariances of ARMA(1, 1) processes. This demonstrates, for this particular model, the covariance structure of the sequence {Y n 2} and the consequent dependence of the white-noise returns sequence {Y n }. □ 

Remark 2.

Since equations (7.5.16)–(7.5.19) (derived by Barndorff-Niesen and Shephard 2001) apply to any second-order stationary non-negative stochastic volatility process, h, independent of B in (7.5.13), they can be used to calculate the second order properties of {Y n } and {I n } for more general models than the Ornstein-Uhlenbeck model defined by (7.5.13) and (7.5.15). If m = b = 0 the second-order properties of {Y n 2} can also be calculated using equations (7.5.20) and (7.5.21). In particular we can replace the Ornstein-Uhlenbeck process, h, in Example 7.5.4 by a non-negative CARMA process (see Section 11.5) to allow a more general class of autocovariance functions for the sequences {I n } and {Y n 2} in order to better represent empirically observed financial data.

Remark 3.

Continuous-time generalizations of the GARCH process have also been developed (see Klüppelberg et al. (2004) and Brockwell et al. 2006). Details however are beyond the scope of this book.

7.6 An Introduction to Option Pricing

We saw in Section 7.5.2 that, under the geometric Brownian motion model, the asset price P(t) satisfies the Itô equation,

$$\displaystyle{ \mathrm{d}P(t) = P(t)[\mu \mathrm{d}t +\sigma \mathrm{d}B(t)]\ \mathrm{with}\ P(0) > 0, }$$
(7.6.1)

which leads to the relation,

$$\displaystyle{ P(t) = P(0)\exp \left [(\mu -\sigma ^{2}/2)t +\sigma B(t)\right ]. }$$
(7.6.2)

In this section we shall determine the value of a European call option on an asset whose price satisfies (7.6.2). The result, derived by Black and Scholes (1973) and Merton (1973), clearly demonstrates the key role played by the volatility parameter σ 2.

A European call option, if sold at time 0, gives the buyer the right, but not the obligation, to buy one unit of the stock at the strike time T for the strike price K. At time T the option has the cash value h(P(t)) = max(P(T) − K, 0) since the option will be exercised only if P(T) > K, in which case the holder of the option can buy the stock at the price K and resell it instantly for P(T). However it is not clear at time 0, since P(T) is random, what price the buyer should pay for this privilege. Assuming

  1. (i)

    the existence of a risk-free asset with price process,

    $$\displaystyle{ D(t) = D(0)\exp (rt),\ r > 0, }$$
    (7.6.3)
  2. (ii)

    the ability to buy and sell arbitrary (positive or negative) amounts of the stock and the risk-free asset continuously with no transaction costs, and

  3. (iii)

    an arbitrage-free market ( i.e., a market in which it is impossible to make a profit which is non-negative with probability one and strictly positive with probability greater than zero).

Black, Scholes and Merton showed that there is a unique value for the option in the sense that both higher and lower prices introduce demonstrable arbitrage opportunities. Details of the derivation can be found in most books dealing with mathematical finance (e.g., Campbell et al. 1996; Mikosch 1998; Klebaner 2005). In the following paragraphs we give a sketch of two arguments, following Mikosch (1998), which determine this value under the assumption that the asset price follows the GBM model.

In the first argument, we attempt to construct a self-financing portfolio, consisting at time t of a t shares of the stock and b t shares of the risk-free asset, where a t and b t are random variables which, for each t are functions of {B(s), s ≤ t}. We require the value of this portfolio at time t, namely

$$\displaystyle{ V (t) = a_{t}P(t) + b_{t}D(t), }$$
(7.6.4)

to satisfy the self-financing condition,

$$\displaystyle{ \mathrm{d}V (t) = a_{t}\ \mathrm{d}P(t) + b_{t}\ \mathrm{d}D(t), }$$
(7.6.5)

and to match the value of the option at time T, i.e.,

$$\displaystyle{ V (T) = h(P(T)) =\max (P(T) - K,0). }$$
(7.6.6)

If such an investment strategy, {(a t , b t ), 0 ≤ t ≤ T} can be found, then V (0) must be the value of the option at the purchase time t = 0. A higher price for the option would allow the seller to pocket the difference δ and invest the amount V (0) in such a way as to match the value of the option at time T. Then at time T, if P(T) < K the option will not be exercised and the portfolio and the option will both have value zero. If P(T) > K the seller sells the portfolio for P(T) − K, then buys one stock for P(T) and receives K for it from the holder of the option. Since there is no loss involved in this transaction, the seller is left with a net profit of δ. The seller of the option therefore makes a profit which is certainly non-negative and strictly positive with non-zero probability, in violation of the no arbitrage assumption. Similarly a lower price than V (0) would create an arbitrage opportunity for the buyer. In order to determine V (t), a t and b t we look for a smooth function v(t, x),  t ∈ [0, T],  x > 0, such that

$$\displaystyle{ V (t) = v(t,P(t)),\ t \in [0,T], }$$
(7.6.7)

satisfies the conditions (7.6.4)–(7.6.6).

Writing x for P(t) in v(t, P(t)) and applying Itô’s formula (see Appendix D) gives

$$\displaystyle{ dv ={ \partial v \over \partial t} dt +{ \partial v \over \partial x}dx +{ 1 \over 2}{ \partial ^{2}v \over \partial x^{2}}(dx)^{2} }$$
(7.6.8)

where, from (7.6.1),

$$\displaystyle{ dx = x(\mu dt +\sigma dB(t)) }$$
(7.6.9)

and

$$\displaystyle{ (dx)^{2} = x^{2}\sigma ^{2}dt. }$$
(7.6.10)

Applying Itô’s formula to (7.6.5) and using (7.6.3) and (7.6.4) gives

$$\displaystyle{ dv = a_{t}(\mu dt +\sigma dB(t)) + r(v - a_{t}x)dt. }$$
(7.6.11)

Substituting (7.6.9) and (7.6.10) into (7.6.8) and comparing with (7.6.11), we find that

$$\displaystyle{ a_{t} ={ \partial v \over \partial x}(t,P(t)) }$$
(7.6.12)

and that v(t, x) satisfies the equation,

$$\displaystyle{{ \partial v \over \partial t} +{ 1 \over 2}\sigma ^{2}x^{2}{ \partial ^{2}v \over \partial x^{2}} + rx{\partial v \over \partial x} = rv. }$$
(7.6.13)

The condition (7.6.6) yields the boundary condition,

$$\displaystyle{ v(T,x) = h(x) =\max (x - K,0), }$$
(7.6.14)

which, with (7.6.13), uniquely determines the function v and hence V (t), a t and b t  = (V (t) − a t P(t))∕D(t) for each t ∈ [0, T]. The corresponding investment strategy {(a t , b t ), 0 ≤ t ≤ T} satisfies (7.6.5) and (7.6.6) and can, under the assumed idealized trading conditions, be implemented in practice. Since at time T this portfolio has the same value as the option, V (0) must be the fair value of the option at time t = 0, otherwise an arbitrage opportunity would arise. The option is said to be hedged by the investment strategy {(a t , b t )}. A key feature of this solution [apparent from (7.6.12)–(7.6.14)] is that both the strategy and the fair price of the option are independent of μ, depending on the price process P only through the volatility parameter σ 2 .

Instead of attempting to solve (7.6.13) directly we now outline the martingale argument which leads to the explicit solution for v(x, t), a t and b t . It is based on the fact that for the GBM model with B(t) defined on the probability space \((\varOmega,\mathcal{F},\varPi )\), there is a unique probability measure Q on \((\varOmega,\mathcal{F})\) which is equivalent to Π (i.e., it has the same null sets) and which, when substituted for Π, causes the discounted price process \(\tilde{P}(t):= e^{-rt}P(t),\ 0 \leq t \leq T\), to be a B-martingale, i.e., to satisfy the conditions that \(E_{Q}\tilde{P}(t) < \infty \) and

$$\displaystyle{ E_{Q}(\tilde{P}(t)\vert B(u),u \leq s) = \tilde{P}(s)\ \ \mathrm{for\ all}\ 0 \leq s \leq t \leq T. }$$
(7.6.15)

The measure Q and the relation (7.6.15) can be derived as follows. Applying Itô’s formula to the expression \(\tilde{P}(t) = e^{-rt}P(t)\) and using (7.6.1) gives

$$\displaystyle{{ \mathrm{d}\tilde{P}(t) \over \tilde{P}(t)} = (\mu -r)\mathrm{d}t +\sigma \mathrm{d}B(t) =\sigma \mathrm{d}\tilde{B}(t), }$$
(7.6.16)

where \(\tilde{B}(t):= (\mu -r)t/\sigma + B(t)\). The solution of (7.6.16) satisfies

$$\displaystyle{ \tilde{P}(t) = \tilde{P}(0)e^{\sigma \tilde{B}(t)-\sigma ^{2}t/2 }. }$$
(7.6.17)

By Girsanov’s theorem (see Mikosch 1998), if we define Q by

$$\displaystyle{ Q(A) =\int _{A}\exp \left (-{\mu -r \over \sigma } B(T) -{ (\mu -r)^{2} \over 2\sigma ^{2}} T\right )d\varPi, }$$
(7.6.18)

then, on the new probability space \((\varOmega,\mathcal{F},Q)\), \(\tilde{B}\) is standard Brownian motion. A simple calculation using (7.6.17) then shows that the discounted price process \(\tilde{P}\) is a B-martingale on \((\varOmega,\mathcal{F},Q)\), i.e. \(E_{Q}\tilde{P}(t) < \infty \) and (7.6.15) holds.

Assuming the existence of a portfolio (7.6.4) which satisfies the self-financing condition (7.6.5) and the boundary condition (7.6.6), the discounted portfolio value is

$$\displaystyle{ \tilde{V }(t) = e^{-rt}V (t). }$$
(7.6.19)

Applying Itô’s formula to this expression we obtain

$$\displaystyle{\mathrm{d}\tilde{V }(t) = e^{-rt}(-rV (t)dt + \mathrm{d}V (t)) = a_{ t}e^{-rt}(-rP(t)\mathrm{d}t + \mathrm{d}P(t)) = a_{ t}\mathrm{d}\tilde{P}(t),}$$

and hence, from (7.6.16),

$$\displaystyle{ \tilde{V }(t) = \tilde{V }(0) +\int _{ 0}^{t}a_{ s}\mathrm{d}\tilde{P}(s) = V (0) +\sigma \int _{ 0}^{t}a_{ s}\tilde{P}(s)\mathrm{d}\tilde{B}(s). }$$
(7.6.20)

Since \(a_{t}\tilde{P}(t)\) is a function of {B(s), s ≤ t} for each t ∈ [0, T] and since, under the probability measure Q, \(\tilde{B}\) is Brownian motion and \(\tilde{B}(t)\) is a function of {B(s), s ≤ t} for each t ∈ [0, T], we conclude that \(\tilde{V }\) is a B-martingale. Hence

$$\displaystyle{\tilde{V }(t) = E_{Q}[\tilde{V }(T)\vert B(s),s \leq t],\ \ t \in [0,T],}$$

and

$$\displaystyle{ V (t) = e^{rt}\tilde{V }(t) = E_{ Q}[e^{-r(T-t)}h(P(T))\vert B(s),s \leq t], }$$
(7.6.21)

where h(P(T)) is the value of the option at time T. For the European call option h(P(T)) = max(P(T) − K, 0).

It only remains to calculate v(t, x) from (7.6.21). To do this we define θ: = Tt. Then, expressing P(T) in terms of P(t),

$$\displaystyle{V (t) = E_{Q}[e^{-r\theta }h(P(t)e^{(r-\frac{\sigma ^{2}} {2} )\theta +\sigma (\tilde{B}(T)-\tilde{B}(t))})\vert B(s),s \leq t] = v(t,P(t)),}$$

where

$$\displaystyle{ v(t,x) = e^{-r\theta }\int h(xe^{(r-\frac{\sigma ^{2}} {2} )\theta +\sigma y\theta ^{1/2}) })\phi (y)\mathrm{d}y }$$
(7.6.22)

and ϕ is the standard normal density function,

$$\displaystyle{\phi (y) = \frac{1} {\sqrt{2\pi }}\exp (-y^{2}/2).}$$

Substituting max(xK, 0) for h(x) in (7.6.22) gives

$$\displaystyle{ v(t,x) = x\varPhi (z_{1}) - Ke^{-r(T-t)}\varPhi (z_{ 2}), }$$
(7.6.23)

where Φ is the standard normal cumulative distribution function, Φ(x) =  x ϕ(u)du,

$$\displaystyle{z_{1} ={ \log (x/K) + (r +\sigma ^{2}/2)(T - t) \over \sigma \sqrt{T - t}} \ \mathrm{and}\ z_{2} = z_{1} -\sigma \sqrt{T - t}.}$$

The value of the option at time 0 is V (0) = v(0, P(0)) and the investment strategy {a t , b t , 0 ≤ t ≤ T} required to hedge it is determined by the relations \(a_{t} ={ \partial v \over \partial x}(t,P(t))\) and b t  = (v(t, P(t) − a t P(t))∕D(t). It can be verified by direct substitution (Problem 7.12) that the function v given by (7.6.23) satisfies the partial differential equation (7.6.13) and the boundary condition (7.6.14).

The quantity m = (μr)∕σ which appears in the integrand in (7.6.18) is called the market price of risk and represents the excess, in units of σ, of the instantaneous rate of return μ of the risky asset S over that of the risk-free asset D. If m = 0 then Q = Π and the model is said to be risk-neutral.

Although the model (7.6.1) has many shortcomings as a representation of asset prices, the remarkable achievement of Black, Scholes and Merton in using it to derive a unique arbitrage-free option price has inspired enormous interest and progress in the field of financial mathematics. As a result of their pioneering work, research in continuous-time financial models has blossomed, with much of it directed at the construction, estimation and analysis of more realistic continuous-time models for the evolution of stock prices, and the pricing of options based on such models. A nice account of option-pricing for a broad class of Lévy-driven stock-price models can be found in the book of Schoutens (2003).

Problems

  1. 7.1

    Evaluate EZ t 4 for the ARCH(1) process (7.2.5) with 0 < α 1 < 1 and {e t } ∼ IID N(0, 1). Deduce that EX t 4 <  if and only if 3α 1 2 < 1.

  2. 7.2

    Let {Z t } be a causal stationary solution of the ARCH(p) equations (7.2.1) and (7.2.2) with EZ t 4 < . Assuming that such a process exists, show that Y t  = Z t 2α 0 satisfies the equations

    $$\displaystyle{Y _{t} = e_{t}^{2}\left (1 +\sum _{ i=1}^{p}\alpha _{ i}Y _{t-i}\right )}$$

    and deduce that {Y t } has the same autocorrelation function as the AR(p) process

    $$\displaystyle{W_{t} =\sum _{ i=1}^{p}\alpha _{ i}W_{t-i} + e_{t},\ \ \{e_{t}\} \sim \mathrm{WN}(0,1).}$$

    (In the case p = 1, a necessary and sufficient condition for existence of a causal stationary solution of (7.2.1) and (7.2.2) with EZ t 4 <  is 3α 1 2 < 1, as shown by the results of Section 7.2 and Problem 7.1.)

  3. 7.3

    Suppose that {Z t } is a causal stationary GARCH(p, q) process \(Z_{t} = \sqrt{h_{t}}e_{t}\), where {e t } ∼ IID(0,1), i = 1 p a i + j = 1 q B j  < 1 and

    $$\displaystyle{h_{t} =\alpha _{0} +\alpha _{1}Z_{t-1}^{2} + \cdots +\alpha _{ p}Z_{t-p}^{2} +\beta _{ 1}h_{t-1} + \cdots +\beta _{q}h_{t-q}.}$$
    1. a.

      Show that E(Z t 2 | Z t−1 2, Z t−2 2, ) = h t .

    2. b.

      Show that the squared process {Z t 2} is an ARMA(m, q) process satisfying the equations

      $$\displaystyle\begin{array}{rcl} Z_{t}^{2}& =& \alpha _{ 0} + (\alpha _{1} +\beta _{1})Z_{t-1}^{2} + \cdots + (\alpha _{ m} +\beta _{m})Z_{t-m}^{2} {}\\ & & +U_{t} -\beta _{1}U_{t-1} -\cdots -\beta _{q}U_{t-q}, {}\\ \end{array}$$

      where m = max{p, q}, α j  = 0 for j > p, β j  = 0 for j > q, and U t  = Z t 2h t is white noise if EZ t 4 < .

    3. c.

      For p ≥ 1, show that the conditional variance process {h t } is an ARMA(m, p − 1) process satisfying the equations

      $$\displaystyle\begin{array}{rcl} h_{t} =\alpha _{0}& +(\alpha _{1} +\beta _{1})h_{t-1} + \cdots + (\alpha _{m} +\beta _{m})h_{t-m} {}\\ & +V _{t} +\alpha _{ 1}^{{\ast}}V _{t-1} + \cdots +\alpha _{ p}^{{\ast}}V _{t-p-1}, & {}\\ \end{array}$$

      where V t  = α 1 −1 U t−1 and α j  = α j+1α 1 for j = 1, , p − 1.

  4. 7.4

    To each of the seven components of the multivariate time series filed as STOCK7.TSM, fit an ARMA model driven by GARCH noise. Compare the fitted models for the various series and comment on the differences. (For exporting components of a multivariate time series to a univariate project, see the topic Getting started in the PDF file ITSM_HELP which is included in the ITSM software package.

  5. 7.5

    Verify equation (7.3.7).

  6. 7.6

    Show that the return, Z Δ (t): = logP(t) − logP(tΔ), approximates the fractional gain, F Δ (t): = (P(t) − P(tΔ))∕P(tΔ), in the sense that

    $$\displaystyle{{Z_{\varDelta }(t) \over F_{\varDelta }(t)} \rightarrow 1\ \mathrm{as}\ F_{\varDelta }(t) \rightarrow 0.}$$
  7. 7.7

    For the GBM model (7.5.7) with P(0) = 1, evaluate the mean and variance of P(t) and the mean and variance of the return, Z Δ (t).

  8. 7.8

    If h is any second-order stationary non-negative volatility process with mean ξ, variance ω 2 and autocorrelation function ρ, verify the relations (7.5.16)–(7.5.18).

  9. 7.9

    Use (7.5.26) and (7.5.27) to evaluate the mean and autocovariance function of the stationary Ornstein-Uhlenbeck process (7.5.23).

  10. 7.10

    If h is the stationary Ornstein-Uhlenbeck process (7.5.23) and s is any fixed value in [0, Δ], show that application of the operator ϕ(B): = (1 − e λ Δ B) to the sequence {h(n Δ + s), n ∈ } gives

    $$\displaystyle{\phi (B)h(n\varDelta + s) = W_{n}(s),}$$

    where {W n (s), n ∈ } is the iid sequence,

    $$\displaystyle{W_{n}(s) =\int _{ (n-1)\varDelta +s}^{n\varDelta +s}e^{\lambda (n\varDelta +s-u)}dL(u).}$$

    Deduce that the integrated volatility sequence, I n  =  Δ 0 h(n Δ + s)ds, satisfies

    $$\displaystyle{(1 - e^{\lambda \varDelta }B)I_{n} =\int _{ -\varDelta }^{0}W_{ n}(s)ds.}$$

    Since the right-hand side is 1-correlated, it follows from Proposition 2.1.1 that it is an MA(1) process and hence that the integrated volatility sequence is an ARMA(1,1) process.

  11. 7.11

    For the stochastic volatility model (7.5.13) with m = b = 0 and second-order stationary volatility process h independent of W, establish (7.5.20) and (7.5.21).

  12. 7.12

    Verify that the expression (7.6.23) for v(t, s) satisfies (7.6.13) and (7.6.14) and use it to write down the value of the option at time t = 0 and the corresponding investment strategy {(a t , b t ), 0 ≤ t ≤ T}.