Summary
The likelihood function of a continuous-discrete state space model is computed recursively by Monte Carlo integration, using importance sampling techniques. A functional integral representation of the transition density is utilized and importance densities are obtained by smoothing. Examples are the likelihood surfaces of an AR(2) process, a Ginzburg-Landau model and stock price models with stochastic volatilities.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The continuous-discrete state space model is a convenient specification for the dynamic modeling of quantitative variables in continuous time, which are subject to random disturbances both in the dynamics and in the discrete process of observation (Jazwinski, 1970). In order to meet requirements of empirical data analysis, the dynamics of the state vector is given in continuous time (system equation), whereas the measurements are assumed to be given only at discrete, possibly unequally spaced time points (measurement model). Moreover, analogously to structural equation models (SEM) or factor analysis, the state is only incompletely observable and subject to measurement error.
In the linear case with gaussian random errors, the system can be estimated efficiently by maximum likelihood (ML) using the Kalman filter algorithm (cf. Jones, 1984, Harvey and Stock, 1985, Jones and Tryon, 1987, Zadrozny, 1988, Jones and Ackerson, 1990, Singer, 1993, 1995), but in nonlinear systems complicated equations for the transition density arise, which must be solved numerically. One approach to do this is the Monte Carlo simulation method, where many sample trajectories are simulated and unknown probability densities and integrals are estimated from these data. In order to keep close contact to the linear Kalman filter, a sequence of time and measurement updates (continuous-discrete filter) is utilized, and the resulting integral expressions (expectation values) are approximated by statistical averages. To reduce the simulation error of the likelihood function, importance sampling and other variance reduction techniques (such as antithetical sampling) are used.
Simulation based filtering methods in discrete time have been used in the literature such as Markov chain Monte Carlo (MCMC; Carlin et al., 1992, Kim et al., 1998), rejection sampling using density estimators (Tanizaki, 1996, Tanizaki and Mariano, 1995, Hurzeler and Künsch, 1998), importance sampling and antithetic variables (Durbin and Koopman, 1997, 2000) and recursive bootstrap resampling (Gordon et al., 1993, Kitagawa, 1996).
In this paper the time update is generalized to the continuous time case by using the Chapman-Kolmogorov equation and importance sampling is implemented by approximate smoothing in order to reduce the variance of the simulated likelihood function. For this purpose the Gaussian sum filter of Alspach and Sorenson (1972) is used. In linear systems, the smoothing is exact and the simulation error of the likelihood estimate is zero, given the data (cf. section 7.1).
Section 2 defines the continuous-discrete state space model and section 3 presents the recursive computation of the filter densities and the likelihood. Sections 4 and 5 derive the variance reduction and its implementation by smoothing, whereas sections 6 and 7 discuss practical issues and present 3 examples.
2 Nonlinear continuous-discrete state space models
We discuss the nonlinear continuous-discrete state space model (Jazwinski, 1970)
where discrete time measurements zi are taken at times {t0, t1, …, tT}, t0 ≤ t ≤ tT. In state equation (1), the process error W(t) is a r-dimensional Wiener process and the state is described by the p-dimensional state vector y(t). It fulfils a system of stochastic differential equations (SDE) in the sense of Itô (cf. Arnold, 1974) with random initial condition y (t0) ∼ P0(y, t0) (prior distribution). The functions f : ℝp × ℝ × ℝu → ℝp and g : ℝp × ℝ × ℝu → ℝp × ℝr are called drift and diffusion coefficients, respectively. In measurement equation (2), ϵi ∼ N(0, R(ti, ψ)) is a k-dimensional discrete time white noise process (measurement error) and h: ℝp × ℝ × ℝu → ℝk is the output function. It is assumed that the error processes dW(t), ϵi and the initial state y(t0) are mutually independent. Parametric estimation is based on the u-dimensional parameter vector ψ. The key quantity for the computation of the likelihood function is the transition probability p(y, t|x, s) between states y and x at times t and s, respectively, which is a solution of the Fokker-Planck equation
subject to the initial condition p(y, s|x, s) = δ(y − x) (Dirac delta function). The symbol F(y, t, ψ) denotes the Fokker-Planck operator. The diffusion matrix is given by Ω = gg′: ℝp × ℝ × ℝu ℝp × ℝp. Under certain technical conditions the solution of (3) is the conditional density of y(t) given y(s) = x (see, e.g. Wong and Hajek, 1985, ch. 4).
In order to model exogenous influences, f, g, h and R are assumed to depend on deterministic regressor variables x(t): ℝ → ℝq, i.e. f(y, t, ψ) = f(y, t, x(t), ψ) etc. For notational simplicity, the dependence on x(t) and on ψ will be suppressed.
It may be noted that state space model (1, 2) allows the modeling of ARIMA systems, since unobserved higher order derivatives can be accomodated in an extended state vector \(\eta = {\rm{\{ }}y,\dot y,\ddot y, \ldots {\rm{\} }}\).
Furthermore, the functions f, g and h, R may depend on earlier measurements Zt = {z(tj);tj ≤ t} and \({Z^{{t_{i - 1}}}}\; = \;{\rm{\{ }}z({t_j});{t_j}\; \le \;{t_{i - 1}}{\rm{\} }}\), respectively, which allows the modeling of (G)ARCH effects ((generalized) autoregressive conditional heteroskedasticity). For example, the diffusion matrix g(y, t) may depend on earlier innovations νj = zj − E[zj|zj−1, …, z0]; tj ≤ t and if the functions are linear in the state y, the state space model is conditionally gaussian (cf. Liptser and Shiryayev, 1978, vol. II, ch. 13). Again, for notational simplicity, the dependence on the Zt will be suppressed.
3 Computation of the likelihood function
The exact time and measurement updates of the continuous-discrete filter are given by the recursive scheme (Jazwinski, 1970) for the a priori and a posteriori densities Pi+1|i,pi|i:
time update:
measurement update:
i = 0, …, T − 1, where F is the Fokker-Planck operator, Zi = {z{t)|t ≤ ti} are the observations up to time ti and Li+1:= p(zi+1|Zi) is the likelihood function of observation zi+1. The time update describes the time evolution of the conditional density p(y, t|Zi) given information up to the last measurement and the measurement update is a discontinuous change due to new information zi+1 using the Bayes formula. Thus the likelihood of the complete observation ZT = {zT, …,z0} can be computed sequentially and new observations zT+1 can be processed with only one more update step.
Some remarks may be in order:
-
1.
In the linear case, the conditional densities are gaussian and the recursive steps can be implemented for the conditional moments y(t|ti) = and P(t|ti) = Var[y(t)|Zi]. Instead of the Fokker-Planck equation, only linear ordinary differential equations must be solved. Furthermore, the measurement update can be computed analytically since all involved quantities are jointly (conditionally) gaussian. This is the celebrated Kalman filter algorithm extensively used in engineering, control theory, statistics, economics and the social sciences (cf. Jazwinski, 1970, Gelb, 1974, Liptser and Shiryayev, 1977, 1978, Harvey, 1989, Fahrmeir and Kaufmann, 1991, Singer, 1993).
-
2.
In the general nonlinear case, the time and measurement updates require the solution of partial differential equations and integrals (likelihood function) which can be obtained only numerically by several approximation methods. Linearizing the system one obtains the extended Kalman filter (EKF), but more elaborate methods such as the second order nonlinear filter (SNF; Jazwinski, 1970), the gaussian sum filter (Alspach and Sorenson, 1972), numerical integration (Kitagawa, 1987), and simulation methods have been used (Kitagawa, 1996, Tanizaki, 1996, Singer, 1997, Kim et al. 1998, Hürzeler and Künsch, 1998). Usually, the filters are formulated in discrete time, however.
In order to compute the solution of Fokker-Planck equation (4), a Monte Carlo approach was utilized (Wagner, 1988, Kloeden and Platen, 1992). We use an integral representation based on the Chapman-Kolmogorov-equation for Markov processes
which can be iterated to express
as a product of transition densities inserted into the interval
The auxiliary variables on the grid are defined as \({\eta _j} = y({\tau _j});{\tau _j} = {t_i} + j\delta t;j = 0, \ldots ,{j_i} = \Delta {t_i}{\rm{/}}\delta t,{y_i} = {\eta _0}, \ldots ,{\eta _{{J_i}}} = {y_{i + 1}}\), so that δt = Δti/Ji can be chosen so small, that
Ω:= gg′, can be approximated by a Gaussian density φ (cf. equation (1)). The Ji-fold product of Gaussian densities is called an Euler density (cf. Kloeden and Platen, 1992, ch. 16.3). In the limit Ji → ∞ the so called path integral (functional integral) representation
of the transition density is obtained (Haken, 1977, ch. 6.6, Risken, 1989, ch. 4.4.2). The exponent
is the Onsager-Machlup functional. The expression is only formal since y(t) is not differentiable. Analogous expressions are obtained when computing likelihood functionals, which can be transformed to formally existing limits (likelihood ratios) on dividing by a reference density and using Itô or Stratonovich integrals (cf. Wong and Hajek, 1985, ch. 6, p. 216 and the remark in ch. 7, p. 257; Stratonovich, 1989).
In numerical computations one does not go to the limit, but uses a δt small enough (a so called ϵ-Version in the sense of Stratonovich). The resulting (Ji − 1)-dimensional integral (8) can be estimated by the mean value
where N is the Monte Carlo sample size. Here \({\eta _n} = {\rm{\{ }}{\eta _{n,{J_i} - 1}}, \ldots ,{\eta _{n,1}},{\eta _0}{\rm{\} }}\) are replications of the vector {y(ti + (Ji − 1)δt), …, y(ti + δt),y(ti)}, which represents the path of the Itô process y(t) on the time grid (conditioned on the initial value yi = η0). The approximation errors are controlled by the parameters δt and N, where the first corresponds to the approximation of the SDE and the second reflects the accuracy of the Monte Carlo integration.
The SDE is simulated by using the Euler-Maruyama scheme
(cf. Kloeden and Platen, 1992, ch. 9 and 14).
In the desired extrapolation integral (time update)
an additional integration over the initial condition η0 = yi is required, which can be simulated by drawing yn,i = ηn,0 ∼ pi|i. The result is an estimator of delta type, similar to a kernel density estimator with variable band widths (variances) Ω(ηn, Ji−1, τJi−1)δt (cf. Silverman, 1986):
where \({y_{n,i + 1{\rm{\vert}}i}}: = {\eta _{n,{J_i} - 1}} + f({\eta _{n,{J_i} - 1}},{\tau _{{J_i} - 1}})\delta t\) and \({P_{n,i + 1{\rm{\vert}}i}}: = \Omega ({\eta _{n,{J_i} - 1}},{\tau _{{J_i} - 1}})\) δt. The Gauss form occurs only if the process errors dW are gaussian, however. In contrast, a kernel density estimator can be chosen of gaussian form even if nongaussian error terms (e.g. Poisson processes) are used, as in some finance applications (cf. Lo, 1988). See Singer (1997) for a comparision of several filter algorithms.
4 Importance sampling
The integral representation (15) can be rewritten to reduce the the variance of the estimate (16). In general, the integral
can be approximated by a variance reduced unbiased estimate
yn ∼ p2, if the density p2 (importance density) is chosen appropriately. One can show that the optimal density is given by
and the variance of (18) is zero if g is positive (cf. Kloeden and Platen, 1992, ch. 16.3). Unfortunately, the definition involves the desired quantity E1|g(y)| and p2,opt must be approximated (see below). Setting \(g({\eta _{{J_{i - 1}}}}) = p({y_{i + 1}}{\rm{\vert}}{\eta _{{J_i} - 1}})\) and \({p_1} = p({\eta _{{J_i} - 1}}{\rm{\vert}}{\eta _{{J_i} - 2}}) \ldots p({\eta _1}{\rm{\vert}}{\eta _0})p({\eta _0}{\rm{\vert}}{Z^i})\) leads to the optimal importance density
where the transition densities are conditioned on future states yi+1, which are not observed. Replacing yi+1 by yields a modified density \({\tilde p_{2,opt}}\) and an estimate
which can be shown to imply zero variance (given the data Zi) for the unbiased estimated likelihood
Furthermore the expressions (22) and (25) coincide for δt → 0 (see appendix).
Since the time update p(yi+1|Zi) (15) can be approximated by a sum of gaussian densities (21) of small band width ∝ δt, the usual measurement updates of the extended Kalman filter (EKF) can be applied to each element in the superposition (21) (cf. Anderson and Moore, 1979, theorem 2.1). One obtains the estimated a posteriori density
where the EKF updates are given by
The sequence of estimated a priori and a posteriori densities \(\tilde p({y_{i + 1}}{\rm{\vert}}{Z^i})\), \(\tilde p({y_{i + 1}}{\rm{\vert}}{Z^{i + 1}})\) (21, 23) yields a numerical implementation of the continuous-discrete filter (4, 5) and permits the variance reduced computation of the likelihood function \(\tilde L = \prod\nolimits_{i = 0}^{T - 1} {{{\tilde L}_{i + 1}}{L_0}} \), where L0 is the likelihood of the first observation z0. Since a functional integral representation of the density p(yi+1|Zi) is utilized the algorithm will be called functional integral filter (FIF).
5 Implementation of the importance density by smoothing
In general the optimal weights (likelihood ratio)
i = 0, …, T − 1, determined by the importance density p2,opt are difficult to compute, since they involve the unknown conditional densities p(ηj+1|ηj, zi+1) and p(yi|Zi, zi+1), but for the linear system an exact result is available which can be generalized to the nonlinear case.
Thus in linear systems the likelihood estimate (22,25) is dispersion free and the mean is exact with one trajectory. In nonlinear systems, the approximate importance density p2 leads to a suboptimal estimate \({{\tilde L}_{i + 1}}\) with variance > 0 but still to a variance reduction.
Since in the linear case all densities are gaussian, one obtains the conditional mean and variance (ηj= y(τj); τj = ti + jδt; j = 0, …, Ji − 1)
which characterize the density p(ηj+1|ηj, zi+1|j, zi+1) in p2,opt.
The smoother gain Fj and the update formulas are similar to the fixed interval smoother (cf. Anderson and Moore, 1979). The quantities in the update formulas can be obtained by solving the differential equations (τj ≤ t ≤ ti+1)
and setting
For the density p(yi| Zi, Zi+1) in p2,opt one obtains the moments:
Again the quantities in the update formulas can be obtained by solving the differential equations (ti ≤ t ≤ ti+1), but with different initial conditions:
and setting
In the limit of small δt one can write
which may be interpreted as a correction to the drift f and the diffusion matrix Ω. Therefore the optimal density p2,opt and trajectories ηn = {ηn, Ji−1, ⋯ ηn,0} drawn from it can be obtained by using a modified drift f2 and diffusion coefficient Ω2. More precisely, a stochastic Euler-Maruyama scheme (13) with drift f2 and diffusion coefficient Ω2 is used to simulate ηn ∼ p2,opt.
Since the moment equations (37-41, 46-48) can be generalized to nonlinear systems by replacing A(t) → fy(y, t), Ω(t) → Ω(y, t), and H(t) → hy(y, t), one obtains a sampling scheme where the (sub)optimal density p2 is implemented by means of the EKF updates and trajectories \({{\rm{\{ }}{\eta _{{J_i} - 1}}, \ldots ,{\eta _0}{\rm{\} }}_n}\sim{p_2}\) can be simulated using f2 and Ω2.
6 Practical implementation
6.1 Smoothing with Gaussian Sums
If the density p0|−1 = p0(y0, t0) (initial condition) is represented by a gaussian mixture distribution of N populations
using appropriate weights αn,0|−1, all updates preserve the structure of a gaussian sum and the computation of the importance density proceeds by an N-fold solution of the smoother equations and related trajectories drawn from the density p2. More precisely, the measurement update
is again a gaussian sum and the smoothed density p(y0|Z0, z1) = p0|1 required in p2 may be represented by the moments E[yn,0|z0, z1], Var(yn,0|Z0, z1) of population n and an updated weight \({\alpha _{n,0{\rm{\vert}}1}} = {\alpha _{n,0{\rm{\vert}}0}}{{{L_{n,1}}} \over {{L_1}}},{L_1} = \sum {{\alpha _{n,0{\rm{\vert}}0}}{L_{n,1}}} \), i.e.
Therefore, the EKF’s and smoothers computing the optimal weights and the importance density run on a gaussian sum which is called a gaussian sum, filter (Anderson and Moore, 1979, chapter 8). This filter does not involve any simulation and computes an approximate time update via deterministic moment equations using N EKF’s. Whereas these are only valid for small sampling intervals Δti, the stochastic simulation of trajectories via (13) leads to an estimate of the a priori density valid for arbitrary sampling intervals.
From the density p0|1 N random initial conditions ηn,0|1 can be drawn and used to simulate the trajectories \({\rm{\{ }}{\eta _{n,{J_0} - 1}}, \ldots ,{\eta _{n,0}}{\rm{\} }} \sim {p_2}\) using f2 and Ω2 in (13). From this the a priori density \({{\tilde p}_{1{\rm{\vert}}0}} = \tilde p({y_1}{\rm{\vert}}{Z^0})\) (21) can be estimated and the a posteriori density \({{\tilde p}_{1{\rm{\vert}}1}} = \tilde p({y_1}{\rm{\vert}}{Z^1})\) and the likelihood estimate \({{\tilde L}_1}\) in (23) is obtained. \({{\tilde p}_{1{\rm{\vert}}1}}\) is again a gaussian sum and the update \({{\tilde p}_{1{\rm{\vert}}2}} = \tilde p({y_1}{\rm{\vert}}{Z^2})\) may be computed as before etc. The algorithm runs recursively from i = 0, …, T and yields a sequence of likelihood contributions \({{\tilde L}_i}\).
6.2 Resampling Strategies and Antithetical Sampling
Drawing yj, j = 1, …, N from the mixture distribution (a posteriori density)
can be accomplished by drawing a population n with probability αn and then setting \({y_j} = {y_n} + P_n^{1/2}{z_j}\) where zj ∼ N(0, I) and \(P_n^{1/2}\) is a Cholesky root (or another matrix square root). The drawing of n may be implemented by drawing uj ∼ U[0, 1] (uniform distribution) and solving \(\sum\nolimits_{m = 1}^{n - 1} {{\alpha _m} < {u_j} \le \sum\nolimits_{m = 1}^{n - 1} {{\alpha _m}} } \). Alternatively, the deterministic values uj = (j − c)/N; c ∈ (0, 1) or stratified values uj = (j + Uj − 0.5 − c)/N; Uj ∼ U[0, 1], c ∈ (0, 1) could be used. According to Kitagawa (1996, appendix), when using deterministic or stratified drawing, it is preferrable in advance to sort the mixture in order of magnitude, i.e. \(\left\Vert {{{\tilde y}_n}} \right\Vert < \left\Vert {{{\tilde y}_{n + 1}}} \right\Vert\) and draw from the sorted \({{\tilde y}_n},{{\tilde P}_n}\) and \({{\tilde \alpha }_n}\). In my experience, sorting improves the smoothness of the simulated likelihood surface as a function of the parameter vector ψ (cf. example 7.3, figs. (10–15)).
Another device in reduction of sampling error is antithetical sampling (Hammersley and Handscomb, 1964, p. 60). Instead of simulating zj ~ N(0, I); j = 1, …, N, pairs {zj, − zj}; j − 1, …, N/2, are drawn. The negatively correlated sample leads to estimators with smaller variance. When simulating the Euler scheme (13), the i.i.d. sequence −zj;j = 0, …, − 2 can be used to simulate a trajectory ηj(−z) which is anticorrelated with ηj(+z).
6.3 Implementation Details
The algorithm was programmed with Mathematica (Wolfram Research, 1992) and the MPW C compiler (Apple Computer, 2001) using the Mathlink communication library and run on Apple Power PC 604e and G3 computers. The Mathlink routines allow the calling of C programs from within Mathematica. For numerical computations (Cholesky roots, random numbers, sorting), the C algorithms in Numerical Recipes in C (Press et al., 1992) have been used.
7 Examples
7.1 AR(2) process
In order to test the performance of the importance sampling algorithm, a linear AR(2) model was simulated. The sampler should give a variance free estimate of the likelihood which must coincide with the exact result obtained by the Kalman filter. I used the state space model (equivalent to a 2nd order differential equation)
where Var(ϵi):= R = 0.1, the data were equispaced with Δt = 2, t0 = 0 ≤ t ≤ tT = 50, and the discretization interval was chosen as δt = 0.01, Ji = Δt/δt − 200, i = 0, …, T = 25. A time series was computed according to (55) and using this data the likelihood function l(ψ) was simulated.
Results: The results are displayed in figures (1–4), where the variance reduction in M = 10 replications of the likelihood surface is summarized. Also shown is the exact result using the (linear) Kalman filter. The likelihoods and scores are plotted as a function of the parameter ψ3 = −16 in the interval [−20, −14]. Even in the case N = 1 (fig. 2), the sampling error is very small. Using larger discretization intervals δt = 0.1, 0.05, one can show numerically that the variance of the estimated likelihood increases. Therefore, approximation errors in the simulation (13) and in the transition density (9) lead to deviations from the (theoretically) exact variance free estimate \(\tilde p({z_{i + 1}}{\rm{\vert}}{Z^i})\) (22).
7.2 Ginzburg-Landau model
The Ginzburg-Landau equation is a nonlinear diffusion equation where the drift coefficient is the gradient of a double well potential \(\Phi (y,{\rm{\{ }}\alpha ,\beta {\rm{\} }}){\rm{ = }}{\textstyle{\alpha \over 2}}{y^2} + {\textstyle{\beta \over 2}}{y^4},\;f = - \partial \Phi /\partial y\):
Models of this kind have been used to model limit cycles, bifurcations, phase transitions and normal forms of nonlinear systems (cf. V.I. Arnold, 1973, 1986, Haken, 1977, Holmes, 1981, normal form theorem 4.4). Other applications are in the modelling of equilibrium states of an economy (Herings, 1996) and the theory of system failure (Frey, 1996).
In the present context a parameter constellation of ψ = {α, β, σ, R} = {− 1., 0.1,2., 0.1}, R = Var(ϵ) was chosen, which corresponds to a potential with two minima and noisy sampled measurements (Δt = 2; 0 ≤ t ≤ 50; δt = 0.1).
The convergence of the simulated likelihood as a function of sample size N is shown in figure (5). Again the variance of the estimates is considerably reduced by importance sampling. The form of the likelihood surface and of the score as a function of ψ1 = α is compared in figures (6–7).
Finally, figs. (8–9) explain the effect of importance sampling on the trajectories drawn from p2. As shown in section 5, the actual drift and diffusion coefficients f, g are modified to f2, g2 in order to draw from the importance density, which yields more random numbers near the points in state space, where p(zi|yi) is averaged in the likelihood expression Li+1 = ∫ p(zi+1|yi+1)p(yi+1|Zi)dyi+1 (cf. eq. 6). Clearly, without importance sampling, only few trajectories are near the measurement at the end of the interval (see fig. 6) and the mean shows high dispersion.
Simulation studies (Singer, 1999b) compared the performance of the functional integral filter (FIF) with a filter based on kernel density estimates and with approximations based on Taylor expansions (EKF, 2nd order nonlinear filter SNF and local linearization (LL), cf. Shoji and Ozaki (1997, 1998)). It was shown that for large sampling intervals, the FIF with importance sampling exhibits the smallest bias even for small Monte Carlo sample sizes (N = 10), whereas without importance sampling, sample sizes of at least N = 50 are required. Moreover, Taylor expansion methods (EKF, SNF, LL) only yield good results for small measurement intervals.
7.3 Stochastic Volatility
Stochastic volatility models such as
Var(dW, dV) = ρdt (Scott, 1987, Hull and White, 1987), where the volatility process u(t) is not observable, can account for the fact that the returns
on financial time series exhibit a time dependent variance and for the leptokurtosis of the return distribution. In contrast to ARCH and GARCH models exhibiting conditional heteroscedasticity, too, the variance equation is driven by a separate Wiener process and the variance cannot be eliminated. For example, the discrete time GARCH(1,1) process
permits the recursive computation of σi given measurements of the innovation process εi (which corresponds to σdW) and an initial value σ0. It has been shown by Nelson (1990) however, that a continuous time limit of the GARCH(1,1)-M model (Engle and Bollerslev, 1986) in the mean corrected log returns log(Si+1/Si):= yi+1 − yi
leads to the system of stochastic differential equations
where W and V are independent standard Wiener processes and the coefficients are scaled as \(dt \to 0\;(\omega \to \omega dt,\;\beta \to 1 - \alpha \sqrt {dt{\rm{/}}2} - \theta dt,\;\alpha \to \alpha \sqrt {dt{\rm{/}}2} )\). This differs somewhat from equation (58), where the volatility satisfies an Ornstein-Uhlenbeck process.
Stochastic volatility models in discrete time have been used as approximations to the stochastic differential equation (58) or some variants such as
where the log-volatility h(t) is modeled by an Ornstein-Uhlenbeck-process to ensure a positive σ (cf. Wiggins, 1987, Nelson, 1990). Taking logarithms and using Itô’s lemma, y = log S fulfils
This has been shown to be the continuous time limit of an AR(1)-EGARCH model (Nelson, 1990, sect. 3.3) and corresponds to the discrete time model
used for the mean corrected returns by Kim et al. (1998).
Since available data are measured in discrete time (daily or weekly), but the models in the option pricing literature are mostly formulated in continuous time, times series formulations are only approximations to the sampled stochastic processes. The continuous time asymptotics and asymptotic GARCH filters developed by Nelson (1990, 1992) are only valid in the limit of small sampling interval. In analogy to linear theory, the differential equations should be filtered and estimated using discrete data with arbitrary time interval. This involves sampled diffusion processes with latent variables, since the volatility is not observed. In contrast to linear models, where exact discrete time series can be derived explicitly (cf. Bergstrom, 1990), the exact analogs of nonlinear systems involve transition densities which are solutions of the Fokker-Planck equation.
The following example serves to illustrate the simulated properties of the continuous time stochastic log volatility model (61). We assume that T = 365 daily data are simulated (y0 = log 100, h0 = log 0.22), but measurements are taken only weekly, i.e. δt = 1/365; Δt = 7/365. Parameters were chosen as \(\psi \; = \;{\rm{\{ }}\mu ,\lambda ,\bar h,\gamma ,\rho {\rm{\} }}\;\; = \;\;{\rm{\{ }}.07, - 1,\log ({0.2^2})\;\; = \;\; - 3.21888,2,0{\rm{\} }}\) and the prior distribution of the state η(0) = {y(0), h(0)} was set to P0|−1 ∼ N({4, −3.}, diag(1, 1)). Thus the correlation p between the Wiener processes W and V is zero in accordance with Kim et al. In the measurement model the measurement error was set to a variance of R = 0.0001. Figures (10–15) show the simulated likelihood surface as a function of ψ3 = h in the interval [−6, 1]. It is seen that sorting of the posterior distribution (cf. section 6.2) improves the smoothness of the likelihood and lowers the sampling error of the score function ∂l/∂ψ). In figure (12), antithetical sampling still further improves the smoothness as seen in the score (right picture). Figures (13 — 15) demonstrate that importance sampling permits the usage of much smaller Monte Carlo sample size JV.
8 Conclusion
We have shown how the likelihood function of a continuous-discrete state space model can be simulated using Monte Carlo integration. The variance of the estimate is considerably reduced by using importance sampling. The importance density was computed by approximate smoothing algorithms, which run on gaussian sums and are only suboptimal in general nonlinear systems. Nevertheless, a strong reduction in dispersion is achieved. Currently the algorithms are tested in estimating the parameters of stochastic volatility models and the Lorenz model.
References
Alspach, D. & Sorenson, H. (1972), ‘Nonlinear Bayesian estimation using Gaussian sum approximations’, IEEE Transactions on Automatic Control 17, 439–448.
Anderson, B. & Moore, J. (1979), Optimal Filtering, Prentice Hall, Englewood Cliffs.
Apple Computer (2001), Macintosh Programmer’s Workshop, http://-developer.apple.com/tools/mpw-tools/.
Arnold, L. (1974), Stochastic Differential Equations, John Wiley, New York.
Arnold, V. (1973), Ordinary Differential Equations, MIT Press, Cambridge (Mass.), London.
Arnold, V. (1986), Catastrophy Theory, Springer, Berlin.
Bergstrom, A. (1990), Continuous Time Econometric Modelling, Oxford University Press, Oxford.
Carlin, B., Poison, N. & Stoffer, D. (1992), ‘A Monte Carlo Approach to Non-normal and Nonlinear State-Space Modeling’, Journal of the American Statistical Association 87, 493–500.
Durbin, J. & Koopman, S. (1997), ‘Monte Carlo maximum likelihood estimation for non-Gaussian state space models’, Biometrika 84,3, 669–684.
Durbin, J. & Koopman, S. (2000), ‘Time series analysis of non-Gaussian observations based on state space models from both classical and Bayesian perspectives.’, Journal of the Royal Statistical Association B 62, 1, 3–56.
Engle, R. & Bollerslev, L. (1986), ‘Modelling the persistence of conditional variances’, Econometric Reviews 5, 1–50.
Fahrmeir, L. & Kaufmann, H. (1991), ‘On Kalman Filtering, Posterior Mode Estimation and Fisher Scoring in Dynamic Exponential Family Regression’, Metrika 38, 37–60.
Frey, M. (1996), ‘A Wiener Filter, State-Space Flux-Optimal Control Against Escape from a Potential Well’, IEEE Transactions on Automatic Control 41, 2, 216–223.
Gelb, A., ed. (1974), Applied Optimal Estimation, MIT Press, Cambridge, Mass.
Gordon, N., Salmond, D. & Smith, A. (1993), ‘Novel approach to nonlinear/non-Gaussian Bayesian state estimation’, IEEE Transactions on Radar and Signal Procesing 140, 2, 107–113.
Haken, H. (1977), Synergetics, Springer, Berlin.
Hammersley, J. & Handscomb, D. (1964), Monte Carlo Methods, Methuen, London.
Harvey, A. (1989), Forecasting, structural time series models and the Kalman filter, Cambridge University Press, Cambridge.
Harvey, A. & Stock, J. (1985), ‘The estimation of higher order continuous time autoregressive models’, Econometric Theory 1, 97–112.
Herings, J. (1996), Static and Dynamic Aspects of General Disequilibrium Theory, Kluwer, Boston, London, Dordrecht.
Holmes, P. J. (1981), ‘Center manifolds, normal forms and bifurcations of vector fields’, Physica D 2, 449–481.
Hull, J. & White, A. (1987), ‘The Pricing of Options with Stochastic Volatilities’, Journal of Finance XLII,2, 281–300.
Hürzeler, M. & Künsch, H. (1998), ‘Monte Carlo Approximations for General State-Space Models’, Journal of Computational and Graphical Statistics 7,2, 175–193.
Jazwinski, A. (1970), Stochastic Processes and Filtering Theory, Academic Press, New York.
Jones, R. (1984), Fitting multivariate models to unequally spaced data, in E. Parzen, ed., ‘Time Series Analysis of Irregularly Observed Data’, Springer, New York, pp. 158–188.
Jones, R. & Ackerson, L. (1990), ‘Serial correlation in unequally spaced longitudinal data’, Biometrika 77, 721–731.
Jones, R. & Tryon, P. (1987), ‘Continuous time series models for unequally spaced data applied to modeling atomic clocks’, SIAM J. Sci. Stat. Comput. 8, 71–81.
Kim, S., Shephard, N. & Chib, S. (1998), ‘Stochastic Volatility: Likelihood Inference and Comparision with ARCH Models’, Review of Economic Studies 45, 361–393.
Kitagawa, G. (1987), ‘Non-Gaussian state space modeling of nonstationary time series’, Journal of the American Statistical Association 82, 1032–1063.
Kitagawa, G. (1996), ‘Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models’, Journal of Computational and Graphical Statistics 5,1, 1–25.
Kloeden, P. & Platen, E. (1992), Numerical Solution of Stochastic Differential Equations, Springer, Berlin.
Liptser, R. & Shiryayev, A. (1977, 1978), Statistics of Random Processes, Volumes I and II, Springer, New York, Heidelberg, Berlin.
Lo, A. (1988), ‘Maximum Likelihood Estimation of Generalized Itô Processes with Discretely Sampled Data’, Econometric Theory 4, 231–247.
Nelson, D. (1990), ‘ARCH models a diffusion approximations’, Journal of Econometrics 45, 7–38.
Nelson, D. (1992), ‘Filtering and forecasting with misspecified ARCH models I’, Journal of Econometrics 52, 61–90.
Press, W., Teukolsky, S., Vetterling, W. & Flannery, B. (1992), Numerical Recipes in C, second edn, Cambridge University Press, Cambridge.
Risken, H. (1989), The Fokker-Planck Equation, second edn, Springer, Berlin, Heidelberg, New York.
Scott, L. (1987), ‘Option pricing when the variance changes randomly: Theory, estimation, and an application’, Journal of Financial and Quantitative Analysis 22, 419–438.
Shoji, I. & Ozaki, T. (1997), ‘Comparative Study of Estimation Methods for Continuous Time Stochastic Processes’, Journal of Time Series Analysis 18, 5, 485–506.
Shoji, I. & Ozaki, T. (1998), ‘A statistical method of estimation and simulation for systems of stochastic differential equations’, Biometrika 85, 1, 240–243.
Silverman, B. (1986), Density estimation for statistics and data analysis, Chapman and Hall, London.
Singer, H. (1993b), ‘Continuous-time dynamical systems with sampled data, errors of measurement and unobserved components’, Journal of Time Series Analysis 14, 5, 527–545.
Singer, H. (1995), ‘Analytical score function for irregularly sampled continuous time stochastic processes with control variables and missing values’, Econometric Theory 11, 721–735.
Singer, H. (1997a), Nonlinear Continuous-Discrete Filtering and ML Estimation using Kernel Density Estimates and Functional Integrals, Regensburger Beiträge zur Statistik und Ökonometrie 40, Universität Regensburg.
Singer, H. (1999b), Parameter Estimation of Nonlinear Stochastic Differential Equations: Simulated Maximum Likelihood vs. Extended Kalman Filter and Itô-Taylor Expansion, Regensburger Beiträge zur Statistik und Ökonometrie 41, Universität Regensburg.
Stratonovich, R. (1989), Some Markov methods in the theory of stochastic processes in nonlinear dynamic systems, in F. Moss & P. McClintock, eds, ‘Noise in nonlinear dynamic systems’, Cambridge University Press, pp. 16–71.
Tanizaki, H. (1996), Nonlinear filters: estimation and applications, second edn, Springer, Berlin.
Tanizaki, H. & Mariano, R. (1995), Prediction, Filtering and Smoothing in Nonlinear and Non-normal Cases using Monte-Carlo Integration, in H. Van Dijk, A. Monfort & B. Brown, eds, ‘Econometric Inference using Simulation Techniques’, John Wiley, pp. 245–261.
Wagner, W. (1988), ‘Monte Carlo Evaluation of Functionals of Solutions of Stochastic Differential Equations. Variance Reduction and Numerical Examples’, Stochastic Analysis and Applications 6, 447–468.
Wiggins, J. (1987), ‘Option values under stochastic volatility’, Journal of Financial Economics 19, 351–372.
Wolfram, S. (1992), Mathematica, 2nd edn, Addison-Wesley, Redwood City.
Wong, E. & Hajek, B. (1985), Stochastic Processes in Engineering Systems, Springer, New York.
Zadrozny, P. (1988), ‘Gaussian likelihood of continuous-time armax models when data are stocks and flows at different frequencies’, Econometric Theory 4, 108–124.
Author information
Authors and Affiliations
Appendix
Appendix
In this appendix it is shown, that the EKF update of the a priori density (21) leads to the correct variance reduced Monte Carlo estimator of the likelihood
Using the Bayes formula this can be rewritten as
where dη = dηJi−1 … dη0 and for this the optimal variance reducing density is
Therefore the optimal estimator is
which is the same as inserting (21) into (62). It. remains to note that
since
Rights and permissions
About this article
Cite this article
Singer, H. Simulated Maximum Likelihood in Nonlinear Continuous-Discrete State Space Models: Importance Sampling by Approximate Smoothing. Computational Statistics 18, 79–106 (2003). https://doi.org/10.1007/s001800300133
Published:
Issue Date:
DOI: https://doi.org/10.1007/s001800300133