1 Introduction

Continuous time stochastic processes are appropriate models for phenomena where no natural time interval for the dynamics is given. Examples are mechanical systems (Newton’s equations) or stock price movements where no natural trading interval can be identified. Measurements of this continuous time process Y(t) are frequently obtained only at discrete time points ti (daily, weekly, quarterly, etc.), so that dynamical models in econometrics are mostly formulated for the measurement times (time series models). In contrast, we consider stochastic differential equations (SDE) for the state Y(t), but assume that only a sampled trajectory Yi:= Y(ti) can be measured (cf. e.g. Bergstrom, 1990; Singer, 1995). Therefore, maximum likelihood estimation for sampled continuous time models must be based on transition probabilities in the observation interval Δt. Unfortunately, this key quantity is not analytically available in most cases and must be computed by approximate schemes. The most simple is based on the Euler approximation of the SDE. The resulting discrete time scheme leads to conditionally Gaussian transition densities. A related approach is based on the moment equations for the first and second moment (for a survey, cf. Singer 2002). Again, a conditionally Gaussian scheme is obtained. Alternatively, the drift coefficient can be expanded around the measurements to obtain a locally linear SDE leading again to a conditionally Gaussian scheme (Shoji and Ozaki 1997, 1998). Quasi likelihood methods using conditional moments are also discussed in Shoji (2002). Still another Gaussian approach using stopping times is discussed by Yu and Phillips (2001).

Whereas these approximations are extremely useful for small sampling intervals where the transition density only slightly deviates from normality, for larger intervals corrections are necessary which take account of skewness and kurtosis (and higher order characteristics) of the true density. Among these approaches are Monte Carlo simulations (Pedersen 1995; Andersen and Lund 1997; Elerian et al. 2001; Singer 2002, 2003), approximate analytical approaches (Aït-Sahalia 2002) and finite difference methods for the Fokker-Planck equation (cf. Jensen and Poulsen 2002). Using the first and second conditional moments, Bibby and Sorensen (1995) discuss martingale estimators which emerge from corrections to the discretized likelihood.

In this paper we consider a Hermite expansion with leading Gaussian term, but in contrast to Aït-Sahalia (2002) the expansion coefficients are expressed in terms of conditional moments and computed by solving deterministic moment equations.

The article is outlined as follows: In Sect. 2 the basic model and the maximum likelihood method is defined. Section 3 introduces the Hermite expansion used to approximate the transition density and the moment equations are derived in Sect. 4. In Sect. 5, a simulation study is performed using an SDE with nonlinear diffusion coefficient [Constant Elasticity of Variance (CEV)], and the performance of the several density approximations are compared with the exact solution. Finally, in an appendix, the moment equations are derived.

2 Model and likelihood

We discuss the nonlinear stochastic differential equation (SDE)

$$\mathrm{d} Y(t)=f(Y(t), t, \psi) \mathrm{d} t+g(Y(t), t, \psi) \mathrm{d} W(t)$$
(1)

where discrete measurements Yi are taken at times {t0, t1, …, tT}, t0ttT, Yi:= Y(ti). In the state equation (1), W(t) denotes a r-dimensional Wiener process and the state is described by the p-dimensional state vector Y(t). It fulfils a system of stochastic differential equations in the sense of Itô (cf. Arnold 1974) with initial condition Y(t0). The functions f:p × ℝ × ℝu → ℝp and g:p × ℝ × ℝu → ℝp × ℝr are called drift and diffusion coefficients, respectively. Parametric estimation is based on the u-dimensional parameter vector ψ. The key quantity for the computation of the likelihood function is the transition probability p(y, tx, s) which is a solution of the Fokker-Planck equation (see appendix). Extensions to nonlinear noisy measurements are given in Gordon et al. (1993), Kitagawa (1987, 1996), Hürzeler and Künsch (1998) and Singer (2003). Using the Markov property of Y(t) the likelihood is given by

$$L_{\psi}(y) :=p\left(y_{T}, \ldots, y_{1} | y_{0} ; \psi\right)=\prod_{i=0}^{T-1} p\left(y_{i+1} | y_{i} ; \psi\right)$$
(2)

The transition density p(yi +1yi; ψ) can be computed analytically only in the linear or some special cases [e.g. constant elasticity of variance (CEV) diffusion process; cf. Feller 1951; Cox and Ross 1976]. In the general case, approximate numerical procedures must be employed (cf. the references in Sect. 1). Conditional Gaussian approximations work well when the sampling intervals Δti = ti+ 1ti are not too large in comparision with the dynamics as specified in f and g. On the other hand, time series and panel data often involve large sampling intervals which are fixed by the design of the study. Therefore, corrections must be made to the Gaussian transition probability. Here we use a Hermite expansion with leading Gaussian term and corrections involving higher order moments.

3 Hermite expansion

The transition density p(yi +1yi; ψ) can be expanded into a Fourier series (cf. Courant and Hilbert 1968, ch. II, 9; Abramowitz and Stegun 1965, ch. 22) by using the complete set of Hermite polynomials which are orthogonal with respect to the weight function w(x) = ϕ(x) = (2π)−1/2 exp(−x2/2) (standard Gaussian density), i.e.,

$$\int\limits_{-\infty}^{\infty} H_{n}(x) H_{m}(x) w(x) \mathrm{d} x=n ! \delta_{n m}$$
(3)

The Hermite polynomials Hn(x) are defined by

$$\phi^{(n)}(x) :=(\mathrm{d} / \mathrm{d} x)^{n} \phi(x)=(-1)^{n} \phi(x) H_{n}(x).$$
(4)

and are given explicitly by H0 = 1, H1 = x, H2 = x2 − 1, H3 = x3 − 3x, H4 = x4 − 6x2 + 3, etc. Therefore, the density function p(x) can be expanded asFootnote 1

$$p(x)=\phi(x) \sum_{n=0}^{\infty} c_{n} H_{n}(x).$$
(5)

and the Fourier coefficients are given by

$$c_{n} :=(1 / n !) \int\limits_{-\infty}^{\infty} H_{n}(x) p(x) \mathrm{d} x=(1 / n !) E\left[H_{n}(X)\right]$$
(6)

where X is a random variable with density p(x). Since the Hermite polynomials contain powers of x, the expansion coefficients can be expressed in terms of moments of X, i.e. μk = E[Xk]. Using the standardized variables Z = (Xμ)/σ with μ = E[X], σ2 = E[X2]−μ2, E[Z]= 0, E[Z2] = 1, E[Zk]:= vk one obtains the simplified expressions c0 = 1, c1 = 0, c2 = 0,

$$c_{3} :=(1 / 3 !) E\left[Z^{3}\right]=(1 / 3 !) v_{3}$$
(7)
$$c_{4} :=(1 / 4 !) E\left[Z^{4}-6 Z^{2}+3\right]=(1 / 24)\left(v_{4}-3\right)$$
(8)

and the density expansion

$$p_{z}(z) :=\phi(z)\left[1+(1 / 6) v_{3} H_{3}(z)+(1 / 24)\left(v_{4}-3\right) H_{4}(z)+\cdots\right]$$
(9)

which shows that the leading Gaussian term is corrected by higher order contributions containing skewness and kurtosis excess. For a Gaussian random variable, pz(z) = ϕ(z), so the coefficients ck, k ≥ 3 all vanish. For example, the kurtosis of Z is E[Z4] = 3, so c4 = 0.

Using the expansion for the standardized variable and the change of variables formula px(x) = (1/σ)pz(z);z = (xμ)/σ one obtains the desired Hermite expansion for px(x). The standardized moments vk = E[Zk] = E[(Xμ)k]/σk:= mk/σk can be expressed in terms of centered moments

$$m_{k} :=E\left[M_{k}\right] :=E\left[(X-\mu)^{k}\right].$$
(10)

For these moments differential equations will be derived in the following.

4 Scalar moment equations

4.1 Differential equations for the moments

The first conditional moment E[Y(t)∣Yi] fulfils the exact equation

$$\dot{\mu}\left(t | t_{i}\right)=E\left[f(Y, t) | Y^{i}\right].$$
(11)

The time dependence of higher order conditional moments

$$m_{k}\left(t | t_{i}\right) :=E\left[M_{k}\left(t | t_{i}\right) | Y^{i}\right] :=E\left[\left(Y(t)-\mu\left(t | t_{i}\right)\right)^{k} | Y^{i}\right].$$
(12)

is (see appendix)

$$\dot{m}_{k}=k E\left[f(Y) *\left(M_{k-1}-m_{k-1}\right)\right]+\frac{1}{2} k(k-1) E\left[\Omega(Y) * M_{k-2}\right]$$
(13)

with initial condition mk(titi) = 0. These are not differential equations, and Taylor expansion of f and Ω around μ and inserting this into (11, 13) yields

$$\dot{\mu} :=\sum_{l=0}^{\infty} f^{(l)}(\mu) \frac{m_{l}}{l !}$$
(14)
$$\begin{aligned} \dot{m}_{k}=& k \sum_{l=1}^{\infty} \frac{f^{(l)}(\mu)}{l !}\left(m_{l+k-1}-m_{l} m_{k-1}\right) \\ &+\frac{1}{2} k(k-1) \sum_{l=0}^{\infty} \frac{\Omega^{(l)}(\mu)}{l !} m_{l+k-2}(k \geq 2). \end{aligned}$$
(15)

For practical applications, three problems must be solved:

  1. 1.

    One must choose a number K of moments to consider.

  2. 2.

    The expansion of f and Ω must be truncated somewhere (l = 0, …, L).

  3. 3.

    On the right hand side moments of maximum order L + K − 1 occur, so that only in the special case L = 1 (locally linear approximation of f and Ω) a closed system of equations results. In other cases, two methods can be used:

    1. (a)

      Higher order moments are neglected: mk = 0; k > K

    2. (b)

      Higher order moments are factorized by the Gaussian assumption

      $$m_{k}=\left\{\begin{array}{ll}{(k-1) ! ! m_{2}^{k / 2} ;} & {k>K\ \ \text { is even }} \\ {0} & {k>K}\ \ {\text { is odd }}\end{array}\right.$$
      (16)

If we use only K = 2 moments and truncate at L = 1, the time update of the extended Kalman filter (EKF) and the second order nonlinear filter (SNF; L = 2) is obtained. In analogy to EKF and SNF, the abbreviation HNF(K, L) (higher order nonlinear filter) will be used.

4.2 Example: square root stock price model

The square root stock price model (cf. Feller 1951; Cox and Ross 1976)

$$\mathrm{d} Y(t)=r Y(t) \mathrm{d} t+\sigma Y(t)^{1 / 2} \mathrm{d} W(t).$$
(17)

has a linear drift and diffusion term Ω(y) = σ2y with derivatives Ω′ (y) = σ2, Ω(l)(y) = 0,l ≥ 2 and yields a closed system (see, e.g. Bibby and Sorensen 1995). The first and second moments fulfil

$$\dot{\mu}=r \mu$$
(18)
$$\dot{m}_{2}=2 r m_{2}+\sigma^{2} \mu$$
(19)

with explicit solution

$$\mu\left(t | t_{i}\right)=\exp \left[r\left(t-t_{i}\right)\right] Y_{i}$$
(20)
$$m_{2}\left(t | t_{i}\right)=\frac{\sigma^{2}}{r}\left[\exp \left(2 r\left(t-t_{i}\right)\right)-\exp \left(r\left(t-t_{i}\right)\right)\right] Y_{i}.$$
(21)

The equation for the third moment

$$\dot{m}_{3}=3 r m_{3}+3 \sigma^{2} m_{2}$$
(22)

contains an inhomogenous term yielding a skewed density after some time. Moreover,

$$\dot{m}_{4}=4 r m_{4}+6 \sigma^{2}\left(\mu m_{2}+m_{3}\right)$$
(23)

is not solved anymore by the Gaussian factorization \(m_{4}=3 m_{2}^{2}\)due to the skewness term σ2m3. For the parameter vector ψ = {r, σ} = {0.1,0.2} the moment equations were solved by an Euler scheme with discretization interval δt = 1/250 year and T = 1,000 time steps corresponding to Tδt = 4 years and initial condition m(titi) = [1,0,0,0]′. Figure 1 demonstrates the improvement of the EKF density over the Euler approximation. The square root model (17) can be solved exactly using Bessel functions (Feller 1951) and the moments mk were computed numerically from this exact density. Inspection of the function q2(z) = pz(z)2/ϕ (z), where pz(z) = py(μ + σz)σ is the standardized density function, reveals that the convergence condition (footnote 1) is not fulfilled. Expanding up to order k = 21, the nonconvergence is shown in Fig. 2. However, low order approximations such as k = 3, 6 are nevertheless quite good. Alternatively, one could transform the process such that the diffusion coefficient is constant. It can be shown that the resulting transition density p y(y) is suffiently close to a normal density (Aït-Sahalia, loc. cit., prop. 2), so that a convergent Hermite expansion is possible. In the multivariate case, such a transformation is difficult, so the idea is to study low order approximants of the nonconvergent Hermite series.

Fig. 1
figure 1

Square root model: exact density (red), approximate density p2,1 (y) with Hermite expansion up to K = 2(EKF, orange) and Euler density (green)

Fig. 2
figure 2

Square root model: Approximate densities pk(y) with Hermite expansion up to k = 21 (orange) and exact density (red). The moments were computed from the exact density function

5 Simulation studies

The Hermite expansion approach was tested in simulation studies and compared with the Euler approach, the Nowman methodFootnote 2 and the exact ML method using the Feller density. Weekly and monthly observations of the square root model were generated on a daily basis, i.e. we chose a discretization interval of δt = 1/365 (year) and simulated daily series using the Euler-Maruyama scheme

$$y_{j+1}=y_{j}+f\left(y_{j}, t_{j}\right) \delta t+g\left(y_{j}, t_{j}\right) \delta W_{j},$$
(24)

δWj, ∼N(0, δt) i.i.d., j = 0, …, J. The data were sampled weekly and monthly at times ji = (Δt/δt)i, i = 0, …, T = 500 with Δt = 7/365,30/365 (year). Thus for the sampled data of length T = 500, J = 500 × 7 = 3,500 and J = 500 × 30 = 15, 000 daily data were simulated. The parameter values in the CEV model

$$\mathrm{d} Y(t)=r Y(t) \mathrm{d} t+\sigma Y(t)^{\alpha / 2} \mathrm{d} W(t)$$
(25)

are ψ = {r = 0.1, α = 1, σ = 0.2} corresponding to a square root model. Also, scenarios with high drift r = 0.2 and high volatility σ = 0.4 were considered. The data were simulated using these true parameter vectors, but in the estimation procedure no restrictions (such as α = 1) were employed.

5.1 Weekly data

Table 1 displays the estimation results for weekly sampling interval Δt = 7/365 year. Comparing the estimation methods, the exact ML approch is best in terms of root mean square error \(\mathrm{RMSE}=\sqrt{\mathrm{Bias}^{2}+\mathrm{Std}^{2}}\), \(\mathrm{Bias} :=\overline{\hat{\theta}}-\theta\), \(\mathrm{Std}=\sqrt{(M-1)^{-1} \sum\nolimits_{m}\left(\hat{\theta}_{m}-\overline{\hat{\theta}}\right)^{2}}\), except for r, where the HNF(4,4) is better. The third order approximation HNF(3,3) is only slightly worse than the exact approach and HNF(4,4).

Table 1 Square root model: means and standard deviations of ML estimates in M = 500 replications

Nowman’s method, which approximates the moment equation for m2 with constant diffusion term, leads to slightly better results than the EKF. Also, the simple Euler estimator performs well.

Generally, for weekly sampling, all methods show small bias and are comparable in terms of RMSE.

5.2 Monthly data

The bias of the approximation methods (deviations from conditional normal distribution) should show up for larger sampling interval. Indeed, using monthly data, Table 2 shows that the Euler method and other approximations [except HNF(3,3) and HNF(4,4)] have slight disadvantages in relation to exact ML. Again, the differences are not very pronounced. Surprisingly, Euler is best for parameter r, but with higher RMSE for the other parameters. In this case, all 500 samples converged except for exact ML, where 15 samples run into a step halving with no higher likelihood value (cf. Dennis and Schnabel 1983, for convergence issues).

Table 2 Square root model: means and standard deviations of ML estimates in M = 500 replications

5.3 Other parameter constellations

The approximation methods were also tested using a scenario with higher drift, i.e. ψ = {0.2,1,0.2} and higher volatility ψ = {0.1,1,0.4}.

5.3.1 High drift

For high drift r = 0.2, the exact ML is best for weekly and monthly sampling interval. The Euler method displays higher RMSE and the other methods perform roughly the same.

5.3.2 High volatility

In the high volatility scenario σ = 0.4, a considerable number of samples did not converge (neither ∥score∥ nor ∣step∣ were smaller than ϵ = 0.01). Since the percentages (up to 25%) depend on the method, the results are difficult to compare and are not reported.

5.3.3 Small sample properties

The normal scenario (ψ = {0.1,1,0.2}) was also tested for T = 50 sampled data (i.e., 50 weeks or months). In this case the estimates for α = 1 were strongly biased and the convergence rates were different for each method. No clear picture emerged which method should be preferred.

6 Conclusion

The transition density of a diffusion process was approximated as Hermite series and the expansion coefficients were expressed in terms of conditional moments. Taylor expansion of the drift and diffusion functions leads to a hierarchy of approximations indexed by the number K of moments and the order L of the Taylor series. The square root model, which is an important model for stock prices, was estimated using a CEV specification. Using weekly and monthly sampling intervals, the exact ML method is best, followed by HNF(4,4), HNF(3,3) and the other approximation methods except Euler. The simple Euler approximation has degraded performance in relation to the EKF type Gaussian likelihood and higher order skewed densities. For the chosen parameter values which are typical for stock prices, the differences are not very pronounced, however.

The practical implications are as follows: In cases where the exact density is unknown, use the HNF(4,4), HNF(3,3) or at least the EKF, SNF or Nowman. The simple Euler estimator is fast but has degraded performance.

Further studies will use higher order Hermite approximations and derive equations for the expansion coefficients of a convergent Hermite series using oscillator eigenfunctions. Moreover, generalizations to the vector case will be derived.