1 Introduction

This paper is concerned with the particular hidden stochastic model:

$$\begin{aligned} \left\{ \begin{array}{ll} Y_i=X_{i}+\varepsilon _{i}\\ X_{i+1}=b_{\phi _0}(X_{i})+\eta _{i+1}, \end{array} \right. \end{aligned}$$
(1)

where \((\varepsilon _{i})_{i\ge 1}\) and \((\eta _{i})_{i\ge 1}\) are two independent sequences of independent and identically distributed (i.i.d) centered random variables with variance \(\sigma ^{2}_{\epsilon }\) and \(\sigma ^{2}_{0}\). It is assumed that the variance \(\sigma ^{2}_{\epsilon }\) is known. The terminology hidden comes from the unobservable character of the process \((X_i)_{i\ge 1}\) since the only available observations are \(Y_{1},\ldots , Y_n\).

The dynamics of the process \(X_i\) is described by a measurable function \(b_{\phi _0}\) which depends on an unknown parameter \(\phi _0\) and by a sequence of i.i.d centered random variables with unknown variance \(\sigma _0^2\). We denote by \(\theta _0\) the vector of parameters governing the process \(X_i\) and suppose that the model is correctly specified: that is, \(\theta _0\) belongs to the parameter space \(\varTheta \subset \mathbb{R }^{r}\), with \(r \in \mathbb N ^{*}\).

Inference in hidden Markov models is a real challenge and has been studied by many authors (see Cappé et al. 2005a; Doucet et al. 2001; Robert et al. 2000). Chanda provided in (1995) an asymptotically normal estimator for the vector of parameters \(\theta _0\) by using modified Yule Walker equation but it assumes that the function \(b_{\phi _0}\) is linear in \(\phi _0\) and \(X_i\), so the model (1) is reduced to an autoregressive model with measurement error.

Recently, in Douc et al. (2011), the authors propose an efficient estimator of the vector of parameters \(\theta _0\) for nonlinear function \(b_{\phi _0}\). They prove that their Maximisation Likelihood Estimator (MLE) is consistent and asymptotically normal. The main difficulty with their approach comes from the unobservable character of the process \(X_i\) making the calculus of the exact likelihood intractable in practice: the likelihood is only available in the form of a multiple integral, so exact likelihood methods require simulations and have therefore an intensive computational cost. In many case, the MLE has to be approximated. A popular approach to approximate the MLE consists in using Monte Carlo Markov Chain (MCMC) simulation techniques. Thanks to the development of these methods, the MLE has known a huge progress and Bayesian estimations have received more attention (see Smith and Roberts 1993). Another method for performing the MLE consists in using the Expectation-Maximization (EM) algorithm proposed by Dempster et al. (1977). Nevertheless, since \(X_i\) is unobservable, this method requires to introduce a MCMC in the Expectation step. Although these methods are used in practice, they are expensive from a computational point of view.

Some authors have proposed Sequential Monte Carlo algorithms (SMC) known as Particles Filtering methods which allow to approximate the likelikood. The computational cost is reduced by a recursive construction. We refer to the book of Doucet et al. (2001) and Cappé et al. (2005a) for a complete review of these methods.

Particle Markov Chain Monte Carlo (PMCMC) is another method for estimating the model (1). This method combines Particles filtering methods and MCMC methods to estimate the vector of parameters \(\theta _0\). From a computational point of view, this approach is expensive and we refer the reader to Andrieu et al. (2010) for more details. In Peters et al. (2010), they propose an adaptive PMCMC method to estimate ecological hidden stochastic models.

We propose here an approach based on M-estimation: It consists in the optimisation of a well-chosen contrast function (see Van der Vaart 1998, chapter p. 41 for a partial review) and deconvolution strategy. The deconvolution problem is encountered in many statistical situations where the observations are collected with random errors. In this approach, a method for estimating the parameter \(\phi _0\) has been proposed by Comte and Taupin (2001). Their procedure of estimation is based on a modified least squared minimization. In the same perspective, Dedecker et al. in (2011) propose also the same procedure of estimation based on a weighted least squared estimation: Their assumptions on the process \(X_i\) are less restrictive than those proposed by F. Comte and M. Taupin and they provide consistent estimation of the parameter \(\phi _0\) with a parametric rate of convergence in a very general framework. Their general estimator is based on the introduction of a kernel deconvolution density and depends on the choice of a weight function.

The approach proposed here is different: it is not based on a weighted least squared estimation so that the choice of the weight function is not encountered in this paper. Moreover, it allows to estimate both the parameters \(\phi _0\) and \(\sigma ^{2}_0\). Our principle of estimation relies on the Nadaraya–Watson strategy and is proposed by Comte et al. in (2011) in a non parametric case to estimate the function \(b_{\phi }\) as a ratio of an estimate of \(l_{\theta }=b_{\phi }f_{\theta }\) and an estimate of \(f_{\theta }\), where \(f_{\theta }\) represents the invariant density of the \(X_i\). We propose to adapt their approach in a parametric context and suppose that the form of the stationary density \(f_{\theta _0}\) is known up to some unknown parameter \(\theta _0\). Our work is purely parametric but we go further in this direction by proposing an analytical expression of the asymptotic variance matrix \(\varSigma (\hat{\theta }_{n})\) which allows to construct confidence interval. Furthermore, this approach is much less greedy from a computational point of view than the MLE and its implementation is straighforward.

Applications: Applications include epidemiology, meterology, neuroscience, ecology (see Ionides et al. 2011) and finance (see Johannes et al. 2009). For example, our approach can be applied to the five ecological state space models described in Peters et al. (2010). Although the scope of our method is general, we have chosen to focus on the so-called discrete time Stochastic Volatility model (SV) introduced by Taylor (2005). We also investigate the behavior of our method on the simpler autoregressive process AR(1) with measurement noise which has been widely studied and on which our method can be more easily understood and compared with other ones. Our procedure allows to estimate the parameters of a large class of discrete Stochastic Volatility models (ARCH-E model, Vasicek model, Merton model..), which is a real challenge in financial application.

(i) Gaussian Autoregressive AR(1) with measurement noise: It has the following form:

$$\begin{aligned} \left\{ \begin{array}{ll} Y_{i+1}=X_{i+1}+\varepsilon _{i+1}\\ X_{i+1}=\phi _0 X_{i}+\eta _{i+1}, \end{array} \right. \end{aligned}$$
(2)

where \(\varepsilon _{i+1}\) and \(\eta _{i+1}\) are two centered Gaussian random variables with variance \(\sigma _{\epsilon }^{2}\) assumed to be known and \(\sigma ^{2}_{0}\) assumed to be unknown. Additionally, we assume that \(|\phi _0|<1\) which implies the stationary and ergodic property of the process \(X_i\) (see Dedecker et al. 2007).

(ii) SV model: It is directly connected to the type of diffusion process used in asset-pricing theory (see Melino and Turnbull 1990):

$$\begin{aligned} \left\{ \begin{array}{ll} R_{i+1}=\exp \left( \frac{X_{i+1}}{2}\right) \xi _{i+1},\\ X_{i+1}=\phi _{0}X_{i}+\eta _{i+1}, \end{array} \right. \end{aligned}$$
(3)

where \(\xi _{i+1}\) and \(\eta _{i+1} \) are two centered Gaussian random variables with variance \(\sigma _{\xi }^{2}\) assumed to be known and equal to one and \(\sigma ^{2}_{0}\) assumed to be unknown. The variables \(R_{i+1}\) represent the returns and \(X_{i+1}\) is the log-volatility process.

By applying a log-transformation \(Y_{i+1}=\log (R^{2}_{i+1})-\mathbb{E }[\log (\xi ^{2}_{i+1})]\) and \(\varepsilon _{i+1}=\log (\xi ^{2}_{i+1})-\mathbb{E }[\log (\xi ^{2}_{i+1})]\), the SV model is a particular version of (1). We assume that \(|\phi _0|<1\) and we refer the reader to Genon-Catalot et al. (2000) for the mixing properties of stochastic volatility models.

Most of the computational problems stem from the assumptions that the innovation of the returns are Gaussian which translates into a logarithmic chi-square distribution when the model (12) is transformed in a linear state space model. Many authors have ignored it in their implementation and many authors use some mixture of Gaussian to approximate the log-chi-square density. For example, in the Quasi-Maximum Likelihood (QML) method implemented by Jacquier et al. (2002) and in the Simulated Expectation-Maximization Likelihood estimator proposed (SIEMLE) by Kim et al. (1994) they used a mixture of Gaussian distribution to approximate the log-chi-square distribution. Harvey (1994) used the Kalman filter to estimate the likelihood of the transform state space model, hence the model was also assumed to be Gaussian.

Organization of the paper: The first purpose of the paper is to present our estimator and its statistical properties in Sect. 1.1: Under weak assumptions, we show that it is a consistent and asymptotically normal estimator (Table 1).

Table 1 Comparison of the computing time (CPU in seconds) and the MSE with respect to the number of observations \(n=200\) up to \(1{,}500\) for the Gaussian AR(1) and the SV models

The second purpose of this paper consists in comparing our contrast estimator with different estimations: the QML, the SIEMLE and Bayesian estimators. Section 2 contains the numerical study: In Sect. 2.4 we give the parameter estimates and the comparison with others ones for simulation data and Sect. 2.6 contains the study on real data. We compare our contrast estimator with other ones on the SP&500 and FTSE index. From a computational point of view, we show that the implementation of our estimator is straightforward and it is faster than the SIEMLE (see Table 2 in Sect. 2.5.1). Besides, we show that our estimator outperforms the QML and Bayesian estimators.

Table 2 SIEMLE estimation for the SV model

Notations: We denote by: \(u^{*}(t)=\int _{}^{}e^{itx}u(x)dx\) the Fourier transform of the function \(u(x)\) and \(\left\langle u,v \right\rangle =\int _{}^{}u(x)\overline{v}(x)dx\) with \(v\overline{v}=|v|^{2}\). We write \(||u||_{2}=\left( \int _{}^{} |u(x)|^{2} dx\right) ^{1/2}\) the norm of \(u(x)\) on the space of functions \(\mathbb L ^{2}(\mathbb{R })\). By property of the Fourier transform, we have \((u^{*})^{*}(x)=2\pi u(-x)\) and \(\left\langle u_1,u_2\right\rangle =\frac{1}{2\pi }\left\langle u^{*}_1,u^{*}_2\right\rangle \). The vector of the partial derivatives of \(f\) with respect to (w.r.t) \(\theta \) is denoted by \(\nabla _{\theta }f\) and the Hessian matrix of \(f\) w.r.t \(\theta \) is denoted by \(\nabla ^{2}_{\theta }f\). The Euclidean norm matrix, that is, the square root of the sum of the squares of all its elements will be written by \(\left\| {A} \right\| \). We denote by \(\mathbf{Y }_i\) the pair \((Y_i,Y_{i+1})\) and \(\mathbf{y }_{i}=(y_{i},y_{i+1})\) is a given realisation of \(\mathbf{Y }_i\).

In the following, \(\mathbb P , \mathbb{E }, \mathbb{V }ar\) and \({\mathbb{C }ov}\) denote respectively the probability \(\mathbb P _{\theta _0}\), the expected value \(\mathbb{E }_{\theta _0}\), the variance \(\mathbb{V }ar_{\theta _0}\) and the covariance \({\mathbb{C }ov}_{\theta _0}\) when the true parameter is \(\theta _0\). Additionally, we write \(\mathbf{P }_n\) (resp. \(\mathbf{P }\)) the empirical expectation (resp. theoretical), that is: for any stochastic variable \(X\): \(\mathbf P _{n}(X) = \frac{1}{n}\sum _{i=1}^{n} X_i\) (resp. \(\mathbf P (X)=\mathbb{E }[X]\)).

1.1 Procedure: contrast estimator

Hereafter, we propose explicit estimators of the parameter \(\theta _0\). This estimator called the contrast estimator is based on minimization of suitable functions of the observations usually called “contrasts functions”. We refer the reader to Van der Vaart (1998) for a general account on this notion. Furthermore, in this part, we use the contrast function proposed by Comte et al. (2010), that is:

$$\begin{aligned} \mathbf P _{n}m_{\theta }=\frac{1}{n}\sum _{i=1}^{n}m_{\theta }(\mathbf{Y }_i), \end{aligned}$$
(4)

with \(n\) the number of observations and:

$$\begin{aligned} m_{\theta }(\mathbf{y }_{i}): (\theta , \mathbf{y }_{i})\in (\varTheta \times \mathbb R ^{2})\mapsto m_{\theta }(\mathbf{y }_{i})=||l_{\theta }||^{2}_{2}-2y_{i+1}u^{*}_{l_{\theta }}(y_{i}), \end{aligned}$$

where the function \(l_{\theta }\) and \(u_{v}\) are given by:

$$\begin{aligned} l_{\theta }(x)=b_{\phi }(x)f_{\theta }(x) \quad \text{ and} \quad u_{v}(x)=\frac{1}{2\pi }\frac{v^{*}(-x)}{f_{\varepsilon }^{*}(x)} \end{aligned}$$
(5)

with \(f_{\theta }\) the invariant density of \(X_i\).

Some assumptions. As our procedure relies on the Fourier deconvolution strategy, in order to construct our estimator, we assume that the density of the noise \(\varepsilon _i\), denoted by \(f_\varepsilon \), is fully known and belongs to \(\mathbb L _2(\mathbb R )\), and for all \(x \in \mathbb R \, f^{*}_{\varepsilon }(x)\ne 0\). Furthermore, we assume that the function \(l_{\theta }\) belongs to \(\mathbb L _1(\mathbb R )\cap \mathbb L _2(\mathbb R )\). The function \(u_{l_{\theta }}\) must be integrable.

For the statistical study, the key assumption is that the process \((X_i)_{i \ge 1}\) is stationary and ergodic (see Genon-Catalot et al. 2000 for a definition).

Remark 1

In this paper we consider the situation in which the observation noise variance is known. This assumption which is not in general the case in practice is necessary for the identifiability of the model and so is a standard assumption for state-space models given in (1).

There is some restrictions on the distribution of the observation and process errors in the Nadaraya-Watson approach. It is known that the rate of convergence for estimating the function \(l_{\theta }\) is related to the rate of decrease of \(f^{*}_{\varepsilon }\). In particular, the smoother \(f_{\varepsilon }\), the slower the rate of convergence for estimating is (The Gaussian, log-chi squared or Cauchy distributions are super-smooth. The Laplace distribution is ordinary smooth). This rate of convergence can be improved by assuming some additional regularity conditions on \(l_{\theta }\). There is a good discussion about this subject in Comte et al. (2010) and Dedecker et al. (2011).

The procedure Let us explain the choice of the contrast function and how the strategy of deconvolution works. Obviously, as the model (1) shows, the \(\mathbf{Y }_i\) are not i.i.d. However, by assumption, they are stationary ergodic, so the convergence of \(\mathbf P _{n}m_{\theta }\) to \(\mathbf P m_{\theta }=\mathbb{E }\left[ m_{\theta }(\mathbf{Y }_{1})\right] \) as \(n\) tends to the infinity is provided by the Ergodic Theorem. Moreover, the limit \(\mathbb{E }\left[ m_{\theta }(\mathbf{Y }_{1})\right] \) of the contrast function can be explicitly computed:

$$\begin{aligned} \mathbb{E }\left[ m_{\theta }(\mathbf{Y }_{1})\right] =\left\| l_{\theta }\right\| _{2}^{2}-2\mathbb{E }\left[ Y_{2}u_{l_{\theta }}^{*}(Y_{1})\right] \!. \end{aligned}$$

By Eq. (1) and under the independence assumptions of the noise \((\varepsilon _2)\) and \((\eta _2)\), we have:

$$\begin{aligned} \mathbb{E }\left[ Y_{2} u^*_{l_{\theta }}(Y_{1})\right] = \mathbb{E }\left[ b_{\phi _{0}}(X_1) u^*_{l_{\theta }}(Y_{1})\right] \!. \end{aligned}$$

Using Fubini’s Theorem and Eq. (1), we obtain:

$$\begin{aligned} \mathbb{E }\left[ b_{\phi _{0}}(X_1) u^*_{l_{\theta }}(Y_{1})\right]&= \mathbb{E }\left[ b_{\phi _{0}}(X_1) \int e^{iY_{1}z} u_{l_{\theta }}(z) dz \right] \nonumber \\&= \mathbb{E }\left[ b_{\phi _{0}}(X_1) \int \frac{1}{2\pi }\frac{1}{f_{\varepsilon }^*(z)}e^{iY_{1}z} (l_{\theta }(-z))^*dz \right] \nonumber \\&= \frac{1}{2\pi } \int \mathbb{E }\left[ b_{\phi _{0}}(X_1)e^{i(X_1+\varepsilon _1)z} \right] \frac{1}{f_{\varepsilon }^*(z)} (l_{\theta }(-z))^* dz \nonumber \\&= \frac{1}{2\pi } \int \frac{\mathbb{E }\left[ e^{i\varepsilon _1z}\right] }{f_{\varepsilon }^*(z)} \mathbb{E }\left[ b_{\phi _{0}}(X_1)e^{iX_1z}\right] (l_{\theta }(-z))^* dz\nonumber \\&= \frac{1}{2\pi } \mathbb{E }\left[ b_{\phi _{0}}(X_1) \int e^{iX_1z} (l_{\theta }(-z))^* dz \right] \nonumber \\&= \frac{1}{2\pi } \mathbb{E }\left[ b_{\phi _{0}}(X_1) \left( (l_{\theta }(-X_1))^{*}\right) ^*\right] \nonumber \\&= \mathbb{E }\left[ b_{\phi _{0}}(X_1)l_{\theta }(X_1)\right] .\nonumber \\&= \int b_{\phi _{0}}(x)f_{\theta _{0}}(x)b_{\phi }(x)f_{\theta }(x)dx \nonumber \\&= \left\langle l_{\theta }, l_{\theta _{0}} \right\rangle \!. \end{aligned}$$
(6)

Then,

$$\begin{aligned} \mathbb{E }\left[ m_{\theta }(\mathbf{Y }_{1})\right]&= \left\| l_{\theta }\right\| _{2}^{2}-2\left\langle l_{\theta }, l_{\theta _{0}} \right\rangle ,\end{aligned}$$
(7)
$$\begin{aligned}&= \left\| l_{\theta }-l_{\theta _0}\right\| _{2}^{2}-\left\| l_{\theta _{0}}\right\| _{2}^{2}. \end{aligned}$$
(8)

Under the uniqueness assumption (CT) given just later this quantity is minimal when \(\theta \) = \(\theta _{0}\). Hence, the associated minimum-contrast estimators \(\widehat{\theta }_n\) is defined as any solution of:

$$\begin{aligned} \widehat{\theta }_n=\arg \min _{\theta \in \varTheta }\mathbf P _{n}m_{\theta }. \end{aligned}$$
(9)

Remark 2

One can see in the deconvolution strategy described in Eq. (6) that temporal correlation in the observation or latent process errors can be authorized. The procedure still be applicable but the covariance matrix \(\varOmega _{j-1}(\theta _0)\) for the CLT has not an analytic expression in this case since the use of the Fourier deconvolution approach does not work.

We refer the reader to Dedecker et al. (2007) for the proof that if \(X_i\) is an ergodic process then the process \(Y_i\), which is the sum of an ergodic process with an i.i.d. noise, is again stationary ergodic. Furthermore, by the definition of an ergodic process, if \(Y_i\) is an ergodic process then the couple \(\mathbf{Y }_i=(Y_i, Y_{i+1})\) inherits the property (see Genon-Catalot et al. 2000).

1.2 Asymptotic properties of the contrast estimator

Our proof holds under the following assumptions. For the reader convenience, we denote by (E) [resp. (C) and (T)] the assumptions which serve us for the existence (resp. Consistency and Central Limit Theorem). If the same assumption is needed for two results, for example for the existence and the consistency, it is denoted by (EC).

  • (ECT): The parameter space \(\varTheta \) is a compact subset of \(\mathbb R ^{r}\) and \(\theta _0\) is an element of the interior of \(\varTheta \).

  • (C): (Local dominance): \(\mathbb{E }\left[ \sup _{\theta \in \varTheta }\left| Y_{2}u^{*}_{l_{\theta }}(Y_{1})\right| \right] <\infty \).

  • (CT): The application \(\theta \mapsto \mathbf P m_{\theta }\) admits an unique minimum and its Hessian matrix denoted by \(V_{\theta }\) is non-singular in \(\theta _0\).

  • (T): (Regularity): We assume that the function \(l_{\theta }\) is twice continuously differentiable w.r.t \(\theta \in \varTheta \) for any \(x\) and measurable w.r.t \(x\) for all \(\theta \) in \(\varTheta \). Additionally, each coordinate of \(\nabla _{\theta }l_{\theta }\) and each coordinate of \(\nabla ^{2}_{\theta }l_{\theta }\) belong to \(\mathbb L _1(\mathbb R )\cap \mathbb L _2(\mathbb R )\) and each coordinate of \(u_{\nabla _{\theta }l_{\theta }}\) and \(u_{\nabla _{\theta }^{2}l_{\theta }}\) have to be integrable as well.

  •       (Moment condition): For some \(\delta >0\) and for \(j \in \left\{ 1,\ldots ,r\right\} \):

    $$\begin{aligned} \mathbb{E }\left[ \left| Y_{2}u^*_{\frac{{\partial }l_{\theta }}{{\partial }\theta _j}}(Y_{1})\right| ^{2+\delta }\right] <\infty . \end{aligned}$$
  •       (Hessian Local dominance): For some neighbourhood \(\fancyscript{U}\) of \(\theta _0\) and for \(j,k \in \left\{ 1,\ldots ,r\right\} \):

    $$\begin{aligned} \mathbb{E }\left[ \sup _{\theta \in \fancyscript{U}}\left| Y_{2}u^{*}_{\frac{{\partial }^2 l_{\theta }}{{\partial }\theta _j{\partial }\theta _k}}(Y_{1})\right| \right] <\infty . \end{aligned}$$

Let us introduce the matrix:

$$\begin{aligned} \varSigma (\theta )=V_{\theta }^{-1} \varOmega (\theta ) V_{\theta }^{-1^{\prime }} \quad \text{ with} \varOmega (\theta )=\varOmega _{0}(\theta )+2 \sum _{j=2}^{+\infty } \varOmega _{j-1}(\theta ), \end{aligned}$$

where \(\varOmega _{0}(\theta )=\mathbb{V }ar\left( \nabla _{\theta }m_{\theta }(\mathbf Y_{1} )\right) \) and \(\varOmega _{j-1}(\theta )={\mathbb{C }ov}\left( \nabla _{\theta }m_{\theta }(\mathbf Y_{1} ),\nabla _{\theta }m_{\theta }(\mathbf Y_{j} )\right) \)

Theorem 1

Under our assumptions, let \(\widehat{\theta }_{n}\) be the minimum-contrast estimator defined by (9). Then:

$$\begin{aligned} \widehat{\theta }_{n} \longrightarrow \theta _{0} \quad \text{ in} \text{ probability} \text{ as} n \rightarrow \infty . \end{aligned}$$

Moreover, if \(\mathbf{Y }_i\) is geometrically ergodic (see Definition 1 in “Appendix A”), then:

$$\begin{aligned} \sqrt{n}(\widehat{\theta }_{n}-\theta _{0})\rightarrow \fancyscript{N}\left( 0,\varSigma (\theta _{0})\right) \quad \text{ in} \text{ law} \text{ as} n \rightarrow \infty . \end{aligned}$$

The following corollary gives an expression of the matrix \(\varOmega (\theta _0)\) and \(V_{\theta _0}\) of Theorem 1 for the practical implementation:

Corollary 1

Under our assumptions, the matrix \(\varOmega (\theta _0)\) is given by:

$$\begin{aligned} \varOmega (\theta _0) = \varOmega _{0}(\theta _0)+ 2\sum _{j=2}^{+\infty } \varOmega _{j-1}(\theta _0), \end{aligned}$$

where:

$$\begin{aligned} \varOmega _{0}(\theta _0)\!=\! 4\mathbb{E }\left[ Y^{2}_{2}\left( u^{*}_{\nabla _{\theta }l_{\theta }}(Y_{1})\right) ^{2}\right] \!-\!4\mathbb{E }\left[ b_{\phi _{0}}(X_1) \left( \nabla _{\theta }l_{\theta }(X_1) \right) \right] \mathbb{E }\left[ b_{\phi _{0}}(X_1) \left( \!\nabla _{\theta }l_{\theta }(X_1)\! \right) \right] ^{\prime }, \end{aligned}$$

and, the covariance terms are given by:

$$\begin{aligned} \varOmega _{j-1}(\theta _0)=4\left[ \tilde{C}_{j-1}-\mathbb{E }\left[ b_{\phi _{0}}(X_1)\left( \nabla _{\theta }l_{\theta }(X_1)\right) \right] \mathbb{E }\left[ b_{\phi _{0}}(X_1)\left( \nabla _{\theta }l_{\theta }(X_1)\right) \right] ^{\prime }\right] , \end{aligned}$$

where \(\tilde{C}_{j-1}=\mathbb{E }\left[ b_{\phi _{0}}(X_1)\left( \nabla _{\theta }l_{\theta }(X_1)\right) \left( b_{\phi _{0}}(X_j)\nabla _{\theta }l_{\theta }(X_j)\right) ^{\prime }\right] \) and the differential \(\nabla _{\theta }l_{\theta }\) is taken at point \(\theta =\theta _0\).

Furthermore, the Hessian matrix \(V_{\theta _0}\) is given by:

$$\begin{aligned} \left( \left[ V_{\theta _0}\right] _{j,k}\right) _{1\le j,k\le r}&= 2\left( \left\langle \frac{{\partial }l_{\theta }}{{\partial }\theta _k}, \frac{{\partial }l_{\theta }}{{\partial }\theta _j}\right\rangle \right) _{j,k} \text{ at} \text{ point} \theta =\theta _0\text{.} \end{aligned}$$

Let us now state the strategy of the proof, the full proof is given in “Appendix B”. Clearly, the proof of Theorem 1 relies on M-estimators properties and on the deconvolution strategy. The existence of our estimator follows from regularity properties of the function \(l_{\theta }\) and compactness argument of the parameter space, it is explained in “Appendix B.1”. The key of the proof consists in proving the asymptotic properties of our estimator. This is done by splitting the proof into two parts: we first give the consistency result in “Appendix B.2” and then give the asymptotic normality in “Appendix B.3”. Let us introduce the principal arguments:

The main idea for proving the consistency of a M-estimator comes from the following observation: if \(\mathbf P _{n}m_{\theta }\) converges to \(\mathbf P m_{\theta }\) in probability, and if the true parameter solves the limit minimization problem, then, the limit of the argminimum \(\widehat{\theta }_n\) is \(\theta _0\). By using an argument of uniform convergence in probability and by compactness of the parameter space, we show that the argminimum of the limit is the limit of the argminimum. A standard method to prove the uniform convergence is to use the Uniform Law of Large Numbers (see Lemma 1 in “Appendix A”). Combining these arguments with the dominance argument (C) give the consistency of our estimator, and then, the first part of Theorem 1.

The asymptotic normality follows essentially from Central Limit Theorem for a mixing process (see Jones 2004). Thanks to the consistency, the proof is based on a moment condition of the Jacobian vector of the function \(m_{\theta }(\mathbf{y })\) and on a local dominance condition of its Hessian matrix. To refer to likelihood results, one can see these assumptions as a moment condition of the score function and a local dominance condition of the Hessian.

2 Applications

2.1 Contrast estimator for the Gaussian AR(1) model with measurement noise

Consider the following autoregressive process AR(1) with measurement noise:

$$\begin{aligned} \left\{ \begin{array}{ll} Y_i=X_{i}+\varepsilon _{i}\\ X_{i+1}=\phi _0 X_{i}+\eta _{i+1}, \end{array} \right. \end{aligned}$$
(10)

The noises \(\varepsilon _i\) and \(\eta _i\) are supposed to be centered Gaussian randoms with variance respectively \(\sigma ^2_{\varepsilon }\) and \(\sigma ^2_0\). We assume that \(\sigma ^2_{\varepsilon }\) is known. Here, the unknown vector of parameters is \(\theta _0=(\phi _0,\sigma ^2_0)\) and for stationary and ergodic properties of the process \(X_i\), we assume that the parameter \(\phi _0\) satisfies \(|\phi _0|<1\) (see Dedecker et al. 2007). The functions \(b_{\phi }\) and \(l_{\theta }\) are defined by:

$$\begin{aligned}&b_{\phi }(x):\ (x,\theta ) \in (\mathbb{R }\times \varTheta ) \mapsto b_{\phi }(x)=\phi x,\\&l_{\theta }(x):\ (x,\theta ) \in (\mathbb{R }\times \varTheta ) \mapsto l_{\theta }(x)=b_{\phi }(x)f_{\theta }(x)=\frac{\phi }{\sqrt{2\pi \gamma ^{2}}} x \exp \left( -\frac{1}{2\gamma ^{2}}x^{2}\right) \!, \end{aligned}$$

where \(\gamma ^{2}=\frac{\sigma ^{2}}{1-\phi ^{2}}\). The vector of parameter \(\theta \) belongs to the compact subset \(\varTheta \) given by \(\varTheta = [-1+r; 1-r]\times [\sigma ^{2}_{min}; \sigma ^{2}_{max}]\) with \(\sigma ^{2}_{min}\ge \sigma _{\varepsilon }^2+\overline{r}\) where \(r,\, \overline{r},\, \sigma ^{2}_{min}\) and \(\sigma ^{2}_{max}\) are positive real constants. We consider this subset since by stationary of \(X_i\), the parameter \(|\phi |<1\) and by construction the function \(u^{*}_{l_{\theta }}\) is well defined for \(\sigma ^{2}> \sigma _{\varepsilon }^2(1-\phi ^2)\) with \(\phi \in [-1+r; 1-r]\) which is implied by \(\sigma ^{2}>\sigma _{\varepsilon }^2\). The contrast estimator defined in (1.1) has the following form:

$$\begin{aligned} \widehat{\theta }_n= \arg \min _{\theta \in \varTheta }\left\{ \frac{\phi ^{2}\gamma }{4\sqrt{\pi }}-\sqrt{\frac{2}{\pi }}\frac{\phi \gamma ^{2}}{n(\gamma ^{2}-\sigma ^{2}_{\epsilon })^{3/2}}\sum _{j=1}^{n} Y_{j+1} Y_{j}\exp \left( -\frac{1}{2} \frac{Y^{2}_{j}}{(\gamma ^{2}-\sigma ^{2}_{\epsilon })}\right) \right\} \nonumber \\ \end{aligned}$$
(11)

with \(n\) the number of observations. Theorem 1 applies for \(\theta _0=(0.7, 0.3)\) and the corresponding result for the Gaussian AR(1) model is given in “Appendix C.1”. As we already mentioned, Corollary 1 allows to compute confidence intervals: For all \(i=1,2\):

$$\begin{aligned} \mathbb{P }\left( \hat{\theta }_{n,i}-z_{1-\alpha /2}\sqrt{\frac{\mathbf{e }_{i}^{\prime }\varSigma (\hat{\theta }_{n})\mathbf{e }_{i}}{n}}\le \theta _{0,i}\le \hat{\theta }_{n,i}+z_{1-\alpha /2}\sqrt{\frac{\mathbf{e }_{i}^{\prime }\varSigma (\hat{\theta }_{n})\mathbf{e }_{i}}{n}}\right) \rightarrow 1-\alpha , \end{aligned}$$

as \(n \rightarrow \infty \) where \(z_{1-\alpha /2}\) is the \(1-\alpha /2\) quantile of the Gaussian law, \(\theta _{0,i}\) is the \(i{\mathrm{th}}\) coordinate of \(\theta _0\) and \(\mathbf{e }_{i}\) is the \(i{\mathrm{th}}\) coordinate of the vector of the canonical basis of \(\mathbb{R }^2\). The covariance matrix \(\varSigma (\hat{\theta }_{n})\) is computed in Lemma 3 in “Appendix C.1.3”.

2.2 Contrast estimator for the SV model

We consider the following SV model:

$$\begin{aligned} \left\{ \begin{array}{ll} R_{i+1}=\exp \left( \frac{X_{i+1}}{2}\right) \xi _{i+1},\\ X_{i+1}=\phi _{0}X_{i}+\eta _{i+1}, \end{array} \right. \end{aligned}$$
(12)

The noises \(\xi _{i+1}\) and \(\eta _{i+1} \) are two centered Gaussian random variables with standard variance \(\sigma _{\xi }^{2}\) assumed to be known and \(\sigma ^{2}_{0}\). We assume that \(|\phi _0|<1\) and we refer the reader to Genon-Catalot et al. (2000) for the mixing properties of this model.

By applying a log-transformation \(Y_{i+1}=\log (R^{2}_{i+1})-\mathbb{E }[\log (\xi ^{2}_{i+1})]\) and \(\varepsilon _{i+1}=\log (\xi ^{2}_{i+1})-\mathbb{E }[\log (\xi ^{2}_{i+1})]\), the log-transform SV model is given by:

$$\begin{aligned} \left\{ \begin{array}{ll} Y_{i+1}=X_{i+1}+\varepsilon _{i+1}\\ X_{i+1}=\phi _{0}X_{i}+\eta _{i+1}, \end{array} \right. \end{aligned}$$
(13)

The Fourier transform of the noise \(\varepsilon _{i+1}\) is given by:

$$\begin{aligned} f^{*}_{\varepsilon }(x)=\frac{1}{\sqrt{\pi }}2^{ix}\varGamma \left( \frac{1}{2}+ix\right) e^{-i\fancyscript{E}x} \end{aligned}$$

where \(\fancyscript{E}=\mathbb{E }[\log (\xi ^{2}_{i+1})]=-1.27\) and \(\mathbb{V }ar[\log (\xi ^{2}_{i+1})]\) = \(\sigma ^{2}_{\varepsilon }=\frac{\pi ^2}{2}\). Here, \(\varGamma \) represents the gamma function given by:

$$\begin{aligned} \varGamma : u\rightarrow \int \limits _{0}^{+\infty }t^{u-1}e^{-t}dt \quad \forall u \in \mathbb C \text{ such} \text{ that} \fancyscript{R}_{e}(u)>0. \end{aligned}$$

The vector of parameters \(\theta =(\phi ,\sigma ^2)\) belongs to the compact subset \(\varTheta \) given by \([-1+r; 1-r]\times [ \sigma ^{2}_{min} ; \sigma ^{2}_{max}]\) with \(r,\, \sigma ^{2}_{min}\) and \(\sigma ^{2}_{max}\) positive real constants.

Our contrast estimator (1.1) is given by:

$$\begin{aligned} \widehat{\theta }_n= \arg \min _{\theta \in \varTheta }\left\{ \frac{\phi ^{2}\gamma }{4\sqrt{\pi }}-\frac{2}{n}\sum _{i=1}^{n}Y_{i+1}u^{*}_{l_{\theta }}(Y_i)\right\} , \end{aligned}$$
(14)

with \(u_{l_{\theta }}(y)=\frac{1}{2\sqrt{\pi }}\left( \frac{-i\phi y\gamma ^2\exp \left( \frac{-y^2}{2}\gamma ^2\right) }{\exp \left( -i\fancyscript{E}y\right) 2^{i y}\varGamma \left( \frac{1}{2}+i y\right) }\right) \).

Theorem 1 applies for \(\theta _0=(0.7, 0.3)\) and by Slutsky’s Lemma we also obtain confidence intervals. We refer the reader to “Appendix C.2” for the proof.

2.3 Comparison with the others methods

2.3.1 QML estimator

For the SV model, the QML estimator, proposed independently by Harvey et al. 1994 is based on the log-transform model given in (13). Making as if the \(\varepsilon _i\) were Gaussian in the log-transform of the SV model, the Kalman filter Kalman (1960) can be applied in order to obtain the Quasi-Maximum Likelihood function of \(Y_{1:n}=(Y_{1},\ldots , Y_n)\) where \(n\) is the sample data length. For the AR(1) and the log-transform of the SV model, the log-likelihood \(l(\theta )\) is given by:

$$\begin{aligned} l(\theta )=\log f_{\theta }(Y_{1:n})=-\frac{n}{2}\log (2\pi )-\frac{1}{2}\sum _{i=1}^{n}\log F_i-\frac{1}{2}\sum _{i=1}^{n}\frac{\nu _{i}^{2}}{F_i}, \end{aligned}$$

where \(\nu _{i}\) is the one-step ahead prediction error for \(Y_i\), and \(F_i\) is the corresponding mean square error. More precisely, the two quantities are given by:

$$\begin{aligned} \nu _i=(Y_i-\hat{Y}_i^{-}) \quad \text{ and} \quad F_i=\mathbb{V }ar_{\theta }[\nu _i]=P_i^{-}+\sigma ^{2}_{\varepsilon }, \end{aligned}$$

where \(\hat{Y}_i^{-}=\mathbb{E }_{\theta }[Y_i| Y_{1:i-1}]\) is the one-step ahead prediction for \(Y_i\) and \(P_i^{-}=\mathbb{V }ar_{\theta }[(X_i-\hat{X}_{i}^{-})^{2}]\) is the one-step ahead error variance for \(X_i\).

Hence, the associated estimator of \(\theta _0\) is defined as a solution of:

$$\begin{aligned} \hat{\theta }_n=\arg \max _{\theta \in \varTheta }l(\theta ). \end{aligned}$$

Note that this procedure can be inefficient: the method does not rely on the exact likelihood of the \(Z_{1:n}\) and approximating the true log-chi-square density by a normal density can be rather inappropriate (see Fig. 1 below).

Fig. 1
figure 1

Approximation of the \(\log \)-chi-square density (red) by a Gaussian density with mean \(\fancyscript{E}=-1.27\) and variance \(\sigma ^{2}_{\varepsilon }=\frac{\pi ^2}{2}\) (black) (color figure online)

2.3.2 Particle filters estimators: bootstrap, APF and KSAPF

For the particle filters, the vector of parameters \(\theta =(\phi ,\sigma ^2)\) is supposed random obeying the prior distribution assumed to be known. We propose to use the Kitagawa and al. approach (2001 chapter 10 p. 189) in which the parameters are supposed time-varying: \(\theta _{i+1}=\theta _{i}+\fancyscript{G}_{i+1}\) where \(\fancyscript{G}_{i+1}\) is a centered Gaussian random with a variance matrix \(Q\) supposed to be known. Now, we consider the augmented state vector \(\tilde{X}_{i+1}=(X_{i+1}, \theta _{i+1})^{\prime }\) where \(X_{i+1}\) is the hidden state variable and \(\theta _{i+1}\) the unknown vector of parameters. In this paragraph, we use the terminology of the particle filtering method, that is: we call particle a random variable. The sequential particle estimation of the vector \(\tilde{X}_{i+1}\) consists in a combined estimation of \(X_{i+1}\) and \(\theta _{i+1}\). For initialisation the distribution of \(X_1\) Footnote 1 conditionally to \(\theta _1\) is given by the stationary density \(f_{\theta _1}\).

For the comparison with our contrast estimator (1.1), we use the three methods: the Bootstrap filter, the Auxiliary Particle filter (APF) and the Kernel Smoothing Auxiliary Particle filter (KSAPF). We refer the reader to Doucet et al. (2001), Pitt and Shephard (1999) and Liu and West (2001) for a complete revue of these methods.

Remark 3

Let us underline some particularity of the combined state and parameters estimation: For the Bootstrap and APF estimator, an important issue concerns the choice of the parameter variance \(Q\) since the parameter is itself unobservable. If one can choose an optimal variance \(Q\) the APF estimator could be a very good estimator since with arbitrary variance the results are acceptable (see Table 4). In practice, Q is chosen by an empirical optimization. The KSAPF is an enhanced version of the APF and depends on a smooth factor \(0<h<1\) (see Liu and West 2001). Therefore, the choice of \(h\) is another problem in practice.

A common approach to estimate the vector of parameters is to maximize the likelihood. Nevertheless, for state space models, the main difficulty with the Maximum Likelihood Estimator (MLE) MLE, Maximum Likelihood Estimator comes from the unobservable character of the state \(x_t\) making the calculus of the likelihood untractable in practice: the likelihood is only available in the form of a multiple integral, so exact likelihood methods require simulations and have therefore an intensive computational cost. In many cases, the MLE has to be approximated. A popular approach to approximate it consists in using MCMC simulation techniques (see Smith and Roberts 1993; Cappé et al. 2005b). Another approach to approximate the likelihood consists in using particles filtering algorithms. Recently, in Rue et al. (2009) the authors propose an approach of Integrated Nested Laplace Approximations to obtain approximations of the likelihood.

In Chopin et al. (2011) the authors propose a sequential \(SMC^{2}\) algorithm which allows an efficient approximation of the complete distribution \(p(x_{0:t}, \theta \vert y_{1:t})\). Their approach is an extension of the Iterated Batch Importance Sampling (IBIS) proposed in Chopin (2002). In Andrieu et al. (2010) the authors develop a general algorithm which is a MCMC algorithm that uses the particles filter to approximate the intractable density \(p_{\theta }(y_{1:n})\) combined with a MCMC step that samples from \(p(\theta \vert y_{1:n})\). They show that their PMCMC algorithm admits as stationary density the distribution of interest \(p(x_{0:t}, \theta \vert y_{1:t})\). There exist others methods and we refer the reader to Johansen et al. (2008); Poyiadjis et al. (2011) for more details.

2.4 A simulation study

For the AR(1) and SV model, we sample the trajectory of the \(X_i\) with the parameters \(\phi _0=0.7\) and \(\sigma _0^{2}=0.3\). Conditionally to the trajectory, we sample the variables \(Y_i\) for \(i=1\cdots n\) where \(n\) represents the number of observations. We take \(n=1000\) and \(\sigma ^{2}_{\varepsilon }=0.1\) for the two models. This means that we consider the following model:

$$\begin{aligned} \left\{ \begin{array}{ll} R_{i+1}=\exp \left( \frac{X_{i+1}}{2}\right) \xi _{i+1}^{\beta },\\ X_{i+1}=\phi _{0}X_{i}+\eta _{i+1}, \end{array} \right. \end{aligned}$$

with \(\beta =\frac{1}{\sqrt{5}\pi }\). In this case, the Fourier transform of \(\varepsilon _{i+1}\) is given by: \(f^{*}_{\varepsilon }(y)=\exp \bigl (-i\tilde{\fancyscript{E}}y\bigr )\frac{2^{i\beta y}}{\sqrt{\pi }}\varGamma \left( \frac{1}{2}+i\beta y\right) \) with \(\tilde{\fancyscript{E}}=\beta \fancyscript{E}\)(see “Appendix C.2”).

For the three methods, we take a number of particles \(M\) equal to 5000. Note that for the Bayesian procedure (Bootstrap, APF and KSAPF), we need a prior on \(\theta \), and this only at the first step. The prior for \(\theta _1\) is taken to be the Uniform law and conditionally to \(\theta _1\) the distribution of \(X_1\) is the stationary law:

$$\begin{aligned} \left\{ \begin{array}{ll} p(\theta _1)=\fancyscript{U}(0.5, 0.9)\times \fancyscript{U}(0.1, 0.4)\\ f_{\theta _1}(X_1)=\fancyscript{N}\left( 0, \frac{\sigma ^{2}_{1}}{1-\phi _1^2}\right) \end{array} \right. \end{aligned}$$

We take \(h=0.1\) for the KSAPF and \(Q=\begin{pmatrix} 0.6.10^{-6} &{} 0\\ 0 &{} 0.1.10^{-6} \end{pmatrix}\) for the APF and Bootstrap filter.

Remark 4

Note that, in practice, there is no constraint on the parameters for the contrast function contrary to the particle filters where we take the stationary law for \(p_{\theta }(X_0)\) and the Uniform law around the true parameters. Hence, we bias favourably the particle filters.

2.5 Numerical results

In the numerical section we compare the different estimations: the QML estimator defined in Sect. 2.3.1, the Bayesian estimators defined in Sect. 2.3.2 and our contrast estimator defined in Sect. 1.1. For the comparison of the computing time, we also compare our contrast estimator with the SIEMLE proposed by Kim et al. [see “Appendix D.1” and Kim and Shephard (1994) for more general details].

2.5.1 Computing time

From a theoretical point of view, the MLE is asymptotically efficient. However, in practice since the states \((X_{1}\ldots , X_{n})\) are unobservable and the SV model is non Gaussian, the likelihood is untractable. We have to use numerical methods to approximate it. In this section, we illustrate the SIEMLE which consists in approximating the likelihood and applying the Expectation-Maximisation algorithm introduced by Dempster (1977) to find the parameter \(\theta \).

To illustrate the SIEMLE for the SV model, we run an estimator with a number of observations \(n\) equal to 1000. Although the estimation is good the computing time is very long compared with the others methods (see Tables 1 and 2). This result illustrates the numerical complexity of the SIEMLE (see “Appendix D.1”). Therefore, in the following, we only compare our contrast estimator with the QML and Bayesian estimators. The results are illustrated by Fig. 1. We can see that our contrast estimator is the fastest for the Gaussian AR(1) model. The QML is the most rapid for the SV model since it assumes that the measurement errors are Gaussian but we show in Figs. 2, 3 and 4 that it is a biased estimator with large mean square error. For our algorithm, for the Gaussian AR(1) model, the function \(u^{*}_{l_{\theta }}\) has an explicit expression but for the SV model, the function \(u^{*}_{l_{\theta }}\) is approximated numerically since the Fourier transform of the function \(u_{l_{\theta }}\) has not an explicit form. This explains why our algorithm is slower on the SV model than on the Gaussian AR(1) model.Footnote 2 In spite of this approximation, our contrast estimator is fast and its implementation is straightforward.

Fig. 2
figure 2

Boxplot of \(\phi \). True value: \(\phi _0=0.7\). Top panel Gaussian AR(1) model. Bottom panel SV model

Fig. 3
figure 3

Boxplot of \(\sigma ^2\). True value: \(\sigma ^{2}_0=0.3\). Left Gaussian AR(1) model. Right SV model

Fig. 4
figure 4

MSE computed by Eq. (15). Top panel Gaussian AR(1) model. Bottom panel SV model

2.5.2 Parameter estimates

For the AR(1) Gaussian model, we run \(N=1{,}000\) estimates for each method (QML, APF, KSAPF and Bootsrap filter) and \(N=500\) for the SV model. The number of observations \(n\) is equal to \(1{,}000\) for the two models.

In order to compare with others the performance of our estimator, we compute for each method the Mean Square Error (MSE) defined by:

$$\begin{aligned} MSE=\frac{1}{N}\left( \sum _{j=1}^{N}(\hat{\phi }_{j}-\phi _0)^{2}+(\hat{\sigma }^{2}_{j}-\sigma ^{2}_0)^{2}\right) , \end{aligned}$$
(15)

We illustrate by boxplots the different estimates (see Figs. 2, 3). We also illustrate in Fig. 4 the MSE for each estimator computed by Eq. (15). We can see that, for the parameter \(\phi _0\), the QML estimator is better for the Gaussian AR(1) model than for the SV model (see Fig. 2). Indeed, the Gaussianity assumption is wrong for the SV model. Moreover, the estimate of \(\sigma ^{2}_0\) by QML is very bad for the two models (see Fig. 3) and its corresponding boxplots have the largest dispersion meaning that the QML method is not very stable. The Bootstrap, APF and KSAPF have also a large dispersion of their boxplots, in particular for the parameter \(\phi _0\) (see Fig. 2). Besides, the Booststrap filter is less efficient than the APF and KSAPF. For the Gaussian and SV model, the boxplots of our contrast estimator show that our estimator is the most stable with respect to \(\phi _0\) and we obtain similar results for \(\sigma ^{2}_0\). The MSE is better for the SV model and the smallest for our contrast estimator.

2.5.3 Confidence interval of the contrast estimator

To illustrate the statistical properties of our contrast estimator, we compute for each model the confidence intervals computed with the confidence level \(1-\alpha \) equal to \(0.95\) for \(N=1\) estimator and the coverages for \(N=1{,}000\) with respect to the number of observations. The coverage corresponds to the number of times for which the true parameter \(\theta _{0,i}, i=1,2\) belongs to the confidence interval. The results are illustrated by the Figs. 5, 6 and 7: for the Gaussian and SV models, the coverage converges to \(95\,\%\) for a small number of observations. As expected, the confidence interval decreases with the number of observations. Note that of course a MLE confidence interval would be smaller since the MLE is efficient but the corresponding computing time would be huge.

Fig. 5
figure 5

Coverage with respect to the number of observations \(n=100\) up to \(5{,}000\) for \(N=1{,}000\) estimators. Top panel Gaussian AR(1) model. Bottom panel SV model (color figure online)

Fig. 6
figure 6

Confidence interval for the parameter \(\phi _0\) with respect to the number of observations \(n=100\) up to 5,000 for \(N=1\) estimator. Top panel Gaussian AR(1) model. Bottom panel SV model

Fig. 7
figure 7

Confidence interval for the parameter \(\sigma ^2_0\) with respect to the number of observations \(n=100\) up to \(5{,}000\) for \(N=1\) estimator. Top panel Gaussian AR(1) model. Bottom panel SV model

2.6 Application to real data

The data consist of daily observations on FTSE stock price index and S&P500 stock price index. The series taken in boursorama.com are closing prices from January, 3, 2004 to January, 2,2007 for the FTSE and S&P500 leaving a sample of 759 observations for the two series.

The daily prices \(S_i\) are transformed into compounded rates returns centered around their sample mean \(c\) for self-normalization (see Mathieu and Schotman 1998; Ghysels et al. 1996) \(R_i=100\times \log \left( \frac{S_i}{S_{i-1}}\right) -c\). We want to model those data by the SV model defined in (13) leading to:

$$\begin{aligned} Y_i&= \log (R_i^2)-\mathbb{E }[\log (\xi _i^2)]\\&= \log (R_i^2)+1.27 \end{aligned}$$

Those data are represented on Fig. [8].

Fig. 8
figure 8

Top left panel Graph of \(Y_i= \text{ FTSE}\). Top right panel Graph of \(Y_i= \text{ SP}500\). Bottom left panel Autocorrelation of \(Y_i=\text{ FTSE}\). Bottom right panel Autocorrelation of \(Y_i=\text{ SP}500\)

2.6.1 Parameter estimates

In the empirical analysis, we compare the QML, the Bootstrap filter, the APF and the KSAPF estimators. The last one is our contrast estimator. The variance of the measurement noise is \(\sigma ^2_{\varepsilon }=\frac{\pi ^2}{2}\), that is \(\beta \) is equal to \(1\) (see Sect. 2.4). Table 3 summarises the parameter estimates and the computing time for the five methods. For initialization of the Bayesian procedure, we take the Uniform law for the parameters \(p(\theta _1)=\fancyscript{U}(0.4, 0.95)\times \fancyscript{U}(0.1, 0.5)\) and the stationary law for the log-volatility process \(X_1\), i.e, \(f_{\theta _1}(X_1)=\fancyscript{N}\biggl (0, \frac{\sigma ^{2}_{1}}{1-\phi _1^2}\biggr )\).

Table 3 Parameter estimates: \(n=1{,}000\) and the number of particles \(M=5{,}000\) for the particle filters

The estimates of \(\phi \) are in full accordance with results reported in previous studies of SV models. This parameter is in general close to 1 which implies persistent logarithmic volatility data. We compute the corresponding confidence intervals at level \(5\,\%\) (see Table 4). For the SP500 and the FTSE, note that the Bootstrap filter and the QML are not in the confidence interval for the two parameters \(\phi \) and \(\sigma ^2\). These results are consistent with the simulations where we showed that both methods were biased for the SV model (see Sect. 2.5.2). Note also that as expected the computing time for the QML is the shortest because it assumes Gaussianity which is probably not the case here. Except of QML, the contrast is the fastest method. The results are presented in Table 3 below.

Table 4 Confidence interval at level \(5\,\%\)

2.7 Summary and conclusions

In this paper we propose a new method to estimate an hidden stochastic model on the form (1). This method is based on the deconvolution strategy and leads to a consistent and asymptotically normal estimator. We empirically study the performance of our estimator for the Gaussian AR(1) model and SV model and we are able to construct a confidence interval (see Figs. 6, 7). As the boxplots [2] and [3] show, only the Contrast, the APF, and the KSAPF estimators are comparable. Indeed the QML and the Bootstrap Filter estimators are biased and their MSE are bad, and in particular, the QML method is the worst estimator (see Fig. 4). One can see that the QML estimator proposed by Harvey et al. is not suitable for the SV model because the approximation of the log-chi-square density by the Gaussian density is not robust (see Fig. 1). Furthermore, if we compare the MSE of the three Sequential Bayesian estimation, the KSAPF estimator is the best method. From a Bayesian point of view, it is known that the Bootstrap filter is less efficient than the APF and KSAPF filter since by using the density transition as the importance density, the propagation step of the particles will be made without taking care the observations (see Doucet et al. 2001).

Among the three estimators (Contrast, APF, and KSAPF) which give good results our estimator outperforms the others in a MSE aspect (see Fig. 4). Moreover, as we already mentioned, in the combined state and parameters estimation the difficulties are the choice of \(Q,\, h\) and the prior law since the results depend on these choices. In the numerical section, we have used the stationary law for the variable \(X_1\) and this choice yields good results but we expect that the behavior of the Bayesian estimation will be worse for another prior. The implementation of the contrast estimator is the easiest and it leads to confidence intervals with a larger variance than the SIEMLE but at a smaller computing cost, in particular for the AR(1) Gaussian model (see Table 1). Furthermore, the contrast estimator does not require an arbitrary choice of parameter in practice.