Abstract
We study a new parametric approach for particular hidden stochastic models. This method is based on contrast minimization and deconvolution and can be applied, for example, for ecological and financial state space models. After proving consistency and asymptotic normality of the estimation leading to asymptotic confidence intervals, we provide a thorough numerical study, which compares most of the classical methods that are used in practice (Quasi-Maximum Likelihood estimator, Simulated Expectation Maximization Likelihood estimator and Bayesian estimators) to estimate the Stochastic Volatility model. We prove that our estimator clearly outperforms the Maximum Likelihood Estimator in term of computing time, but also most of the other methods. We also show that this contrast method is the most robust with respect to non Gaussianity of the error and also does not need any tuning parameter.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
This paper is concerned with the particular hidden stochastic model:
where \((\varepsilon _{i})_{i\ge 1}\) and \((\eta _{i})_{i\ge 1}\) are two independent sequences of independent and identically distributed (i.i.d) centered random variables with variance \(\sigma ^{2}_{\epsilon }\) and \(\sigma ^{2}_{0}\). It is assumed that the variance \(\sigma ^{2}_{\epsilon }\) is known. The terminology hidden comes from the unobservable character of the process \((X_i)_{i\ge 1}\) since the only available observations are \(Y_{1},\ldots , Y_n\).
The dynamics of the process \(X_i\) is described by a measurable function \(b_{\phi _0}\) which depends on an unknown parameter \(\phi _0\) and by a sequence of i.i.d centered random variables with unknown variance \(\sigma _0^2\). We denote by \(\theta _0\) the vector of parameters governing the process \(X_i\) and suppose that the model is correctly specified: that is, \(\theta _0\) belongs to the parameter space \(\varTheta \subset \mathbb{R }^{r}\), with \(r \in \mathbb N ^{*}\).
Inference in hidden Markov models is a real challenge and has been studied by many authors (see Cappé et al. 2005a; Doucet et al. 2001; Robert et al. 2000). Chanda provided in (1995) an asymptotically normal estimator for the vector of parameters \(\theta _0\) by using modified Yule Walker equation but it assumes that the function \(b_{\phi _0}\) is linear in \(\phi _0\) and \(X_i\), so the model (1) is reduced to an autoregressive model with measurement error.
Recently, in Douc et al. (2011), the authors propose an efficient estimator of the vector of parameters \(\theta _0\) for nonlinear function \(b_{\phi _0}\). They prove that their Maximisation Likelihood Estimator (MLE) is consistent and asymptotically normal. The main difficulty with their approach comes from the unobservable character of the process \(X_i\) making the calculus of the exact likelihood intractable in practice: the likelihood is only available in the form of a multiple integral, so exact likelihood methods require simulations and have therefore an intensive computational cost. In many case, the MLE has to be approximated. A popular approach to approximate the MLE consists in using Monte Carlo Markov Chain (MCMC) simulation techniques. Thanks to the development of these methods, the MLE has known a huge progress and Bayesian estimations have received more attention (see Smith and Roberts 1993). Another method for performing the MLE consists in using the Expectation-Maximization (EM) algorithm proposed by Dempster et al. (1977). Nevertheless, since \(X_i\) is unobservable, this method requires to introduce a MCMC in the Expectation step. Although these methods are used in practice, they are expensive from a computational point of view.
Some authors have proposed Sequential Monte Carlo algorithms (SMC) known as Particles Filtering methods which allow to approximate the likelikood. The computational cost is reduced by a recursive construction. We refer to the book of Doucet et al. (2001) and Cappé et al. (2005a) for a complete review of these methods.
Particle Markov Chain Monte Carlo (PMCMC) is another method for estimating the model (1). This method combines Particles filtering methods and MCMC methods to estimate the vector of parameters \(\theta _0\). From a computational point of view, this approach is expensive and we refer the reader to Andrieu et al. (2010) for more details. In Peters et al. (2010), they propose an adaptive PMCMC method to estimate ecological hidden stochastic models.
We propose here an approach based on M-estimation: It consists in the optimisation of a well-chosen contrast function (see Van der Vaart 1998, chapter p. 41 for a partial review) and deconvolution strategy. The deconvolution problem is encountered in many statistical situations where the observations are collected with random errors. In this approach, a method for estimating the parameter \(\phi _0\) has been proposed by Comte and Taupin (2001). Their procedure of estimation is based on a modified least squared minimization. In the same perspective, Dedecker et al. in (2011) propose also the same procedure of estimation based on a weighted least squared estimation: Their assumptions on the process \(X_i\) are less restrictive than those proposed by F. Comte and M. Taupin and they provide consistent estimation of the parameter \(\phi _0\) with a parametric rate of convergence in a very general framework. Their general estimator is based on the introduction of a kernel deconvolution density and depends on the choice of a weight function.
The approach proposed here is different: it is not based on a weighted least squared estimation so that the choice of the weight function is not encountered in this paper. Moreover, it allows to estimate both the parameters \(\phi _0\) and \(\sigma ^{2}_0\). Our principle of estimation relies on the Nadaraya–Watson strategy and is proposed by Comte et al. in (2011) in a non parametric case to estimate the function \(b_{\phi }\) as a ratio of an estimate of \(l_{\theta }=b_{\phi }f_{\theta }\) and an estimate of \(f_{\theta }\), where \(f_{\theta }\) represents the invariant density of the \(X_i\). We propose to adapt their approach in a parametric context and suppose that the form of the stationary density \(f_{\theta _0}\) is known up to some unknown parameter \(\theta _0\). Our work is purely parametric but we go further in this direction by proposing an analytical expression of the asymptotic variance matrix \(\varSigma (\hat{\theta }_{n})\) which allows to construct confidence interval. Furthermore, this approach is much less greedy from a computational point of view than the MLE and its implementation is straighforward.
Applications: Applications include epidemiology, meterology, neuroscience, ecology (see Ionides et al. 2011) and finance (see Johannes et al. 2009). For example, our approach can be applied to the five ecological state space models described in Peters et al. (2010). Although the scope of our method is general, we have chosen to focus on the so-called discrete time Stochastic Volatility model (SV) introduced by Taylor (2005). We also investigate the behavior of our method on the simpler autoregressive process AR(1) with measurement noise which has been widely studied and on which our method can be more easily understood and compared with other ones. Our procedure allows to estimate the parameters of a large class of discrete Stochastic Volatility models (ARCH-E model, Vasicek model, Merton model..), which is a real challenge in financial application.
(i) Gaussian Autoregressive AR(1) with measurement noise: It has the following form:
where \(\varepsilon _{i+1}\) and \(\eta _{i+1}\) are two centered Gaussian random variables with variance \(\sigma _{\epsilon }^{2}\) assumed to be known and \(\sigma ^{2}_{0}\) assumed to be unknown. Additionally, we assume that \(|\phi _0|<1\) which implies the stationary and ergodic property of the process \(X_i\) (see Dedecker et al. 2007).
(ii) SV model: It is directly connected to the type of diffusion process used in asset-pricing theory (see Melino and Turnbull 1990):
where \(\xi _{i+1}\) and \(\eta _{i+1} \) are two centered Gaussian random variables with variance \(\sigma _{\xi }^{2}\) assumed to be known and equal to one and \(\sigma ^{2}_{0}\) assumed to be unknown. The variables \(R_{i+1}\) represent the returns and \(X_{i+1}\) is the log-volatility process.
By applying a log-transformation \(Y_{i+1}=\log (R^{2}_{i+1})-\mathbb{E }[\log (\xi ^{2}_{i+1})]\) and \(\varepsilon _{i+1}=\log (\xi ^{2}_{i+1})-\mathbb{E }[\log (\xi ^{2}_{i+1})]\), the SV model is a particular version of (1). We assume that \(|\phi _0|<1\) and we refer the reader to Genon-Catalot et al. (2000) for the mixing properties of stochastic volatility models.
Most of the computational problems stem from the assumptions that the innovation of the returns are Gaussian which translates into a logarithmic chi-square distribution when the model (12) is transformed in a linear state space model. Many authors have ignored it in their implementation and many authors use some mixture of Gaussian to approximate the log-chi-square density. For example, in the Quasi-Maximum Likelihood (QML) method implemented by Jacquier et al. (2002) and in the Simulated Expectation-Maximization Likelihood estimator proposed (SIEMLE) by Kim et al. (1994) they used a mixture of Gaussian distribution to approximate the log-chi-square distribution. Harvey (1994) used the Kalman filter to estimate the likelihood of the transform state space model, hence the model was also assumed to be Gaussian.
Organization of the paper: The first purpose of the paper is to present our estimator and its statistical properties in Sect. 1.1: Under weak assumptions, we show that it is a consistent and asymptotically normal estimator (Table 1).
The second purpose of this paper consists in comparing our contrast estimator with different estimations: the QML, the SIEMLE and Bayesian estimators. Section 2 contains the numerical study: In Sect. 2.4 we give the parameter estimates and the comparison with others ones for simulation data and Sect. 2.6 contains the study on real data. We compare our contrast estimator with other ones on the SP&500 and FTSE index. From a computational point of view, we show that the implementation of our estimator is straightforward and it is faster than the SIEMLE (see Table 2 in Sect. 2.5.1). Besides, we show that our estimator outperforms the QML and Bayesian estimators.
Notations: We denote by: \(u^{*}(t)=\int _{}^{}e^{itx}u(x)dx\) the Fourier transform of the function \(u(x)\) and \(\left\langle u,v \right\rangle =\int _{}^{}u(x)\overline{v}(x)dx\) with \(v\overline{v}=|v|^{2}\). We write \(||u||_{2}=\left( \int _{}^{} |u(x)|^{2} dx\right) ^{1/2}\) the norm of \(u(x)\) on the space of functions \(\mathbb L ^{2}(\mathbb{R })\). By property of the Fourier transform, we have \((u^{*})^{*}(x)=2\pi u(-x)\) and \(\left\langle u_1,u_2\right\rangle =\frac{1}{2\pi }\left\langle u^{*}_1,u^{*}_2\right\rangle \). The vector of the partial derivatives of \(f\) with respect to (w.r.t) \(\theta \) is denoted by \(\nabla _{\theta }f\) and the Hessian matrix of \(f\) w.r.t \(\theta \) is denoted by \(\nabla ^{2}_{\theta }f\). The Euclidean norm matrix, that is, the square root of the sum of the squares of all its elements will be written by \(\left\| {A} \right\| \). We denote by \(\mathbf{Y }_i\) the pair \((Y_i,Y_{i+1})\) and \(\mathbf{y }_{i}=(y_{i},y_{i+1})\) is a given realisation of \(\mathbf{Y }_i\).
In the following, \(\mathbb P , \mathbb{E }, \mathbb{V }ar\) and \({\mathbb{C }ov}\) denote respectively the probability \(\mathbb P _{\theta _0}\), the expected value \(\mathbb{E }_{\theta _0}\), the variance \(\mathbb{V }ar_{\theta _0}\) and the covariance \({\mathbb{C }ov}_{\theta _0}\) when the true parameter is \(\theta _0\). Additionally, we write \(\mathbf{P }_n\) (resp. \(\mathbf{P }\)) the empirical expectation (resp. theoretical), that is: for any stochastic variable \(X\): \(\mathbf P _{n}(X) = \frac{1}{n}\sum _{i=1}^{n} X_i\) (resp. \(\mathbf P (X)=\mathbb{E }[X]\)).
1.1 Procedure: contrast estimator
Hereafter, we propose explicit estimators of the parameter \(\theta _0\). This estimator called the contrast estimator is based on minimization of suitable functions of the observations usually called “contrasts functions”. We refer the reader to Van der Vaart (1998) for a general account on this notion. Furthermore, in this part, we use the contrast function proposed by Comte et al. (2010), that is:
with \(n\) the number of observations and:
where the function \(l_{\theta }\) and \(u_{v}\) are given by:
with \(f_{\theta }\) the invariant density of \(X_i\).
Some assumptions. As our procedure relies on the Fourier deconvolution strategy, in order to construct our estimator, we assume that the density of the noise \(\varepsilon _i\), denoted by \(f_\varepsilon \), is fully known and belongs to \(\mathbb L _2(\mathbb R )\), and for all \(x \in \mathbb R \, f^{*}_{\varepsilon }(x)\ne 0\). Furthermore, we assume that the function \(l_{\theta }\) belongs to \(\mathbb L _1(\mathbb R )\cap \mathbb L _2(\mathbb R )\). The function \(u_{l_{\theta }}\) must be integrable.
For the statistical study, the key assumption is that the process \((X_i)_{i \ge 1}\) is stationary and ergodic (see Genon-Catalot et al. 2000 for a definition).
Remark 1
In this paper we consider the situation in which the observation noise variance is known. This assumption which is not in general the case in practice is necessary for the identifiability of the model and so is a standard assumption for state-space models given in (1).
There is some restrictions on the distribution of the observation and process errors in the Nadaraya-Watson approach. It is known that the rate of convergence for estimating the function \(l_{\theta }\) is related to the rate of decrease of \(f^{*}_{\varepsilon }\). In particular, the smoother \(f_{\varepsilon }\), the slower the rate of convergence for estimating is (The Gaussian, log-chi squared or Cauchy distributions are super-smooth. The Laplace distribution is ordinary smooth). This rate of convergence can be improved by assuming some additional regularity conditions on \(l_{\theta }\). There is a good discussion about this subject in Comte et al. (2010) and Dedecker et al. (2011).
The procedure Let us explain the choice of the contrast function and how the strategy of deconvolution works. Obviously, as the model (1) shows, the \(\mathbf{Y }_i\) are not i.i.d. However, by assumption, they are stationary ergodic, so the convergence of \(\mathbf P _{n}m_{\theta }\) to \(\mathbf P m_{\theta }=\mathbb{E }\left[ m_{\theta }(\mathbf{Y }_{1})\right] \) as \(n\) tends to the infinity is provided by the Ergodic Theorem. Moreover, the limit \(\mathbb{E }\left[ m_{\theta }(\mathbf{Y }_{1})\right] \) of the contrast function can be explicitly computed:
By Eq. (1) and under the independence assumptions of the noise \((\varepsilon _2)\) and \((\eta _2)\), we have:
Using Fubini’s Theorem and Eq. (1), we obtain:
Then,
Under the uniqueness assumption (CT) given just later this quantity is minimal when \(\theta \) = \(\theta _{0}\). Hence, the associated minimum-contrast estimators \(\widehat{\theta }_n\) is defined as any solution of:
Remark 2
One can see in the deconvolution strategy described in Eq. (6) that temporal correlation in the observation or latent process errors can be authorized. The procedure still be applicable but the covariance matrix \(\varOmega _{j-1}(\theta _0)\) for the CLT has not an analytic expression in this case since the use of the Fourier deconvolution approach does not work.
We refer the reader to Dedecker et al. (2007) for the proof that if \(X_i\) is an ergodic process then the process \(Y_i\), which is the sum of an ergodic process with an i.i.d. noise, is again stationary ergodic. Furthermore, by the definition of an ergodic process, if \(Y_i\) is an ergodic process then the couple \(\mathbf{Y }_i=(Y_i, Y_{i+1})\) inherits the property (see Genon-Catalot et al. 2000).
1.2 Asymptotic properties of the contrast estimator
Our proof holds under the following assumptions. For the reader convenience, we denote by (E) [resp. (C) and (T)] the assumptions which serve us for the existence (resp. Consistency and Central Limit Theorem). If the same assumption is needed for two results, for example for the existence and the consistency, it is denoted by (EC).
-
(ECT): The parameter space \(\varTheta \) is a compact subset of \(\mathbb R ^{r}\) and \(\theta _0\) is an element of the interior of \(\varTheta \).
-
(C): (Local dominance): \(\mathbb{E }\left[ \sup _{\theta \in \varTheta }\left| Y_{2}u^{*}_{l_{\theta }}(Y_{1})\right| \right] <\infty \).
-
(CT): The application \(\theta \mapsto \mathbf P m_{\theta }\) admits an unique minimum and its Hessian matrix denoted by \(V_{\theta }\) is non-singular in \(\theta _0\).
-
(T): (Regularity): We assume that the function \(l_{\theta }\) is twice continuously differentiable w.r.t \(\theta \in \varTheta \) for any \(x\) and measurable w.r.t \(x\) for all \(\theta \) in \(\varTheta \). Additionally, each coordinate of \(\nabla _{\theta }l_{\theta }\) and each coordinate of \(\nabla ^{2}_{\theta }l_{\theta }\) belong to \(\mathbb L _1(\mathbb R )\cap \mathbb L _2(\mathbb R )\) and each coordinate of \(u_{\nabla _{\theta }l_{\theta }}\) and \(u_{\nabla _{\theta }^{2}l_{\theta }}\) have to be integrable as well.
-
(Moment condition): For some \(\delta >0\) and for \(j \in \left\{ 1,\ldots ,r\right\} \):
$$\begin{aligned} \mathbb{E }\left[ \left| Y_{2}u^*_{\frac{{\partial }l_{\theta }}{{\partial }\theta _j}}(Y_{1})\right| ^{2+\delta }\right] <\infty . \end{aligned}$$ -
(Hessian Local dominance): For some neighbourhood \(\fancyscript{U}\) of \(\theta _0\) and for \(j,k \in \left\{ 1,\ldots ,r\right\} \):
$$\begin{aligned} \mathbb{E }\left[ \sup _{\theta \in \fancyscript{U}}\left| Y_{2}u^{*}_{\frac{{\partial }^2 l_{\theta }}{{\partial }\theta _j{\partial }\theta _k}}(Y_{1})\right| \right] <\infty . \end{aligned}$$
Let us introduce the matrix:
where \(\varOmega _{0}(\theta )=\mathbb{V }ar\left( \nabla _{\theta }m_{\theta }(\mathbf Y_{1} )\right) \) and \(\varOmega _{j-1}(\theta )={\mathbb{C }ov}\left( \nabla _{\theta }m_{\theta }(\mathbf Y_{1} ),\nabla _{\theta }m_{\theta }(\mathbf Y_{j} )\right) \)
Theorem 1
Under our assumptions, let \(\widehat{\theta }_{n}\) be the minimum-contrast estimator defined by (9). Then:
Moreover, if \(\mathbf{Y }_i\) is geometrically ergodic (see Definition 1 in “Appendix A”), then:
The following corollary gives an expression of the matrix \(\varOmega (\theta _0)\) and \(V_{\theta _0}\) of Theorem 1 for the practical implementation:
Corollary 1
Under our assumptions, the matrix \(\varOmega (\theta _0)\) is given by:
where:
and, the covariance terms are given by:
where \(\tilde{C}_{j-1}=\mathbb{E }\left[ b_{\phi _{0}}(X_1)\left( \nabla _{\theta }l_{\theta }(X_1)\right) \left( b_{\phi _{0}}(X_j)\nabla _{\theta }l_{\theta }(X_j)\right) ^{\prime }\right] \) and the differential \(\nabla _{\theta }l_{\theta }\) is taken at point \(\theta =\theta _0\).
Furthermore, the Hessian matrix \(V_{\theta _0}\) is given by:
Let us now state the strategy of the proof, the full proof is given in “Appendix B”. Clearly, the proof of Theorem 1 relies on M-estimators properties and on the deconvolution strategy. The existence of our estimator follows from regularity properties of the function \(l_{\theta }\) and compactness argument of the parameter space, it is explained in “Appendix B.1”. The key of the proof consists in proving the asymptotic properties of our estimator. This is done by splitting the proof into two parts: we first give the consistency result in “Appendix B.2” and then give the asymptotic normality in “Appendix B.3”. Let us introduce the principal arguments:
The main idea for proving the consistency of a M-estimator comes from the following observation: if \(\mathbf P _{n}m_{\theta }\) converges to \(\mathbf P m_{\theta }\) in probability, and if the true parameter solves the limit minimization problem, then, the limit of the argminimum \(\widehat{\theta }_n\) is \(\theta _0\). By using an argument of uniform convergence in probability and by compactness of the parameter space, we show that the argminimum of the limit is the limit of the argminimum. A standard method to prove the uniform convergence is to use the Uniform Law of Large Numbers (see Lemma 1 in “Appendix A”). Combining these arguments with the dominance argument (C) give the consistency of our estimator, and then, the first part of Theorem 1.
The asymptotic normality follows essentially from Central Limit Theorem for a mixing process (see Jones 2004). Thanks to the consistency, the proof is based on a moment condition of the Jacobian vector of the function \(m_{\theta }(\mathbf{y })\) and on a local dominance condition of its Hessian matrix. To refer to likelihood results, one can see these assumptions as a moment condition of the score function and a local dominance condition of the Hessian.
2 Applications
2.1 Contrast estimator for the Gaussian AR(1) model with measurement noise
Consider the following autoregressive process AR(1) with measurement noise:
The noises \(\varepsilon _i\) and \(\eta _i\) are supposed to be centered Gaussian randoms with variance respectively \(\sigma ^2_{\varepsilon }\) and \(\sigma ^2_0\). We assume that \(\sigma ^2_{\varepsilon }\) is known. Here, the unknown vector of parameters is \(\theta _0=(\phi _0,\sigma ^2_0)\) and for stationary and ergodic properties of the process \(X_i\), we assume that the parameter \(\phi _0\) satisfies \(|\phi _0|<1\) (see Dedecker et al. 2007). The functions \(b_{\phi }\) and \(l_{\theta }\) are defined by:
where \(\gamma ^{2}=\frac{\sigma ^{2}}{1-\phi ^{2}}\). The vector of parameter \(\theta \) belongs to the compact subset \(\varTheta \) given by \(\varTheta = [-1+r; 1-r]\times [\sigma ^{2}_{min}; \sigma ^{2}_{max}]\) with \(\sigma ^{2}_{min}\ge \sigma _{\varepsilon }^2+\overline{r}\) where \(r,\, \overline{r},\, \sigma ^{2}_{min}\) and \(\sigma ^{2}_{max}\) are positive real constants. We consider this subset since by stationary of \(X_i\), the parameter \(|\phi |<1\) and by construction the function \(u^{*}_{l_{\theta }}\) is well defined for \(\sigma ^{2}> \sigma _{\varepsilon }^2(1-\phi ^2)\) with \(\phi \in [-1+r; 1-r]\) which is implied by \(\sigma ^{2}>\sigma _{\varepsilon }^2\). The contrast estimator defined in (1.1) has the following form:
with \(n\) the number of observations. Theorem 1 applies for \(\theta _0=(0.7, 0.3)\) and the corresponding result for the Gaussian AR(1) model is given in “Appendix C.1”. As we already mentioned, Corollary 1 allows to compute confidence intervals: For all \(i=1,2\):
as \(n \rightarrow \infty \) where \(z_{1-\alpha /2}\) is the \(1-\alpha /2\) quantile of the Gaussian law, \(\theta _{0,i}\) is the \(i{\mathrm{th}}\) coordinate of \(\theta _0\) and \(\mathbf{e }_{i}\) is the \(i{\mathrm{th}}\) coordinate of the vector of the canonical basis of \(\mathbb{R }^2\). The covariance matrix \(\varSigma (\hat{\theta }_{n})\) is computed in Lemma 3 in “Appendix C.1.3”.
2.2 Contrast estimator for the SV model
We consider the following SV model:
The noises \(\xi _{i+1}\) and \(\eta _{i+1} \) are two centered Gaussian random variables with standard variance \(\sigma _{\xi }^{2}\) assumed to be known and \(\sigma ^{2}_{0}\). We assume that \(|\phi _0|<1\) and we refer the reader to Genon-Catalot et al. (2000) for the mixing properties of this model.
By applying a log-transformation \(Y_{i+1}=\log (R^{2}_{i+1})-\mathbb{E }[\log (\xi ^{2}_{i+1})]\) and \(\varepsilon _{i+1}=\log (\xi ^{2}_{i+1})-\mathbb{E }[\log (\xi ^{2}_{i+1})]\), the log-transform SV model is given by:
The Fourier transform of the noise \(\varepsilon _{i+1}\) is given by:
where \(\fancyscript{E}=\mathbb{E }[\log (\xi ^{2}_{i+1})]=-1.27\) and \(\mathbb{V }ar[\log (\xi ^{2}_{i+1})]\) = \(\sigma ^{2}_{\varepsilon }=\frac{\pi ^2}{2}\). Here, \(\varGamma \) represents the gamma function given by:
The vector of parameters \(\theta =(\phi ,\sigma ^2)\) belongs to the compact subset \(\varTheta \) given by \([-1+r; 1-r]\times [ \sigma ^{2}_{min} ; \sigma ^{2}_{max}]\) with \(r,\, \sigma ^{2}_{min}\) and \(\sigma ^{2}_{max}\) positive real constants.
Our contrast estimator (1.1) is given by:
with \(u_{l_{\theta }}(y)=\frac{1}{2\sqrt{\pi }}\left( \frac{-i\phi y\gamma ^2\exp \left( \frac{-y^2}{2}\gamma ^2\right) }{\exp \left( -i\fancyscript{E}y\right) 2^{i y}\varGamma \left( \frac{1}{2}+i y\right) }\right) \).
Theorem 1 applies for \(\theta _0=(0.7, 0.3)\) and by Slutsky’s Lemma we also obtain confidence intervals. We refer the reader to “Appendix C.2” for the proof.
2.3 Comparison with the others methods
2.3.1 QML estimator
For the SV model, the QML estimator, proposed independently by Harvey et al. 1994 is based on the log-transform model given in (13). Making as if the \(\varepsilon _i\) were Gaussian in the log-transform of the SV model, the Kalman filter Kalman (1960) can be applied in order to obtain the Quasi-Maximum Likelihood function of \(Y_{1:n}=(Y_{1},\ldots , Y_n)\) where \(n\) is the sample data length. For the AR(1) and the log-transform of the SV model, the log-likelihood \(l(\theta )\) is given by:
where \(\nu _{i}\) is the one-step ahead prediction error for \(Y_i\), and \(F_i\) is the corresponding mean square error. More precisely, the two quantities are given by:
where \(\hat{Y}_i^{-}=\mathbb{E }_{\theta }[Y_i| Y_{1:i-1}]\) is the one-step ahead prediction for \(Y_i\) and \(P_i^{-}=\mathbb{V }ar_{\theta }[(X_i-\hat{X}_{i}^{-})^{2}]\) is the one-step ahead error variance for \(X_i\).
Hence, the associated estimator of \(\theta _0\) is defined as a solution of:
Note that this procedure can be inefficient: the method does not rely on the exact likelihood of the \(Z_{1:n}\) and approximating the true log-chi-square density by a normal density can be rather inappropriate (see Fig. 1 below).
2.3.2 Particle filters estimators: bootstrap, APF and KSAPF
For the particle filters, the vector of parameters \(\theta =(\phi ,\sigma ^2)\) is supposed random obeying the prior distribution assumed to be known. We propose to use the Kitagawa and al. approach (2001 chapter 10 p. 189) in which the parameters are supposed time-varying: \(\theta _{i+1}=\theta _{i}+\fancyscript{G}_{i+1}\) where \(\fancyscript{G}_{i+1}\) is a centered Gaussian random with a variance matrix \(Q\) supposed to be known. Now, we consider the augmented state vector \(\tilde{X}_{i+1}=(X_{i+1}, \theta _{i+1})^{\prime }\) where \(X_{i+1}\) is the hidden state variable and \(\theta _{i+1}\) the unknown vector of parameters. In this paragraph, we use the terminology of the particle filtering method, that is: we call particle a random variable. The sequential particle estimation of the vector \(\tilde{X}_{i+1}\) consists in a combined estimation of \(X_{i+1}\) and \(\theta _{i+1}\). For initialisation the distribution of \(X_1\) Footnote 1 conditionally to \(\theta _1\) is given by the stationary density \(f_{\theta _1}\).
For the comparison with our contrast estimator (1.1), we use the three methods: the Bootstrap filter, the Auxiliary Particle filter (APF) and the Kernel Smoothing Auxiliary Particle filter (KSAPF). We refer the reader to Doucet et al. (2001), Pitt and Shephard (1999) and Liu and West (2001) for a complete revue of these methods.
Remark 3
Let us underline some particularity of the combined state and parameters estimation: For the Bootstrap and APF estimator, an important issue concerns the choice of the parameter variance \(Q\) since the parameter is itself unobservable. If one can choose an optimal variance \(Q\) the APF estimator could be a very good estimator since with arbitrary variance the results are acceptable (see Table 4). In practice, Q is chosen by an empirical optimization. The KSAPF is an enhanced version of the APF and depends on a smooth factor \(0<h<1\) (see Liu and West 2001). Therefore, the choice of \(h\) is another problem in practice.
A common approach to estimate the vector of parameters is to maximize the likelihood. Nevertheless, for state space models, the main difficulty with the Maximum Likelihood Estimator (MLE) MLE, Maximum Likelihood Estimator comes from the unobservable character of the state \(x_t\) making the calculus of the likelihood untractable in practice: the likelihood is only available in the form of a multiple integral, so exact likelihood methods require simulations and have therefore an intensive computational cost. In many cases, the MLE has to be approximated. A popular approach to approximate it consists in using MCMC simulation techniques (see Smith and Roberts 1993; Cappé et al. 2005b). Another approach to approximate the likelihood consists in using particles filtering algorithms. Recently, in Rue et al. (2009) the authors propose an approach of Integrated Nested Laplace Approximations to obtain approximations of the likelihood.
In Chopin et al. (2011) the authors propose a sequential \(SMC^{2}\) algorithm which allows an efficient approximation of the complete distribution \(p(x_{0:t}, \theta \vert y_{1:t})\). Their approach is an extension of the Iterated Batch Importance Sampling (IBIS) proposed in Chopin (2002). In Andrieu et al. (2010) the authors develop a general algorithm which is a MCMC algorithm that uses the particles filter to approximate the intractable density \(p_{\theta }(y_{1:n})\) combined with a MCMC step that samples from \(p(\theta \vert y_{1:n})\). They show that their PMCMC algorithm admits as stationary density the distribution of interest \(p(x_{0:t}, \theta \vert y_{1:t})\). There exist others methods and we refer the reader to Johansen et al. (2008); Poyiadjis et al. (2011) for more details.
2.4 A simulation study
For the AR(1) and SV model, we sample the trajectory of the \(X_i\) with the parameters \(\phi _0=0.7\) and \(\sigma _0^{2}=0.3\). Conditionally to the trajectory, we sample the variables \(Y_i\) for \(i=1\cdots n\) where \(n\) represents the number of observations. We take \(n=1000\) and \(\sigma ^{2}_{\varepsilon }=0.1\) for the two models. This means that we consider the following model:
with \(\beta =\frac{1}{\sqrt{5}\pi }\). In this case, the Fourier transform of \(\varepsilon _{i+1}\) is given by: \(f^{*}_{\varepsilon }(y)=\exp \bigl (-i\tilde{\fancyscript{E}}y\bigr )\frac{2^{i\beta y}}{\sqrt{\pi }}\varGamma \left( \frac{1}{2}+i\beta y\right) \) with \(\tilde{\fancyscript{E}}=\beta \fancyscript{E}\)(see “Appendix C.2”).
For the three methods, we take a number of particles \(M\) equal to 5000. Note that for the Bayesian procedure (Bootstrap, APF and KSAPF), we need a prior on \(\theta \), and this only at the first step. The prior for \(\theta _1\) is taken to be the Uniform law and conditionally to \(\theta _1\) the distribution of \(X_1\) is the stationary law:
We take \(h=0.1\) for the KSAPF and \(Q=\begin{pmatrix} 0.6.10^{-6} &{} 0\\ 0 &{} 0.1.10^{-6} \end{pmatrix}\) for the APF and Bootstrap filter.
Remark 4
Note that, in practice, there is no constraint on the parameters for the contrast function contrary to the particle filters where we take the stationary law for \(p_{\theta }(X_0)\) and the Uniform law around the true parameters. Hence, we bias favourably the particle filters.
2.5 Numerical results
In the numerical section we compare the different estimations: the QML estimator defined in Sect. 2.3.1, the Bayesian estimators defined in Sect. 2.3.2 and our contrast estimator defined in Sect. 1.1. For the comparison of the computing time, we also compare our contrast estimator with the SIEMLE proposed by Kim et al. [see “Appendix D.1” and Kim and Shephard (1994) for more general details].
2.5.1 Computing time
From a theoretical point of view, the MLE is asymptotically efficient. However, in practice since the states \((X_{1}\ldots , X_{n})\) are unobservable and the SV model is non Gaussian, the likelihood is untractable. We have to use numerical methods to approximate it. In this section, we illustrate the SIEMLE which consists in approximating the likelihood and applying the Expectation-Maximisation algorithm introduced by Dempster (1977) to find the parameter \(\theta \).
To illustrate the SIEMLE for the SV model, we run an estimator with a number of observations \(n\) equal to 1000. Although the estimation is good the computing time is very long compared with the others methods (see Tables 1 and 2). This result illustrates the numerical complexity of the SIEMLE (see “Appendix D.1”). Therefore, in the following, we only compare our contrast estimator with the QML and Bayesian estimators. The results are illustrated by Fig. 1. We can see that our contrast estimator is the fastest for the Gaussian AR(1) model. The QML is the most rapid for the SV model since it assumes that the measurement errors are Gaussian but we show in Figs. 2, 3 and 4 that it is a biased estimator with large mean square error. For our algorithm, for the Gaussian AR(1) model, the function \(u^{*}_{l_{\theta }}\) has an explicit expression but for the SV model, the function \(u^{*}_{l_{\theta }}\) is approximated numerically since the Fourier transform of the function \(u_{l_{\theta }}\) has not an explicit form. This explains why our algorithm is slower on the SV model than on the Gaussian AR(1) model.Footnote 2 In spite of this approximation, our contrast estimator is fast and its implementation is straightforward.
2.5.2 Parameter estimates
For the AR(1) Gaussian model, we run \(N=1{,}000\) estimates for each method (QML, APF, KSAPF and Bootsrap filter) and \(N=500\) for the SV model. The number of observations \(n\) is equal to \(1{,}000\) for the two models.
In order to compare with others the performance of our estimator, we compute for each method the Mean Square Error (MSE) defined by:
We illustrate by boxplots the different estimates (see Figs. 2, 3). We also illustrate in Fig. 4 the MSE for each estimator computed by Eq. (15). We can see that, for the parameter \(\phi _0\), the QML estimator is better for the Gaussian AR(1) model than for the SV model (see Fig. 2). Indeed, the Gaussianity assumption is wrong for the SV model. Moreover, the estimate of \(\sigma ^{2}_0\) by QML is very bad for the two models (see Fig. 3) and its corresponding boxplots have the largest dispersion meaning that the QML method is not very stable. The Bootstrap, APF and KSAPF have also a large dispersion of their boxplots, in particular for the parameter \(\phi _0\) (see Fig. 2). Besides, the Booststrap filter is less efficient than the APF and KSAPF. For the Gaussian and SV model, the boxplots of our contrast estimator show that our estimator is the most stable with respect to \(\phi _0\) and we obtain similar results for \(\sigma ^{2}_0\). The MSE is better for the SV model and the smallest for our contrast estimator.
2.5.3 Confidence interval of the contrast estimator
To illustrate the statistical properties of our contrast estimator, we compute for each model the confidence intervals computed with the confidence level \(1-\alpha \) equal to \(0.95\) for \(N=1\) estimator and the coverages for \(N=1{,}000\) with respect to the number of observations. The coverage corresponds to the number of times for which the true parameter \(\theta _{0,i}, i=1,2\) belongs to the confidence interval. The results are illustrated by the Figs. 5, 6 and 7: for the Gaussian and SV models, the coverage converges to \(95\,\%\) for a small number of observations. As expected, the confidence interval decreases with the number of observations. Note that of course a MLE confidence interval would be smaller since the MLE is efficient but the corresponding computing time would be huge.
2.6 Application to real data
The data consist of daily observations on FTSE stock price index and S&P500 stock price index. The series taken in boursorama.com are closing prices from January, 3, 2004 to January, 2,2007 for the FTSE and S&P500 leaving a sample of 759 observations for the two series.
The daily prices \(S_i\) are transformed into compounded rates returns centered around their sample mean \(c\) for self-normalization (see Mathieu and Schotman 1998; Ghysels et al. 1996) \(R_i=100\times \log \left( \frac{S_i}{S_{i-1}}\right) -c\). We want to model those data by the SV model defined in (13) leading to:
Those data are represented on Fig. [8].
2.6.1 Parameter estimates
In the empirical analysis, we compare the QML, the Bootstrap filter, the APF and the KSAPF estimators. The last one is our contrast estimator. The variance of the measurement noise is \(\sigma ^2_{\varepsilon }=\frac{\pi ^2}{2}\), that is \(\beta \) is equal to \(1\) (see Sect. 2.4). Table 3 summarises the parameter estimates and the computing time for the five methods. For initialization of the Bayesian procedure, we take the Uniform law for the parameters \(p(\theta _1)=\fancyscript{U}(0.4, 0.95)\times \fancyscript{U}(0.1, 0.5)\) and the stationary law for the log-volatility process \(X_1\), i.e, \(f_{\theta _1}(X_1)=\fancyscript{N}\biggl (0, \frac{\sigma ^{2}_{1}}{1-\phi _1^2}\biggr )\).
The estimates of \(\phi \) are in full accordance with results reported in previous studies of SV models. This parameter is in general close to 1 which implies persistent logarithmic volatility data. We compute the corresponding confidence intervals at level \(5\,\%\) (see Table 4). For the SP500 and the FTSE, note that the Bootstrap filter and the QML are not in the confidence interval for the two parameters \(\phi \) and \(\sigma ^2\). These results are consistent with the simulations where we showed that both methods were biased for the SV model (see Sect. 2.5.2). Note also that as expected the computing time for the QML is the shortest because it assumes Gaussianity which is probably not the case here. Except of QML, the contrast is the fastest method. The results are presented in Table 3 below.
2.7 Summary and conclusions
In this paper we propose a new method to estimate an hidden stochastic model on the form (1). This method is based on the deconvolution strategy and leads to a consistent and asymptotically normal estimator. We empirically study the performance of our estimator for the Gaussian AR(1) model and SV model and we are able to construct a confidence interval (see Figs. 6, 7). As the boxplots [2] and [3] show, only the Contrast, the APF, and the KSAPF estimators are comparable. Indeed the QML and the Bootstrap Filter estimators are biased and their MSE are bad, and in particular, the QML method is the worst estimator (see Fig. 4). One can see that the QML estimator proposed by Harvey et al. is not suitable for the SV model because the approximation of the log-chi-square density by the Gaussian density is not robust (see Fig. 1). Furthermore, if we compare the MSE of the three Sequential Bayesian estimation, the KSAPF estimator is the best method. From a Bayesian point of view, it is known that the Bootstrap filter is less efficient than the APF and KSAPF filter since by using the density transition as the importance density, the propagation step of the particles will be made without taking care the observations (see Doucet et al. 2001).
Among the three estimators (Contrast, APF, and KSAPF) which give good results our estimator outperforms the others in a MSE aspect (see Fig. 4). Moreover, as we already mentioned, in the combined state and parameters estimation the difficulties are the choice of \(Q,\, h\) and the prior law since the results depend on these choices. In the numerical section, we have used the stationary law for the variable \(X_1\) and this choice yields good results but we expect that the behavior of the Bayesian estimation will be worse for another prior. The implementation of the contrast estimator is the easiest and it leads to confidence intervals with a larger variance than the SIEMLE but at a smaller computing cost, in particular for the AR(1) Gaussian model (see Table 1). Furthermore, the contrast estimator does not require an arbitrary choice of parameter in practice.
Notes
To avoid confusions between the true value \(\theta _0\) and the initial value \(\theta _1\) in the Bayesian algorithms, we start the algorithms with \(i=1\).
We use a quadrature method implemented in Matlab to approximate the Fourier transform of \(u_{l_{\theta }}(y)\). One can also use the FFT method and we expect that the contrast estimator will be more rapid in this case.
References
Andrieu C, Doucet A, Holenstein R (2010) Particle Markov chain Monte Carlo methods. J R Stat Soc Ser B Stat Methodol 72(3):269–342. doi:10.1111/j.1467-9868.2009.00736.x
Cappé O, Moulines E, Rydén T (2005a) Inference in hidden Markov models. Springer Series in Statistics, Springer, New York, with Randal Douc’s contributions to Chapter 9 and Christian P. Robert’s to Chapters 6, 7 and 13, With Chapter 14 by Gersende Fort, Philippe Soulier and Moulines, and Chapter 15 by Stéphane Boucheron and Elisabeth Gassiat
Cappé O, Moulines E, Rydén T (2005b) Inference in hidden Markov models. Springer Series in Statistics, Springer, New York, with Randal Douc’s contributions to Chapter 9 and Christian P. Robert’s to Chapters 6, 7 and 13, With Chapter 14 by Gersende Fort, Philippe Soulier and Moulines, and Chapter 15 by Stéphane Boucheron and Elisabeth Gassiat
Chanda KC (1995) Large sample analysis of autoregressive moving-average models with errors in variables. J Time Ser Anal 16(1):1–15. doi:10.1111/j.1467-9892.1995.tb00220.x
Chopin N (2002) A sequential particle filter method for static models. Biometrika 89(3):539–551. doi:10.1093/biomet/89.3.539
Chopin N, Jacob PE, Papaspiliopoulos O (2011) \(smc^2\): an efficient algorithm for sequential analysis of state-space models. Preprint in arxivorg/abs/11011528
Comte F, Lacour C (2011) Data driven density estimation in presence of unknown convolution operator. J R Stat Soc Ser B Stat Methodol 73(4):601–627. doi:10.1111/j.1467-9868.2011.00775.x, URL:http://dx.doi.org/10.1111/j.1467-9868.2011.00775.x
Comte F, Taupin ML (2001) Semiparametric estimation in the (auto)-regressive \(\beta \)-mixing model with errors-in-variables. Math Methods Stat 10(2):121–160
Comte F, Lacour C, Rozenholc Y (2010) Adaptive estimation of the dynamics of a discrete time stochastic volatility model. J Econometr 154(1):59–73. doi:10.1016/j.jeconom.2009.07.001
Dedecker J, Doukhan P, Lang G, León RJR, Louhichi S, Prieur C (2007) Weak dependence: with examples and applications, vol 190, Lecture notes in statistics. Springer, New York
Dedecker J, Samson A, Taupin ML (2011) Estimation in autoregressive model with measurement noise. Preprint: hal-00591114
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc Ser B 39(1):1–38 (with discussion)
Douc R, Moulines E, Olsson J, van Handel R (2011) Consistency of the maximum likelihood estimator for general hidden Markov models. Ann Statist 39(1):474–513. doi:10.1214/10-AOS834
Doucet A, de Freitas N, Gordon N (eds) (2001) Sequential Monte Carlo methods in practice. Statistics for engineering and information science. Springer, New York
Fan J (1991) Asymptotic normality for deconvolution kernel density estimators. Sankhyā Ser A 53(1):97–110
Genon-Catalot V, Jeantheau T, Larédo C (2000) Stochastic volatility models as hidden Markov models and statistical applications. Bernoulli 6(6):1051–1079. doi:10.2307/3318471
Ghysels E, Harvey AC, Renault E (1996) Stochastic volatility. In: Statistical methods in finance, handbook of statististics, vol 14. North-Holland, Amsterdam, pp 119–191. doi:10.1016/S0169-7161(96)14007-4, URL:http://dx.doi.org/10.1016/S0169-7161(96)14007-4
Hansen BE, Horowitz JL (1997) Handbook of econometrics, vol. 4 robert f. engle and daniel l. mcfadden, editors elsevier science b. v., 1994. Econometr Theory 13(01):119–132, URL http://ideas.repec.org/a/cup/etheor/v13y1997i01p119-132_00.html
Harvey A, Ruiz E, Shephard N (1994) Multivariate stochastic variance models. Rev Econ Stud 61(2):247–264
Hayashi F (2000) Econometrics. Princeton University Press, Princeton
Ionides EL, Bhadra A, Atchadé Y, King A (2011) Iterated filtering. Ann Statist 39(3):1776–1802. doi:10.1214/11-AOS886
Jacquier E, Polson NG, Rossi PE (2002) Bayesian analysis of stochastic volatility models. J Bus Econom Statist 20(1):69–87. doi:10.1198/073500102753410408, twentieth anniversary commemorative issue
Johannes MS, Polson NG, Stroud JR (2009) Optimal filtering of jump diffusions: extracting latent states from asset prices. Rev Financial Stud 22(7):2559–2599
Johansen AM, Doucet A, Davy M (2008) Particle methods for maximum likelihood estimation in latent variable models. Stat Comput 18(1):47–57. doi:10.1007/s11222-007-9037-8
Jones GL (2004) On the Markov chain central limit theorem. Probab Surv 1:299–320. doi:10.1214/154957804100000051
Kalman RE (1960) A new approach to linear filtering and prediction problems URL http://www.cs.unc.edu/welch/kalman/media/pdf/Kalman1960.pdf
Kim S, Shephard N (1994) Stochastic volatility: likelihood inference and comparison with arch models. Economics papers 3, Economics Group, Nuffield College, University of Oxford, URL http://ideas.repec.org/p/nuf/econwp/0003.html
Liu J, West M (2001) Combined parameter and state estimation in simulation-based filtering. in arnaud doucet, nando de freitas, and neil gordon, editors. Sequential Monte Carlo Methods in Practice
Mathieu RJ, Schotman PC (1998) An empirical application of stochastic volatility models. Rev Econ Stud 13(4):333–360
Melino A, Turnbull SM (1990) Pricing foreign currency options with stochastic volatility. J Econometr 45(1–2):239–265 URL http://ideas.repec.org/a/eee/econom/v45y1990i1-2p239-265.html
Newey WK (1987) Advanced econometrics by takeshi amemiya, harvard university press, 1986. Econometr Theory 3(01):153–158 URL http://ideas.repec.org/a/cup/etheor/v3y1987i01p153-158_00.html
Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. In: Handbook of econometrics, vol IV, Handbooks in Econometrics, vol 2, North-Holland, Amsterdam, pp 2111–2245
Peters GW, Hosack GR, Hayes KR (2010) Ecological non-linear state space model selection via adaptive particle markov chain monte carlo (adpmcmc). Preprint:arXIv-10052238v1
Pitt MK, Shephard N (1999) Filtering via simulation: auxiliary particle filters. J Am Statist Assoc 94(446):590–599. doi:10.2307/2670179
Poyiadjis G, Doucet A, Singh SS (2011) Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika 98(1):65–80 URL http://ideas.repec.org/a/oup/biomet/v98y2011i1p65-80.html
Robert CP, Rydén T, Titterington DM (2000) Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. J R Stat Soc Ser B Stat Methodol 62(1):57–75. doi:10.1111/1467-9868.00219
Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B Stat Methodol 71(2):319–392. doi:10.1111/j.1467-9868.2008.00700.x
Smith AFM, Roberts GO (1993) Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. J Roy Statist Soc Ser B 55(1):3–23. URL http://links.jstor.org/sici?sici=0035-9246(1993)55:1%3c3:BCVTGS%3e2.0.CO;2-&origin=MSN
Taylor S (2005) Financial returns modelled by the product of two stochastic processes, a study of daily sugar prices vol 1, Oxford University Press, pp 203–226. URL http://eprints.lancs.ac.uk/29900/
Van der Vaart AW (1998) Asymptotic statistics, Cambridge series in statistical and probabilistic mathematics, vol 3. Cambridge University Press, Cambridge
Acknowledgments
I thank my co-director Patricia Reynaud-Bouret for her idea about this paper and for her help and generosity. I thank also my director Frédéric Patras for his supervisory throughout this paper. I thank him for his careful reading of the paper and his large comments. I would like to thank F. Comte, A. Samson, N. Chopin, M. Miniconi and F. Pelgrin for their suggestions and their interest about this framework.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: M-estimator
Definition 1
Geometrical ergodic process
Denote by \(Q^n(x,.)\) the transition kernel at step \(n\) of a (discrete-time) stationary Markov chain \((X_n)_n\) which started at \(x\) at time \(0\). That is, \(Q^n(x,F) = \mathbb P (X_n \in F | X_0 =x)\). Let \(\pi \) denote the stationary law of \(X_n\) and let \(f\) be any measurable function. We call mixing coefficients \((\beta _n)_n\) the coefficients defined by, for each \(n\):
where \(\pi (f) = \int f(y) \pi (dy)\). We say that a process is geometrically ergodic if the decreasing of the sequence of the mixing coefficients \((\beta _n)_n\) is geometrical, that is:
The following results are the main tools for the proof of Theorem 1.
Consider the following quantities:
where \(h_{\theta }(y)\) is real function from \(\varTheta \times \fancyscript{Y}\) with value in \(\mathbb{R }\).
Lemma 1
Uniform Law of Large Numbers (ULLN) (see Newey and McFadden 1994 for the proof).
Let \((Y_i)\) be an ergodic stationary process and suppose that:
-
1.
\(h_{\theta }(y)\) is continuous in \(\theta \) for all \(y\) and measurable in \(y\) for all \(\theta \) in the compact subset \(\varTheta \).
-
2.
There exists a function \(s(y)\)(called the dominating function) such that \(\left| h_{\theta }(y)\right| \le s(y)\) for all \(\theta \in \varTheta \) and \(\mathbb{E }[s(Y_1)]<\infty \). Then:
Moreover, \(\mathbf P h_{\theta }\) is a continuous function of \(\theta \).
Proposition 1
(Proposition 7.8 p. 472 in Hayashi (2000). The proof is in Newey (1987) Theorem 4.1.5.)
Suppose that:
-
1.
\(\theta _0\) is in the interior of \(\varTheta \).
-
2.
\(h_{\theta }(y)\) is twice continuously differentiable in \(\theta \) for any \(y\).
-
3.
The Hessian matrix of the application \(\theta \mapsto \mathbf P h_{\theta }\) is non-singular.
-
4.
\(\sqrt{n}\mathbf P _{n}S_{\theta } \stackrel{}{\rightarrow } \fancyscript{N}(0, \varOmega (\theta _0))\) in law as n \(\rightarrow \infty \), with \(\varOmega (\theta _0)\) a positive definite matrix.
-
5.
Local dominance on the Hessian: for some neighbourhood \(\fancyscript{U}\) of \(\theta _0\):
$$\begin{aligned} \mathbb{E }\left[ \sup _{\theta \in \fancyscript{U} }\left\| \nabla _{\theta }^{2}h_{\theta }(Y_{1})\right\| \right] <\infty , \end{aligned}$$
so that, for any consistent estimator \(\hat{\theta }\) of \(\theta _0\) we have: \(\mathbf P _{n}H_{\hat{\theta }} \rightarrow \mathbb{E }[\nabla ^{2}_{\theta }h_{\theta }(Y_1)]\) in probability as n \(\rightarrow \infty \).
Then, \(\hat{\theta }\) is asymptotically normal with asymptotic covariance matrix given by:
where the differential \(\nabla ^{2}_{\theta }h_{\theta }(Y_1)\) is taken at point \(\theta =\theta _0\).
Proposition 2
(The proof is in Jones 2004)
Let \(Y_i\) be an ergodic stationary Markov chain and let g: \(\fancyscript{Y}\, \rightarrow \, \mathbb R \) a borelian function. Suppose that \(Y_i\) is geometrically ergodic and \(\mathbb{E }\left[ |g(Y_1)|^{2+\delta }\right] <\infty \) for some \(\delta >0\). Then, when \(n \rightarrow \infty \),
where \(\sigma ^{2}_{g}:=Var\left[ (g(Y_{1})\right] +2\sum _{j=1}^{\infty }Cov \left( g(Y_{1}), g(Y_{j})\right) <\infty \).
Appendix B: Proofs of theorem 1
For the reader convenience we split the proof of Theorem 1 into three parts: in Sect. 2.7, we give the proof of the existence of our contrast estimator defined in (1.1). In Sect. 2.7, we prove the consistency, that is, the first part of Theorem 1. Then, we prove the asymptotic normality of our estimator in Sect. 2.7, that is, the second part of Theorem 1. The Sect. 2.7 is devoted to Corollary 1. Finally, in Sect. 2.7 we prove that Theorem 1 applies for the AR(1) and SV models.
1.1 B.1 Proof of the existence and measurability of the M-estimator
By assumption, the function \(\theta \mapsto \left\| l_{\theta }\right\| _{2}^2\) is continuous. Moreover, \(l^{*}_{\theta }\) and then \(u^{*}_{l_{\theta }}(x)=\frac{1}{2\pi }\int e^{ixy}\frac{l^{*}_{\theta }(-y)}{f^{*}_{\varepsilon }(y)}dy\) are continuous w.r.t \(\theta \). In particular, the function \(m_{\theta }(\mathbf{y }_{i})=\left\| l_{\theta }\right\| _{2}^2-2y_{i+1}u^{*}_{l_{\theta }}(y_{i})\) is continuous w.r.t \(\theta \). Hence, the function \(\mathbf P _{n}m_{\theta }=\frac{1}{n}\sum _{i=1}^{n}m_{\theta }(\mathbf{Y }_i)\) is continuous w.r.t \(\theta \) belonging to the compact subset \(\varTheta \). So, there exists \(\tilde{\theta }\) belongs to \(\varTheta \) such that:
1.2 B.2 Proof of the consistency
By assumption \(l_{\theta }\) is continuous w.r.t \(\theta \) for any \(x\) and measurable w.r.t \(x\) for all \(\theta \) which implies the continuity and the measurability of the function \(\mathbf P _{n}m_{\theta }\) on the compact subset \(\varTheta \). Furthermore, the local dominance assumption (C) implies that \(\mathbb{E }\left[ \sup _{\theta \in \varTheta }\left| m_{\theta }(\mathbf{Y }_i)\right| \right] \) is finite. Indeed,
As \(\left\| l_{\theta }\right\| _{2}^2\) is continuous on the compact subset \(\varTheta ,\, \sup _{\theta \in \varTheta }\left\| l_{\theta }\right\| _{2}^2\) is finite. Therefore, \(\mathbb{E }\left[ \sup _{\theta \in \varTheta }\left| m_{\theta }(\mathbf{Y }_i)\right| \right] \) is finite if \(\mathbb{E }\left[ \sup _{\theta \in \varTheta }\left| Y_{i+1}u^{*}_{l_{\theta }}(Y_i)\right| \right] \) is finite. Lemma ULLN 1 gives us the uniform convergence in probability of the contrast function: for any \(\varepsilon >0\):
Combining the uniform convergence with Theorem 2.1 p. 2121 chapter 36 in Hansen and Horowitz (1997) yields the weak (convergence in probability) consistency of the estimator.\(\square \)
Remark 5
In most applications, we do not know the bounds for the true parameter. So the compactness assumption is sometimes restrictive, one can replace the compactness assumption by: \(\theta _0\) is an element of the interior of a convex parameter space \(\varTheta \subset \mathbb R ^{r}\). Then, under our assumptions except the compactness, the estimator is also consistent. The proof is the same and the existence is proved by using convex optimization arguments. One can refer to Hayashi (2000) for this discussion.
1.3 B.3 Proof of the asymptotic normality
The proof is based on the following Lemma:
Lemma 2
Suppose that the conditions of the consistency hold. Suppose further that:
-
1.
\(\mathbf{Y }_i\) geometrically ergodic.
-
2.
(Moment condition): for some \(\delta >0\) and for each \(j\in \left\{ 1,\ldots ,r\right\} \!:\)
$$\begin{aligned} \mathbb{E }\left[ \left| \frac{{\partial }m_{\theta }(\mathbf{Y }_{1})}{{\partial }\theta _j}\right| ^{2+\delta }\right] <\infty . \end{aligned}$$ -
3.
(Hessian Local condition): For some neighbourhood \(\fancyscript{U}\) of \(\theta _0\) and for \(j,k\in \left\{ 1,\ldots , r\right\} \!:\)
$$\begin{aligned} \mathbb{E }\left[ \sup _{\theta \in \fancyscript{U} }\left| \frac{{\partial }^2m_{\theta }(\mathbf{Y }_{1})}{{\partial }\theta _j {\partial }\theta _k}\right| \right] < \infty . \end{aligned}$$Then, \(\widehat{\theta }_{n}\) defined in Eq. (9) is asymptotically normal with asymptotic covariance matrix given by:
where \(V_{\theta _0}\) is the Hessian of the application \(\mathbf P m_{\theta }\) given in Eq. (7).
Proof
The proof follows from Proposition 1 and Proposition 2 and by using the fact that by assumption we have \(\mathbb{E }[\nabla _{\theta }^{2}m_{\theta }(\mathbf{Y }_1)]=\nabla _{\theta }^{2}\mathbb{E }[m_{\theta }(\mathbf{Y }_1)]\). \(\square \)
It just remains to check that the conditions (2) and (3) of Lemma 2 hold under our assumptions (T).
Moment condition: As the function \(l_{\theta }\) is twice continuously differentiable w.r.t \(\theta \), for all \(\mathbf{y }_{i} \, \in \, \mathbb{R }^2\), the application \(m_{\theta }(\mathbf{y }_{i}):\ \theta \in \varTheta \mapsto m_{\theta }(\mathbf{y }_{i})=||l_{\theta }||_{2}^2 - 2y_{i+1}u^*_{l_{\theta }}(y_{i})\) is twice continuously differentiable for all \(\theta \, \in \, \varTheta \) and its first derivatives are given by:
By assumption, for each \(j\in \left\{ 1,\ldots ,r\right\} ,\, \frac{{\partial }l_{\theta }}{{\partial }\theta _j} \in \mathbb L _{1}(\mathbb{R })\), therefore one can apply the Lebesgue Derivation Theorem and Fubini’s Theorem to obtain:
Then, for some \(\delta >0\):
where \(C_{1}\) and \(C_{2}\) are two positive constants. By assumption, the function \(||l_{\theta }||_{2}^2\) is twice continuously differentiable w.r.t \(\theta \). Hence, \(\nabla _{\theta }||l_{\theta }||_{2}^2\) is continuous on the compact subset \(\varTheta \) and the first term of Eq. (17) is finite. The second term is finite by the moment assumption (T).
Hessian Local dominance: For \(j,k \in \left\{ 1,\ldots ,r\right\} ,\, \frac{{\partial }^2 l_{\theta }}{{\partial }\theta _j {\partial }\theta _k} \in \mathbb L _{1}(\mathbb{R })\), the Lebesgue Derivation Theorem gives:
and, for some neighbourhood \(\fancyscript{U}\) of \(\theta _0\):
The first term of the above equation is finite by continuity and by compactness argument. And, the second term is finite by the Hessian local dominance assumption (T). \(\square \)
1.4 B.4 Proof of corollary 1
By replacing \(\nabla _{\theta }m_{\theta }(\mathbf{Y }_{1})\) by its expression (16), we have:
Furthermore, by Eq. (1) and by independence of the centered noise \((\varepsilon _2)\) and \((\eta _2)\), we have:
Using Fubini’s Theorem and Eq. (1) we obtain:
Hence,
where
Calculus of the covariance matrix of Corollary (1): By replacing \((\nabla _{\theta }m_{\theta }(Y_{1}))\) by its expression (16) we have:
By using Eq. (18) and the stationary property of the \(Y_i\), one can replace the second term of the above equation by:
Furthermore, by using Eq. (1) we obtain:
By independence of the centered noise, the term (19), (20) and (21) are equal to zero. Now, if we use Fubini’s Theorem we have:
Hence, the covariance matrix is given by:
Finally, we obtain: \(\varOmega (\theta )=\varOmega _{0}(\theta )+2\sum _{j >1}^{\infty }\varOmega _{j-1}(\theta )\) with \(\varOmega _{0}(\theta )=4\left( P_2-P_{1}\right) \) and \(\varOmega _{j-1}(\theta )=4\bigl (\tilde{C}_{j-1}-P_{1}\bigr )\).
Expression of the Hessian matrix \(V_{\theta }\) : We have:
For all \(\theta \) in \(\varTheta \), the application \(\theta \mapsto \mathbf P m_{\theta }\) is twice differentiable w.r.t \(\theta \) on the compact subset \(\varTheta \). And for \(j\in \left\{ 1,\ldots ,r\right\} \):
and for \(j,k \in \left\{ 1,\ldots ,r\right\} \):
Appendix C: Proof of the applications
1.1 C.1 The Gaussian AR(1) model with measurement noise
1.1.1 C.1.1 Contrast function
We have:
So that:
and the Fourier Transform of \(l_{\theta }\) is given by:
As \(\varepsilon _i\) is a centered Gaussian noise with variance \(\sigma _{\varepsilon }^2\), we have:
Define:
Then:
where \(G\sim \fancyscript{N}\left( 0,\frac{1}{(\gamma ^2-\sigma _{\varepsilon }^2)} \right) \). We deduce that the function \(m_{\theta }(\mathbf{y }_{i})\) is given by:
Then, the contrast estimator defined in (1.1) is given by:
1.1.2 C.1.2 Checking assumptions of Theorem 1
Mixing properties. If \(|\phi |<1\), the process \(\mathbf{Y }_i\) is geometrically ergodic. For further details, we refer to Dedecker et al. (2007).
Regularity conditions: It remains to prove that the assumptions of Theorem 1 hold. It is easy to see that the only difficulty is to check the moment condition and the local dominance (C)-(T) and the uniqueness assumption (CT). The others assumptions are easily to verify since the function \(l_{\theta }(x)\) is regular in \(\theta \) belonging to \(\varTheta \).
(CT): The limit contrast function \(\displaystyle \mathbf P m_{\theta } : \theta \in \varTheta \mapsto \mathbf P m_{\theta }\) given by:
is differentiable for all \(\theta \) in \(\varTheta \) and \(\nabla _{\theta }\mathbf P m_{\theta }=0_{\mathbb{R }^2}\) if and only if \(\theta \) is equal to \(\theta _0\) . More precisely its first derivatives are given by:
and
The partial derivatives of \(l_{\theta }\) w.r.t \(\theta \) are given by:
For the reader convenience let us introduce the following notations:
We rewrite:
where the function \(g_{0,\gamma ^2}\) defines the normal probability density of a centered random variable with variance \(\gamma ^2\). Now, we can use Corollary 1 to compute the Hessian matrix \(V_{\theta _0}\):
with \(X \sim \fancyscript{N}\bigl (0,\frac{\gamma _{0}^2}{2}\bigr )\). By replacing the terms \(a_{1}, a_{2}, b_{1}\) and \(b_{2}\) at the point \(\theta _0\) we obtain:
which has a positive determinant equal to \(0.0956\) at the true value \(\theta _{0}=(0.7,0.3)\). Hence, \(V_{\theta _{0}}\) is non-singular. Furthermore, the strict convexity of the function \(\mathbf P m_{\theta }\) gives that \(\theta _0\) is a minimum.
(C): (Local dominance): We have:
The multivariate normal density of the pair \(\mathbf{Y }_{1}=(Y_{1},Y_{2})\) denoted \(g_{(0, \fancyscript{J}_{\theta _0})}\) is given by:
with:
By definition of the parameter space \(\varTheta \) and as all moments of the pair \(\mathbf{Y }_{1}\) exist, the quantity \(\mathbb{E }\left[ \sup _{\theta \in \varTheta }\left| Y_{2}u^{*}_{l_{\theta }}(Y_{1})\right| \right] \) is finite.
Moment condition (T): We recall that:
The Fourier transforms of the first derivatives are:
and
We can compute the function \(u_{\nabla _{\theta }l_{\theta }}(x)\):
with \(\overline{C}=\frac{1}{\sqrt{2\pi }}\frac{1}{(\gamma ^{2}-\sigma _{\varepsilon }^2)^{1/2}}\) and \(A_1=a_{1}\gamma ^{2}+3a_{2}\gamma ^{4}=\gamma ^2\frac{(1+\phi ^2)}{(1-\phi ^2)}\) and \(A_2=a_{2}\gamma ^{6}=\gamma ^4\frac{\phi ^2}{(1-\phi ^2)}.\) The Fourier transform of the function \(u_{\frac{\partial l_{\theta }}{\partial \phi }}(x)\) is given by:
with \(\varPsi ^{\phi _0}_{1}=\overline{C}\left( \frac{A_1}{(\gamma ^{2}-\sigma ^{2}_{\varepsilon })}-\frac{3A_2}{(\gamma ^{2}-\sigma _{\varepsilon }^2)^{2}}\right) \) and \(\varPsi ^{\phi _0}_{2}=\overline{C}\left( \frac{A_2}{(\gamma ^{2}-\sigma _{\varepsilon }^2)^{3}}\right) .\) By the same arguments, we obtain:
with \(\varPsi ^{\sigma _0^{2}}_{1}=\overline{C}\left( \frac{B_1}{(\gamma ^{2}-\sigma _{\varepsilon }^2)}-\frac{3B_2}{(\gamma ^{2}-\sigma _{\varepsilon }^2)^{2}}\right) , \varPsi ^{\sigma _0^{2}}_{2}=\overline{C}\left( \frac{B_2}{(\gamma ^{2}-\sigma _{\varepsilon }^2)^{3}}\right) , B_1=b_{1}\gamma ^{2}+3b_{2}\gamma ^{4}=\frac{\phi }{(1-\phi ^2)}\) and \(B_2=b_{2}\gamma ^{6}=\gamma ^2\frac{\phi }{2(1-\phi ^2)}.\)
Hence, for some \(\delta >0,\, \mathbb{E }\left[ \left| Y_{2}u^*_{\nabla _{\theta }l_{\theta }}(Y_{1}) \right| ^{2+\delta }\right] \) is finite if:
which is satisfied by the existence of all moments of the pair \(\mathbf{Y }_{1}\). One can check that the Hessian local assumption (T) is also satisfied by the same arguments.
1.1.3 C.1.3 Explicit form of the covariance matrix
Lemma 3
The matrix \(\varSigma (\theta _0)\) in the Gaussian AR(1) model is given by:
with
and
where:
and \(P_2\) is the \(2\times 2\) symmetric matrix multiplied by a factor \(\frac{1}{\sqrt{\pi (\gamma _0^{2}-\sigma ^{2}_{\varepsilon })}}\) and its coefficients \((P^2_{lm})_{1\le l,m \le 2}\) are given by:
with \(\fancyscript{F}\!=\!\frac{1}{(\sigma _{\varepsilon }^2\!+\!\gamma _{0}^{2})^{2}\!-\!\gamma _{0}^{4}\phi _{0}^{2}}\tilde{V}_{1}^{1/2}\tilde{V}_{2}^{1/2},\, \tilde{V}_{1}^{-1}\!=\!\frac{2}{(\gamma _{0}^2-\sigma _{\varepsilon }^2)}\!+\!\left( \frac{\gamma _{0}^2\!+\!\sigma _{\varepsilon }^2}{(\sigma _{\varepsilon }^2+\gamma _{0}^2)^{2}\!-\!\gamma _{0}^4\phi _{0}^{2}}\right) \left( 1\!-\!\frac{\phi _{0}^{2}\gamma _{0}^{4}}{(\gamma _0^2+\sigma _{\varepsilon }^2)^2}\right) , \tilde{V}_{2}=\frac{(\gamma _{0}^2+\sigma _{\varepsilon }^2)^{2}-\phi _{0}^{2}\gamma _{0}^{4}}{(\gamma _{0}^2+\sigma _{\varepsilon }^2)}\), and:
The covariance terms are given by:
with:
where:
Moreover \(\lim _{j\rightarrow \infty }\varOmega _{j-1}(\theta _0)=0_{\fancyscript{M}_{2\times 2}}\).
Remark 6
In practice, for the computing of the covariance matrix \(\varOmega _{j-1}(\theta )\) that appears in Corollary 1, we have truncated the infinite sum (\(q_{trunc}=100\)).
Proof
Calculus of \(\nabla m\)
For all \(x \in \mathbb{R }\), the function \(l_{\theta }(x)\) is two times differentiable w.r.t \(\theta \) on the compact subset \(\varTheta \). More precisely, note that since \(\gamma ^2 = \sigma ^2/(1-\phi ^2)\), it follows from the definition of the subset \(\varTheta \) that \((\gamma ^2-\sigma _{\varepsilon }^2)>0\). So that for all \(\mathbf{y }_{i}\) in \(\mathbb{R }^{2}\) the function \(m_{\theta }(\mathbf{y }_{i}): \theta \in \varTheta \mapsto m_{\theta }(\mathbf{y }_{i})\) is differentiable and:
with:
And, the function \(u^{*}_{\frac{{\partial }l_{\theta }}{{\partial }\phi }}(x)\) and \(u^{*}_{\frac{{\partial }l_{\theta }}{{\partial }\sigma ^{2}}}(x)\) are given in Eqs. (28)–(29). Therefore,
Calculus of \(P_{1}\): Recall that we have:
And the moments \((\mu _{2k})_{k \in \mathbb N }\) of a centered Gaussian random variable with variance \(\sigma ^2\) are given by:
We define by \(P(x)\) a polynomial function of ordinary degree. We are interested in the calculus of \(\displaystyle \mathbb{E }\left[ P(X) g_{0,\gamma ^{2}}(X)\right] ,\) where \(X \sim \fancyscript{N}(0,\gamma ^2)\). We have:
where \(\displaystyle \bar{X} \sim \fancyscript{N}\left( 0,\frac{\gamma ^2}{2}\right) \).
Denote by \(B_1\) the constant \(\frac{1}{2\sqrt{\pi }\gamma _0}\). We obtain:
where \(\displaystyle \bar{X} \sim \fancyscript{N}\bigl (0,\frac{\gamma _0^2}{2} \bigr )\). The polynomials \(\left( H_{ij}(x)\right) _{1 \le i,j \le 2}\) are given by:
Lastly, by replacing the terms \(B_1,\, a_1\), and \(a_2\) by their expressions given in Eq. (24) at the point \(\theta _{0}\), we obtain:
Calculus of \(P_2\):
We have:
The density of \(\mathbf{Y }_{1}\) is \(g_{(0, \fancyscript{J}_{\theta _0})}\). Then, \(g_{(0,\fancyscript{J}_{\theta _0})}\times \) \(\small { \exp \left( -\frac{y^{2}_{1}}{(\gamma _{0}^2-\sigma _{\varepsilon }^2)}\right) }\) is equal to:
with \(\tilde{V}_{1}^{-1}=\frac{2}{(\gamma _{0}^2-\sigma _{\varepsilon }^2)}+\left( \frac{\gamma _{0}^2+\sigma _{\varepsilon }^2}{(\sigma _{\varepsilon }^2+\gamma _{0}^2)^{2}-\gamma _{0}^4\phi _{0}^{2}}\right) \left( 1-\frac{\phi _{0}^{2}\gamma _{0}^{4}}{(\gamma _0^2 + \sigma _{\varepsilon }^2)^2}\right) \) and \(\tilde{V}_{2}=\frac{(\gamma _{0}^2+\sigma _{\varepsilon }^2)^{2}-\phi _{0}^{2}\gamma _{0}^{4}}{(\gamma _{0}^2+\sigma _{\varepsilon }^2)}\).
Then, we obtain:
In the following, we set \(\fancyscript{F}=\frac{1}{(\sigma _{\varepsilon }^2+\gamma _{0}^2 )^{2}-\gamma _{0}^4\phi _{0}^{2}}\tilde{V}_{1}^{1/2}\tilde{V}_{2}^{1/2}\). Now, we can compute the moments:
In a similar manner, we have:
and
By replacing all the terms of Eq. (31) we obtain:
and
and
Calculus of \({\mathbb{C }ov}\left( \nabla _{\theta }m_{\theta }(Y_{1}),\nabla _{\theta }m_{\theta }(Y_{j}) \right) \): We want to compute:
Since we have already computed the terms of the matrix \(P_1\), it remains to compute the terms of the covariance matrix \(\tilde{C}_{j-1}\) given by:
For all \(j>1\), the pair \((X_1,X_j)\) has a multivariate normal density \(g_{(0, \fancyscript{W})}\) where \(\fancyscript{W}\) is given by:
The density of the couple \((X_1,X_j)\) is:
We start by computing:
We have:
For all \(j>1\), we define:
We can rewrite:
So, by Fubini’s Theorem, we obtain:
where \(\displaystyle G \sim \fancyscript{N}\left( \frac{\phi _{0}^jx_j}{(2-\phi _{0}^{2j})},\fancyscript{V} \right) \). Thus, \(\mathbb{E }[G^2] = \fancyscript{V} + \left( \frac{\phi _{0}^jx_j}{(2-\phi _{0}^{2j})}\right) ^2\). We obtain:
where \(G_j \sim \fancyscript{N}\left( 0,V_j \right) \). Additionally, we have:
Now, we are interested in \(\mathbb{E }\left[ X_1^4 X_j^4 \exp \left( -\frac{1}{2\gamma _{0}^2}(X_1^2+X_j^2) \right) \right] \). In a similar manner, we obtain:
where \(\small {G \sim \fancyscript{N}\left( \frac{\phi _{0}^jx_j}{(2-\phi _{0}^{2j})},\fancyscript{V} \right) }\). We use the fact that the moments of a random variable \(X\sim \fancyscript{N}(\mu ,v)\) are:
By replacing \(\mathbb{E }[G^4]\) in Eq. (37), we have:
For all \(j>1\), the matrix \(\tilde{C}_{j-1}\) is given by:
where the coefficients \(\tilde{c}_1(j),\, \tilde{c}_2(j)\), and \(\tilde{c}_3(j)\) are given by (35), (36) and (38).
Finally, by replacing the terms \(a_1,\, a_2,\, b_1\) and \(b_2\), the matrix \(\tilde{C}_{j-1}\) is equal to:
where \(A=\frac{\phi _{0}^2}{2\pi \gamma _{0}^2(1-\phi _0^2)^2}\).
Asymptotic behaviour of the covariance matrix \(\varOmega _{j-1}(\theta _0)\): By the stationary assumption \(|\phi _{0}|<1\), the limits of the following terms are:
and
Therefore,
We obtain:
We conclude that the covariance between the two vectors \(\nabla _{\theta }m_{\theta _0}(Y_{1}), \nabla _{\theta }m_{\theta _0}(Y_{j})\) vanishes when the lag between the two observations \(Y_{1}\) and \(Y_{j}\) goes to the infinity.
Calculus of \(V_{\theta _{0}}\): The Hessian matrix \(V_{\theta _0}\) is given in Eq. (27).
1.2 C.2 The SV model
1.2.1 C.2.1 Contrast function
The \(\mathbb L _2\)-norm and the Fourier transform of the function \(l_{\theta }\) are the same as the Gaussian AR(1) model. The only difference is the law of the measurement noise which is a log-chi-square for the log-transform SV model.
Consider the random variable \(\varepsilon =\beta \log (X^2)-\tilde{\fancyscript{E}}\) where \(\tilde{\fancyscript{E}}=\beta \mathbb{E }[\log (X^{2})]\) such that \(\varepsilon \) is centered. The random variable \(X\) is a standard Gaussian random. The Fourier transform of \(\varepsilon \) is given by:
By a change of variable \(z=\frac{x^2}{2}\), one has:
and the expression (14) of the contrast function follows with \(u_{l_{\theta }}(y)=\frac{1}{2\sqrt{\pi }} \left( \frac{-i\phi y\gamma ^2\exp \left( \frac{-y^2}{2}\gamma ^2\right) }{\exp \left( -i\tilde{\fancyscript{E}}y\right) 2^{i\beta y}\varGamma \left( \frac{1}{2}+i\beta y\right) }\right) \).
1.2.2 C.2.2 Checking assumption of Theorem 1
Regularity conditions: The proof is essentially the same as for the Gaussian case since the functions \(l_{\theta }(x)\) and \(\mathbf P m_{\theta }\) are the same. We need only to check the assumptions (C) and (T). These assumptions are satisfied since Fan (1991) showed that the noises \(\varepsilon _i\) have a Fourier transform \(f^{*}_{\varepsilon }\) which satisfies:
which means that \(f_{\varepsilon }\) is super-smooth in its terminology. Furthermore, by the compactness of the parameter space \(\varTheta \) and as the functions \(l^{*}_{\theta }\), and for \(j,k \in \left\{ 1, 2\right\} \), the functions \((\frac{{\partial }l_{\theta }}{{\partial }\theta _j })^* \, (\frac{{\partial }^{2} l_{\theta }}{{\partial }\theta _j {\partial }\theta _k})^{*}\), have the following form \(C_1(\theta )P(x) \exp \left( -C_2(\theta )x^{2}\right) \) where \(C_1(\theta )\) and \(C_2(\theta )\) are two constants well defined in the parameter space \(\varTheta \) with \(C_2(\theta )>0\), we obtain:
1.2.3 C.2.3 Expression of the covariance matrix
As, the functions \(l_{\theta }(x)\) and \(\mathbf P m_{\theta }\) are the same for the two models, the expressions of the matrix \(V_{\theta _0}\) and \(\varOmega _{j}(\theta _0)\) are given in Lemma 3. We need only to use an estimator of \(P_{2}=\mathbb{E }[Y^{2}_{2}(u^{*}_{\nabla l_{\theta }}(Y_{1}))^2]\) since we can just approximate \(u^{*}_{\nabla l_{\theta }}(y)\). A natural and consistent estimator of \(P_2\) is given by:
Remark 7
In some models, the covariance matrix \(\varOmega _{j}(\hat{\theta }_n)\) cannot be explicitly computable. We refer the reader to Hayashi (2000) chapter 6 Section 6.6 p.408 for this case.
Appendix D: EM algorithm
We first refer to Dempster et al. (1977) for general details on the EM algorithm. The EM algorithm is an iterative procedure for maximizing the log-likelihood \(l(\theta )=\log (f_{\theta }(Y_{1:n}))\). Suppose that after the \(k^{th}\) iteration, the estimate for \(\theta \) is given by \(\theta _k\). Since the objective is to maximize \(l(\theta )\), we want to compute an updated \(\theta \) such that:
Hidden variables can be introduced for making the ML estimation tractable. Denote the hidden random variables \(U_{1:n}\) and a given realization by \(u_{1:n}\). The total probability \(f_{\theta }(Y_{1:n})\) can be written as:
Hence,
In going from Eq. (40) to (41) we use the Jensen inequality: \(\log \sum _{i=1}^{n} \lambda _i x_i \ge \sum _{i=1}^{n} \lambda _i \log (x_i )\) for constants \(\lambda _i \ge 0\) with \(\sum _{i=1}^{n} \lambda _i=1\). And in going from Eq. (41) to (42) we use the fact that \(\sum _{u_{1:n}}^{} f_{\theta _k}(u_{1:n}\vert Y_{1:n})=1\). Hence,
The function \(\fancyscript{L}(\theta , \theta _k)\) is bounded by the log-likelihood function \(l(\theta )\) and they are equal when \(\theta =\theta _k\). Consequently, any \(\theta \) which increases \(\fancyscript{L}(\theta , \theta _k)\) will increases \(l(\theta )\). The EM algorithm selects \(\theta \) such that \(\fancyscript{L}(\theta , \theta _k)\) is maximized. We denote this updated value \(\theta _{k+1}\). Thus,
1.1 D.1 Simulated expectation maximization estimator
Here, we describe the SIEMLE proposed by Kim et al. (1994) for the SV model, these authors retain the linear log-transform model given in (13). However, instead of approximating the log-chi-square distribution of \(\varepsilon _i\) with a Gaussian distribution, they approximate \(\varepsilon _i\) by a mixture of seven Gaussian. The distribution of the noise is given by:
where \(g_{(m,v)}(x)\) denotes the Gaussian distribution of \(\varepsilon _i\) with mean \(m\) and variance \(v\), and \(f_{\varepsilon _i|s_i=j}(x)\) is a Gaussian distribution conditional to an indicator variable \(s_i\) at time \(i\) and the variables \(q_j, j=1\ldots , 7\) are the given weights attached to each component and such that \(\sum _{j=1}^{7}q_j=1\). Note that, most importantly, given the indicator variable \(s_i\) at each time \(i\), the log-transform model is Gaussian. That is:
Then, conditionally to the indicator variable \(s_i\), the SV model becomes a Gaussian state-space model and the Kalman filter can be used in the SIEMLE in order to compute the log-likelihood function given by:
with \(\nu _i=(Y_i-\hat{Y}_i^{-}-m_{s_i})\) and \(F_i=\mathbb{V }_{\theta }[\nu _i]=P_i^{-}+v^{2}_{s_i}\). The quantities \(\hat{Y}_i^{-}=\mathbb{E }_{\theta }[Y_i| Y_{1:i-1}]\) and \(P_i^{-}=\mathbb{V }_{\theta }[(X_i-\hat{X}_{i}^{-})^{2}]\) are computed by the Kalman filter.
Hence, if we consider that the missing data \(u_{1:n}\) for the EM correspond to the indicator variables \(s_{1:n}\), then according to Eq. (43) and since \(f(s_{1:n})\) do not depend on \(\theta \), the Maximization step is:
where the expectation is according to \(f_{\theta _k}(s_{1:n}\vert Y_{1:n})\). Nevertheless, for the SV model, the problem with the EM algorithm is that the density \(f_{\theta }(s_{1:n}|Y_{1:n})\) is unknown. The main idea consists in introducing a Gibbs algorithm to obtain \(\tilde{M}\) draws \(s^{(1)}_{1:n},\ldots , s^{(\tilde{M})}_{1:n}\) from the law \(f_{\theta }(s_{1:n}|Y_{1:n})\). Hence, the objective function \(Q(\theta ,\theta _{k})\) is approximated by:
Then, the simulated EM algorithm for the SV model is as follows: Let \(C>0\) be a threshold to stop the algorithm and \(\theta _{k}\) a given arbitrary value of the parameter. While \(|\theta _{k}-\theta _{k-1}|>C,\)
-
1.
Apply the Gibbs sampler as follows: The Gibbs Sampler: Choose arbitrary starting values \(X_{1:n}^{(0)}\), and let \(l=0\).
-
(a)
Sample \(s_{1:n}^{(l+1)}\sim f_{\theta _{k}}(s_{1:n}|Y_{1:n},X_{1:n}^{(l)})\).
-
(b)
Sample \(X_{1:n}^{(l+1)}\sim f_{\theta _{k}}(X_{1:n} |Y_{1:n} ,s_{1:n}^{(l+1)})\).
-
(c)
Set \(l=l+1\) and goto (a).
-
(a)
-
2.
\(\theta _{k+1}=\arg \max _{\theta } \tilde{Q}(\theta ,\theta _{k})\).
Step (a): to sample the vector \(s_{1:n}\) from its full conditional density, we sample each \(s_i\) independently. We have:
and \(f_{\theta _{k}}(Y_{r}|s_{r}=j,X_{r})\propto g_{(X_{r}+m_{j}, v_{j}^2)}\) for \(j=1\ldots ,7.\) And the step (b) of the Gibbs sampler is conducted by the Kalman filter since the model is Gaussian.
Rights and permissions
About this article
Cite this article
El Kolei, S. Parametric estimation of hidden stochastic model by contrast minimization and deconvolution. Metrika 76, 1031–1081 (2013). https://doi.org/10.1007/s00184-013-0430-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-013-0430-3