Abstract
The aim of this paper is to introduce a new type of test statistic for simple null hypothesis on one-dimensional ergodic diffusion processes sampled at discrete times. We deal with a quasi-likelihood approach for stochastic differential equations (i.e. local gaussian approximation of the transition functions) and define a test statistic by means of the empirical \(L^2\)-distance between quasi-likelihoods. We prove that the introduced test statistic is asymptotically distribution free; namely it weakly converges to a \(\chi ^2\) random variable. Furthermore, we study the power under local alternatives of the parametric test. We show by the Monte Carlo analysis that, in the small sample case, the introduced test seems to perform better than other tests proposed in literature.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let \((\Omega ,\mathcal F,\mathbf{F}=(\mathcal F_t)_{t\ge 0},P)\) be a filtered complete probability space. Let us consider a 1-dimensional processes \(X:=(X_t)_{t\ge 0}\) solution to the following stochastic differential equation
where \(x_0\) is a deterministic initial value. We assume that \(b: \Theta _{\alpha }\times \mathbb {R}\rightarrow \mathbb {R}\), \(\sigma :\Theta _\beta \times \mathbb {R} \rightarrow \mathbb {R}\) are Borel known functions (up to \(\alpha \) and \(\beta \)) and \((W_t)_{t\ge 0}\) is a one-dimensional standard \(\mathcal F_t\)-Brownian motion. Furthermore, \(\alpha \in \Theta _\alpha \subset \mathbb {R}^{m_1},\beta \in \Theta _\beta \subset \mathbb {R}^{m_2},m_1, m_2\in {\mathbb {N}},\) are unknown parameters and \(\theta =(\alpha ,\beta )\in \Theta :=\Theta _\alpha \times \Theta _\beta ,\) where \(\Theta _{\alpha }\) and \(\Theta _{\beta }\) are compact convex sets. We denote by \(\theta _0:=(\alpha _0,\beta _0)\) the true value of \(\theta \) and assume that \(\theta _0\in \) Int\((\Theta )\).
The sample path of X is observed only at \(n+1\) equidistant discrete times \(t_i^n\), such that \(t_i^n-t_{i-1}^n=\Delta _n<\infty \) for \(i=1,\ldots ,n,\) (with \(t_0^n=0\)). Therefore the data, denoted by \((X_{t_i^n})_{0\le i\le n},\) are the discrete observations of the sample path of X. Let p be an integer with \(p\ge 2.\) The asymptotic scheme adopted in this paper is the following: \(n\Delta _n\rightarrow \infty \), \(\Delta _n\rightarrow 0\) and \(n\Delta _n^p\rightarrow 0\) as \(n\rightarrow \infty \). This scheme is called rapidly increasing design, i.e. the number of observations grows over time but no so fast.
This setting is useful, for instance, in the analysis of financial time series. In mathematical finance and econometric theory, diffusion processes described by the stochastic differential equations (1.1) play a central role. Indeed, they have been used to model the behavior of stock prices, exchange rates and interest rates. The underlying stochastic evolution of the financial assets can be thought continuous in time, although the data are always recorded at discrete instants (e.g. weekly, daily or each minute). For these reasons, the estimation problems for discretely observed stochastic differential equations have been tackled by many authors with different approaches (see, for instance, Florens-Zmirou 1989; Yoshida 1992; Genon-Catalot and Jacod 1993; Bibby and Sørensen 1995; Kessler 1997; Kessler and Sørensen 1999; Aït-Sahalia 2002; Gobet 2002; Jacod 2006; Aït-Sahalia 2008; De Gregorio and Iacus 2008; Phillips and Yu 2009; Yoshida 2011; Uchida and Yoshida 2012; Li 2013; Uchida and Yoshida 2014; Kamatani and Uchida 2015). For clustering time series arising from discrete observations of diffusion processes De Gregorio and Iacus (2010) proposed a new dissimilarity measure based on the \(L^1\) distance between the Markov operators. The change-point problem in the diffusion term of a stochastic differential equation has been considered in De Gregorio and Iacus (2008) and Iacus and Yoshida (2012). In Iacus et al. (2009), the authors faced the estimation problem for hidden diffusion processes observed at discrete times. An adaptive Lasso-type estimator is proposed in De Gregorio and Iacus (2012). For the simulation and the practical implementation of the statistical inference for stochastic differential equations see Iacus (2008, 2011) and Iacus and Yoshida (2017).
We also recall that the statistical inference for continuously observed ergodic diffusions is a well-developed research topic; on this point the reader can consult Kutoyants (2004).
The main object of interest of the present paper is the problem of testing parametric hypotheses for diffusion processes from discrete observations. This research topic is less developed in literature. It is well-known that for testing two simple alternative hypotheses, the Neyman-Pearson lemma provides a procedure based on the likelihood ratio which leads to the uniformly most powerful test. In the other cases uniformly most powerful tests do not exist and for this reason the research of new criteria is justified.
For discretely observed stochastic differential equations, Kitagawa and Uchida (2014) introduced and studied the asymptotic behavior of three kinds of test statistics: likelihood ratio type test statistic, Wald type test statistic and Rao’s score type test statistic.
Another possible approach is based on the divergences. Indeed, several statistical divergence measures (which are not necessarily a metric) and distances have been introduced to decide if two probability distributions are close or far. The main goal of this metric is to make “easy to distinguish” between a pair of distributions which are far from each other than between those which are closer. These tools have been used for testing hypotheses in parametric models. The reader can consult on this point, for example, Morales et al. (1997) and Pardo (2006). For stochastic differential equations sampled at discrete times, De Gregorio and Iacus (2013) introduced a family of test statistics (for \(p=2\) and \(n\Delta _n^2\rightarrow 0\)) based on empirical \(\phi \)-divergences.
We consider the following hypotheses testing problem concerning the vector parameter \(\theta \)
and assume that X is observed at discrete times; that is the data \((X_{t_i^n})_{0\le i\le n}\) are available. In this work we study different test statistics with respect to those used in De Gregorio and Iacus (2013) and Kitagawa and Uchida (2014). Indeed, the purpose of this paper is to propose a methodology based on a suitable “distance” between the approximated transition functions. This idea follows from the observation that for continuously observed sample paths of (1.1), we could define the \(L^2\)-distance between the continuous log-likelihood. Clearly this approach is not useful in our framework and then, similarly to the aforementioned papers, we consider the local gaussian approximation of the transition density of the process X from \(X_{t_{i-1}}\) to \(X_{t_i}.\) In other words, we resort the quasi-likelihood function introduced in Kessler (1997), which is defined by means of an approximation with higher order correction terms to relax the condition of convergence of \(\Delta _n\) to zero. Therefore, let \(\texttt {l}_{p,i}(\theta ),\theta \in \Theta ,\) be the approximated log-transition function from \(X_{t_{i-1}}\) to \(X_{t_i}\) representing the parametric model (1.1). We deal with
which can be interpreted as the empirical \(L^2\)-distance between two loglikelihoods. If \({{\hat{\theta }}}_{p,n}\) is the maximum quasi-likelihood estimator introduced in Kessler (1997), we are able to prove that, under \(H_0,\) the test statistic
is asymptotically distribution free; i.e. it converges in distribution to a chi squared random variable. Furthermore, we study the power function of the test under local alternatives.
The paper is organized as follows. Section 2 contains the notations and the assumptions of the paper. The contrast function arising from the quasi-likelihood approach is briefly discussed in Sect. 3. In the same section we define the maximum quasi-likelihood estimator and recall its main asymptotic properties. In Sect. 4 we introduce and study a test statistic for the hypotheses problem \(H_0:\theta =\theta _0\) vs \(H_1: \theta \ne \theta _0\). The proposed new test statistic shares the same asymptotic properties of the other test statistics presented in the literature. Therefore, to justify its use in practice among its competitors, a numerical study is included in Sect. 5 which contains a comparison of several test statistics in the “small sample” case, i.e., when the asymptotic conditions are not met. Our numerical analysis shows that, at least for \(p=2,\) the performance of \(T_{2,n}\) is very good. The proofs are collected in Sect. 6.
It is worth to point out that for the sake of simplicity in this paper a 1-dimensional diffusion is treated. Nevertheless, it is possible to extend our methodology to the multidimensional stochastic differential equations setting.
2 Notations and assumptions
Throughout this paper, we will use the following notation.
-
\(\theta :=(\alpha ,\beta )\) and \(\alpha _0,\beta _0\) and \(\theta _0\) denote the true values of \(\alpha ,\beta \) and \(\theta \) respectively.
-
\(c(\beta ,x)=\sigma ^2(\beta ,x).\)
-
C is a positive constant. If C depends on a fixed quantity, for instance an integer k, we may write \(C_k.\)
-
\(\partial _{\alpha _h}:=\frac{\partial }{\partial \alpha _h},\partial _{\beta _k}:=\frac{\partial }{\partial \beta _k}, \partial ^2_{\alpha _h\alpha _k}:=\frac{\partial ^2}{\partial \alpha _h\partial \alpha _k}, h,k=1,\ldots , m_1, \partial ^2_{\beta _h\beta _k}:=\frac{\partial ^2}{\partial \beta _h\partial \beta _k}, h,k=1,\ldots ,m_2,\partial ^2_{\alpha _h\beta _k}:=\frac{\partial ^2}{\partial \alpha _h\partial \beta _k}, h=1,\ldots ,m_1, k=1,\ldots ,m_2,\)\(\partial _\theta :=(\partial _\alpha ,\partial _\beta )',\) where \(\partial _\alpha :=(\partial _{\alpha _1},\ldots ,\partial _{\alpha _{m_1}})'\) and \(\partial _\beta :=(\partial _{\beta _1},\ldots ,\partial _{\beta _{m_2}})',\)\(\partial _\theta ^2:=[\partial _{\alpha _j\beta _k}^2]_{h=1,\ldots ,m_1, k=1,\ldots ,m_2}.\)
-
If \(f:\Theta \times {\mathbb {R}}\rightarrow {\mathbb {R}},\) we denote by \(f_{i-1}(\theta )\) the value \(f(\theta , X_{t_{i-1}^n})\); for instance \(c(\beta , X_{t_{i-1}^n})=c_{i-1}(\beta )\).
-
For \(0\le i\le n, t_i^n:=i\Delta _n\) and \({\mathcal {G}}_i^n:=\sigma (W_s,s\le t_i^n).\)
-
The random sample is given by \(\mathbf{X}_n:=(X_{t_i^n})_{0\le i\le n}\) and \(X_i:=X_{t_i^n}.\)
-
The probability law of (1.1) is denoted by \(P_\theta \) and \(E_\theta ^{i-1}[\cdot ]:=E_\theta [\cdot |{\mathcal {G}}_{i-1}^n].\) We set \(P_0:=P_{\theta _0}\) and \(E_0^{i-1}[\cdot ]:= E_{\theta _0}^{i-1}[\cdot ].\)
-
\(\overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }} \) and \(\overset{d}{\underset{n\rightarrow \infty }{\longrightarrow }} \) stand for the convergence in probability and in distribution, respectively.
-
Let \(F_n:\Theta \times {\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) and \(F:\Theta \rightarrow {\mathbb {R}};\)\(``F_n(\theta , \mathbf{X}_n)\overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }} F(\theta )\) uniformly in \(\theta ''\) stands for
$$\begin{aligned} \sup _{\theta \in \Theta }\left| F_n(\theta , \mathbf{X}_n)-F(\theta )\right| \overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }} 0. \end{aligned}$$Furthermore, if \(F_n(\theta , \mathbf{X}_n)\overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }} 0\) uniformly in \(\theta \) we set
$$\begin{aligned} F_n(\theta , \mathbf{X}_n)=\mathbf {o}_{P_\theta }(1). \end{aligned}$$ -
Let \(u_n\) be a \(\mathbb {R}\)-valued sequence. We indicate by R a function \(\Theta \times \mathbb {R}^2\rightarrow \mathbb {R}\) for which there exists a constant C such that
$$\begin{aligned} R(\theta ,u_n,x)\le u_nC(1+|x|)^C,\quad \text {for all}\,\theta \in \Theta , x\in \mathbb {R}^2, n\in \mathbb {N}. \end{aligned}$$ -
For a \(m\times n\) matrix A, \(||A||^2=\text {tr}(AA')=\sum _{i=1}^m\sum _{j=1}^n |A_{ij}|^2\) and \(I_m\) stands for the identity matrix of size m.
Let \(C_{\uparrow }^{k,h}({\mathbb {R}}\times \Theta ; {\mathbb {R}})\) be the space of all functions f such that:
-
(i)
\(f(\theta ,x)\) is a \({\mathbb {R}}\)-valued function on \( \Theta \times {\mathbb {R}};\)
-
(ii)
\(f(\theta ,x)\) is continuously differentiable with respect to x up to order \(k\ge 1\) for all \(\theta ;\) these x-derivatives up to order k are of polynomial growth in x, uniformly in \(\theta \);
-
(iii)
\(f(\theta ,x)\) and all x-derivatives up to order \(k\ge 1,\) are \(h\ge 1\) times continuously differentiable with respect to \(\theta \) for all \(x\in {\mathbb {R}}.\) Moreover, these derivatives up to the h-th order with respect to \(\theta \) are of polynomial growth in x, uniformly in \(\theta \).
We need some standard assumptions on the regularity of the process X.
- \( A_1\) :
-
(Existence and uniqueness) There exists a constant C such that
$$\begin{aligned} \sup _{\alpha \in \Theta _\alpha }|b(\alpha ,x)-b(\alpha ,y)|+\sup _{\beta \in \Theta _\beta }|\sigma (\beta ,x)-\sigma (\beta ,y)|\le C|x-y|. \end{aligned}$$ - \( A_2\) :
-
(Ergodicity) The process X is ergodic for \(\theta =\theta _0\) with invariant probability measure \(\pi _0(\mathrm {d}x)\). Thus
$$\begin{aligned} \frac{1}{T}\int _0^Tf(X_t)\mathrm {d}t\overset{P_\theta }{\underset{T\rightarrow \infty }{\longrightarrow }} \int f(x)\pi _0(\mathrm {d}x), \end{aligned}$$where \(f\in L^1(\pi _0)\). Furthermore, we assume that \(\pi _0\) admits all moments finite.
- \( A_3\) :
-
\(\inf _{x,\beta }\sigma (\beta ,x)>0.\)
- \( A_4\) :
-
(Moments) For all \(q\ge 0\), \(\sup _t E|X_t|^q<\infty \).
- \(A_5\) :
-
[k] (Smoothness) \(b\in C_{\uparrow }^{k,3}(\Theta _\alpha \times {\mathbb {R}},{\mathbb {R}})\) and \(\sigma \in C_{\uparrow }^{k,3}(\Theta _\beta \times {\mathbb {R}},{\mathbb {R}}).\)
- \( A_6\) :
-
(Identifiability) If the coefficients \(b(\alpha ,x)=b(\alpha _0,x)\) and \(\sigma (\beta ,x)=\sigma (\beta _0,x)\) for all x (\(\pi _{0}\)-almost surely), then \(\alpha =\alpha _0\) and \(\beta =\beta _0\).
Let \(L_\theta \) the infinitesimal generator of X with domain given by \(C^2({\mathbb {R}})\) (the space of the twice continuously differentiable function on \({\mathbb {R}}\)); that is if \(f\in C^2({\mathbb {R}})\)
Under the assumption \(A_5[2(j-1)]\) we can define \(L_\theta ^j:=L_\theta \circ L_\theta ^{j-1}\) with domain \(C^{2j}({\mathbb {R}})\) and \(L_\theta ^0=\)Id.
We conclude this section with some well-known examples of ergodic diffusion processes belonging to the class (1.1):
-
the Ornstein–Uhlenbeck or Vasicek model is the unique solution to
$$\begin{aligned} \mathrm {d}X_t=\alpha _1(\alpha _2-X_t)\mathrm {d}t+\beta _1 \mathrm {d}W_t,\quad X_0=x_0, \end{aligned}$$(2.1)where \(b(\alpha _1,\alpha _2,x)=\alpha _1(\alpha _2-x)\) and \(\sigma (\beta _1,x)=\beta _1\) with \(\alpha _1,\alpha _2\in {\mathbb {R}}\) and \(\beta _1>0.\) This stochastic process is a Gaussian process and it is often used in finance where \(\beta _1\) is the volatility, \(\alpha _2\) is the long-run equilibrium of the model and \(\alpha _1\) is the speed of mean reversion. For \(\alpha _1>0\) the Vasicek process is ergodic with invariant law \(\pi _0\) given by a Gaussian law with mean \(\alpha _2\) and variance \(\frac{\beta _1^2}{2\alpha _1}.\) It is easy to check that all the conditions \(A_1-A_6\) fulfill;
-
the Cox–Ingersoll–Ross (CIR) process is the solution to
$$\begin{aligned} \mathrm {d}X_t=\alpha _1(\alpha _2-X_t)\mathrm {d}t+\beta _1 \sqrt{X_t}\mathrm {d}W_t,\quad X_0=x_0>0, \end{aligned}$$(2.2)where \(b(\alpha _1,\alpha _2,x)=\alpha _1( \alpha _2-x)\) and \(\sigma (\beta _1,x)=\beta _1\sqrt{x}\) with \(\alpha _1,\alpha _2,\beta _1>0.\) If \(2\alpha _1\alpha _2>\beta _1^2\) the process is strictly positive, otherwise non negative. This model has a conditional density given by the non central \(\chi ^2\) distribution. The CIR process is useful in the description of short-term interest rates and admits invariant law \(\pi _0\) given by a Gamma distribution with shape parameter \(\frac{2\alpha _1\alpha _2}{\beta _1^2}\) and scale parameter \(\frac{\beta _1^2}{2\alpha _1}.\) If (2.2) is strictly positive, we can prove that the above assumptions hold true.
3 Preliminaries on the quasi-likelihood function
We briefly recall the quasi-likelihood function introduced by Kessler (1997) based on the Itô-Taylor expansion. The main problem in the statistical analysis of the diffusion process X is that its transition density is in general unknown and then the likelihood function is unknown as well. To overcome this difficulty one can discretizes the sample path of X by means of Euler-Maruyama’s scheme; namely
Hence (3.1) leads to consider a local-Gaussian approximation to the transition density; that is
and the approximated log-likelihood function of the random sample \(\mathbf{X}_n,\) called (negative) quasi-log-likelihood function, becomes
This approach suggests to consider the mean and the variance of the transition density of X; that is
and assume
Thus we can consider as contrast function the following one
Nevertheless, (3.4) does not have a closed form because \(\mathrm {m}(\theta ,X_{i-1})\) and \(\mathrm {m}_2(\theta ,X_{i-1})\) are unknown. Therefore we substitute in (3.4) closed approximations of \(\mathrm {m}\) and \(\mathrm {m}_2\) based on the Itô-Taylor expansion.
Let \(f(y):=y,\) for \(l\ge 0,\) under the assumption \(A_5[2(l - 1)]\), we have the following approximation (see Lemma 1, Kessler 1997)
where
Now let us consider the function \((y-r_l(\Delta _n,X_{i-1},\theta ))^2,\) which is for fixed x, y and \(\theta \) a polynomial in \(\Delta _n\) of degree 2l. We indicate by \({\overline{g}}_{\Delta _n,x,\theta ,l}(y)\) the sum of its first terms up to degree l; that is \(\overline{g}_{\Delta _n,x,\theta ,l}(y)=\sum _{j=0}^l \Delta _n^j \overline{g}_{x,\theta }^j(y)\) where
Under the assumption \(A_5[2(l-1)]\), we have that \(L_\theta ^r{\overline{g}}_{x,\theta }^j(y)\) is well-defined for \(r+j=l\) and we set
where \(\gamma _j(\theta ,x)\) are the coefficients of \(\Delta _n^j\). Therefore by (3.6)–(3.9), we obtain, for instance,
Let
where \(\overline{\Gamma }_l(\Delta _n,x,\theta ):=\frac{\sum _{j=2}^l\Delta _n^j\gamma _j(\theta ,x)}{\Delta _n c(\beta ,x)}.\) For \(l\ge 0,\) under the assumption \(A_5[2l]\)(i), we have that (see Lemma 2, Kessler 1997)
It seems quite natural at this point to substitute (3.5) and (3.10) into the expression (3.4). Nevertheless, in order to avoid technical difficulties related to the control of denominator and logarithmic we consider a further expansion in \(\Delta _n\) of \((1+{{\overline{\Gamma }}}_l)^{-1}\) and \(\log (1+{{\overline{\Gamma }}}_l)\).
Let \(k_0=[p/2].\) Under the assumption \(A_5[2k_0]\), we define the quasi-loglikelihood function of \(\mathbf{X}_n\) as
where
and \(\mathrm {d}_j,\) resp. \(\mathrm {e}_j,\) is the coefficient of \(\Delta _n^j\) in the Taylor expansion of \((1+{{\overline{\Gamma }}}_{k_0+1}(\Delta _n,x,\theta ))^{-1},\) resp. \(\log (1+{{\overline{\Gamma }}}_{k_0+1}(\Delta _n,x,\theta )).\) It is not hard to show that, for example,
Remark 3.1
It is worth to point out that by assumptions \(A_3\) and \(A_5\) emerge that \(\mathrm {d}_j\) and \(\mathrm {e}_j,\) for all \(j\le k_0,\) are three times differentiable with respect to \(\theta .\) Furthermore, all their derivatives with respect to \(\theta \) are of polynomial growth in x uniformly in \(\theta .\)
The contrast function (3.11) yields to the maximum quasi-likelihood estimator \({{\hat{\theta }}}_{p,n}:=({{\hat{\alpha }}}_{p,n},{{\hat{\beta }}}_{p,n})\) defined as
Let \(I(\theta _0)\) be the Fisher information matrix at \(\theta _0\) defined as follows
where
We recall an important asymptotic result which will be useful in the proof of our main theorem.
Theorem 1
(Kessler 1997) Let p be an integer and \(k_0=[p/2].\) Under assumptions \(A_1\) to \(A_{4}, A_5[2k_0]\) and \(A_6,\) if \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty ,\) as \(n\rightarrow \infty ,\) the estimator \({{\hat{\theta }}}_{p,n}\) is consistent; i.e.
If in addition \(n\Delta _n^p\rightarrow 0\) and \(\theta _0\in Int(\Theta )\) then
where
Remark 3.2
We observe that \(l_{2,n}\) does not coincide with (3.2), because (3.11) contains the terms \(\mathrm {d}_1\) and \(\mathrm {e}_1.\) Nevertheless, \(l_n\) also yields an asymptotical efficient estimator for \(\theta \) and then we refer to it when \(p=2.\)
Remark 3.3
Under the same framework adopted in this paper, alternatively to \({{\hat{\theta }}}_{p,n}\), Kessler (1995) and Uchida and Yoshida (2012) proposed different types of adaptive maximum quasi-likelihood estimators. For instance, in Uchida and Yoshida (2012), the first type of adaptive estimator is introduced starting from the initial estimator \(\tilde{\beta }_{0,n}\) given by \({\mathbb {U}}_n(\tilde{\beta }_{0,n})=\inf _{\beta \in \Theta _\beta }{\mathbb {U}}_n(\beta ),\) where
For \(p\ge 2, k_0=[p/2]\) and \(l_0=[(p-1)/2],\) the first type adaptive estimator \({{\tilde{\theta }}}_{p,n}=(\tilde{\alpha }_{k_0,n},{{\tilde{\beta }}}_{l_0,n})\) is defined for \(k=1,2,\ldots ,k_0,\) as follows
The maximum quasi-likelihood estimator \({{\hat{\theta }}}_{p,n}\) and its adaptive versions, like \({{\tilde{\theta }}}_{p,n},\) are asymptotically equivalent (under a minor change of the initial assumptions); i.e. they have the same properties (3.15) and (3.16) (see Uchida and Yoshida 2012). In what follow we will developed a test based on \({{\hat{\theta }}}_{p,n};\) nevertheless in light of the previous discussion, it would be possible to replace \({{\hat{\theta }}}_{p,n}\) with \({{\tilde{\theta }}}_{p,n}.\)
4 Test statistics
The goal of this section is to introduce a new type of test statistics for the following parametric hypotheses problem
concerning the stochastic differential equation (1.1). X is partially observed and therefore we have discrete observations represented by \(\mathbf{X}_n\). The motivation of this research is due to the fact that under non-simple alternative hypotheses do not exist uniformly most powerful parametric tests. Therefore, we need proper procedure for making the right decision concerning statistical hypothesis.
The first step consists in the introduction of a suitable measure regarding the “discrepancy”, or the “distance”, between diffusions belonging to the parametric class (1.1). Furthermore, as recalled in the previous section, for a general stochastic differential equation X, the true probability transitions from \(X_{i-1}\) to \(X_i\) do not exist in closed form as well as the likelihood function. Suppose known the parameter \(\beta \) and assume observable the sample path up to time \(T=n\Delta _n.\) Let \(Q_\beta \) be the probability law of the process solution to \(\mathrm {d}Y_t=\sigma (\beta ,Y_t)\mathrm {d}W_t.\) The continuous log-likelihood of X is given by
Thus we can consider the (squared) \(L^2(Q_\beta )\)-distance between the log-likelihoods \(\log \frac{\mathrm {d}P_{\theta _1}}{\mathrm {d}Q_\beta }\) and \(\log \frac{\mathrm {d}P_{\theta _2}}{\mathrm {d}Q_\beta }\) with \(\theta _1,\theta _2\in \Theta \); that is
Clearly for testing the hypotheses (4.1) in the framework of discretely observed stochastic differential equations, the distance (4.2) is not useful. Nevertheless, the above \(L^2-\)metric for the continuos observations suggests to consider
which can be interpreted as the empirical version of (4.2), where the theoretical log-likelihood is replaced with the quasi-log-likelihood defined by (3.11). The following theorem provides the convergence in probability of \({\mathbb {D}}_{p,n}.\)
Theorem 2
Let p be an integer and \(k_0=[p/2].\) Assume \( A_1- A_{4}, A_5[2k_0]\) and \(A_6.\) Under \(H_0,\) if \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty ,\) as \(n\rightarrow \infty \), we have that
uniformly in \(\theta ,\) where
The above result shows that \({\mathbb {D}}_{p,n}(\theta ,\theta _0)\) is not a true approximation of \(D_{p,n}(\theta ,\theta _0)\) because it does not converge to \(\int \left[ \log (\pi _\theta (\mathrm {d}x)/\pi _0(\mathrm {d}x))\right] ^2\pi _0(\mathrm {d}x).\) Nevertheless, the function (4.3) allows to construct the main object of interest of the paper. Let \({\hat{\theta }_{n}}\) be the maximum quasi-likelihood estimator defined by (3.13), for testing the hypotheses (4.1) we introduce the following class of test statistics
The first result concerns the weak convergence of \(T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0).\) We prove that \(T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\) is asymptotically distribution free under \(H_0;\) namely it weakly converges to a chi-squared random variable with \(m_1 + m_2\) degrees of freedom.
Theorem 3
Let p be an integer and \(k_0=[p/2].\) Assume \( A_1- A_{4}, A_5[2k_0]\) and \(A_6.\) Under \(H_0,\) if \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty , n\Delta _n^p\rightarrow 0,\) as \(n\rightarrow \infty \), we have that
Given the level \(\alpha \in (0,1)\), our criterion suggests to
where \(\chi ^2_{m_1+m_2,{\alpha }}\) is the \(1-\alpha \) quantile of the limiting random variable \(\chi _{m_1+m_2}^2\); that is under \(H_0\)
Under \(H_1,\) the power function of the proposed test are equal to the following map
Often a way to judge the quality of sequences of tests is provided by the powers at alternatives that become closer and closer to the null hypothesis. This justify the study of local limiting power. Indeed, usually the power functions of test statistic (4.4) cannot be calculated explicitly. Nevertheless, \( P_\theta \left( T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)>\chi ^2_{m_1+m_2,{\alpha }}\right) \) can be studied and approximated under contiguous alternatives written as
where \(h\in \mathbb {R}^{m_1+m_2}\) such that \(\theta _0+\varphi (n)^{1/2}h \in \Theta .\) In order to get a reasonable approximation of the power function, we analyze the asymptotic law of the test statistics under the local alternatives \(H_{1,n}.\) We need the following assumption on the contiguity of probability measures (see Van der Vaart 1998):
- \(B_1\) :
-
\(P_{\theta _0+\varphi (n)^{1/2}h}\) is a sequence of contiguous probability measures with respect to \(P_0;\) i.e. \( \lim _{n\rightarrow \infty } P_{0}(A_n)=0\) implies \(\lim _{n\rightarrow \infty } P_{\theta _0+\varphi (n)^{1/2}h}(A_n)=0\) for every measurable sets \(A_n\).
Remark 4.1
The assumption \(B_1\) holds if we assume \(A_1- A_{4}, A_5[2k_0]\) and the conditions:
-
(i)
there exists a constant \(C>0\) such that the following estimates hold
$$\begin{aligned} |b(\alpha ,x)|\le C(1+|x|),\quad \left| \frac{\partial }{\partial x}b(\alpha ,x)\right| +|\sigma (\beta ,x)|+\left| \frac{\partial }{\partial x}\sigma (\beta ,x)\right| \le C \end{aligned}$$for all \((\alpha ,\beta )\in \Theta \) and \(x\in {\mathbb {R}};\)
-
(ii)
there exists \(C_0>0\) and \(K>0\) such that
$$\begin{aligned} b(\alpha ,x)x\le -C_0|x|^2+K \end{aligned}$$for all \((\alpha ,x)\in \Theta _\alpha \times {\mathbb {R}};\)
-
(iii)
there exists a constant \(C_1>1\) such that
$$\begin{aligned} \frac{1}{C_1}\le \sigma (\beta ,x)\le C_1, \text { for all } (\beta ,x)\in \Theta _\beta \times \mathbb {R}. \end{aligned}$$
Under the above assumptions, Gobet (2002) proved the Local Asymptotic Normality (LAN) for the likelihood of the ergodic diffusions (1.1); i.e.
By means of Le Cam’s first lemma (see Van der Vaart 1998), LAN property implies the contiguity of \(P_{\theta _0+\varphi (n)^{1/2}h}\) with respect to \(P_0.\)
Now, we are able to study the asymptotic probability distribution of \(T_{p,n}\) under \(H_{1,n}.\)
Theorem 4
Let p be an integer and \(k_0=[p/2].\) Assume \(A_1- A_{4}, A_5[2k_0], A_6\) and \(B_1\) fulfill. Under the local alternative hypothesis \(H_{1,n},\) if \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty , n\Delta _n^p\rightarrow 0\) as \(n\rightarrow \infty \), the following weak convergence holds
where the random variable \(\chi ^2_{l+m}(h'I(\theta _0)h)\) is a non-central chi square random variable with \(l+m\) degrees of freedom and non-centrality parameter \(h'I(\theta _0)h\).
Remark 4.2
If we deal with \(H_0 : \theta =\theta _0\) and the local alternative hypothesis \(H_{1,n},\) Theorem 4 leads to the following approximation of the power functions
where \(\mathbf {F}(\cdot )\) is the cumulative function of the random variable \(\chi ^2_{m_1+m_2}(h'I(\theta _0)h)\).
Remark 4.3
The Generalized Quasi-Likelihood Ratio, Wald, Rao type test statistics have been studied by Kitagawa and Uchida (2014). These test statistics are, respectively, defined as follows
where
and \(R_{p,n}\) is well-defined if \( I_{p,n}(\theta )\) is nonsingular. The above test statistics are asymptotically equivalent to \(T_{p,n};\) i.e. under \(H_0,\)\(L_{p,n},W_{p,n}\) and \(R_{p,n}\) weakly converge to a \(\chi ^2\) random variable.
Remark 4.4
In De Gregorio and Iacus (2013), the authors dealt with (for \(p=2\)) test statistics based on an empirical version of the true \(\phi \)-divergences; i.e.
where \(\phi \) represents a suitable convex function and \(l_n\) is given by (3.2). In the present paper, the starting point is represented by the \(L^2\)-distance between two diffusion parametric models. Somehow, the approach developed in this work is close to that developed by Aït-Sahalia (1996), where a test based on the \(L^2\)-distance measure between the density function and its nonparametric estimator is introduced.
Remark 4.5
From a practical point of view, since sometimes \(\alpha = \alpha _0\) and \(\beta = \beta _0\) have different meanings, it is possible to deal with a stepwise procedure. For instance as \(p=2,\) first, we test \(\beta = \beta _0\) by means of
and then, in the second step, we test \(\alpha = \alpha _0\) by taking into account
where \({{\tilde{\alpha }}}_{1,n}\) and \({{\tilde{\beta }}}_{0,n}\) are the adaptive estimators defined in the Remark 3.3.
5 Numerical analysis
Although all test statistics presented in the above and in the literature satisfy the same asymptotic results, for small sample sizes the performance of each test statistic is determined by the statistical model generating the data and the quality of the approximation of the quasi-likelihood function. To put in evidence these effects we consider the two stochastic models presented in Sect. 2, namely the Ornstein-Uhlenbeck (OU in the tables) of Eq. (2.1) and the CIR model of Eq. (2.2). In this numerical study we consider the power of the test under local alternatives for different test statistics:
-
the \(\phi \) divergence of Eq. (4.12) with \(\phi (x) = 1-x+x \log (x)\), which is equivalent to the approximated Kullback–Leibler divergence (see, De Gregorio and Iacus 2013). We use the label AKL in the tables for this approximate Kullback-Leibler measure;
-
the \(\phi \) divergence with \(\phi (x) = \left( \frac{x-1}{x+1}\right) ^2\): this was proposed in Balakrishnan and Sanghvi (1968), we name it BS in the tables;
-
the Generalized Quasi-Likelihood Ratio test with \(p=2\), see e.g., (4.9), denoted as GQLRT in the tables;
-
the Rao test statisticsFootnote 1\(R({{\hat{\theta }}}_{p,n},\theta _0)\) of Eq. (4.11), denoted as RAO in the tables;
-
and the statistic \(T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\) proposed in this paper and defined in Eq. (4.4), with \(p=2\), denoted as \(T_{2,n}\) in the tables.
The sample sizes have been chosen to be equal to \(n=50, 100, 250, 500, 1000\) observations and time horizon is set to \(T=n^\frac{1}{3}\), in order to satisfy the asymptotic theory. For testing \(\theta _0\) against the local alternatives \(\theta _0 + \frac{h}{\sqrt{n\Delta _n}}\) for the parameters in the drift coefficient and \(\theta _0 + \frac{h}{\sqrt{n}}\) for the parameters in the diffusion coefficient, h is taken in a grid from 0 to 1, and \(h=0\) corresponds to the null hypothesis \(H_0\). For the data generating process, we consider the following statistical models
-
OU
the one-dimensional Ornstein–Uhlenbeck model solution to \(\mathrm {d}X_t = \alpha _1(\alpha _2- X_t)\mathrm {d}t + \beta _1 \mathrm {d}W_t\), \(X_0=1\), with \(\theta _0=(\alpha _1, \alpha _2, \beta _1) = (0.5, 0.5, 0.25)\);
-
CIR
the one-dimensional CIR model solution to \(\mathrm {d}X_t = \alpha _1(\alpha _2- X_t)\mathrm {d}t + \beta _1 \sqrt{X_t} \mathrm {d}W_t\), \(X_0=1\), with \(\theta _0=(\alpha _1, \alpha _2,\beta _1) = (0.5, 0.5, 0.125)\).
In each experiments the process have been simulated at high frequency using the Euler-Maruyama scheme and resampled to obtain \(n=50, 100, 250, 500, 1000\) observations. Remark that, even if the Ornstein-Uhlenbeck process has a Gaussian transition density, this density is different from the Euler-Maruyama Gaussian density for non negligible time mesh \(\Delta _n\) (see, Iacus 2008). For the simulation we used the R package yuima (see, Iacus and Yoshida 2017). Each experiment is replicated 1000 times and from the empirical distribution of each test statistic, say \(S_n\), we define the rejection threshold of the test as \(\tilde{\chi }^2_{3,0.05}\), i.e. \({{\tilde{\chi }}}^2_{3,0.05}\) is the 95% quantile of the empirical distribution of \(S_n,\) that is
Similarly, we define the empirical power function of the test as
where \({{\hat{\theta }}}_n\) is the maximum quasi-likelihood estimator defined in (3.13). The choice of using the empirical threshold \({{\tilde{\chi }}}^2_{3,0.05}\) instead of the theoretical threshold \( \chi ^2_{3,0.05}\) from the \(\chi ^2_3\) distribution, is due to the fact that otherwise the tests are non comparable. Indeed, the empirical level of the test is not 0.05 for small sample sizes when \(\chi ^2_{3,0.05}\) is used as rejection threshold and, for example, when \(h=0\) different choices of the test statistic produce different empirical levels of the test. Tables 1 and 2 contain the empirical power function of each test. In these tables the bold face font is used to put in evidence the test statistics with the highest empirical power function \(\mathrm{EPow}(h)\) for a given local alternative \(h>0\).
From this numerical analysis we can see several facts:
-
the test statistic based on the AKL does not perform as the GQLR test despite they are related to the same divergence; the latter being sometimes better;
-
the \(T_{2,n}\) seems to be (almost) uniformly more powerful in this experiment;
-
all but RAO test seem to have a good behaviour when the alternative is sufficiently large;
-
for the CIR model, the RAO test does not perform well under the alternative hypothesis and this is probably because it requires very large T which, in our case, is at most \(T=10\). For the OU Gaussian case, the performance are better and in line from those presented in Kitagawa and Uchida (2014) for similar sample sizes.
Therefore, we can conclude that, despite all the test statistics share the same asymptotic properties, the proposed \(T_{p,n}\) seems to perform very well in the small sample case examined in the above Monte Carlo experiments, at least for \(p=2\).
6 Proofs
In order to prove the theorems appearing in the paper, we need some preliminary results. Let us start with the following lemmas.
Lemma 1
For \(k\ge 1\) and \(t_{i-1}^n\le t\le t_i^n\)
If \(f:\Theta \times {\mathbb {R}}\rightarrow {\mathbb {R}}\) is of polynomial growth in x uniformly in \(\theta \) then
Proof
See the proof of Lemma 6 in Kessler (1997). \(\square \)
Lemma 2
For \(l\ge 1\)
Proof
The equalities from (6.3) to (6.6) represent the statement of Lemma 7 in Kessler (1997). By using the same approach adopted for the proof of the aforementioned lemma, we observe that from (6.3) to (6.6), the result (6.7) and (6.8) hold, if we are able to show that
We only prove (6.12), because (6.11) follows by means of similar arguments. By applying the Itô-Taylor formula [see Lemma 1, in Florens-Zmirou (1989)] to the function \(f_x(y)=(y-x)^6\) we obtain
By applying (6.2), we obtain
Furthermore, by means of long and cumbersome calculations, we can show that \(f_x(x)=L_0f_{x}(x)=L_0^{2}f_{x}(x)=0\), while \(L_0^{3}f_x(x)=5\cdot 3\cdot 3! c_{i-1}^3(\beta _0).\)
Analogously to what done, from (6.3) to (6.8), the equalities (6.9) and (6.10) hold, if we are able to show that
We only prove (6.14), because (6.13) follows by means of similar arguments. The application of the Itô-Taylor formula to the function \(f_x(y)=(y-x)^8\) yields
By applying (6.2), we get
Furthermore, by means of long and cumbersome calculations, we can show that \(f_x(x)=L_0f_x(x)=L_0^2f_x(x)=L_0^3f_x(x)=0\) while \(L_0^4f_x(x)=7\cdot 5\cdot 3\cdot 4! c^4_{i-1}(\beta _0).\)\(\square \)
Lemma 3
(Triangular arrays convegence) Let \(U_i^n\) and U be random variables, with \(U_i^n\) being \(\mathcal {G}_{i}^n\)-measurable. The two following conditions imply \(\sum _{i=1}^nU_i^n\overset{P}{\underset{n\rightarrow \infty }{\longrightarrow }} U\):
Proof
See the proof of Lemma 9 in Genon-Catalot and Jacod (1993). \(\square \)
Lemma 4
Let \(f:\Theta \times \mathbb {R}\rightarrow \mathbb {R}\) be such that \(f(\theta ,x)\in C_{\uparrow }^{1,1}(\Theta \times \mathbb {R},\mathbb {R}).\) Let us assume \(A_1-A_6\), if \(\Delta _n\rightarrow 0\) and \(n\Delta _n\rightarrow \infty \) we have that
uniformly in \(\theta \).
Proof
See the proof of Lemma 8 in Kessler (1997). \(\square \)
Lemma 5
Let \(f:\Theta \times \mathbb {R}\rightarrow \mathbb {R}\) be such that \(f(\theta ,x)\in C_{\uparrow }^{1,1}(\Theta \times \mathbb {R},\mathbb {R}).\) Let us assume \(A_1-A_6\), if \(\Delta _n\rightarrow 0\) and \(n\Delta _n\rightarrow \infty ,\) as \(n\rightarrow \infty ,\) we have that
uniformly in \(\theta \).
Proof
The cases \(j=1,k=1\) and \(j=1,k=2\) coincide with Lemma 9 and Lemma 10 in Kessler (1997) and then we use the same approach to show that remaining convergences hold true.
By setting
we prove that the convergence holds for all \(\theta .\) By taking into account Lemma 2 and Lemma 4
Therefore by Lemma 3 we can conclude that
for all \(\theta .\) For the uniformity it is sufficient to prove the tightness of the sequence of random elements
taking values in the Banach space \(C(\Theta )\) endowed with the sup-norm \(||\cdot ||_\infty .\) From the assumptions of lemma follows that \(\sup _nE_0[\sup _{\theta \in \Theta }|\partial _\theta Y_n(\theta )|]<\infty \) which implies the tightness of \(Y_n(\theta )\) for the criterion given by Theorem 16.5 in Kallenberg (2001).
By setting
we prove that the convergence holds for all \(\theta .\) By taking into account Lemmas 2 and 4
Therefore by Lemma 3 we get the pointwise convergence. For the uniformity of the convergence we proceed as done above. \(\square \)
Before to proceed with the proofs of the main theorems of the paper, we introduce some useful quantities coinciding with (4.2)−(4.8) appearing in Kessler (1997). We can write down
where
Furthermore
where
and
where
From (6.15) it is possible to derive
where
where
and
where
Proof of Theorem 2
We observe that
Under \(H_0,\) from Lemmas 2 and 5, we derive
uniformly in \(\theta .\) Thus the statement of the theorem immediately follows. \(\square \)
Let
where
The following proposition concerning the asymptotic behavior of \(C_{p,n}(\theta ,\theta _0)\) plays a crucial role in the proof of Theorem 3.
Proposition 1
Under \(H_0,\) assume \( A_1- A_6\) and \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty ,\) as \(n\rightarrow \infty \). The following convergences hold
and
Proof of Proposition 1
We study the uniform convergence in probability of \(C_{p,n}(\theta ,\theta _0).\) Thus we prove that uniformly in \(\theta \)
where
Let us start with the analysis of the quantity \(\frac{1}{n\Delta _n}\partial _{\alpha _h\alpha _k}^2 T_{p,n}(\theta ,\theta _0)\) given by (6.22) which can be split in two terms. From (6.16) folllows that
for each \(\theta \in \Theta .\) Since \(\partial _{\alpha _h} r_{k_0}(\Delta _n,X_{i-1},\theta )=\Delta _n\partial _{\alpha _h} b_{i-1}(\alpha )+R(\theta ,\Delta _n^2, X_{i-1}),\) by taking into account Lemma 5, we get
uniformly in \(\theta .\) Now, by resorting (6.15) and (6.18), we rewrite the second term appearing in (6.22) as follows
By applying Lemmas 1 and 5, the following convergence results hold
uniformly in \(\theta .\) Finally, since \(\mathrm {d}_1(\theta ,x)=-\mathrm {e}_1(\theta ,x),\) we get
uniformly in \(\theta .\) Hence, by (6.28) and (6.29), we immediately derive
uniformly in \(\theta .\)
Now, we consider the elements of the matrix \(C_{n,p}(\theta ,\theta _0)\) given by (6.23). First, we study the convergence probability of
Since \(\partial _{\beta _h} r_{k_0}(\Delta _n,X_{i-1},\theta )=R(\theta ,\Delta _n^2, X_{i-1}),\) from Lemmas 5 and 1 we derive
uniformly in \(\theta .\) Now, by resorting (6.15) and (6.19), we rewrite the second term appearing in (6.23) as follows
By taking into account again Lemmas 1 and 5, the following results yield
uniformly in \(\theta .\) Finally
uniformly in \(\theta .\) Therefore, by (6.31) and (6.32), we get
uniformly in \(\theta .\)
Recalling the expressions (6.16), (6.17), (6.20) and (6.15), by means of similar arguments adopted above, it is not hard to prove that
and
uniformly in \(\theta .\) This implies that
uniformly in \(\theta .\)
In conclusion the results (6.30), (6.33) and (6.34) lead to the convergence (6.27). Moreover, (6.27) implies (6.25) since \(K(\theta _0,\theta _0)=I(\theta _0)\). From the inequality
follows (6.26). Indeed, (6.25) leads to \(||2I(\theta _0)-C_{p,n}(\theta _0,\theta _0)||\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0,\varepsilon _n\rightarrow 0,\) while the term \(\sup _{||\theta ||\le \varepsilon _n}||C_{p,n}(\theta _0+\theta ,\theta _0)-2K(\theta _0+\theta ,\theta _0)||\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0,\varepsilon _n\rightarrow 0,\) by the uniformity of the convergence (i.e. by the result (6.27)). Furthermore, \(\sup _{||\theta ||\le \varepsilon _n}||K(\theta _0+\theta ,\theta _0)-I(\theta _0)||\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0,\varepsilon _n\rightarrow 0,\) because the assumptions \(A_3\) and \(A_5,\) imply that \(K(\theta ,\theta _0)\) is a continuous function with respect to \(\theta .\)\(\square \)
Now, we are able to prove Theorem 3.
Proof of Theorem 3
We adopt classical arguments. By Taylor’s formula, we have that
where in the last step we denoted by
Proposition 1 implies
By taking into account (6.35), (3.16) and (6.36), Slutsky’s theorem allows to conclude the proof. \(\square \)
Proof of Theorem 4
Under \(H_{1,n}\) we have that [see Lemma 2 in Kitagawa and Uchida (2014)]
Therefore, under the hypothesis \(H_{1,n}\)
and
References
Aït-Sahalia Y (1996) Testing continuous-time models of the spot interest rate. Rev Financial Stud 70:385–426
Aït-Sahalia Y (2002) Maximum-likelihood estimation of discretely-sampled diffusions: a closed-form approximation approach. Econometrica 70:223–262
Aït-Sahalia Y (2008) Closed-form likelihood expansions for multivariate diffusions. Ann Stat 36:906–937
Balakrishnan V, Sanghvi LD (1968) Distance between populations on the basis of attribute data. Biometrica 24:859–865
Bibby BM, Sørensen M (1995) Martingale estimating functions for discretely observed diffusion processes. Bernoulli 1:17–39
De Gregorio A, Iacus SM (2008) Least squares volatility change point estimation for partially observed diffusion processes. Commun Stat Theory Methods 37:2342–2357
De Gregorio A, Iacus SM (2010) Clustering of discretely observed diffusion processes. Comput Stat Data Anal 54:598–606
De Gregorio A, Iacus SM (2012) Adaptive LASSO-type estimation for multivariate diffusion processes. Econ Theory 28:838–860
De Gregorio A, Iacus SM (2013) On a family of test statistics for discretely observed diffusion processes. J Multivar Anal 122:292–316
Florens-Zmirou D (1989) Approximate discrete-time schemes for statistics of diffusion processes. Statistics 20:547–557
Genon-Catalot V, Jacod J (1993) On the estimation of the diffusion coefficient for multidimensional diffusion processes. Ann Inst Henri Poincaré 29:119–151
Gobet E (2002) LAN property for ergodic diffusions with discrete observations. Ann I H Poincaré-PR 38:711–737
Iacus SM (2008) Simulation and inference for stochastic differential equations: with R examples. Springer series in statistics. Springer, New York
Iacus SM (2011) Option pricing and estimation of financial models with R. Wiley, New York
Iacus SM, Yoshida N (2012) Estimation for the change point of volatility in a stochastic differential equation. Stoch Process Appl 122:1068–1092
Iacus SM, Yoshida N (2017) Simulation and inference for stochastic processes with YUIMA. Springer series in statistics. Springer, New York
Iacus SM, Uchida M, Yoshida N (2009) Parametric estimation for partially hidden diffusion processes sampled at discrete times. Stoch Process Appl 119:1580–1600
Jacod J (2006) Parametric inference for discretely observed non-ergodic diffusions. Bernoulli 12:383–401
Kallenberg O (2001) Foundations of modern probability. Springer, London
Kamatani K, Uchida M (2015) Hybrid multi-step estimators for stochastic differential equations based on sampled data. Stat Inference Stoch Process 18:177–204
Kitagawa H, Uchida M (2014) Adaptive test statistics for ergodic diffusion processes sampled at discrete times. J Stat Plan Inference 150:84–110
Kessler M (1995) Estimation des parametres d’une diffusion par des contrastes corrigés. C R Acad Sci Paris Ser I Math 320:359–362
Kessler M (1997) Estimation of an ergodic diffusion from discrete observations. Scand J Stat 24:211–229
Kessler M, Sørensen M (1999) Estimating equations based on eigenfunctions for a discretely observed diffusion process. Bernoulli 5:299–314
Kutoyants YA (2004) Statistical inference for ergodic diffusion processes. Springer, London
Li C (2013) Maximum-likelihood estimation for diffusion processes via closed-form density expansions. Ann Stat 41:1350–1380
Morales D, Pardo L, Vajda I (1997) Some new statistics for testing hypotheses in parametric models. J Multivar Anal 67:137–168
Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall/CRC, London
Phillips PCB, Yu J (2009) A two-stage realized volatility approach to estimation of diffusion processes with discrete data. J Econ 150:139–150
Uchida M, Yoshida N (2012) Adaptive estimation of an ergodic diffusion process based on sampled data. Stoch Process Appl 122:2885–2924
Uchida M, Yoshida N (2014) Adaptive Bayes type estimators of ergodic diffusion processes from discrete observations. Stat Inference Stoch Process 17:181–219
Van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Yoshida N (1992) Estimation for diffusion processes from discrete observation. J Multivar Anal 41:220–242
Yoshida N (2011) Polynomial type large deviation inequalities and quasi-likelihood analysis for stochastic dierential equations. Ann Inst Stat Math 63:431–479
Acknowledgements
We would like to thank both the referees for their comments which have greatly improved the first version of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
De Gregorio, A., Iacus, S.M. Empirical \(L^2\)-distance test statistics for ergodic diffusions. Stat Inference Stoch Process 22, 233–261 (2019). https://doi.org/10.1007/s11203-018-9176-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11203-018-9176-x