1 Introduction

Let \((\Omega ,\mathcal F,\mathbf{F}=(\mathcal F_t)_{t\ge 0},P)\) be a filtered complete probability space. Let us consider a 1-dimensional processes \(X:=(X_t)_{t\ge 0}\) solution to the following stochastic differential equation

$$\begin{aligned} \mathrm {d}X_t=b(\alpha ,X_{t})\mathrm {d}t+\sigma (\beta ,X_{t})\mathrm {d}W_t,\quad X_0=x_0, \end{aligned}$$
(1.1)

where \(x_0\) is a deterministic initial value. We assume that \(b: \Theta _{\alpha }\times \mathbb {R}\rightarrow \mathbb {R}\), \(\sigma :\Theta _\beta \times \mathbb {R} \rightarrow \mathbb {R}\) are Borel known functions (up to \(\alpha \) and \(\beta \)) and \((W_t)_{t\ge 0}\) is a one-dimensional standard \(\mathcal F_t\)-Brownian motion. Furthermore, \(\alpha \in \Theta _\alpha \subset \mathbb {R}^{m_1},\beta \in \Theta _\beta \subset \mathbb {R}^{m_2},m_1, m_2\in {\mathbb {N}},\) are unknown parameters and \(\theta =(\alpha ,\beta )\in \Theta :=\Theta _\alpha \times \Theta _\beta ,\) where \(\Theta _{\alpha }\) and \(\Theta _{\beta }\) are compact convex sets. We denote by \(\theta _0:=(\alpha _0,\beta _0)\) the true value of \(\theta \) and assume that \(\theta _0\in \) Int\((\Theta )\).

The sample path of X is observed only at \(n+1\) equidistant discrete times \(t_i^n\), such that \(t_i^n-t_{i-1}^n=\Delta _n<\infty \) for \(i=1,\ldots ,n,\) (with \(t_0^n=0\)). Therefore the data, denoted by \((X_{t_i^n})_{0\le i\le n},\) are the discrete observations of the sample path of X. Let p be an integer with \(p\ge 2.\) The asymptotic scheme adopted in this paper is the following: \(n\Delta _n\rightarrow \infty \), \(\Delta _n\rightarrow 0\) and \(n\Delta _n^p\rightarrow 0\) as \(n\rightarrow \infty \). This scheme is called rapidly increasing design, i.e. the number of observations grows over time but no so fast.

This setting is useful, for instance, in the analysis of financial time series. In mathematical finance and econometric theory, diffusion processes described by the stochastic differential equations (1.1) play a central role. Indeed, they have been used to model the behavior of stock prices, exchange rates and interest rates. The underlying stochastic evolution of the financial assets can be thought continuous in time, although the data are always recorded at discrete instants (e.g. weekly, daily or each minute). For these reasons, the estimation problems for discretely observed stochastic differential equations have been tackled by many authors with different approaches (see, for instance, Florens-Zmirou 1989; Yoshida 1992; Genon-Catalot and Jacod 1993; Bibby and Sørensen 1995; Kessler 1997; Kessler and Sørensen 1999; Aït-Sahalia 2002; Gobet 2002; Jacod 2006; Aït-Sahalia 2008; De Gregorio and Iacus 2008; Phillips and Yu 2009; Yoshida 2011; Uchida and Yoshida 2012; Li 2013; Uchida and Yoshida 2014; Kamatani and Uchida 2015). For clustering time series arising from discrete observations of diffusion processes De Gregorio and Iacus (2010) proposed a new dissimilarity measure based on the \(L^1\) distance between the Markov operators. The change-point problem in the diffusion term of a stochastic differential equation has been considered in De Gregorio and Iacus (2008) and Iacus and Yoshida (2012). In Iacus et al. (2009), the authors faced the estimation problem for hidden diffusion processes observed at discrete times. An adaptive Lasso-type estimator is proposed in De Gregorio and Iacus (2012). For the simulation and the practical implementation of the statistical inference for stochastic differential equations see Iacus (2008, 2011) and Iacus and Yoshida (2017).

We also recall that the statistical inference for continuously observed ergodic diffusions is a well-developed research topic; on this point the reader can consult Kutoyants (2004).

The main object of interest of the present paper is the problem of testing parametric hypotheses for diffusion processes from discrete observations. This research topic is less developed in literature. It is well-known that for testing two simple alternative hypotheses, the Neyman-Pearson lemma provides a procedure based on the likelihood ratio which leads to the uniformly most powerful test. In the other cases uniformly most powerful tests do not exist and for this reason the research of new criteria is justified.

For discretely observed stochastic differential equations, Kitagawa and Uchida (2014) introduced and studied the asymptotic behavior of three kinds of test statistics: likelihood ratio type test statistic, Wald type test statistic and Rao’s score type test statistic.

Another possible approach is based on the divergences. Indeed, several statistical divergence measures (which are not necessarily a metric) and distances have been introduced to decide if two probability distributions are close or far. The main goal of this metric is to make “easy to distinguish” between a pair of distributions which are far from each other than between those which are closer. These tools have been used for testing hypotheses in parametric models. The reader can consult on this point, for example, Morales et al. (1997) and Pardo (2006). For stochastic differential equations sampled at discrete times, De Gregorio and Iacus (2013) introduced a family of test statistics (for \(p=2\) and \(n\Delta _n^2\rightarrow 0\)) based on empirical \(\phi \)-divergences.

We consider the following hypotheses testing problem concerning the vector parameter \(\theta \)

$$\begin{aligned} H_0:\theta =\theta _0,\quad \text {vs}\quad H_1:\theta \ne \theta _0, \end{aligned}$$

and assume that X is observed at discrete times; that is the data \((X_{t_i^n})_{0\le i\le n}\) are available. In this work we study different test statistics with respect to those used in De Gregorio and Iacus (2013) and Kitagawa and Uchida (2014). Indeed, the purpose of this paper is to propose a methodology based on a suitable “distance” between the approximated transition functions. This idea follows from the observation that for continuously observed sample paths of (1.1), we could define the \(L^2\)-distance between the continuous log-likelihood. Clearly this approach is not useful in our framework and then, similarly to the aforementioned papers, we consider the local gaussian approximation of the transition density of the process X from \(X_{t_{i-1}}\) to \(X_{t_i}.\) In other words, we resort the quasi-likelihood function introduced in Kessler (1997), which is defined by means of an approximation with higher order correction terms to relax the condition of convergence of \(\Delta _n\) to zero. Therefore, let \(\texttt {l}_{p,i}(\theta ),\theta \in \Theta ,\) be the approximated log-transition function from \(X_{t_{i-1}}\) to \(X_{t_i}\) representing the parametric model (1.1). We deal with

$$\begin{aligned} \mathbb {D}_{p,n}(\theta _1,\theta _2):=\frac{1}{n}\sum _{i=1}^n[\texttt {l}_{p,i}(\theta _1)-\texttt {l}_{p,i}(\theta _2)]^2,\quad \theta _1,\theta _2\in \Theta , \end{aligned}$$

which can be interpreted as the empirical \(L^2\)-distance between two loglikelihoods. If \({{\hat{\theta }}}_{p,n}\) is the maximum quasi-likelihood estimator introduced in Kessler (1997), we are able to prove that, under \(H_0,\) the test statistic

$$\begin{aligned} T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0):=n\mathbb {D}_{p,n}(\hat{\theta }_{p,n},\theta _0) \end{aligned}$$

is asymptotically distribution free; i.e. it converges in distribution to a chi squared random variable. Furthermore, we study the power function of the test under local alternatives.

The paper is organized as follows. Section 2 contains the notations and the assumptions of the paper. The contrast function arising from the quasi-likelihood approach is briefly discussed in Sect. 3. In the same section we define the maximum quasi-likelihood estimator and recall its main asymptotic properties. In Sect. 4 we introduce and study a test statistic for the hypotheses problem \(H_0:\theta =\theta _0\) vs \(H_1: \theta \ne \theta _0\). The proposed new test statistic shares the same asymptotic properties of the other test statistics presented in the literature. Therefore, to justify its use in practice among its competitors, a numerical study is included in Sect. 5 which contains a comparison of several test statistics in the “small sample” case, i.e., when the asymptotic conditions are not met. Our numerical analysis shows that, at least for \(p=2,\) the performance of \(T_{2,n}\) is very good. The proofs are collected in Sect. 6.

It is worth to point out that for the sake of simplicity in this paper a 1-dimensional diffusion is treated. Nevertheless, it is possible to extend our methodology to the multidimensional stochastic differential equations setting.

2 Notations and assumptions

Throughout this paper, we will use the following notation.

  • \(\theta :=(\alpha ,\beta )\) and \(\alpha _0,\beta _0\) and \(\theta _0\) denote the true values of \(\alpha ,\beta \) and \(\theta \) respectively.

  • \(c(\beta ,x)=\sigma ^2(\beta ,x).\)

  • C is a positive constant. If C depends on a fixed quantity, for instance an integer k,  we may write \(C_k.\)

  • \(\partial _{\alpha _h}:=\frac{\partial }{\partial \alpha _h},\partial _{\beta _k}:=\frac{\partial }{\partial \beta _k}, \partial ^2_{\alpha _h\alpha _k}:=\frac{\partial ^2}{\partial \alpha _h\partial \alpha _k}, h,k=1,\ldots , m_1, \partial ^2_{\beta _h\beta _k}:=\frac{\partial ^2}{\partial \beta _h\partial \beta _k}, h,k=1,\ldots ,m_2,\partial ^2_{\alpha _h\beta _k}:=\frac{\partial ^2}{\partial \alpha _h\partial \beta _k}, h=1,\ldots ,m_1, k=1,\ldots ,m_2,\)\(\partial _\theta :=(\partial _\alpha ,\partial _\beta )',\) where \(\partial _\alpha :=(\partial _{\alpha _1},\ldots ,\partial _{\alpha _{m_1}})'\) and \(\partial _\beta :=(\partial _{\beta _1},\ldots ,\partial _{\beta _{m_2}})',\)\(\partial _\theta ^2:=[\partial _{\alpha _j\beta _k}^2]_{h=1,\ldots ,m_1, k=1,\ldots ,m_2}.\)

  • If \(f:\Theta \times {\mathbb {R}}\rightarrow {\mathbb {R}},\) we denote by \(f_{i-1}(\theta )\) the value \(f(\theta , X_{t_{i-1}^n})\); for instance \(c(\beta , X_{t_{i-1}^n})=c_{i-1}(\beta )\).

  • For \(0\le i\le n, t_i^n:=i\Delta _n\) and \({\mathcal {G}}_i^n:=\sigma (W_s,s\le t_i^n).\)

  • The random sample is given by \(\mathbf{X}_n:=(X_{t_i^n})_{0\le i\le n}\) and \(X_i:=X_{t_i^n}.\)

  • The probability law of (1.1) is denoted by \(P_\theta \) and \(E_\theta ^{i-1}[\cdot ]:=E_\theta [\cdot |{\mathcal {G}}_{i-1}^n].\) We set \(P_0:=P_{\theta _0}\) and \(E_0^{i-1}[\cdot ]:= E_{\theta _0}^{i-1}[\cdot ].\)

  • \(\overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }} \) and \(\overset{d}{\underset{n\rightarrow \infty }{\longrightarrow }} \) stand for the convergence in probability and in distribution, respectively.

  • Let \(F_n:\Theta \times {\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) and \(F:\Theta \rightarrow {\mathbb {R}};\)\(``F_n(\theta , \mathbf{X}_n)\overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }} F(\theta )\) uniformly in \(\theta ''\) stands for

    $$\begin{aligned} \sup _{\theta \in \Theta }\left| F_n(\theta , \mathbf{X}_n)-F(\theta )\right| \overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }} 0. \end{aligned}$$

    Furthermore, if \(F_n(\theta , \mathbf{X}_n)\overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }} 0\) uniformly in \(\theta \) we set

    $$\begin{aligned} F_n(\theta , \mathbf{X}_n)=\mathbf {o}_{P_\theta }(1). \end{aligned}$$
  • Let \(u_n\) be a \(\mathbb {R}\)-valued sequence. We indicate by R a function \(\Theta \times \mathbb {R}^2\rightarrow \mathbb {R}\) for which there exists a constant C such that

    $$\begin{aligned} R(\theta ,u_n,x)\le u_nC(1+|x|)^C,\quad \text {for all}\,\theta \in \Theta , x\in \mathbb {R}^2, n\in \mathbb {N}. \end{aligned}$$
  • For a \(m\times n\) matrix A, \(||A||^2=\text {tr}(AA')=\sum _{i=1}^m\sum _{j=1}^n |A_{ij}|^2\) and \(I_m\) stands for the identity matrix of size m.

Let \(C_{\uparrow }^{k,h}({\mathbb {R}}\times \Theta ; {\mathbb {R}})\) be the space of all functions f such that:

  1. (i)

    \(f(\theta ,x)\) is a \({\mathbb {R}}\)-valued function on \( \Theta \times {\mathbb {R}};\)

  2. (ii)

    \(f(\theta ,x)\) is continuously differentiable with respect to x up to order \(k\ge 1\) for all \(\theta ;\) these x-derivatives up to order k are of polynomial growth in x,  uniformly in \(\theta \);

  3. (iii)

    \(f(\theta ,x)\) and all x-derivatives up to order \(k\ge 1,\) are \(h\ge 1\) times continuously differentiable with respect to \(\theta \) for all \(x\in {\mathbb {R}}.\) Moreover, these derivatives up to the h-th order with respect to \(\theta \) are of polynomial growth in x,  uniformly in \(\theta \).

We need some standard assumptions on the regularity of the process X.

\( A_1\) :

(Existence and uniqueness) There exists a constant C such that

$$\begin{aligned} \sup _{\alpha \in \Theta _\alpha }|b(\alpha ,x)-b(\alpha ,y)|+\sup _{\beta \in \Theta _\beta }|\sigma (\beta ,x)-\sigma (\beta ,y)|\le C|x-y|. \end{aligned}$$
\( A_2\) :

(Ergodicity) The process X is ergodic for \(\theta =\theta _0\) with invariant probability measure \(\pi _0(\mathrm {d}x)\). Thus

$$\begin{aligned} \frac{1}{T}\int _0^Tf(X_t)\mathrm {d}t\overset{P_\theta }{\underset{T\rightarrow \infty }{\longrightarrow }} \int f(x)\pi _0(\mathrm {d}x), \end{aligned}$$

where \(f\in L^1(\pi _0)\). Furthermore, we assume that \(\pi _0\) admits all moments finite.

\( A_3\) :

\(\inf _{x,\beta }\sigma (\beta ,x)>0.\)

\( A_4\) :

(Moments) For all \(q\ge 0\), \(\sup _t E|X_t|^q<\infty \).

\(A_5\) :

[k] (Smoothness) \(b\in C_{\uparrow }^{k,3}(\Theta _\alpha \times {\mathbb {R}},{\mathbb {R}})\) and \(\sigma \in C_{\uparrow }^{k,3}(\Theta _\beta \times {\mathbb {R}},{\mathbb {R}}).\)

\( A_6\) :

(Identifiability) If the coefficients \(b(\alpha ,x)=b(\alpha _0,x)\) and \(\sigma (\beta ,x)=\sigma (\beta _0,x)\) for all x (\(\pi _{0}\)-almost surely), then \(\alpha =\alpha _0\) and \(\beta =\beta _0\).

Let \(L_\theta \) the infinitesimal generator of X with domain given by \(C^2({\mathbb {R}})\) (the space of the twice continuously differentiable function on \({\mathbb {R}}\)); that is if \(f\in C^2({\mathbb {R}})\)

$$\begin{aligned} L_\theta f(x):=b(\alpha ,x)\frac{\partial f}{\partial x}(x)+\frac{c(\beta ,x)}{2}\frac{\partial ^2 f}{\partial x^2}(x),\quad L_0:=L_{\theta _0}. \end{aligned}$$

Under the assumption \(A_5[2(j-1)]\) we can define \(L_\theta ^j:=L_\theta \circ L_\theta ^{j-1}\) with domain \(C^{2j}({\mathbb {R}})\) and \(L_\theta ^0=\)Id.

We conclude this section with some well-known examples of ergodic diffusion processes belonging to the class (1.1):

  • the Ornstein–Uhlenbeck or Vasicek model is the unique solution to

    $$\begin{aligned} \mathrm {d}X_t=\alpha _1(\alpha _2-X_t)\mathrm {d}t+\beta _1 \mathrm {d}W_t,\quad X_0=x_0, \end{aligned}$$
    (2.1)

    where \(b(\alpha _1,\alpha _2,x)=\alpha _1(\alpha _2-x)\) and \(\sigma (\beta _1,x)=\beta _1\) with \(\alpha _1,\alpha _2\in {\mathbb {R}}\) and \(\beta _1>0.\) This stochastic process is a Gaussian process and it is often used in finance where \(\beta _1\) is the volatility, \(\alpha _2\) is the long-run equilibrium of the model and \(\alpha _1\) is the speed of mean reversion. For \(\alpha _1>0\) the Vasicek process is ergodic with invariant law \(\pi _0\) given by a Gaussian law with mean \(\alpha _2\) and variance \(\frac{\beta _1^2}{2\alpha _1}.\) It is easy to check that all the conditions \(A_1-A_6\) fulfill;

  • the Cox–Ingersoll–Ross (CIR) process is the solution to

    $$\begin{aligned} \mathrm {d}X_t=\alpha _1(\alpha _2-X_t)\mathrm {d}t+\beta _1 \sqrt{X_t}\mathrm {d}W_t,\quad X_0=x_0>0, \end{aligned}$$
    (2.2)

    where \(b(\alpha _1,\alpha _2,x)=\alpha _1( \alpha _2-x)\) and \(\sigma (\beta _1,x)=\beta _1\sqrt{x}\) with \(\alpha _1,\alpha _2,\beta _1>0.\) If \(2\alpha _1\alpha _2>\beta _1^2\) the process is strictly positive, otherwise non negative. This model has a conditional density given by the non central \(\chi ^2\) distribution. The CIR process is useful in the description of short-term interest rates and admits invariant law \(\pi _0\) given by a Gamma distribution with shape parameter \(\frac{2\alpha _1\alpha _2}{\beta _1^2}\) and scale parameter \(\frac{\beta _1^2}{2\alpha _1}.\) If (2.2) is strictly positive, we can prove that the above assumptions hold true.

3 Preliminaries on the quasi-likelihood function

We briefly recall the quasi-likelihood function introduced by Kessler (1997) based on the Itô-Taylor expansion. The main problem in the statistical analysis of the diffusion process X is that its transition density is in general unknown and then the likelihood function is unknown as well. To overcome this difficulty one can discretizes the sample path of X by means of Euler-Maruyama’s scheme; namely

$$\begin{aligned} X_{i}{-}X_{i-1}=\int _{t_{i-1}^n}^{t_{i}^n}b(\alpha ,X_s)\mathrm {d}s{+}\int _{t_{i-1}^n}^{t_{i}^n}\sigma (\beta ,X_s)\mathrm {d}W_s\simeq b_{i-1}(\alpha )\Delta _n+\sigma _{i-1}(\beta )(W_{t_i^n}-W_{t_{i-1}^n}). \end{aligned}$$
(3.1)

Hence (3.1) leads to consider a local-Gaussian approximation to the transition density; that is

$$\begin{aligned} {\mathcal {L}}(X_i|X_{i-1})\simeq N(X_{i-1}+b_{i-1}(\alpha )\Delta _n,c_{i-1}(\beta )\Delta _n ) \end{aligned}$$

and the approximated log-likelihood function of the random sample \(\mathbf{X}_n,\) called (negative) quasi-log-likelihood function, becomes

$$\begin{aligned} l_n(\theta ):=\frac{1}{2}\sum _{i=1}^n\left\{ \frac{(X_i-X_{i-1}-b_{i-1}(\alpha )\Delta _n)^2}{c_{i-1}(\beta )\Delta _n}+\log c_{i-1}(\beta ) \right\} . \end{aligned}$$
(3.2)

This approach suggests to consider the mean and the variance of the transition density of X;  that is

$$\begin{aligned} \mathrm {m}(\theta ,X_{i-1}):=E_{\theta }[X_{i}|X_{i-1}],\quad \mathrm {m}_2(\theta ,X_{i-1}):=E_{\theta }[(X_{i}-\mathrm {m}(\theta ,X_{i-1}))^2|X_{i-1}], \end{aligned}$$
(3.3)

and assume

$$\begin{aligned} {\mathcal {L}}(X_i|X_{i-1})\simeq N(\mathrm {m}(\theta ,X_{i-1}),\mathrm {m}_2(\theta ,X_{i-1})). \end{aligned}$$

Thus we can consider as contrast function the following one

$$\begin{aligned} \frac{1}{2}\sum _{i=1}^n\left\{ \frac{(X_i-\mathrm {m}(\theta ,X_{i-1}))^2}{\mathrm {m}_2(\theta ,X_{i-1})}+\log \mathrm {m}_2(\theta ,X_{i-1})\right\} . \end{aligned}$$
(3.4)

Nevertheless, (3.4) does not have a closed form because \(\mathrm {m}(\theta ,X_{i-1})\) and \(\mathrm {m}_2(\theta ,X_{i-1})\) are unknown. Therefore we substitute in (3.4) closed approximations of \(\mathrm {m}\) and \(\mathrm {m}_2\) based on the Itô-Taylor expansion.

Let \(f(y):=y,\) for \(l\ge 0,\) under the assumption \(A_5[2(l - 1)]\), we have the following approximation (see Lemma 1, Kessler 1997)

$$\begin{aligned} \mathrm {m}(\theta ,X_{i-1})=r_l(\Delta _n,X_{i-1},\theta )+R(\theta ,\Delta _n^{l+1},X_{i-1}) \end{aligned}$$
(3.5)

where

$$\begin{aligned} r_l(\Delta _n,X_{i-1},\theta ):=\sum _{i=0}^l \frac{\Delta _n^i}{i!}L_\theta ^i f(x). \end{aligned}$$

Now let us consider the function \((y-r_l(\Delta _n,X_{i-1},\theta ))^2,\) which is for fixed xy and \(\theta \) a polynomial in \(\Delta _n\) of degree 2l. We indicate by \({\overline{g}}_{\Delta _n,x,\theta ,l}(y)\) the sum of its first terms up to degree l;  that is \(\overline{g}_{\Delta _n,x,\theta ,l}(y)=\sum _{j=0}^l \Delta _n^j \overline{g}_{x,\theta }^j(y)\) where

$$\begin{aligned}&\displaystyle {\overline{g}}_{x,\theta }^0(y)=(y-x)^2\end{aligned}$$
(3.6)
$$\begin{aligned}&\displaystyle {\overline{g}}_{x,\theta }^1(y)=-2(y-x)L_\theta f(x)\end{aligned}$$
(3.7)
$$\begin{aligned}&\displaystyle {\overline{g}}_{x,\theta }^j(y)=-2(y-x)\frac{L_\theta ^j f(x)}{j!}+\sum _{r,s\ge 1,r+s=j}\frac{L_\theta ^r f(x)}{r!}\frac{L_\theta ^s f(x)}{s!},\quad 2\le j\le l. \end{aligned}$$
(3.8)

Under the assumption \(A_5[2(l-1)]\), we have that \(L_\theta ^r{\overline{g}}_{x,\theta }^j(y)\) is well-defined for \(r+j=l\) and we set

$$\begin{aligned} \Gamma _l(\Delta _n,x,\theta ):=\sum _{j=0}^l\Delta _n^j\sum _{r=0}^{l-j}\frac{\Delta _n^r}{r!}L_\theta ^r{\overline{g}}_{x,\theta }^j(x):=\sum _{j=0}^l\Delta _n^j\gamma _j(\theta ,x), \end{aligned}$$
(3.9)

where \(\gamma _j(\theta ,x)\) are the coefficients of \(\Delta _n^j\). Therefore by (3.6)–(3.9), we obtain, for instance,

$$\begin{aligned} \gamma _0(\theta ,x)&=L_\theta ^0 {\overline{g}}_{x,\theta }^0(x)=0\\ \gamma _1(\theta ,x)&=L_\theta {\overline{g}}_{x,\theta }^0(x)=c(\beta ,x)\\ \gamma _2(\theta ,x)&=\frac{L_\theta ^2 {\overline{g}}_{x,\theta }^0}{2}(x)+L_\theta {\overline{g}}_{x,\theta }^1(x)+L_\theta ^0 {\overline{g}}_{x,\theta }^2(x)\\&=\frac{1}{2}\left[ b(\alpha ,x)\frac{\partial }{\partial y}c(\beta ,x)+2c(\beta ,x)\frac{\partial }{\partial y}b(\alpha ,x)\right] +\frac{c(\beta ,x)}{4}\frac{\partial ^2 }{\partial y^2} c(\beta ,x) \end{aligned}$$

Let

$$\begin{aligned} \Gamma _l(\Delta _n,x,\theta ):=\Delta _n c(\beta ,x)[1+{{\overline{\Gamma }}}_l(\Delta _n,x,\theta )] \end{aligned}$$

where \(\overline{\Gamma }_l(\Delta _n,x,\theta ):=\frac{\sum _{j=2}^l\Delta _n^j\gamma _j(\theta ,x)}{\Delta _n c(\beta ,x)}.\) For \(l\ge 0,\) under the assumption \(A_5[2l]\)(i), we have that (see Lemma 2, Kessler 1997)

$$\begin{aligned} \mathrm {m}_2(\theta ,X_{i-1})=\Delta _n c_{i-1}(\beta )[1+\overline{\Gamma }_l(\Delta _n,X_{i-1},\theta )]+R(\theta ,\Delta _n^{l+1},X_{i-1}). \end{aligned}$$
(3.10)

It seems quite natural at this point to substitute (3.5) and (3.10) into the expression (3.4). Nevertheless, in order to avoid technical difficulties related to the control of denominator and logarithmic we consider a further expansion in \(\Delta _n\) of \((1+{{\overline{\Gamma }}}_l)^{-1}\) and \(\log (1+{{\overline{\Gamma }}}_l)\).

Let \(k_0=[p/2].\) Under the assumption \(A_5[2k_0]\), we define the quasi-loglikelihood function of \(\mathbf{X}_n\) as

$$\begin{aligned} l_{p,n}(\theta ):=l_{p,n}(\theta ,\mathbf{X}_n):=\sum _{i=1}^n\texttt {l}_{p,i}(\theta ) \end{aligned}$$
(3.11)

where

$$\begin{aligned} \texttt {l}_{p,i}(\theta )&:=\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))^2}{2\Delta _nc_{i-1}(\beta )}\left\{ 1+\sum _{j=1}^{k_0}\Delta _n^j \mathrm {d}_j(\theta ,X_{i-1})\right\} \nonumber \\&\quad \quad +\frac{1}{2}\left\{ \log c_{i-1}(\beta )+\sum _{j=1}^{k_0}\Delta _n^j \mathrm {e}_j(\theta ,X_{i-1})\right\} \end{aligned}$$
(3.12)

and \(\mathrm {d}_j,\) resp. \(\mathrm {e}_j,\) is the coefficient of \(\Delta _n^j\) in the Taylor expansion of \((1+{{\overline{\Gamma }}}_{k_0+1}(\Delta _n,x,\theta ))^{-1},\) resp. \(\log (1+{{\overline{\Gamma }}}_{k_0+1}(\Delta _n,x,\theta )).\) It is not hard to show that, for example,

$$\begin{aligned}&\mathrm {d}_1(\theta ,x)=-\,\mathrm {e}_1(\theta ,x)=-\frac{\gamma _2(\theta ,x)}{c(\beta ,x)},\\&\mathrm {d}_2(\theta ,x)=-\,\mathrm {e}_2(\theta ,x)=\frac{1}{c(\beta ,x)}\left[ \frac{\gamma _2^2(\theta ,x)}{c(\beta ,x)}-\gamma _3(\theta ,x)\right] . \end{aligned}$$

Remark 3.1

It is worth to point out that by assumptions \(A_3\) and \(A_5\) emerge that \(\mathrm {d}_j\) and \(\mathrm {e}_j,\) for all \(j\le k_0,\) are three times differentiable with respect to \(\theta .\) Furthermore, all their derivatives with respect to \(\theta \) are of polynomial growth in x uniformly in \(\theta .\)

The contrast function (3.11) yields to the maximum quasi-likelihood estimator \({{\hat{\theta }}}_{p,n}:=({{\hat{\alpha }}}_{p,n},{{\hat{\beta }}}_{p,n})\) defined as

$$\begin{aligned} l_{p,n}({{\hat{\theta }}}_{p,n})=\inf _{\theta \in \Theta }l_{p,n}(\theta ). \end{aligned}$$
(3.13)

Let \(I(\theta _0)\) be the Fisher information matrix at \(\theta _0\) defined as follows

$$\begin{aligned} I(\theta _0):=\left( \begin{matrix} [I_b^{h,k}(\theta _0)]_{h,k=1,\ldots ,m_1} &{} 0 \\ 0 &{} [I_\sigma ^{h,k}(\theta _0)]_{h,k=1,\ldots ,m_2}\ \\ \end{matrix}\right) , \end{aligned}$$
(3.14)

where

$$\begin{aligned} I_b^{h,k}(\theta _0)&:=\int \left( \frac{\partial _{\alpha _h} b\,\partial _{\alpha _k} b}{c}\right) (\theta _0,x)\pi _0(\mathrm {d}x),\\ I_\sigma ^{h,k}(\theta _0)&:=\frac{1}{2} \int \left( \frac{\partial _{\beta _h} c\,\partial _{\beta _k}c}{c^2}\right) (\beta _0,x)\pi _0(\mathrm {d}x). \end{aligned}$$

We recall an important asymptotic result which will be useful in the proof of our main theorem.

Theorem 1

(Kessler 1997) Let p be an integer and \(k_0=[p/2].\) Under assumptions \(A_1\) to \(A_{4}, A_5[2k_0]\) and \(A_6,\) if \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty ,\) as \(n\rightarrow \infty ,\) the estimator \({{\hat{\theta }}}_{p,n}\) is consistent; i.e.

$$\begin{aligned} {{\hat{\theta }}}_{p,n}\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\theta _0. \end{aligned}$$
(3.15)

If in addition \(n\Delta _n^p\rightarrow 0\) and \(\theta _0\in Int(\Theta )\) then

$$\begin{aligned} \varphi (n)^{-1/2}({{\hat{\theta }}}_{p,n}-\theta _0)= \left( \begin{array}{c} \sqrt{n\Delta _n}({{\hat{\alpha }}}_{p,n}-\alpha _0)\\ \sqrt{n}({{\hat{\beta }}}_{p,n}-\beta _0) \\ \end{array}\right) \overset{d}{\underset{n\rightarrow \infty }{\longrightarrow }}N_{m_1+m_2}(0,I^{-1}(\theta _0)), \end{aligned}$$
(3.16)

where

$$\begin{aligned} \varphi (n):=\left( \begin{array}{cc} \frac{1}{n\Delta _n}I_{m_1}&{} 0 \\ 0&{}\frac{1}{n}I_{m_2} \\ \end{array}\right) . \end{aligned}$$

Remark 3.2

We observe that \(l_{2,n}\) does not coincide with (3.2), because (3.11) contains the terms \(\mathrm {d}_1\) and \(\mathrm {e}_1.\) Nevertheless, \(l_n\) also yields an asymptotical efficient estimator for \(\theta \) and then we refer to it when \(p=2.\)

Remark 3.3

Under the same framework adopted in this paper, alternatively to \({{\hat{\theta }}}_{p,n}\), Kessler (1995) and Uchida and Yoshida (2012) proposed different types of adaptive maximum quasi-likelihood estimators. For instance, in Uchida and Yoshida (2012), the first type of adaptive estimator is introduced starting from the initial estimator \(\tilde{\beta }_{0,n}\) given by \({\mathbb {U}}_n(\tilde{\beta }_{0,n})=\inf _{\beta \in \Theta _\beta }{\mathbb {U}}_n(\beta ),\) where

$$\begin{aligned} \mathbb {U}_n(\beta ):=\frac{1}{2}\sum _{i=1}^n\left\{ \frac{(X_i-X_{i-1})^2}{\Delta _n c_{i-1}(\beta )}+\log c_{i-1}(\beta )\right\} . \end{aligned}$$

For \(p\ge 2, k_0=[p/2]\) and \(l_0=[(p-1)/2],\) the first type adaptive estimator \({{\tilde{\theta }}}_{p,n}=(\tilde{\alpha }_{k_0,n},{{\tilde{\beta }}}_{l_0,n})\) is defined for \(k=1,2,\ldots ,k_0,\) as follows

$$\begin{aligned}&l_{p,n}({{\tilde{\alpha }}}_{k,n},{{\tilde{\beta }}}_{k-1,n})=\inf _{\alpha \in \Theta _\alpha }l_{p,n}(\alpha ,{{\tilde{\beta }}}_{k-1,n}),\\&l_{p,n}({{\tilde{\alpha }}}_{k,n},\tilde{\beta }_{k,n})=\inf _{\beta \in \Theta _\beta }l_{p,n}(\tilde{\alpha }_{k,n},\beta ). \end{aligned}$$

The maximum quasi-likelihood estimator \({{\hat{\theta }}}_{p,n}\) and its adaptive versions, like \({{\tilde{\theta }}}_{p,n},\) are asymptotically equivalent (under a minor change of the initial assumptions); i.e. they have the same properties (3.15) and (3.16) (see Uchida and Yoshida 2012). In what follow we will developed a test based on \({{\hat{\theta }}}_{p,n};\) nevertheless in light of the previous discussion, it would be possible to replace \({{\hat{\theta }}}_{p,n}\) with \({{\tilde{\theta }}}_{p,n}.\)

4 Test statistics

The goal of this section is to introduce a new type of test statistics for the following parametric hypotheses problem

$$\begin{aligned} H_0:\theta =\theta _0,\quad \text {vs} \quad H_1:\theta \ne \theta _0, \end{aligned}$$
(4.1)

concerning the stochastic differential equation (1.1). X is partially observed and therefore we have discrete observations represented by \(\mathbf{X}_n\). The motivation of this research is due to the fact that under non-simple alternative hypotheses do not exist uniformly most powerful parametric tests. Therefore, we need proper procedure for making the right decision concerning statistical hypothesis.

The first step consists in the introduction of a suitable measure regarding the “discrepancy”, or the “distance”, between diffusions belonging to the parametric class (1.1). Furthermore, as recalled in the previous section, for a general stochastic differential equation X,  the true probability transitions from \(X_{i-1}\) to \(X_i\) do not exist in closed form as well as the likelihood function. Suppose known the parameter \(\beta \) and assume observable the sample path up to time \(T=n\Delta _n.\) Let \(Q_\beta \) be the probability law of the process solution to \(\mathrm {d}Y_t=\sigma (\beta ,Y_t)\mathrm {d}W_t.\) The continuous log-likelihood of X is given by

$$\begin{aligned} \log \frac{\mathrm {d}P_\theta }{\mathrm {d}Q_\beta }=\int _0^T\frac{b(\alpha ,X_t)}{c(\beta ,X_t)}\mathrm {d}X_t-\frac{1}{2}\int _0^T\frac{b^2(\alpha ,X_t)}{c(\beta ,X_t)}\mathrm {d}t. \end{aligned}$$

Thus we can consider the (squared) \(L^2(Q_\beta )\)-distance between the log-likelihoods \(\log \frac{\mathrm {d}P_{\theta _1}}{\mathrm {d}Q_\beta }\) and \(\log \frac{\mathrm {d}P_{\theta _2}}{\mathrm {d}Q_\beta }\) with \(\theta _1,\theta _2\in \Theta \); that is

$$\begin{aligned} D(\theta _1,\theta _2):=\left| \left| \log \frac{\mathrm {d}P_{\theta _1}}{\mathrm {d}Q_\beta }-\log \frac{\mathrm {d}P_{\theta _2}}{\mathrm {d}Q_\beta }\right| \right| _{L^2(Q_\beta )}^2=\int \left[ \log \frac{\mathrm {d}P_{\theta _1}}{\mathrm {d}Q_\beta }-\log \frac{\mathrm {d}P_{\theta _2}}{\mathrm {d}Q_\beta }\right] ^2 \mathrm {d}Q_\beta .\qquad \end{aligned}$$
(4.2)

Clearly for testing the hypotheses (4.1) in the framework of discretely observed stochastic differential equations, the distance (4.2) is not useful. Nevertheless, the above \(L^2-\)metric for the continuos observations suggests to consider

$$\begin{aligned} {\mathbb {D}}_{p,n}(\theta _1,\theta _2):=\frac{1}{n}\sum _{i=1}^n[\texttt {l}_{p,i}(\theta _1)-\texttt {l}_{p,i}(\theta _2)]^2,\quad \theta _1,\theta _2\in \Theta , \end{aligned}$$
(4.3)

which can be interpreted as the empirical version of (4.2), where the theoretical log-likelihood is replaced with the quasi-log-likelihood defined by (3.11). The following theorem provides the convergence in probability of \({\mathbb {D}}_{p,n}.\)

Theorem 2

Let p be an integer and \(k_0=[p/2].\) Assume \( A_1- A_{4}, A_5[2k_0]\) and \(A_6.\) Under \(H_0,\) if \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty ,\) as \(n\rightarrow \infty \), we have that

$$\begin{aligned} \mathbb D_{p,n}(\theta ,\theta _0)\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} U(\beta ,\beta _0) \end{aligned}$$

uniformly in \(\theta ,\) where

$$\begin{aligned} U(\beta ,\beta _0)&:=\frac{1}{4}\int \left\{ 3\left[ \frac{c(\beta _0,x)}{c(\beta ,x)}-1\right] ^2+\left[ \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \right] ^2\right. \\&\left. \quad \quad +\,\left[ \frac{c(\beta _0,x)}{c(\beta ,x)}-1\right] \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \right\} \pi _0(\mathrm {d}x). \end{aligned}$$

The above result shows that \({\mathbb {D}}_{p,n}(\theta ,\theta _0)\) is not a true approximation of \(D_{p,n}(\theta ,\theta _0)\) because it does not converge to \(\int \left[ \log (\pi _\theta (\mathrm {d}x)/\pi _0(\mathrm {d}x))\right] ^2\pi _0(\mathrm {d}x).\) Nevertheless, the function (4.3) allows to construct the main object of interest of the paper. Let \({\hat{\theta }_{n}}\) be the maximum quasi-likelihood estimator defined by (3.13), for testing the hypotheses (4.1) we introduce the following class of test statistics

$$\begin{aligned} T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0):=n\mathbb D_{p,n}({{\hat{\theta }}}_{p,n},\theta _0). \end{aligned}$$
(4.4)

The first result concerns the weak convergence of \(T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0).\) We prove that \(T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\) is asymptotically distribution free under \(H_0;\) namely it weakly converges to a chi-squared random variable with \(m_1 + m_2\) degrees of freedom.

Theorem 3

Let p be an integer and \(k_0=[p/2].\) Assume \( A_1- A_{4}, A_5[2k_0]\) and \(A_6.\) Under \(H_0,\) if \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty , n\Delta _n^p\rightarrow 0,\) as \(n\rightarrow \infty \), we have that

$$\begin{aligned} T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\overset{d}{\underset{n\rightarrow \infty }{\longrightarrow }} \chi _{m_1+m_2}^2. \end{aligned}$$
(4.5)

Given the level \(\alpha \in (0,1)\), our criterion suggests to

$$\begin{aligned} \text {reject}\, H_0\, \text {if }\, T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)>\chi ^2_{m_1+m_2,{\alpha }}, \end{aligned}$$

where \(\chi ^2_{m_1+m_2,{\alpha }}\) is the \(1-\alpha \) quantile of the limiting random variable \(\chi _{m_1+m_2}^2\); that is under \(H_0\)

$$\begin{aligned} \lim _{n\rightarrow \infty }P_\theta (T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)>\chi ^2_{m_1+m_2,{\alpha }})=\alpha . \end{aligned}$$

Under \(H_1,\) the power function of the proposed test are equal to the following map

$$\begin{aligned} \theta \mapsto P_\theta \left( T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)>\chi ^2_{m_1+m_2,{\alpha }}\right) \end{aligned}$$

Often a way to judge the quality of sequences of tests is provided by the powers at alternatives that become closer and closer to the null hypothesis. This justify the study of local limiting power. Indeed, usually the power functions of test statistic (4.4) cannot be calculated explicitly. Nevertheless, \( P_\theta \left( T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)>\chi ^2_{m_1+m_2,{\alpha }}\right) \) can be studied and approximated under contiguous alternatives written as

$$\begin{aligned} H_{1,n}:\theta =\theta _0+\varphi (n)^{1/2}h, \end{aligned}$$
(4.6)

where \(h\in \mathbb {R}^{m_1+m_2}\) such that \(\theta _0+\varphi (n)^{1/2}h \in \Theta .\) In order to get a reasonable approximation of the power function, we analyze the asymptotic law of the test statistics under the local alternatives \(H_{1,n}.\) We need the following assumption on the contiguity of probability measures (see Van der Vaart 1998):

\(B_1\) :

\(P_{\theta _0+\varphi (n)^{1/2}h}\) is a sequence of contiguous probability measures with respect to \(P_0;\) i.e. \( \lim _{n\rightarrow \infty } P_{0}(A_n)=0\) implies \(\lim _{n\rightarrow \infty } P_{\theta _0+\varphi (n)^{1/2}h}(A_n)=0\) for every measurable sets \(A_n\).

Remark 4.1

The assumption \(B_1\) holds if we assume \(A_1- A_{4}, A_5[2k_0]\) and the conditions:

  1. (i)

    there exists a constant \(C>0\) such that the following estimates hold

    $$\begin{aligned} |b(\alpha ,x)|\le C(1+|x|),\quad \left| \frac{\partial }{\partial x}b(\alpha ,x)\right| +|\sigma (\beta ,x)|+\left| \frac{\partial }{\partial x}\sigma (\beta ,x)\right| \le C \end{aligned}$$

    for all \((\alpha ,\beta )\in \Theta \) and \(x\in {\mathbb {R}};\)

  2. (ii)

    there exists \(C_0>0\) and \(K>0\) such that

    $$\begin{aligned} b(\alpha ,x)x\le -C_0|x|^2+K \end{aligned}$$

    for all \((\alpha ,x)\in \Theta _\alpha \times {\mathbb {R}};\)

  3. (iii)

    there exists a constant \(C_1>1\) such that

    $$\begin{aligned} \frac{1}{C_1}\le \sigma (\beta ,x)\le C_1, \text { for all } (\beta ,x)\in \Theta _\beta \times \mathbb {R}. \end{aligned}$$

Under the above assumptions, Gobet (2002) proved the Local Asymptotic Normality (LAN) for the likelihood of the ergodic diffusions (1.1); i.e.

$$\begin{aligned} \log \left( \frac{\mathrm {d}P_{\theta _0+\varphi (n)^{1/2 h}}}{\mathrm {d}P_0}(\mathbf{X}_n)\right) \overset{d}{\underset{n\rightarrow \infty }{\longrightarrow }} h' N_{m_1+m_2}(0, I(\theta _0))+\frac{1}{2} h' I(\theta _0)h. \end{aligned}$$

By means of Le Cam’s first lemma (see Van der Vaart 1998), LAN property implies the contiguity of \(P_{\theta _0+\varphi (n)^{1/2}h}\) with respect to \(P_0.\)

Now, we are able to study the asymptotic probability distribution of \(T_{p,n}\) under \(H_{1,n}.\)

Theorem 4

Let p be an integer and \(k_0=[p/2].\) Assume \(A_1- A_{4}, A_5[2k_0], A_6\) and \(B_1\) fulfill. Under the local alternative hypothesis \(H_{1,n},\) if \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty , n\Delta _n^p\rightarrow 0\) as \(n\rightarrow \infty \), the following weak convergence holds

$$\begin{aligned} T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\overset{d}{\underset{n\rightarrow \infty }{\longrightarrow }} \chi _{m_1+m_2}^2(h'I(\theta _0)h), \end{aligned}$$
(4.7)

where the random variable \(\chi ^2_{l+m}(h'I(\theta _0)h)\) is a non-central chi square random variable with \(l+m\) degrees of freedom and non-centrality parameter \(h'I(\theta _0)h\).

Remark 4.2

If we deal with \(H_0 : \theta =\theta _0\) and the local alternative hypothesis \(H_{1,n},\) Theorem 4 leads to the following approximation of the power functions

$$\begin{aligned} P_\theta \left( T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)>\chi ^2_{m_1+m_2,{\alpha }}\right) \cong 1-\mathbf {F}\left( \chi ^2_{m_1+m_2,\alpha }\right) ,\quad n>>1, \end{aligned}$$
(4.8)

where \(\mathbf {F}(\cdot )\) is the cumulative function of the random variable \(\chi ^2_{m_1+m_2}(h'I(\theta _0)h)\).

Remark 4.3

The Generalized Quasi-Likelihood Ratio, Wald, Rao type test statistics have been studied by Kitagawa and Uchida (2014). These test statistics are, respectively, defined as follows

$$\begin{aligned}&L_{p,n}({{\hat{\theta }}}_{p,n},\theta _0):= 2(l_{p,n}({{\hat{\theta }}}_{p,n})-l_{p,n}(\theta _0)) \end{aligned}$$
(4.9)
$$\begin{aligned}&W_{p,n}({{\hat{\theta }}}_{p,n},\theta _0):= (\varphi (n)^{-1/2}({{\hat{\theta }}}_{p,n}-\theta _0))' I_{p,n}({{\hat{\theta }}}_{p,n})\varphi (n)^{-1/2}({{\hat{\theta }}}_{p,n}-\theta _0) \end{aligned}$$
(4.10)
$$\begin{aligned}&R_{p,n}({{\hat{\theta }}}_{p,n},\theta _0):= (\varphi (n)^{1/2}\partial _\theta l_{p,n}(\theta _0))' I_{p,n}^{-1}({{\hat{\theta }}}_{p,n})\varphi (n)^{1/2}\partial _\theta l_{p,n}(\theta _0), \end{aligned}$$
(4.11)

where

$$\begin{aligned} I_{p,n}(\theta )=\left( \begin{array}{cc } \frac{1}{n\Delta _n}\partial _\alpha ^2l_{p,n}(\theta ) &{} \frac{1}{n\sqrt{\Delta _n}}\partial _{\alpha }\partial _{\beta } l_{p,n}(\theta ) \\ \frac{1}{n\sqrt{\Delta _n}}\partial _{\beta }\partial _{\alpha } l_{p,n}(\theta ) &{} \frac{1}{n}\partial _\beta ^2l_{p,n}(\theta ) \end{array}\right) \end{aligned}$$

and \(R_{p,n}\) is well-defined if \( I_{p,n}(\theta )\) is nonsingular. The above test statistics are asymptotically equivalent to \(T_{p,n};\) i.e. under \(H_0,\)\(L_{p,n},W_{p,n}\) and \(R_{p,n}\) weakly converge to a \(\chi ^2\) random variable.

Remark 4.4

In De Gregorio and Iacus (2013), the authors dealt with (for \(p=2\)) test statistics based on an empirical version of the true \(\phi \)-divergences; i.e.

$$\begin{aligned} 2\sum _{i=1}^n\phi \left( \frac{\exp l_n(\theta )}{\exp l_n(\theta _0)}{}\right) \end{aligned}$$
(4.12)

where \(\phi \) represents a suitable convex function and \(l_n\) is given by (3.2). In the present paper, the starting point is represented by the \(L^2\)-distance between two diffusion parametric models. Somehow, the approach developed in this work is close to that developed by Aït-Sahalia (1996), where a test based on the \(L^2\)-distance measure between the density function and its nonparametric estimator is introduced.

Remark 4.5

From a practical point of view, since sometimes \(\alpha = \alpha _0\) and \(\beta = \beta _0\) have different meanings, it is possible to deal with a stepwise procedure. For instance as \(p=2,\) first, we test \(\beta = \beta _0\) by means of

$$\begin{aligned} T_n^\beta ({{\tilde{\beta }}}_{0,n},\beta _0):=\sum _{i=1}^n\left[ \frac{(X_i-X_{i-1})^2}{\Delta _n}\left( \frac{1}{ c_{i-1}({{\tilde{\beta }}}_{0,n})}-\frac{1}{ c_{i-1}(\beta _0)}\right) +\log \left( \frac{c_{i-1}({{\tilde{\beta }}}_{0,n})}{c_{i-1}(\beta _0)}\right) \right] ^2 \end{aligned}$$

and then, in the second step, we test \(\alpha = \alpha _0\) by taking into account

$$\begin{aligned} T_n^\alpha ({{\tilde{\alpha }}}_{1,n},\alpha _0,{{\tilde{\beta }}}_{0,n}):=\sum _{i=1}^n[\texttt {l}_{2,i}({{\tilde{\alpha }}}_{1,n}, {{\tilde{\beta }}}_{0,n})-\texttt {l}_{2,i}(\alpha _0,{{\tilde{\beta }}}_{0,n})]^2, \end{aligned}$$

where \({{\tilde{\alpha }}}_{1,n}\) and \({{\tilde{\beta }}}_{0,n}\) are the adaptive estimators defined in the Remark 3.3.

5 Numerical analysis

Although all test statistics presented in the above and in the literature satisfy the same asymptotic results, for small sample sizes the performance of each test statistic is determined by the statistical model generating the data and the quality of the approximation of the quasi-likelihood function. To put in evidence these effects we consider the two stochastic models presented in Sect. 2, namely the Ornstein-Uhlenbeck (OU in the tables) of Eq. (2.1) and the CIR model of Eq. (2.2). In this numerical study we consider the power of the test under local alternatives for different test statistics:

  • the \(\phi \) divergence of Eq. (4.12) with \(\phi (x) = 1-x+x \log (x)\), which is equivalent to the approximated Kullback–Leibler divergence (see, De Gregorio and Iacus 2013). We use the label AKL in the tables for this approximate Kullback-Leibler measure;

  • the \(\phi \) divergence with \(\phi (x) = \left( \frac{x-1}{x+1}\right) ^2\): this was proposed in Balakrishnan and Sanghvi (1968), we name it BS in the tables;

  • the Generalized Quasi-Likelihood Ratio test with \(p=2\), see e.g., (4.9), denoted as GQLRT in the tables;

  • the Rao test statisticsFootnote 1\(R({{\hat{\theta }}}_{p,n},\theta _0)\) of Eq. (4.11), denoted as RAO in the tables;

  • and the statistic \(T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\) proposed in this paper and defined in Eq. (4.4), with \(p=2\), denoted as \(T_{2,n}\) in the tables.

The sample sizes have been chosen to be equal to \(n=50, 100, 250, 500, 1000\) observations and time horizon is set to \(T=n^\frac{1}{3}\), in order to satisfy the asymptotic theory. For testing \(\theta _0\) against the local alternatives \(\theta _0 + \frac{h}{\sqrt{n\Delta _n}}\) for the parameters in the drift coefficient and \(\theta _0 + \frac{h}{\sqrt{n}}\) for the parameters in the diffusion coefficient, h is taken in a grid from 0 to 1, and \(h=0\) corresponds to the null hypothesis \(H_0\). For the data generating process, we consider the following statistical models

  1. OU

    the one-dimensional Ornstein–Uhlenbeck model solution to \(\mathrm {d}X_t = \alpha _1(\alpha _2- X_t)\mathrm {d}t + \beta _1 \mathrm {d}W_t\), \(X_0=1\), with \(\theta _0=(\alpha _1, \alpha _2, \beta _1) = (0.5, 0.5, 0.25)\);

  2. CIR

    the one-dimensional CIR model solution to \(\mathrm {d}X_t = \alpha _1(\alpha _2- X_t)\mathrm {d}t + \beta _1 \sqrt{X_t} \mathrm {d}W_t\), \(X_0=1\), with \(\theta _0=(\alpha _1, \alpha _2,\beta _1) = (0.5, 0.5, 0.125)\).

In each experiments the process have been simulated at high frequency using the Euler-Maruyama scheme and resampled to obtain \(n=50, 100, 250, 500, 1000\) observations. Remark that, even if the Ornstein-Uhlenbeck process has a Gaussian transition density, this density is different from the Euler-Maruyama Gaussian density for non negligible time mesh \(\Delta _n\) (see, Iacus 2008). For the simulation we used the R package yuima (see, Iacus and Yoshida 2017). Each experiment is replicated 1000 times and from the empirical distribution of each test statistic, say \(S_n\), we define the rejection threshold of the test as \(\tilde{\chi }^2_{3,0.05}\), i.e. \({{\tilde{\chi }}}^2_{3,0.05}\) is the 95% quantile of the empirical distribution of \(S_n,\) that is

$$\begin{aligned} 0.05 = \text {Freq}\left( S_n({\hat{\theta }_{n}}, \theta _0) > \tilde{\chi }^2_{3,0.05}\right) . \end{aligned}$$

Similarly, we define the empirical power function of the test as

$$\begin{aligned} \mathrm{EPow}(h) =\text {Freq}\left( S_n({\hat{\theta }_{n}}, \theta _0+\varphi (n)^{1/2}h) > {{\tilde{\chi }}}^2_{3,0.05}\right) , \end{aligned}$$

where \({{\hat{\theta }}}_n\) is the maximum quasi-likelihood estimator defined in (3.13). The choice of using the empirical threshold \({{\tilde{\chi }}}^2_{3,0.05}\) instead of the theoretical threshold \( \chi ^2_{3,0.05}\) from the \(\chi ^2_3\) distribution, is due to the fact that otherwise the tests are non comparable. Indeed, the empirical level of the test is not 0.05 for small sample sizes when \(\chi ^2_{3,0.05}\) is used as rejection threshold and, for example, when \(h=0\) different choices of the test statistic produce different empirical levels of the test. Tables 1 and 2 contain the empirical power function of each test. In these tables the bold face font is used to put in evidence the test statistics with the highest empirical power function \(\mathrm{EPow}(h)\) for a given local alternative \(h>0\).

From this numerical analysis we can see several facts:

  • the test statistic based on the AKL does not perform as the GQLR test despite they are related to the same divergence; the latter being sometimes better;

  • the \(T_{2,n}\) seems to be (almost) uniformly more powerful in this experiment;

  • all but RAO test seem to have a good behaviour when the alternative is sufficiently large;

  • for the CIR model, the RAO test does not perform well under the alternative hypothesis and this is probably because it requires very large T which, in our case, is at most \(T=10\). For the OU Gaussian case, the performance are better and in line from those presented in Kitagawa and Uchida (2014) for similar sample sizes.

Therefore, we can conclude that, despite all the test statistics share the same asymptotic properties, the proposed \(T_{p,n}\) seems to perform very well in the small sample case examined in the above Monte Carlo experiments, at least for \(p=2\).

Table 1 Empirical power function \(\mathrm{EPow}(h)\), for different sample sizes n and local alternatives h
Table 2 Empirical power function \(\mathrm{EPow}(h)\), for different sample sizes n and local alternatives h

6 Proofs

In order to prove the theorems appearing in the paper, we need some preliminary results. Let us start with the following lemmas.

Lemma 1

For \(k\ge 1\) and \(t_{i-1}^n\le t\le t_i^n\)

$$\begin{aligned} E_0^{i-1}[|X_t-X_{i-1}|^k]\le C_k|t-t_{i-1}^n|^{k/2}(1+|X_{i-1}|)^{C_k}. \end{aligned}$$
(6.1)

If \(f:\Theta \times {\mathbb {R}}\rightarrow {\mathbb {R}}\) is of polynomial growth in x uniformly in \(\theta \) then

$$\begin{aligned} E_0^{i-1}[f(\theta ,X_{t})]\le C_{t-t_{i-1}^n}(1+|X_{i-1}|)^{C},\quad t_{i-1}^n\le t\le t_i^n. \end{aligned}$$
(6.2)

Proof

See the proof of Lemma 6 in Kessler (1997). \(\square \)

Lemma 2

For \(l\ge 1\)

$$\begin{aligned}&r_{l}(\Delta _n,X_{i-1},\theta )=X_{i-1}+\Delta _n b_{i-1}(\alpha )+R(\theta ,\Delta _n^2, X_{i-1})\end{aligned}$$
(6.3)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta ))^2]=\Delta _n c_{i-1}(\beta _0)+R(\theta ,\Delta _n^2, X_{i-1})\end{aligned}$$
(6.4)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta ))^3]=R(\theta ,\Delta _n^2, X_{i-1})\end{aligned}$$
(6.5)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta ))^4]=3\Delta _n^2 c_{i-1}^2(\beta _0)+R(\theta ,\Delta _n^3, X_{i-1})\end{aligned}$$
(6.6)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta ))^5]=R(\theta ,\Delta _n^3, X_{i-1})\end{aligned}$$
(6.7)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta ))^6]=5\cdot 3\Delta _n^3 c_{i-1}^3(\beta _0)+R(\theta ,\Delta _n^4, X_{i-1})\end{aligned}$$
(6.8)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta ))^7]=R(\theta ,\Delta _n^4, X_{i-1})\end{aligned}$$
(6.9)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta ))^8]=7\cdot 5\cdot 3\Delta _n^4 c_{i-1}^4(\beta _0)+R(\theta ,\Delta _n^5, X_{i-1}) \end{aligned}$$
(6.10)

Proof

The equalities from (6.3) to (6.6) represent the statement of Lemma 7 in Kessler (1997). By using the same approach adopted for the proof of the aforementioned lemma, we observe that from (6.3) to (6.6), the result (6.7) and (6.8) hold, if we are able to show that

$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-X_{i-1})^5]=R(\theta ,\Delta _n^3, X_{i-1})\end{aligned}$$
(6.11)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-X_{i-1})^6]=5\cdot 3\Delta _n^3 c_{i-1}^3(\beta _0)+R(\theta ,\Delta _n^4, X_{i-1}) \end{aligned}$$
(6.12)

We only prove (6.12), because (6.11) follows by means of similar arguments. By applying the Itô-Taylor formula [see Lemma 1, in Florens-Zmirou (1989)] to the function \(f_x(y)=(y-x)^6\) we obtain

$$\begin{aligned} E_0^{i-1}\big [(X_i-X_{i-1})^6\big ]&=f_{X_{i-1}}(X_{i-1})+\Delta _n L_0f_{X_{i-1}}(X_{i-1})\\&\quad +\frac{\Delta _n^2}{2}L_0^2f_{X_{i-1}}(X_{i-1})+\frac{\Delta _n^3}{3!}L_0^3f_{X_{i-1}}(X_{i-1})\\&\quad + \int _0^{\Delta _n}\int _0^{u_1}\int _0^{u_2}\int _0^{u_3}E_0^{i-1}\big [L_0^4 f_{X_{i-1}}(X_{t_{i-1}^n+u_4})\big ]\mathrm {d}u_1\mathrm {d}u_2\mathrm {d}u_3\mathrm {d}u_4. \end{aligned}$$

By applying (6.2), we obtain

$$\begin{aligned} \int _0^{\Delta _n}\int _0^{u_1}\int _0^{u_2}\int _0^{u_3}E_0^{i-1}\big [L_0^4 f_{X_{i-1}}(X_{t_{i-1}^n+u_4})\big ]\mathrm {d}u_1\mathrm {d}u_2\mathrm {d}u_3\mathrm {d}u_4=R(\theta ,\Delta _n^4, X_{i-1}). \end{aligned}$$

Furthermore, by means of long and cumbersome calculations, we can show that \(f_x(x)=L_0f_{x}(x)=L_0^{2}f_{x}(x)=0\), while \(L_0^{3}f_x(x)=5\cdot 3\cdot 3! c_{i-1}^3(\beta _0).\)

Analogously to what done, from (6.3) to (6.8), the equalities (6.9) and (6.10) hold, if we are able to show that

$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-X_{i-1})^7]=R(\theta ,\Delta _n^4, X_{i-1}), \end{aligned}$$
(6.13)
$$\begin{aligned}&E_{0}^{i-1}[(X_{i}-X_{i-1})^8]=7\cdot 5\cdot 3\Delta _n^4 c_{i-1}^4(\beta _0)+R(\theta ,\Delta _n^5, X_{i-1}). \end{aligned}$$
(6.14)

We only prove (6.14), because (6.13) follows by means of similar arguments. The application of the Itô-Taylor formula to the function \(f_x(y)=(y-x)^8\) yields

$$\begin{aligned} E_0^{i-1}\big [(X_i-X_{i-1})^8\big ]&=f_{X_{i-1}}(X_{i-1})+\Delta _n L_0f_{X_{i-1}}(X_{i-1})+\frac{\Delta _n^2}{2}L_0^2f_{X_{i-1}}(X_{i-1})\\&\quad +\frac{\Delta _n^3}{3!}L_0^3f_{X_{i-1}}(X_{i-1})+\frac{\Delta _n^4}{4!}L_0^4f_{X_{i-1}}(X_{i-1})\\&\quad + \int _0^{\Delta _n}\int _0^{u_1}\int _0^{u_2}\int _0^{u_3}\int _0^{u_4}E_0^{i-1}\big [L_0^5 f_{X_{i-1}}(X_{t_{i-1}^n+u_5})\big ]\mathrm {d}u_1\mathrm {d}u_2\mathrm {d}u_3\mathrm {d}u_4\mathrm {d}u_5 \end{aligned}$$

By applying (6.2), we get

$$\begin{aligned} \int _0^{\Delta _n}\int _0^{u_1}\int _0^{u_2}\int _0^{u_3}\int _0^{u_4}E_0^{i-1}\big [L_0^5 f_{X_{i-1}}(X_{t_{i-1}^n+u_5})\big ]\mathrm {d}u_1\mathrm {d}u_2\mathrm {d}u_3\mathrm {d}u_4\mathrm {d}u_5=R(\theta ,\Delta _n^5, X_{i-1}). \end{aligned}$$

Furthermore, by means of long and cumbersome calculations, we can show that \(f_x(x)=L_0f_x(x)=L_0^2f_x(x)=L_0^3f_x(x)=0\) while \(L_0^4f_x(x)=7\cdot 5\cdot 3\cdot 4! c^4_{i-1}(\beta _0).\)\(\square \)

Lemma 3

(Triangular arrays convegence) Let \(U_i^n\) and U be random variables, with \(U_i^n\) being \(\mathcal {G}_{i}^n\)-measurable. The two following conditions imply \(\sum _{i=1}^nU_i^n\overset{P}{\underset{n\rightarrow \infty }{\longrightarrow }} U\):

$$\begin{aligned}&\sum _{i=1}^nE[U_i^n|\mathcal {G}_{i-1}^n]\overset{P}{\underset{n\rightarrow \infty }{\longrightarrow }} U,\quad \sum _{i=1}^nE[(U_i^n)^2|\mathcal {G}_{i-1}^n]\overset{P}{\underset{n\rightarrow \infty }{\longrightarrow }} 0 \end{aligned}$$

Proof

See the proof of Lemma 9 in Genon-Catalot and Jacod (1993). \(\square \)

Lemma 4

Let \(f:\Theta \times \mathbb {R}\rightarrow \mathbb {R}\) be such that \(f(\theta ,x)\in C_{\uparrow }^{1,1}(\Theta \times \mathbb {R},\mathbb {R}).\) Let us assume \(A_1-A_6\), if \(\Delta _n\rightarrow 0\) and \(n\Delta _n\rightarrow \infty \) we have that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n f_{i-1}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} \int f(x,\theta )\pi _{0}(\mathrm {d}x) \end{aligned}$$

uniformly in \(\theta \).

Proof

See the proof of Lemma 8 in Kessler (1997). \(\square \)

Lemma 5

Let \(f:\Theta \times \mathbb {R}\rightarrow \mathbb {R}\) be such that \(f(\theta ,x)\in C_{\uparrow }^{1,1}(\Theta \times \mathbb {R},\mathbb {R}).\) Let us assume \(A_1-A_6\), if \(\Delta _n\rightarrow 0\) and \(n\Delta _n\rightarrow \infty ,\) as \(n\rightarrow \infty ,\) we have that

$$\begin{aligned} \frac{1}{n\Delta _n^j}\sum _{i=1}^n f_{i-1}(\theta )(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta _0))^k\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} {\left\{ \begin{array}{ll}0,&{} j=1,k=1,\\ \int f(\theta ,x)c(\beta _0,x)\pi _{0}(\mathrm {d}x),&{} j=1,k=2,\\ \int f(\theta ,x)R(\theta ,1,x)\pi _0(\mathrm {d}x),&{} j=2,k=3,\\ 0,&{} j=1,k=4,\\ 3\int f(\theta ,x)c^2(\beta _0,x)\pi _0(\mathrm {d}x),&{} j=2,k=4, \end{array}\right. } \end{aligned}$$

uniformly in \(\theta \).

Proof

The cases \(j=1,k=1\) and \(j=1,k=2\) coincide with Lemma 9 and Lemma 10 in Kessler (1997) and then we use the same approach to show that remaining convergences hold true.

By setting

$$\begin{aligned} \zeta _i^n(\theta ):=\frac{1}{n\Delta _n^2}f_{i-1}(\theta )(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta _0))^3, \end{aligned}$$

we prove that the convergence holds for all \(\theta .\) By taking into account Lemma 2 and Lemma 4

$$\begin{aligned}&E_0^{i-1}\big [\zeta _i^n(\theta )\big ]=\frac{1}{n}\sum _{i=1}^nf_{i-1}(\theta )R(\theta ,1, X_{i-1})\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\int f(\theta ,x)R(\theta ,1,x)\pi _0(\mathrm {d}x),\\&E_0^{i-1}\big [\big (\zeta _i^n(\theta )\big )^2\big ]=\frac{1}{n^2\Delta _n}\sum _{i=1}^n[5\cdot 3 c_{i-1}^3(\beta _0)+R(\theta ,1, X_{i-1})]\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0. \end{aligned}$$

Therefore by Lemma 3 we can conclude that

$$\begin{aligned} \zeta _i^n(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\int f(\theta ,x)R(\theta ,1,x)\pi _0(\mathrm {d}x), \end{aligned}$$

for all \(\theta .\) For the uniformity it is sufficient to prove the tightness of the sequence of random elements

$$\begin{aligned} Y_n(\theta ):=\frac{1}{n}\sum _{i=1}^n\frac{f_{i-1}(\theta )(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta _0))^3}{\Delta _n^2} \end{aligned}$$

taking values in the Banach space \(C(\Theta )\) endowed with the sup-norm \(||\cdot ||_\infty .\) From the assumptions of lemma follows that \(\sup _nE_0[\sup _{\theta \in \Theta }|\partial _\theta Y_n(\theta )|]<\infty \) which implies the tightness of \(Y_n(\theta )\) for the criterion given by Theorem 16.5 in Kallenberg (2001).

By setting

$$\begin{aligned} \zeta _i^n(\theta ):=\frac{1}{n\Delta _n^2}f_{i-1}(\theta )(X_{i}-r_{l}(\Delta _n,X_{i-1},\theta _0))^4, \end{aligned}$$

we prove that the convergence holds for all \(\theta .\) By taking into account Lemmas 2 and 4

$$\begin{aligned}&E_0^{i-1}\big [\zeta _i^n(\theta )\big ]=\frac{1}{n}\sum _{i=1}^nf_{i-1}(\theta )\big [3 c_{i-1}^2(\beta _0)+R(\theta ,\Delta _n, X_{i-1})\big ]\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\\&\qquad \qquad \qquad \qquad 3\int f(\theta ,x)c^2(\beta _0,x)\pi _0(\mathrm {d}x),\\&E_0^{i-1}\big [\big (\zeta _i^n(\theta )\big )^2\big ]=\frac{1}{n^2}\sum _{i=1}^n\big [7\cdot 5\cdot 3 c_{i-1}^4(\beta _0)+R(\theta ,\Delta _n, X_{i-1})\big ]\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0. \end{aligned}$$

Therefore by Lemma 3 we get the pointwise convergence. For the uniformity of the convergence we proceed as done above. \(\square \)

Before to proceed with the proofs of the main theorems of the paper, we introduce some useful quantities coinciding with (4.2)−(4.8) appearing in Kessler (1997). We can write down

$$\begin{aligned} \texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)=\varphi _{i,1}(\theta ,\theta _0)+\varphi _{i,2}(\theta ,\theta _0)+\varphi _{i,3}(\theta ,\theta _0)+\varphi _{i,4}(\theta ,\theta _0), \end{aligned}$$
(6.15)

where

$$\begin{aligned} \varphi _{i,1}(\theta ,\theta _0)&:=\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))^2}{2\Delta _n}\left\{ \frac{1+ \sum _{j=1}^{k_0}\Delta _n^j \mathrm {d}_j(\theta ,X_{i-1})}{c_{i-1}(\beta )}\right. \\&\left. \quad -\frac{1+ \sum _{j=1}^{k_0}\Delta _n^j \mathrm {d}_j(\theta _0,X_{i-1})}{c_{i-1}(\beta _0)}\right\} ,\\ \varphi _{i,2}(\theta ,\theta _0)&:=\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))(r_{k_0}(\Delta _n,X_{i-1},\theta _0)-r_{k_0}(\Delta _n,X_{i-1},\theta ))}{\Delta _nc_{i-1}(\beta )}\\&\quad \times \left\{ 1+ \sum _{j=1}^{k_0}\Delta _n^j \mathrm {d}_j(\theta ,X_{i-1})\right\} ,\\ \varphi _{i,3}(\theta ,\theta _0)&:=\frac{(r_{k_0}(\Delta _n,X_{i-1},\theta _0)-r_{k_0}(\Delta _n,X_{i-1},\theta ))^2}{2\Delta _nc_{i-1}(\beta )}\left\{ 1+ \sum _{j=1}^{k_0}\Delta _n^j \mathrm {d}_j(\theta ,X_{i-1})\right\} ,\\ \varphi _{i,4}(\theta ,\theta _0)&:=\frac{1}{2}\log \left( \frac{c_{i-1}(\beta )}{c_{i-1}(\beta _0)}\right) + \frac{1}{2}\sum _{j=1}^{k_0}\Delta _n^j (\mathrm {e}_j(\theta ,X_{i-1})-\mathrm {e}_j(\theta _0,X_{i-1})). \end{aligned}$$

Furthermore

$$\begin{aligned} \partial _{\alpha _h} \texttt {l}_{p,i}(\theta )=\eta _{i,1}^h(\theta )+\eta _{i,2}^h(\theta ), \quad h=1,2,\ldots ,m_1, \end{aligned}$$
(6.16)

where

$$\begin{aligned}&\eta _{i,1}^h(\theta ):=-(\partial _{\alpha _h} r_{k_0}(\Delta _n,X_{i-1},\theta ))(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))\frac{\left\{ 1+\sum _{j=1}^{k_0}\Delta _n^j \mathrm {d}_j(\theta ,X_{i-1})\right\} }{\Delta _nc_{i-1}(\beta )},\\&\eta _{i,2}^h(\theta ):=(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))^2\frac{\sum _{j=1}^{k_0}\Delta _n^j \partial _{\alpha _h}\mathrm {d}_j(\theta ,X_{i-1})}{2\Delta _nc_{i-1}(\beta )}+\frac{1}{2}\sum _{j=1}^{k_0}\Delta _n^j \partial _{\alpha _h}\mathrm {e}_j(\theta ,X_{i-1}), \end{aligned}$$

and

$$\begin{aligned} \partial _{\beta _k}{} \texttt {l}_{p,i}(\theta )=\xi _{i,1}^k(\theta )+\xi _{i,2}^k(\theta )+\xi _{i,3}^k(\theta ), \quad k=1,2,\ldots ,m_2, \end{aligned}$$
(6.17)

where

$$\begin{aligned}&\xi _{i,1}^k(\theta ):=\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))^2}{2\Delta _nc_{i-1}(\beta )}\left\{ \sum _{j=1}^{k_0}\Delta _n^j \partial _{\beta _k}\mathrm {d}_j(\theta ,X_{i-1})\right\} +\frac{1}{2}\sum _{j=1}^{k_0}\Delta _n^j \partial _{\beta _k}\mathrm {e}_j(\theta ,X_{i-1}),\\&\xi _{i,2}^k(\theta ):=-\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))^2\partial _{\beta _k} c_{i-1}(\beta )}{2\Delta _nc_{i-1}^2(\beta )}\left\{ 1+\sum _{j=1}^{k_0}\Delta _n^j \mathrm {d}_j(\theta ,X_{i-1})\right\} +\frac{\partial _{\beta _k} c_{i-1}(\beta )}{2c_{i-1}(\beta )},\\&\xi _{i,3}^k(\theta ):=-(\partial _{\beta _k} r_{k_0}(\Delta _n,X_{i-1},\theta ))(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))\frac{\left\{ 1+\sum _{j=1}^{k_0}\Delta _n^j \mathrm {d}_j(\theta ,X_{i-1})\right\} }{\Delta _nc_{i-1}(\beta )}. \end{aligned}$$

From (6.15) it is possible to derive

$$\begin{aligned} \partial _{\alpha _h\alpha _k}^2\texttt {l}_{p,i}(\theta ):=\delta _{i,1}^{h,k}(\theta )+\delta _{i,2}^{h,k}(\theta )+\delta _{i,3}^{h,k}(\theta )+\delta _{i,4}^{h,k}(\theta ),\quad h,k=1,2,\ldots ,m_1,\end{aligned}$$
(6.18)

where

$$\begin{aligned} \delta _{i,1}^{h,k}(\theta )&:=\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))^2}{2c_{i-1}(\beta )}\Big \{\Big (\partial _{\alpha _h\alpha _k}^2 \mathrm {d}_1\Big )_{i-1}(\theta )+R(\theta ,\Delta _n, X_{i-1})\Big \},\nonumber \\ \delta _{i,2}^{h,k}(\theta )&:= \frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))}{c_{i-1}(\beta )}\Big \{-\partial _{\alpha _h\alpha _k}^2 b_{i-1}(\alpha )+R(\theta ,\Delta _n, X_{i-1})\Big \},\nonumber \\ \delta _{i,3}^{h,k}(\theta )&:=\frac{1}{2}\Delta _n\partial _{\alpha _h\alpha _k}^2\mathrm {e}_1(\theta ,X_{i-1}),\nonumber \\ \delta _{i,4}^{h,k}(\theta )&:=\Delta _n\left\{ \frac{\partial _{\alpha _h\alpha _k}^2 b_{i-1}(\alpha )(b_{i-1}(\alpha )-b_{i-1}(\alpha _0))+\partial _{\alpha _h} b_{i-1}(\alpha )\partial _{\alpha _k} b_{i-1}(\alpha )}{c_{i-1}(\beta )}\right. \nonumber \\&\left. \quad +R(\theta ,\Delta _n, X_{i-1})\right\} ,\nonumber \\ \partial _{\beta _h\beta _k}^2\texttt {l}_{p,i}(\theta )&:=\nu _{i,1}^{h,k}(\theta )+\nu _{i,2}^{h,k}(\theta )+\nu _{i,3}^{h,k}(\theta ), \quad h,k=1,2,\ldots ,m_2, \end{aligned}$$
(6.19)

where

$$\begin{aligned} \nu _{i,1}^{h,k}(\theta )&:=\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))^2}{2\Delta _n}\Big \{\Big (\partial _{\beta _h\beta _k}^2 c^{-1}\Big )_{i-1}(\beta )+R(\theta ,\Delta _n, X_{i-1})\Big \},\\ \nu _{i,2}^{h,k}(\theta )&:= \frac{1}{2}(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))R(\theta ,1, X_{i-1})),\\ \nu _{i,3}^{h,k}(\theta )&:=\frac{1}{2}(\partial _{\beta _h\beta _k}^2\log c)_{i-1}(\beta )+R(\theta ,\Delta _n, X_{i-1})), \end{aligned}$$

and

$$\begin{aligned} \partial _{\alpha _h\beta _k}^2\texttt {l}_{p,i}(\theta ):=\mu _{i,1}(\theta )+\mu _{i,2}(\theta ),\quad h=1,2,\ldots ,m_1, k=1,2,\ldots ,m_2, \end{aligned}$$
(6.20)

where

$$\begin{aligned} \mu _{i,1}(\theta )&:=\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))^2}{2\Delta _n}R(\theta ,\Delta _n, X_{i-1}),\\ \mu _{i,2}(\theta )&:= \frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))}{\Delta _n}R(\theta ,\Delta _n, X_{i-1})+R(\theta ,\Delta _n, X_{i-1}). \end{aligned}$$

Proof of Theorem 2

We observe that

$$\begin{aligned} {\mathbb {D}}_{p,n}(\theta ,\theta _0)=\frac{1}{n}\sum _{i=1}^n\left\{ \sum _{k=1}^4(\varphi _{i,k}(\theta ,\theta _0))^2+2\sum _{j<k})\varphi _{i,j}(\theta ,\theta _0)\varphi _{i,k}(\theta ,\theta _0)\right\} .\\ \end{aligned}$$

Under \(H_0,\) from Lemmas 2 and 5, we derive

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n(\varphi _{i,1}(\theta ,\theta _0))^2=\frac{1}{n}\sum _{i=1}^n\left[ \frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))^4}{4\Delta _n^2}\right. \\&\left. \qquad \qquad \qquad \qquad \qquad \quad \left\{ \frac{1}{c_{i-1}(\beta )}-\frac{1}{c_{i-1}(\beta _0)}+R(\theta ,\Delta _n,X_{i-1})\right\} ^2\right] \\&\qquad \qquad \qquad \qquad \qquad =\frac{1}{n}\sum _{i=1}^n\left[ \frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))^4}{4\Delta _n^2}\left\{ \frac{1}{c_{i-1}(\beta )}-\frac{1}{c_{i-1}(\beta _0)}\right\} ^2\right] \\&\qquad \qquad \qquad \qquad \qquad \quad +\mathbf {o}_{P_0}(1)\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\frac{3}{4}\int c^2(\beta _0,x)\left\{ \frac{1}{c(\beta ,x)}-\frac{1}{c(\beta _0,x)}\right\} ^2\pi _0(\mathrm {d}x)\\&\frac{1}{n}\sum _{i=1}^n(\varphi _{i,2}(\theta ,\theta _0))^2=\frac{1}{n}\sum _{i=1}^n\left[ \frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))^2}{c_{i-1}^2(\beta _0)}[b_{i-1}(\alpha _0)-b_{i-1}(\alpha )]^2\right] +\mathbf {o}_{P_0}(1)\\&\qquad \qquad \qquad \qquad \qquad \quad \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0\\&\frac{1}{n}\sum _{i=1}^n(\varphi _{i,3}(\theta ,\theta _0))^2=\frac{1}{n}\sum _{i=1}^n\left[ \frac{\Delta _n^2[b_{i-1}(\alpha _0)-b_{i-1}(\alpha )]^4}{4c_{i-1}^2(\beta )}\right] +\mathbf {o}_{P_0}(1)\\&\qquad \qquad \qquad \qquad \qquad \quad \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0\\&\frac{1}{n}\sum _{i=1}^n(\varphi _{i,4}(\theta ,\theta _0))^2=\frac{1}{n}\sum _{i=1}^n\frac{1}{4}\left[ \log \left( \frac{c_{i-1}(\beta )}{c_{i-1}(\beta _0)}\right) \right] ^2+\mathbf {o}_{P_0}(1)\\&\qquad \qquad \qquad \qquad \qquad \quad \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\frac{1}{4}\int \left[ \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \right] ^2\pi _0(\mathrm {d}x)\\&\frac{1}{n}\sum _{i=1}^n\varphi _{i,1}(\theta ,\theta _0)\varphi _{i,4}(\theta ,\theta _0)=\frac{1}{n}\sum _{i=1}^n\frac{(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta _0))^2}{4\Delta _n}\left\{ \frac{1}{c_{i-1}(\beta )}-\frac{1}{c_{i-1}(\beta _0)}\right\} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \times \log \left( \frac{c_{i-1}(\beta )}{c_{i-1}(\beta _0)}\right) +\mathbf {o}_{P_0}(1)\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\frac{1}{4}\int c(\beta _0,x)\left\{ \frac{1}{c(\beta ,x)}-\frac{1}{c(\beta _0,x)}\right\} \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \pi _0(\mathrm {d}x)\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \frac{1}{n}\sum _{i=1}^n\varphi _{i,1}(\theta ,\theta _0)\varphi _{i,j}(\theta ,\theta _0)\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0,\quad j=2,3,\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \frac{1}{n}\sum _{i=1}^n\varphi _{i,2}(\theta ,\theta _0)\varphi _{i,j}(\theta ,\theta _0)\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0,\quad j=3,4,\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \frac{1}{n}\sum _{i=1}^n\varphi _{i,3}(\theta ,\theta _0)\varphi _{i,4}(\theta ,\theta _0)\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0, \end{aligned}$$

uniformly in \(\theta .\) Thus the statement of the theorem immediately follows. \(\square \)

Let

$$\begin{aligned} C_{p,n}(\theta ,\theta _0):= \left( \begin{array}{cc} \frac{1}{n\Delta _n}[\partial ^2_{\alpha _h\alpha _k} T_{p,n}(\theta ,\theta _0) ]_{\begin{array}{c} h=1,\ldots ,m_1\\ k=1,\ldots ,m_1 \end{array}} &{}\frac{1}{n\sqrt{\Delta _n}}[\partial ^2_{\alpha _h\beta _k} T_{p,n}(\theta ,\theta _0) ]_{\begin{array}{c} h=1,\ldots ,m_1 \\ k=1,\ldots ,m_2 \end{array}} \\ \frac{1}{n\sqrt{\Delta _n}}[\partial ^2_{\alpha _h\beta _k} T_{p,n}(\theta ,\theta _0)]_{\begin{array}{c} h=1,\ldots ,m_1 \\ k=1,\ldots ,m_2 \end{array}} &{} \frac{1}{n} [\partial ^2_{\beta _h\beta _k} T_{p,n}(\theta ,\theta _0) ]_{\begin{array}{c} h=1,\ldots ,m_2 \\ k=1,\ldots ,m_2 \end{array}} \\ \end{array}\right) \end{aligned}$$
(6.21)

where

$$\begin{aligned} \partial ^2_{\alpha _h\alpha _k} T_{p,n}(\theta ,\theta _0)&=2\sum _{i=1}^n\left\{ \partial _{\alpha _h}{} \texttt {l}_{p,i}(\theta )\partial _{\alpha _k}{} \texttt {l}_{p,i}(\theta )+[\texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)]\partial _{\alpha _h\alpha _k}^2\texttt {l}_{p,i}(\theta )\right\} ,\end{aligned}$$
(6.22)
$$\begin{aligned} \partial ^2_{\beta _h\beta _k} T_{p,n}(\theta ,\theta _0)&=2\sum _{i=1}^n\left\{ \partial _{\beta _h}{} \texttt {l}_{p,i}(\theta )\partial _{\beta _k}{} \texttt {l}_{p,i}(\theta )+[\texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)]\partial _{\beta _h\beta _k}^2\texttt {l}_{p,i}(\theta )\right\} ,\end{aligned}$$
(6.23)
$$\begin{aligned} \partial ^2_{\alpha _h\beta _k} T_{p,n}(\theta ,\theta _0)&=2\sum _{i=1}^n\left\{ \partial _{\alpha _h}{} \texttt {l}_{p,i}(\theta )\partial _{\beta _k}{} \texttt {l}_{p,i}(\theta )+[\texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)]\partial _{\alpha _h\beta _k}^2\texttt {l}_{p,i}(\theta )\right\} . \end{aligned}$$
(6.24)

The following proposition concerning the asymptotic behavior of \(C_{p,n}(\theta ,\theta _0)\) plays a crucial role in the proof of Theorem 3.

Proposition 1

Under \(H_0,\) assume \( A_1- A_6\) and \(\Delta _n\rightarrow 0,n\Delta _n\rightarrow \infty ,\) as \(n\rightarrow \infty \). The following convergences hold

$$\begin{aligned} C_{p,n}(\theta _0,\theta _0)\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 2I(\theta _0) \end{aligned}$$
(6.25)

and

$$\begin{aligned} \sup _{||\theta ||\le \varepsilon _n}||C_{p,n}(\theta _0+\theta ,\theta _0)-C_{p,n}(\theta _0,\theta _0)||\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0,\quad \varepsilon _n\rightarrow 0. \end{aligned}$$
(6.26)

Proof of Proposition 1

We study the uniform convergence in probability of \(C_{p,n}(\theta ,\theta _0).\) Thus we prove that uniformly in \(\theta \)

$$\begin{aligned} C_{p,n}(\theta ,\theta _0)\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 2K(\theta ,\theta _0):= 2\left( \begin{matrix} K_{1}(\theta ,\theta _0)+K_{2}(\theta ,\theta _0)&{} 0\\ 0&{} K_{3}(\theta ,\theta _0)+K_{4}(\theta ,\theta _0) \end{matrix} \right) \nonumber \\ \end{aligned}$$
(6.27)

where

$$\begin{aligned} K_1(\theta ,\theta _0)&:=\int \frac{\partial _{\alpha _h}b(\alpha ,x)\partial _{\alpha _k} b(\alpha ,x)}{c^2(\beta ,x)}c(\beta _0,x)\pi _0(\mathrm {d}x), \\ K_2(\theta ,\theta _0)&:=\frac{1}{4}\int \partial _{\alpha _h\alpha _k}^2\mathrm {d}_1(x,\theta )\left[ \frac{c(\beta _0,x)}{c(\beta ,x)}-1\right] \left[ 3\frac{c(\beta _0,x)}{c(\beta ,x)}+\log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) -1\right] \pi _0(\mathrm {d}x)\\&\quad \quad + \frac{1}{2}\int \left[ \frac{\partial _{\alpha _h\alpha _k}^2b(\alpha ,x)(b(\alpha ,x)-b(\alpha _0,x))+\partial _{\alpha _h} b(\alpha ,x)\partial _{\alpha _k} b(\alpha ,x)}{c(\beta ,x)}\right] \\&\quad \quad \times \left[ \frac{c(\beta _0,x)}{c(\beta ,x)}-1+\log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \right] \pi _0(\mathrm {d}x)\\&\quad \quad +\int \frac{-\partial _{\alpha _h\alpha _k}^2 b(\alpha ,x)}{c(\beta ,x)}\\&\quad \quad \times \left[ \frac{1}{2}\left( \frac{1}{c(\beta _0,x)}-\frac{1}{c(\beta ,x)}\right) R(\theta ,1,x)+\frac{c(\beta _0,x)}{c^2(\beta ,x)}(b(\alpha ,x)-b(\alpha _0,x))\right] \pi _0(\mathrm {d}x)\\ K_3(\theta ,\theta _0)&:=\frac{1}{2}\int \left\{ \frac{c(\beta _0,x)\partial _{\beta _h} c(\beta ,x)\partial _{\beta _k} c(\beta ,x)}{c^3(\beta ,x)}\left[ \frac{3}{2} \frac{c(\beta _0,x)}{c(\beta ,x)}-1\right] \right. \\&\left. \quad \quad +\frac{1}{2} \frac{\partial _{\beta _h} c(\beta ,x)\partial _{\beta _k} c(\beta ,x)}{c^2(\beta ,x)}\right\} \pi _0(\mathrm {d}x), \\ K_4(\theta ,\theta _0)&:=\frac{1}{4}\int c(\beta _0,x)\partial _{\beta _h\beta _k}^2\log c(\beta ,x)\left[ \frac{1}{c(\beta ,x)}-\frac{1}{c(\beta _0,x)}\right] \pi _0(\mathrm {d}x)\\&\quad \quad +\frac{1}{4} \int \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) {c(\beta _0,x)}\partial _{\beta _h\beta _k}^2 c^{-1}(\beta ,x)\pi _0(\mathrm {d}x)\\&\quad \quad +\frac{1}{4} \int \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \partial _{\beta _h\beta _k}^2 \log c(\beta ,x)\pi _0(\mathrm {d}x). \end{aligned}$$

Let us start with the analysis of the quantity \(\frac{1}{n\Delta _n}\partial _{\alpha _h\alpha _k}^2 T_{p,n}(\theta ,\theta _0)\) given by (6.22) which can be split in two terms. From (6.16) folllows that

$$\begin{aligned} \frac{1}{n\Delta _n}\sum _{i=1}^n\partial _{\alpha _h}{} \texttt {l}_{p,i}(\theta )\partial _{\alpha _k}{} \texttt {l}_{p,i}(\theta )=\frac{1}{n\Delta _n}\sum _{i=1}^n(\eta _{i,1}^h(\theta )+\eta _{i,2}^h(\theta ))(\eta _{i,1}^k(\theta )+\eta _{i,2}^k(\theta )) \end{aligned}$$

for each \(\theta \in \Theta .\) Since \(\partial _{\alpha _h} r_{k_0}(\Delta _n,X_{i-1},\theta )=\Delta _n\partial _{\alpha _h} b_{i-1}(\alpha )+R(\theta ,\Delta _n^2, X_{i-1}),\) by taking into account Lemma 5, we get

$$\begin{aligned} \frac{1}{n\Delta _n}\sum _{i=1}^n\partial _{\alpha _h}{} \texttt {l}_{p,i}(\theta )\partial _{\alpha _k}{} \texttt {l}_{p,i}(\theta )&=\frac{1}{n\Delta _n}\sum _{i=1}^n\eta _{i,1}^h(\theta )\eta _{i,1}^k(\theta )+\mathbf {o}_{P_0}(1)\nonumber \\&=\frac{1}{n\Delta _n}\sum _{i=1}^n\frac{\partial _{\alpha _h} b_{i-1}(\alpha )\partial _{\alpha _k} b_{i-1}(\alpha )}{c_{i-1}^2(\beta )}\nonumber \\&\quad (X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))^2+\mathbf {o}_{P_0}(1)\nonumber \\&\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}K_1(\theta ,\theta _0) \end{aligned}$$
(6.28)

uniformly in \(\theta .\) Now, by resorting (6.15) and (6.18), we rewrite the second term appearing in (6.22) as follows

$$\begin{aligned} \frac{1}{n\Delta _n}\sum _{i=1}^n[\texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)]\partial _{\alpha _h\alpha _k}^2\texttt {l}_{p,i}(\theta )=\frac{1}{n\Delta _n}\sum _{i=1}^n\left[ \sum _{l=1}^{4}\sum _{j=1}^{4} \varphi _{i,l}(\theta ,\theta _0)\delta _{i,j}^{h,k}(\theta )\right] . \end{aligned}$$

By applying Lemmas 1 and 5, the following convergence results hold

$$\begin{aligned}&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,1}(\theta ,\theta _0)\delta _{i,1}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} \frac{3}{4}\int \partial _{\alpha _h\alpha _k}^2\mathrm {d}_1(\theta ,x)\frac{c^2(\beta _0,x)}{c(\beta ,x)}\\&\quad \left[ \frac{1}{c(\beta ,x)}-\frac{1}{c(\beta _0,x)}\right] \pi _0(\mathrm {d}x), \\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,1}(\theta ,\theta _0)\delta _{i,2}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} \frac{1}{2}\int \frac{-\partial _{\alpha _h\alpha _k}^2b(\alpha ,x)}{c(\beta ,x)}\\&\quad \left[ \frac{1}{c(\beta ,x)}-\frac{1}{c(\beta _0,x)}\right] R(\theta ,1,x)\pi _0(\mathrm {d}x), \\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,1}(\theta ,\theta _0)\delta _{i,3}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} \frac{1}{4}\int \partial _{\alpha _h\alpha _k}^2\mathrm {e}_1(\theta ,x)\left[ \frac{c(\beta _0,x)}{c(\beta ,x)}-1\right] \pi _0(\mathrm {d}x), \\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,1}(\theta ,\theta _0)\delta _{i,4}^{h,k}(\theta )\\&\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} \frac{1}{2}\int \left[ \frac{c(\beta _0,x)}{c(\beta ,x)}-1\right] \\&\quad \left[ \frac{\partial _{\alpha _h\alpha _k}^2b(\alpha ,x)(b(\alpha ,x)-b(\alpha _0,x))+\partial _{\alpha _h} b(\alpha ,x)\partial _{\alpha _k} b(\alpha ,x)}{c(\beta ,x)}\right] \pi _0(\mathrm {d}x), \\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,2}(\theta ,\theta _0)\delta _{i,2}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} \int \frac{c(\beta _0,x)}{c^2(\beta ,x)}(-\partial _{\alpha _h\alpha _k}^2 b(\alpha ,x))(b(\alpha ,x)-b(\alpha _0,x))\pi _0(\mathrm {d}x), \\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,4}(\theta ,\theta _0)\delta _{i,1}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\frac{1}{4} \int \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \frac{c(\beta _0,x)}{c(\beta ,x)}\partial _{\alpha _h\alpha _k}^2 \mathrm {d}_1(\theta ,x)\pi _0(\mathrm {d}x),\\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,4}(\theta ,\theta _0)\delta _{i,3}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\frac{1}{4}\int \partial _{\alpha _h\alpha _k}^2\mathrm {e}_1(\theta ,x)\log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \pi _0(\mathrm {d}x), \\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,4}(\theta ,\theta _0)\delta _{i,4}^{h,k}(\theta )\\ {}&\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\frac{1}{2}\int \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \left\{ \frac{\partial _{\alpha _h\alpha _k}^2 b(\alpha ,x)(b(\alpha ,x)-b(\alpha _0,x))+\partial _{\alpha _k} b(\alpha ,x)\partial _{\alpha _h} b(\alpha ,x)}{c(\beta ,x)}\right\} \pi _0(\mathrm {d}x),\\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,2}(\theta ,\theta _0)\delta _{i,j}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0,\quad j=1,3,4, \\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,3}(\theta ,\theta _0)\delta _{i,j}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0,\quad j=1,2,3,4,\\&\frac{1}{n\Delta _n}\sum _{i=1}^n\varphi _{i,4}(\theta ,\theta _0)\delta _{i,2}^{h,k}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0, \end{aligned}$$

uniformly in \(\theta .\) Finally, since \(\mathrm {d}_1(\theta ,x)=-\mathrm {e}_1(\theta ,x),\) we get

$$\begin{aligned} \frac{1}{n\Delta _n}\sum _{i=1}^n[\texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)]\partial _{\alpha _h\alpha _k}^2\texttt {l}_{p,i}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} K_2(\theta ,\theta _0). \end{aligned}$$
(6.29)

uniformly in \(\theta .\) Hence, by (6.28) and (6.29), we immediately derive

$$\begin{aligned} \frac{1}{n\Delta _n}\partial _{\alpha _h\alpha _k}^2 T_{p,n}(\theta ,\theta _0) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 2(K_1(\theta ,\theta _0)+K_2(\theta ,\theta _0)) \end{aligned}$$
(6.30)

uniformly in \(\theta .\)

Now, we consider the elements of the matrix \(C_{n,p}(\theta ,\theta _0)\) given by (6.23). First, we study the convergence probability of

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n \partial _{\beta _h} \texttt {l}_{p,i}(\theta ) \partial _{\beta _k} \texttt {l}_{p,i}(\theta )= \frac{1}{n}\sum _{i=1}^n (\xi _{i,1}^h(\theta )+\xi _{i,2}^h(\theta )+\xi _{i,3}^h(\theta ))(\xi _{i,1}^k(\theta )+\xi _{i,2}^k(\theta )+\xi _{i,3}^k(\theta )). \end{aligned}$$

Since \(\partial _{\beta _h} r_{k_0}(\Delta _n,X_{i-1},\theta )=R(\theta ,\Delta _n^2, X_{i-1}),\) from Lemmas 5 and 1 we derive

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n \partial _{\beta _h} \texttt {l}_{p,i}(\theta ) \partial _{\beta _k} \texttt {l}_{p,i}(\theta )&=\frac{1}{n}\sum _{i=1}^n \xi _{i,2}^h(\theta )\xi _{i,2}^k(\theta )+\mathbf {o}_{P_0}(1) \nonumber \\&=\frac{1}{n}\sum _{i=1}^n\frac{\partial _{\beta _h} c_{i-1}(\beta )\partial _{\beta _k} c_{i-1}(\beta )}{4\Delta _n^2c_{i-1}^4(\beta )}(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))^4\nonumber \\&\quad +\frac{1}{n}\sum _{i=1}^n\frac{\partial _{\beta _h} c_{i-1}(\beta )\partial _{\beta _k} c_{i-1}(\beta )}{2\Delta _nc_{i-1}^3(\beta )}(X_i-r_{k_0}(\Delta _n,X_{i-1},\theta ))^2\nonumber \\&\quad +\frac{1}{n}\sum _{i=1}^n\frac{\partial _{\beta _h} c_{i-1}(\beta )\partial _{\beta _k} c_{i-1}(\beta )}{4c_{i-1}^2(\beta )}+\mathbf {o}_{P_0}(1)\nonumber \\&\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}K_3(\theta ,\theta _0) \end{aligned}$$
(6.31)

uniformly in \(\theta .\) Now, by resorting (6.15) and (6.19), we rewrite the second term appearing in (6.23) as follows

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n[\texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)]\partial _{\beta _h\beta _k}^2\texttt {l}_{p,i}(\theta )=\frac{1}{n}\sum _{i=1}^n\left[ \sum _{k=1}^{4}\sum _{j=1}^{3} \varphi _{i,k}(\theta ,\theta _0)\nu _{i,j}^{h,k}(\theta )\right] . \end{aligned}$$

By taking into account again Lemmas 1 and 5, the following results yield

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n \varphi _{i,1}(\theta ,\theta _0)\nu _{i,3}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} \frac{1}{4}\int c(\beta _0,x)\partial _{\beta _h\beta _k}^2\log c(\beta ,x)\left[ \frac{1}{c(\beta ,x)}-\frac{1}{c(\beta _0,x)}\right] \pi _0(\mathrm {d}x) \\&\frac{1}{n}\sum _{i=1}^n \varphi _{i,4}(\theta ,\theta _0)\nu _{i,1}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\frac{1}{4} \int \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) {c(\beta _0,x)}\partial _{\beta _h\beta _k}^2 c^{-1}(\beta ,x)\pi _0(\mathrm {d}x) \\&\frac{1}{n}\sum _{i=1}^n \varphi _{i,4}(\theta ,\theta _0)\nu _{i,3}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}\frac{1}{4} \int \log \left( \frac{c(\beta ,x)}{c(\beta _0,x)}\right) \partial _{\beta _h\beta _k}^2 \log c(\beta ,x)\pi _0(\mathrm {d}x)\\&\frac{1}{n}\sum _{i=1}^n \varphi _{i,1}(\theta ,\theta _0)\nu _{i,j}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0,\quad j=1,2,\\&\frac{1}{n}\sum _{i=1}^n \varphi _{i,k}(\theta ,\theta _0)\nu _{i,j}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0,\quad k=2,3,j=1,2,3,\\&\frac{1}{n}\sum _{i=1}^n \varphi _{i,4}(\theta ,\theta _0)\nu _{i,2}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}0, \end{aligned}$$

uniformly in \(\theta .\) Finally

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n[\texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)]\partial _{\beta _h\beta _k}^2\texttt {l}_{p,i}(\theta )\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}K_4(\theta ,\theta _0) \end{aligned}$$
(6.32)

uniformly in \(\theta .\) Therefore, by (6.31) and (6.32), we get

$$\begin{aligned} \frac{1}{n}\partial _{\beta _h\beta _k}^2 T_{p,n}(\theta ,\theta _0) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 2(K_3(\theta ,\theta _0)+K_4(\theta ,\theta _0)) \end{aligned}$$
(6.33)

uniformly in \(\theta .\)

Recalling the expressions (6.16), (6.17), (6.20) and (6.15), by means of similar arguments adopted above, it is not hard to prove that

$$\begin{aligned} \frac{1}{n\sqrt{\Delta _n}}\sum _{i=1}^n \partial _{\alpha _h} \texttt {l}_{p,i}(\theta )\partial _{\beta _k} \texttt {l}_{p,i}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0 \end{aligned}$$

and

$$\begin{aligned} \frac{1}{n\sqrt{\Delta _n}}\sum _{i=1}^n [\texttt {l}_{p,i}(\theta )-\texttt {l}_{p,i}(\theta _0)]\partial _{\alpha _h\beta _k}^2 \texttt {l}_{p,i}(\theta ) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0 \end{aligned}$$

uniformly in \(\theta .\) This implies that

$$\begin{aligned} \frac{1}{n\sqrt{\Delta _n}}\partial _{\alpha _h\beta _k}^2 T_{p,n}(\theta ,\theta _0) \overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0 \end{aligned}$$
(6.34)

uniformly in \(\theta .\)

In conclusion the results (6.30), (6.33) and (6.34) lead to the convergence (6.27). Moreover, (6.27) implies (6.25) since \(K(\theta _0,\theta _0)=I(\theta _0)\). From the inequality

$$\begin{aligned}&\sup _{||\theta ||\le \varepsilon _n}||C_{p,n}(\theta _0+\theta ,\theta _0)-C_{p,n}(\theta _0,\theta _0)||\\&\le \sup _{||\theta ||\le \varepsilon _n}||C_{p,n}(\theta _0+\theta ,\theta _0)-2K(\theta _0+\theta ,\theta _0)||+\sup _{||\theta ||\le \varepsilon _n}||2K(\theta _0+\theta ,\theta _0)-2I(\theta _0)||\\&\quad +||2I(\theta _0)-C_{p,n}(\theta _0,\theta _0)|| \end{aligned}$$

follows (6.26). Indeed, (6.25) leads to \(||2I(\theta _0)-C_{p,n}(\theta _0,\theta _0)||\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0,\varepsilon _n\rightarrow 0,\) while the term \(\sup _{||\theta ||\le \varepsilon _n}||C_{p,n}(\theta _0+\theta ,\theta _0)-2K(\theta _0+\theta ,\theta _0)||\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0,\varepsilon _n\rightarrow 0,\) by the uniformity of the convergence (i.e. by the result (6.27)). Furthermore, \(\sup _{||\theta ||\le \varepsilon _n}||K(\theta _0+\theta ,\theta _0)-I(\theta _0)||\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }} 0,\varepsilon _n\rightarrow 0,\) because the assumptions \(A_3\) and \(A_5,\) imply that \(K(\theta ,\theta _0)\) is a continuous function with respect to \(\theta .\)\(\square \)

Now, we are able to prove Theorem 3.

Proof of Theorem 3

We adopt classical arguments. By Taylor’s formula, we have that

$$\begin{aligned} T_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)&=T_{p,n}(\theta _0,\theta _0)+n\partial _\theta T_{p,n}(\theta _0,\theta _0)({{\hat{\theta }}}_{p,n}-\theta _0) \nonumber \\&\quad +\frac{1}{2}(\varphi (n)^{-1/2}({{\hat{\theta }}}_n-\theta _0))'\Lambda _{p,n}({{\hat{\theta }}}_{p,n},\theta _0))\varphi (n)^{-1/2}({{\hat{\theta }}}_{p,n}-\theta _0)\nonumber \\&=\frac{1}{2}(\varphi (n)^{-1/2}({{\hat{\theta }}}_n-\theta _0))'\Lambda _{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\varphi (n)^{-1/2}({{\hat{\theta }}}_n-\theta _0) \end{aligned}$$
(6.35)

where in the last step we denoted by

$$\begin{aligned} \Lambda _{p,n}({{\hat{\theta }}}_{p,n},\theta _0)&:=\varphi (n)^{1/2}\int _0^1(1-u)\partial _\theta ^2 T_{p,n}(\theta _0+u({{\hat{\theta }}}_{p,n}-\theta _0),\theta _0)\mathrm {d}u\varphi (n)^{1/2}\\&=\int _0^1(1-u)[C_{p,n}(\theta _0+u({{\hat{\theta }}}_{p,n}-\theta _0),\theta _0)-C_{p,n}(\theta _0,\theta _0)]\mathrm {d}u+C_{p,n}(\theta _0,\theta _0). \end{aligned}$$

Proposition 1 implies

$$\begin{aligned} \Lambda _{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\overset{P_0}{\underset{n\rightarrow \infty }{\longrightarrow }}2I(\theta _0). \end{aligned}$$
(6.36)

By taking into account (6.35), (3.16) and (6.36), Slutsky’s theorem allows to conclude the proof. \(\square \)

Proof of Theorem 4

Under \(H_{1,n}\) we have that [see Lemma 2 in Kitagawa and Uchida (2014)]

$$\begin{aligned} \varphi (n)^{-1/2}({{\hat{\theta }}}_{p,n}-(\theta _0+\varphi (n)^{1/2}h))\overset{d}{\underset{n\rightarrow \infty }{\longrightarrow }} N(0,I(\theta _0)^{-1}). \end{aligned}$$

Therefore, under the hypothesis \(H_{1,n}\)

$$\begin{aligned} \varphi (n)^{-1/2}({{\hat{\theta }}}_{p,n}-\theta _0)=\varphi (n)^{-1/2}({{\hat{\theta }}}_{p,n}-\theta )+h\overset{d}{\underset{n\rightarrow \infty }{\longrightarrow }} N(h,I(\theta _0)^{-1}) \end{aligned}$$

and

$$\begin{aligned} C_{p,n}({{\hat{\theta }}}_{p,n},\theta _0)\overset{P_\theta }{\underset{n\rightarrow \infty }{\longrightarrow }}2I(\theta _0) \quad (\text {under}\, H_{1,n}). \end{aligned}$$

Hence, from (6.35) we obtain the result (4.7). \(\square \)