Abstract
We consider the nonparametric robust estimation problem for regression models in continuous time with semi-Markov noises. An adaptive model selection procedure is proposed. Under general moment conditions on the noise distribution a sharp non-asymptotic oracle inequality for the robust risks is obtained and the robust efficiency is shown. It turns out that for semi-Markov models the robust minimax convergence rate may be faster or slower than the classical one.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let us consider a regression model in continuous time
where \(S(\cdot )\) is an unknown 1-periodic function from \(\mathbf{L}_{2}[0,1]\) defined on \({{\mathbb {R}}}\) with values in \({{\mathbb {R}}}\), the noise process \((\xi _{t})_{t\ge \, 0}\) is defined as
where \(\varrho _{1}\), \(\varrho _{2}\) and \(\varrho _{3}\) are unknown coefficients, \((w_{t})_{t\ge \,0}\) is a standard Brownian motion, \((L_{t})_{t\ge \,0}\) is a jump Lévy process (with \(\mathbf{E}L^2_{t}=t\) cf. Eq. (2.3)) and the pure jump process \((z_{t})_{t\ge \,1},\) defined in (2.5), is assumed to be a semi-Markov process (see, for example, Barbu and Limnios 2008).
The problem is to estimate the unknown function S in the model (1.1) on the basis of observations \((y_{t})_{0\le t\le n}\). Firstly, this problem was considered in the framework of the “signal+white noise” models (see, for example, Ibragimov and Khasminskii 1981 or Pinsker 1981). Later, in order to study dependent observations in continuous time, were introduced “signal+color noise” regressions based on Ornstein-Uhlenbeck processes (cf. Höpfner and Kutoyants 2009, 2010; Konev and Pergamenshchikov 2003, 2010).
Moreover, to include jumps in such models, the papers Konev and Pergamenshchikov (2012) and Konev and Pergamenshchikov (2015) used non Gaussian Ornstein-Uhlenbeck processes introduced in Barndorff-Nielsen and Shephard (2001) for modeling of the risky assets in the stochastic volatility financial markets. Unfortunately, the dependence of the stable Ornstein-Uhlenbeck type decreases with a geometric rate. So, asymptotically when the duration of observations goes to infinity, we obtain very quickly the same “signal+white noise” model.
The main goal of this paper is to consider continuous time regression models with dependent observations for which the dependence does not disappear for a sufficient large duration of observations. To this end we define the noise in the model (1.1) through a semi-Markov process which keeps the dependence for any duration n. This type of models allows, for example, to estimate the signals observed under long impulse noise impact with a memory or in the presence of “against signals”.
In this paper we use the robust estimation approach introduced in Konev and Pergamenshchikov (2012) for such problems. To this end, we denote by Q the distribution of \((\xi _{t})_{0\le t\le n}\) in the Skorokhod space \({{\mathcal {D}}}[0,n]\). We assume that Q is unknown and belongs to some distribution family \({{\mathcal {Q}}}_{n}\) specified in Sect. 4. In this paper we use the quadratic risk
where \(\Vert f\Vert ^{2}=\int ^{1}_{0}\,f^{2}(s)\mathrm {d}s\) and \(\mathbf{E}_{Q,S}\) is the expectation with respect to the distribution \(\mathbf{P}_{Q,S}\) of the process (1.1) corresponding to the noise distribution Q. Since the noise distribution Q is unknown, it seems reasonable to introduce the robust risk of the form
which enables us to take into account the information that \(Q\in {{\mathcal {Q}}}_{n}\) and ensures the quality of an estimate \(\widetilde{S}_{n}\) for all distributions in the family \({{\mathcal {Q}}}_{n}\).
To summarize, the goal of this paper is to develop robust efficient model selection methods for the model (1.1) with the semi-Markov noise having unknown distribution, based on the approach proposed by Konev and Pergamenshchikov (2012, 2015) for continuous time regression models with semimartingale noises. Unfortunately, we cannot use directly this method for semi-Markov regression models, since their tool essentially uses the fact that the Ornstein-Uhlenbeck dependence decreases with geometrical rate and the “white noise” case is obtained sufficiently quickly.
Thus in the present paper we propose new analytical tools based on renewal methods to obtain the sharp non-asymptotic oracle inequalities. As a consequence, we obtain the robust efficiency for the proposed model selection procedures in the adaptive setting.
The rest of the paper is organized as follows. We start by introducing the main conditions in the next section. Then, in Sect. 3 we construct the model selection procedure on the basis of the weighted least squares estimates. The main results are stated in Sect. 4; here we also specify the set of admissible weight sequences in the model selection procedure. In Sect. 5 we derive some renewal results useful for obtaining other results of the paper. In Sect. 6 we develop stochastic calculus for semi Markov processes. In Sect. 7 we study some properties of the model (1.1). A numerical example is presented in Sect. 8. Most of the results of the paper are proved in Sect. 9. In “Appendix” some auxiliary propositions are given.
2 Main conditions
In the model (1.2) we assume that the jump Lévy process \(L_{t}\) is defined as
where \(\mu (\mathrm {d}s,\mathrm {d}x)\) is the jump measure with the deterministic compensator \(\widetilde{\mu }(\mathrm {d}s\,\mathrm {d}x)=\mathrm {d}s\Pi (\mathrm {d}x)\), where \(\Pi (\cdot )\) is the Levy measure on \({{\mathbb {R}}}_{*}={{\mathbb {R}}}\setminus \{0\}\) (see, for example Jacod and Shiryaev 2002; Cont and Tankov 2004 for details) for which we assume that
where we use the usual notation \(\Pi (\vert x\vert ^{m})=\int _{{{\mathbb {R}}}_{*}}\,\vert z \vert ^{m}\,\Pi (\mathrm {d}z)\) for any \(m>0\). Note that, using the Ito formula for the martingales (see, for example, Liptser and Shiryaev 1986, p.185) we can obtain directly that
where \(\Delta L_{s}=L_{s}- L_{s-}\) and \(L_{s-}\) is the left limit to s in probability. Moreover, the last condition in (2.2) and the inequality (A.1) imply that for some positive constant \(C^{*}\) the expectation
Note that \(\Pi ({{\mathbb {R}}}_{*})\) may be equal to \(+\infty \). Moreover, we assume that the pure jump process \((z_{t})_{t\ge \, 0}\) in (1.2) is a semi-Markov process with the following form
where \((Y_{i})_{i\ge \, 1}\) is an i.i.d. sequence of random variables with
Here \(N_{t}\) is a general counting process (see, for example, Mikosch 2004) defined as
where \((\tau _{l})_{l\ge \,1}\) is an i.i.d. sequence of positive integrated random variables with distribution \(\eta \) and mean \({\check{\tau }}=\mathbf{E}_{Q}\tau _{1}>0\). We assume that the processes \((N_{t})_{t\ge 0}\) and \((Y_{i})_{i\ge \, 1}\) are independent between them and are also independent of \((L_{t})_{t\ge 0}\).
Note that the process \((z_{t})_{t\ge \, 0}\) is a special case of a semi-Markov process (see, e.g., Barbu and Limnios 2008; Limnios and Oprisan 2001).
Remark 2.1
It should be noted that if \(\tau _{j}\) are exponential random variables, then \((N_{t})_{t\ge 0}\) is a Poisson process and, in this case, \((\xi _{t})_{t\ge 0}\) is a Lévy process for which this model has been studied in Konev and Pergamenshchikov (2009a, b) and Konev and Pergamenshchikov (2012). But, in the general case when the process (2.5) is not a Lévy process, this process has a memory and cannot be treated in the framework of semi-martingales with independent increments. In this case, we need to develop new tools based on renewal theory arguments, what we do in Sect. 5. This tools will be intensively used in the proofs of the main results of this paper.
Note that for any function f from \(\mathbf{L}_{2}[0,n],\)\(f: [0,n] \rightarrow {{\mathbb {R}}},\) for the noise process \((\xi _{t})_{t\ge \, 0}\) defined in (1.2), with \((z_{t})_{t\ge \, 0}\) given in (2.5), the integral
is well defined with \(\mathbf{E}_{Q}\,I_n(f)=0\). Moreover, as it is shown in Lemma 6.2,
where \(\Vert f\Vert ^{2}_{t}= \int _{0}^{t} f^2(s) \mathrm {d}\,s\), \({\bar{\varrho }}=\varrho _{1}^{2}+\varrho _{2}^{2}\) and \(\vert \rho \vert _{*}=\sup _{t\ge 0}\vert \rho (t)\vert <\infty \). Here \(\rho \) is the density of the renewal measure \({\check{\eta }}\) defined as
where \(\eta ^{(l)}\) is the lth convolution power for \(\eta \).
Remark 2.2
In Proposition 5.2 we will prove that, under Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\), the the renewal measure \({\check{\eta }}\) hase a density \(\rho \).
To study the series (2.9) we assume that the measure \(\eta \) has a density g which satisfies the following conditions.
\((\mathbf{H}_{1}\)) Assume that, for any\(x\in {{\mathbb {R}}},\)there exist the finite limits
and, for any\(K>0,\)there exists\(\delta =\delta (K)>0\)for which
\((\mathbf{H}_{2}\)) For any\(\gamma >0,\)
\((\mathbf{H}_{3}\)) There exists\(\beta >0\)such that\(\int _{{{\mathbb {R}}}}\,e^{\beta x}\,g(x)\,\mathrm {d}x<\infty .\)
Remark 2.3
It should be noted that Condition \((\mathbf{H}_{3})\) means that there exists an exponential moment for the random variable \((\tau _{j})_{j\ge 1}\), i.e. these random variables are not too large. This is a natural constraint since these random variables define the intervals between jumps, i.e., the frequency of the jumps. So, to study the influence of the jumps in the model (1.1) one needs to consider the noise process (1.2) with “small” interval between jumps or large jump frequency.
For the next condition we need to introduce the Fourier transform of any function f from \(\mathbf{L}_{1}({{\mathbb {R}}}),\)\(f : {{\mathbb {R}}}\rightarrow {{\mathbb {R}}},\) defined as
\((\mathbf{H}_{4}\)) There exists\(t^{*}>0\)such that the function\(\widehat{g}(\theta -it)\)belongs to\(\mathbf{L}_{1}({{\mathbb {R}}})\)for any\(0\le t\le t^{*}\).
It is clear that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold true for any continuously differentiable function g, for example for the exponential density.
Now we define the family of the noise distributions for the model (1.1) which is used in the robust risk (1.4). In our case the distribution family \({{\mathcal {Q}}}_{n}\) consists in all distributions on the Skorokhod space \({{\mathcal {D}}}[0,n]\) of the process (1.2) with the parameters satisfying the conditions (2.11) and (2.12). Note that any distribution Q from \({{\mathcal {Q}}}_{n}\) is defined by the unknown parameters in (1.2) and (2.1). We assume that
where \(\sigma _{Q}=\varrho _{1}^{2}+\varrho _{2}^{2}+ \varrho _{3}^{2}/{\check{\tau }}\), the unknown bounds \(0<\varsigma _{*}\le \varsigma ^{*}\) are functions of n, i.e. \(\varsigma _{*}=\varsigma _{*}(n)\) and \(\varsigma ^{*}=\varsigma ^{*}(n)\), such that for any \({\check{\epsilon }}>0,\)
Remark 2.4
As we will see later, the parameter \(\sigma _{Q}\) is the limit for the Fourier transform of the noise process (1.2). Such limit is called variance proxy (see Konev and Pergamenshchikov 2012).
Remark 2.5
Note that, generally (but it is not necessary) the parameters \(\varrho _{1}\), \(\varrho _{2}\) and \(\varrho _{3}\) can be dependent on n. Condition (2.12) means that we consider all possible cases, i.e. these parameters may go to the infinity or be constant or to zero as well. See, for example, the conditions (3.32) in Konev and Pergamenshchikov (2015).
3 Model selection
Let \((\phi _{j})_{j\ge \, 1}\) be an orthonormal uniformly bounded basis in \(\mathbf{L}_{2}[0,1]\), i.e., for some constant \(\phi _{*}\ge 1\), which may be depend on n,
We extend the functions \(\phi _{j}(t)\) by periodicity, i.e., we set \(\phi _{j}(t):=\phi _{j}(\{t\})\), where \(\{t\}\) is the fractional part of \(t\ge 0\). For example, we can take the trigonometric basis defined as \(\text{ Tr }_{1}\equiv 1\) and, for \(j\ge 2,\)
where [x] denotes the integer part of x.
To estimate the function S we use here the model selection procedure for continuous time regression models from Konev and Pergamenshchikov (2012) based on the Fourrier expansion. We recall that for any function S from \(\mathbf{L}_{2}[0,1]\) we can write
So, to estimate the function S it suffices to estimate the coefficients \(\theta _{j}\) and to replace them in this representation by their estimators. Using the fact that the function S and \(\phi _{j}\) are 1 - periodic we can write that
If we replace here the differential \(S(t)\mathrm {d}t\) by the stochastic observed differential \(\mathrm {d}y_{t}\) we obtain the natural estimate for \(\theta _{j}\) on the time interval [0, n]
which can be represented, in view of the model (1.1), as
Now (see, for example, Ibragimov and Khasminskii 1981) we can estimate the function S by the projection estimators, i.e.
for some number \(m\rightarrow \infty \) as \(n\rightarrow \infty \). It should be noted that Pinsker in Pinsker (1981) shows that the projection estimators of the form (3.6) are not efficient. For obtaining efficient estimation one needs to use weighted least square estimators defined as
where the coefficients \(\lambda =(\lambda (j))_{1\le j\le n}\) belong to some finite set \(\Lambda \) from \([0,1]^n\). As it is shown in Pinsker (1981), in order to obtain efficient estimators, the coefficients \(\lambda (j)\) in (3.7) need to be chosen depending on the regularity of the unknown function S. In this paper we consider the adaptive case, i.e. we assume that the regularity of the function S is unknown. In this case we chose the weight coefficients on the basis of the model selection procedure proposed in Konev and Pergamenshchikov (2012) for the general semi-martingale regression model in continuous time. These coefficients will be obtained later in (3.19). To the end, first we set
where \(\#(\Lambda )\) is the cardinal number of \(\Lambda \) and \(\check{L}(\lambda )=\sum ^{n}_{j=1}\lambda (j)\). Now, to choose a weight sequence \(\lambda \) in the set \(\Lambda \) we use the empirical quadratic risk, defined as
which in our case is equal to
Since the Fourier coefficients \((\theta _{j})_{j\ge \,1}\) are unknown, we replace the terms \(\widehat{\theta }_{j,n}\theta _{j}\) by
where \(\widehat{\sigma }_{n}\) is an estimate for the variance proxy \(\sigma _{Q}\) defined in (2.11). If it is known, we take \(\widehat{\sigma }_{n}=\sigma _{Q}\); otherwise, we can choose it, for example, as in Konev and Pergamenshchikov (2012), i.e.
where \(\widehat{t}_{j,n}\) are the estimators for the Fourier coefficients with respect to the trigonometric basis (3.2), i.e.
Finally, in order to choose the weights, we will minimize the following cost function
where \(\delta >0\) is some threshold which will be specified later and the penalty term is
We define the model selection procedure as
We recall that the set \(\Lambda \) is finite so \({\hat{\lambda }}\) exists. In the case when \({\hat{\lambda }}\) is not unique, we take one of them.
Let us now specify the weight coefficients \((\lambda (j))_{1\le j\le n}\). Consider, for some fixed \(0<\varepsilon <1,\) a numerical grid of the form
where \(m=[1/\varepsilon ^2]\). We assume that both parameters \(k^*\ge 1\) and \(\varepsilon \) are functions of n, i.e. \(k^*=k^*(n)\) and \(\varepsilon =\varepsilon (n)\), such that
for any \({\check{\delta }}>0\). One can take, for example, for \(n\ge 2\)
where \(k^{*}_{0}\ge 0\) is some fixed constant. For each \(\alpha =(\beta , \mathbf{l})\in {{\mathcal {A}}}\), we introduce the weight sequence
with the elements
where \(j_{*}=1+\left[ \ln \upsilon _{n}\right] \), \(\omega _{\alpha }=(\mathrm {d}_{\beta }\,\mathbf{l}\upsilon _{n})^{1/(2\beta +1)}\),
and the threshold \(\varsigma ^{*}(n)\) is introduced in (2.11). Now we define the set \(\Lambda \) as
It will be noted that in this case the cardinal of the set \(\Lambda \) is
Moreover, taking into account that \(\mathrm {d}_{\beta }<1\) for \(\beta \ge 1\) we obtain for the set (3.20)
Remark 3.1
Note that the form (3.19) for the weight coefficients in (3.7) was proposed by Pinsker in Pinsker (1981) for the efficient estimation in the nonadaptive case, i.e. when the regularity parameters of the function S are known. In the adaptive case these weight coefficients are used in Konev and Pergamenshchikov (2012, 2015) to show the asymptotic efficiency for model selection procedures.
4 Main results
In this section we obtain in Theorem 4.3 the non-asymptotic oracle inequality for the quadratic risk (1.3) for the model selection procedure (3.15) and in Theorem 4.4 the non-asymptotic oracle inequality for the robust risk (1.4) for the same model selection procedure (3.15), considered with the coefficients (3.19). We give the lower and upper bound for the robust risk in Theorems 4.5 and 4.7, and also the optimal convergence rate in Corollary 4.8.
Before stating the non-asymptotic oracle inequality, let us first introduce the following parameters which will be used for describing the rest term in the oracle inequalities. For the renewal density \(\rho \) defined in (2.9) we set
where \({\check{\tau }}=\mathbf{E}_{Q}\tau _{1}\). In Proposition 5.2 we show that \(\vert \rho \vert _{*}=\sup _{t\ge 0}\vert \rho (t)\vert <\infty \) and \(\vert \Upsilon \vert _{1}<\infty \). So, using this, we can introduce the following parameters
and
where \( {\check{\mathbf{l}}}=5(1+{\check{\tau }})^{2}(1+\vert \rho \vert ^{2}_{*}) \left( 2+\vert \Upsilon \vert _{1}+\mathbf{E}Y^{4}_{1}+\Pi (x^{4})\right) \). We recall that \({\check{\iota }}_n\) is cardinal of \(\Lambda \), the noise variance \(\sigma _{Q}\) is defined in (2.11) and the parameter \(\varkappa _{Q}\) is given in (2.8). First, let us state the non-asymptotic oracle inequality for the quadratic risk for the model selection procedure (3.15) introduced in (1.3) by \({{\mathcal {R}}}_{Q}(\widetilde{S}_{n},S)= \mathbf{E}_{Q,S}\,\Vert \widetilde{S}_{n}-S\Vert ^{2}.\)
Theorem 4.1
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold. Then, for any \(n\ge \,1\) and \(0<\delta < 1/6\), the estimator of S given in (3.15) satisfies the following oracle inequality
Now we study the estimate (3.11).
Proposition 4.2
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold and that the function \(S(\cdot )\) is continuously differentiable. Then, for any \(n\ge 2\),
where \(\dot{S}\) is the differential of S.
Theorem 4.1 and Proposition 4.2 implies the following result.
Theorem 4.3
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold and that the function S is continuously differentiable. Then, for any \(n\ge \, 1 \) and \( 0 <\delta \le 1/6\), the procedure (3.15), (3.11) satisfies the following oracle inequality
where \(\widetilde{\Psi }_{Q,n}=12 \widetilde{\Lambda }_{n}\mathbf{c}^{*}_{Q}+\Psi _{Q}\) and \(\widetilde{\Lambda }_{n}=\vert \Lambda \vert _{*}/\sqrt{n}.\)
Remark 4.1
Note that the coefficient \(\varkappa _{Q}\) can be estimated as \(\varkappa _{Q}\le (1+{\check{\tau }}\vert \rho \vert _{*})\sigma _{Q}\). Therefore,taking into account that \(\phi ^{4}_{max}\ge 1\), the remainder term in (4.6) can be estimated as
where \(\mathbf{C}_{*}>0\) is some constant which is independent of the distribution Q.
Furthermore, let us study the robust risk (1.4) for the procedure (3.15). In this case, the distribution family \({{\mathcal {Q}}}_{n}\) consists in all distributions on the Skorokhod space \({{\mathcal {D}}}[0,n]\) of the process (1.2) with the parameters satisfying the conditions (2.11) and (2.12).
Moreover, we assume also that the upper bound for the basis functions in (3.1) may depend on \(n\ge 1\), i.e. \(\phi _{*}=\phi _{*}(n)\), such that for any \({\check{\epsilon }}>0\)
The next result presents the non-asymptotic oracle inequality for the robust risk (1.4) for the model selection procedure (3.15), considered with the coefficients (3.19).
Theorem 4.4
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold and that the unknown function S is continuously differentiable. Then, for the robust risk defined by \({{\mathcal {R}}}^{*}_{n}(\widetilde{S}_{n},S)=\sup _{Q\in {{\mathcal {Q}}}_{n}}\,{{\mathcal {R}}}_{Q}(\widetilde{S}_{n},S),\) through the distribution family (2.11–2.12), the procedure (3.15) with the coefficients (3.19) for any \(n\ge \, 1 \) and \( 0<\delta <1/6\), satisfies the following oracle inequality
where the sequence \(\mathbf{U}^{*}_{n}(S)>0\) is such that, under the conditions (2.12), (3.17) and (4.8), for any \(r>0\) and \({\check{\delta }}>0,\)
Now we study the asymptotic efficiency for the procedure (3.15) with the coefficients (3.19), with respect to the robust risk (1.4) defined by the distribution family (2.11–2.12). To this end, we assume that the unknown function S in the model (1.1) belongs to the Sobolev ball
where \(\mathbf{r}>0\) and \(k\ge 1\) are some unknown parameters, \({{\mathcal {C}}}^{k}_{per}[0,1]\) is the set of k times continuously differentiable functions \(f\,:\,[0,1]\rightarrow {{\mathbb {R}}}\) such that \(f^{(i)}(0)=f^{(i)}(1)\) for all \(0\le i \le k\). The function class \(W^{k}_{r}\) can be written as an ellipsoid in \(\mathbf{L}_{2}[0,1]\), i.e.,
where \(a_{j}=\sum ^k_{i=0}\left( 2\pi [j/2]\right) ^{2i}\) and \(\theta _{j}=\int ^{1}_{0}\,f(v)\text{ Tr }_{j}(v)\mathrm {d}v\). We recall that the trigonometric basis \((\text{ Tr }_{j})_{j\ge 1}\) is defined in (3.2).
Similarly to Konev and Pergamenshchikov (2012, 2015) we will show here that the asymptotic sharp lower bound for the robust risk (1.4) is given by
Note that this is the well-known Pinsker constant obtained for the nonadaptive filtration problem in “signal + small white noise” model (see, for example, Pinsker 1981). Let \(\Pi _{n}\) be the set of all estimators \(\widehat{S}_{n}\) measurable with respect to the \(\sigma \) - field \(\sigma \{y_{t}\,,\,0\le t\le n\}\) generated by the process (1.1).
The following two results give the lower and upper bound for the robust risk (1.4) defined for the distribution family (2.11–2.12).
Theorem 4.5
Under Conditions (2.11) and (2.12),
where \(\upsilon _{n}=n/\varsigma ^{*}\).
Note that if the parameters \(\mathbf{r}\) and k are known, i.e. for the non adaptive estimation case, then to obtain the efficient estimation for the “signal+white noise” model Pinsker (1981) proposed to use the estimate \(\widehat{S}_{\lambda _{0}}\) defined in (3.7) with the weights
where \(\alpha _{0}=(k,\mathbf{l}_{0})\) and \(\mathbf{l}_{0}=[\mathbf{r}/\varepsilon ]\varepsilon \). For the model (1.1–1.2) we show the same result.
Proposition 4.6
The estimator \(\widehat{S}_{\lambda _{0}}\) satisfies the following asymptotic upper bound
Remark 4.2
Note that the inequalities (4.14) and (4.16) imply that the estimator \(\widehat{S}_{\lambda _{0}}\) is efficient. But we can’t use the weights (4.15) directly because the parameters k and \(\mathbf{r}\) are unknown. By this reason to obtain the efficient estimate in the adaptive setting we use the model selection procedure (3.15) over the estimate family (3.20) which includes the estimator (4.15). Then, using the oracle inequality (4.9) and the upper bound (4.16) we can obtain the efficient property for this model selection procedure.
For the adaptive estimation we use the model selection procedure (3.15) with the parameter \(\delta \) defined as a function of n satisfying
for any \({\check{\delta }}>0\). For example, we can take \(\delta _{n}=(6+\ln n)^{-1}\).
Let \(\widehat{S}_{*}\) be the procedure (3.15) based on the trigonometric basis (3.2) with the coefficients (3.19) and the parameter \(\delta =\delta _{n}\) satisfying (4.17).
Theorem 4.7
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold true. Then
Theorems 4.5 and 4.7 allow us to compute the optimal convergence rate.
Corollary 4.8
Under the assumptions of Theorem 4.7 the procedure \(\widehat{S}_{*}\) is efficient, i.e.
and
Remark 4.3
It is well known that the optimal (minimax) risk convergence rate for the Sobolev ball \(W^{k}_{r}\) is \(n^{2k/(2k+1)}\) (see, for example, Pinsker 1981; Nussbaum 1985). We see here that the efficient robust rate is \(\upsilon ^{2k/(2k+1)}_{n}\), i.e., if the distribution upper bound \(\varsigma ^{*}\rightarrow 0\) as \(n\rightarrow \infty ,\) we obtain a faster rate with respect to \(n^{2k/(2k+1)}\), and, if \(\varsigma ^{*}\rightarrow \infty \) as \(n\rightarrow \infty ,\) we obtain a slower rate. In the case when \(\varsigma ^{*}\) is constant, than the robust rate is the same as the classical non robust convergence rate. The same properties for the robust risks are obtained in Konev and Pergamenshchikov (2010) and Konev and Pergamenshchikov (2012) for the regression model with Ornstein-Uhlenbeck noise process. So, this is typical situation, when we take the supremum over all noise distribution in (1.4). It is natural that we don’t obtain the same convergence rate as for the usual risks (1.3) and the difference is given by the coefficient \(\varsigma ^{*}\) which satisfies the “slowly changing” properties (2.12).
5 Renewal density
This section is concerned with results related to the renewal measure \( {\check{\eta }} =\sum ^{\infty }_{l=1}\,\eta ^{(l)} \,.\) We start with the following lemma.
Lemma 5.1
Let \(\tau \) be a positive random variable with a density g, such that \(\mathbf{E}e^{\beta \tau } <\infty \) for some \(\beta > 0\). Then there exists a constant \(\beta _1,\)\(0< \beta _1 <\beta \) for which,
Proof
We will show this lemma by contradiction, i.e. assume there exists some sequence of positive numbers going to zero \((\gamma _{k} )_{k\ge 1}\) and a sequence \((w_{k})_{k\ge 1}\) such that
for any \(k\ge 1\). Firstly assume that \(\limsup _{k\rightarrow \infty }\,w_{k} = +\infty \), i.e. there exists \((l_{k})_{k\ge 1}\) for which \(\lim _{k\rightarrow \infty }\,w_{l_{k}} = +\infty \). Note that in this case, for any \(N\ge 1,\)
i.e., in view of Lemma A.5, for any fixed \(N\ge 1\)
Since for some \(\beta >0\) the integral \( \int _{0}^{+\infty }\,e^{\beta t}\,g(t) \mathrm {d}t<\infty \), we get
Let now \( \limsup _{k\rightarrow \infty } w_{k}=\omega _{\infty }\) and \(\vert \omega _{\infty }\vert <\infty \). In this case there exists a sequence \((l_k)_{k\ge 1}\) such that \( \lim _{k\rightarrow \infty } w_{l_k}=\omega _{\infty }\), i.e.
It is clear that, for random variables having density, the last equality is possible if and only if \(w_{\infty }=0\). In this case, i.e. when \(\lim _{k\rightarrow \infty } w_{l_k} = 0\), the Eq. (5.1) implies
But, \(\mathbf{E}\tau >0\), under our conditions. These contradictions imply the desired result. \(\square \)
Proposition 5.2
Let \(\tau \) be a positive random variable with the distribution \(\eta \) having a density g which satisfies Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\). Then the renewal measure (2.9) is absolutely continuous with density \(\rho \), for which
where \({\check{\tau }}=\mathbf{E}\tau _{1}\) and \(\Upsilon (\cdot )\) is some function defined on \({{\mathbb {R}}}_{+}\) with values in \({{\mathbb {R}}}\) such that
Proof
First note, that we can represent the renewal measure \({\check{\eta }}\) as \({\check{\eta }}=\eta *\eta _{0}\) and \(\eta _{0}=\sum _{j=0}^{\infty } \eta ^{(j)}\). It is clear that in this case the density \(\rho \) of \({\check{\eta }}\) can be written as
Now we use the arguments proposed in the proof of Lemma 9.5 from Goldie (1991). For any \(0<\epsilon <1\) we set
where \( g_0(y)= e^{-\epsilon y/{\check{\tau }}}1_{\{y>0\}}.\) It is easy to deduce that for any \(x\in {{\mathbb {R}}}\)
Moreover, in view of Condition \((\mathbf{H}_{1})\) we obtain that the function \(\rho _{\epsilon }(x)\) satisfies Condition \(\mathbf{D})\) from Section A.3. So, through Proposition A.6 we get
where \(\widehat{\rho }_{\epsilon }(\theta ) =\int _{{{\mathbb {R}}}}\,e^{i\theta x}\rho _{\epsilon }(x)\mathrm {d}x\). Note that by the Bunyakovskii–Cauchy–Schwarz inequality
It should be noted that this inequality becomes an equality if and only if \(\theta =0\). Therefore, for any \(0<\epsilon <1\) the module \(\vert (1-\epsilon )\widehat{g}(\theta )\vert <1\) and
So, taking into account that
we obtain
where
i.e.
In section A.5 we show that
Therefore, using Condition \((\mathbf{H}_{4})\) and Lebesgue’s dominated convergence theorem, we can pass to limit as \(\epsilon \rightarrow 0\) in (5.7), i.e., we obtain that
where
Using here again Proposition A.6 we deduce that
and
Note now that we can represent the density (5.3) as
and the function \( \rho _{c}(x)\) is continuous for all \(x\in {{\mathbb {R}}}\). This means that
and, therefore, Condition \((\mathbf{H}_{2})\) implies that, for any \(\gamma >0,\)
Now we can rewrite (5.9) as
Taking into account that \(\mathbf{E}_{Q}e^{\beta \tau }<\infty \) for some \(\beta >0\) we can obtain that
To study the second term in (5.10) we will apply Proposition A.4 to the function \(\widehat{g}(\theta ) \check{G}(\theta ).\) First, note that the function \(\widehat{g}(\theta )=\mathbf{E}_{Q}e^{i\tau \theta }\) is holomorphic for any \(\theta \in {{\mathbb {C}}}\) with \(\text{ Im }(\theta )>-\beta ,\) due to Condition \((\mathbf{H}_{3});\) in view of Lemma A.8, there exists \(0<\beta _{*}<\beta \) for which the function \(\check{G}(\theta )\) is holomorphic. Second, Condition \((\mathbf{H}_{4})\) applied to the function \(\widehat{g}(\theta ) \check{G}(\theta )\) implies the first condition in Eq. (A.2). The second condition of (A.2) follows directly from Lemma A.5.
Therefore, the conditions of Proposition A.4 hold with \(\beta _{2}=+\infty \). Thus Proposition A.4 implies that for some \(0<\beta _{0}<\beta _{*}\)
Taking into account here Condition \((\mathbf{H}_{4}\)) and the bound for \(\check{G}\) given in (A.8), we obtain
Hence Proposition 5.2. \(\square \)
Using this proposition we can study the renewal process \((N_{t})_{t\ge 0}\) introduced in (2.6).
Corollary 5.3
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold true. Then, for any \(t>0,\)
and, moreover, \(\mathbf{E}N^{m}_{t}<\infty \) for any \(m\ge 3\).
Proof
First, by means of Proposition 5.2, note that we get
To estimate the second moment of \(N_{t}\) note that,
i.e. we obtain that
where \(\Theta (v)=\mathbf{E}\sum _{j \ge k+1}\,\mathbf{1}_{\{\sum _{i=k+1}^j\, \tau _{i}\le v\}}\). Taking into account that \((\tau _{k})_{k\ge 1}\) is i.i.d. sequence, this term can be represented as
This implies the second inequality in (5.11). Similarly, for \(m\ge 3\) we obtain that for some constant \(\mathbf{C}_{m}>0\)
Therefore, by induction we obtain that \(\mathbf{E}\,N^{m}_{t}<\infty \) for any \(m\ge 3\). Hence Corollary 5.3. \(\square \)
6 Stochastic calculus for semi-Markov processes
In this section we give some results of stochastic calculus for the process \((\xi _{t})_{t\ge \, 0}\) given in (1.2), needed all along this paper. As this process is the combination of a Lévy process and a semi-Markov process, these results are not standard and need to be provided.
Lemma 6.1
Let f and g be any non-random functions from \(\mathbf{L}_{2}[0,n]\) and \((I_{t}(f))_{t\ge \,0}\) be the process defined in (2.7). Then, for any \(0 \le t \le n\),
where \((f,g)_{t}=\int _{0}^{t} f(s)\,g(s) \mathrm {d}s\) and \(\rho \) is the density of the renewal measure \({\check{\eta }}=\sum ^{\infty }_{l=1}\,\eta ^{(l)}.\)
Proof
First, note that the noise process (1.2) is square integrated martingale which can be represented as
where \(\xi ^{c}_{t}=\varrho _{1} w_{t}\) and \(\xi ^{d}_{t}=\varrho _{2}\,L_{t}+ \varrho _{3}\,z_{t}\). Note that the process \((\xi ^{c}_{t})_{t\ge 0}\) is the square integrated continuous martingale with the quadratic characteristic \(<\xi ^{c}>_{t}=\varrho _{1}^2\,t\). Therefore, the quadratic variance \([\xi ]_{t}\) is the following:
where \(\Delta \xi _{s}=\xi _{s}-\xi _{s-}\) (see, for example, Liptser and Shiryaev 1986). Recalling that the processes \((L_{t})_{t\ge 0}\) and \((z_{t})_{t\ge 0}\) are independent, we obtain that \(\Delta L_{t} \Delta z_{t}=0\) for any \(t>0\), i.e.
Moreover, note that we can represent the stochastic integral \(I_{t}(f)\) as
where the stochastic integrals \(I_{t}^{w}(f)=\int _{0}^{t}\, f(s) \mathrm {d}w_{s}\)\(I_{t}^{L}(f)=\int _{0}^{t}\, f(s) \mathrm {d}L_{s}\) and \(I_{t}^z(f)=\int _{0}^{t}\, f(s) \mathrm {d}z_{s}\) are independent square integrated martingales. Therefore,
Taking into account that \(\mathbf{E}\,I^{w}_{t}(f)I^{w}_{t}(g)=(f,g)_{t}\) and that the expectation of the product of square integrated martingales equals to the expectation of their mutual covariance, i.e. \(\mathbf{E}\,I_{t}^L(f)\,I_{t}^L(g)=\mathbf{E}\,[I^L(f),I^L(g)]_{t}\) and \(\mathbf{E}\,I_{t}^z(f)\,I_{t}^z(g)=\mathbf{E}\,[I^z(f),I^z(g)]_{t}\), we obtain that
In view of (6.3) the mutual covariances may be calculated as
and
Taking into account that \(\Pi (x^{2})=1\) and that the sequences \((Y_{k})_{k\ge 1}\) and \((T_{k})_{k\ge 1}\) are independent, we find
and
Hence the conclusion follows. \(\square \)
Corollary 6.2
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold true. Then, for any \(n\ge 1\) and for any non random function f from \(\mathbf{L}_{2}[0,n],\) the stochastic integral (2.7) exists and satisfies the inequality (2.8).
Proof
This lemma follows directly from Lemma 6.1 with \(f=g\) and Proposition 5.2 which ensures that \(\sup _{t\ge 0}\rho (t)<\infty \). \(\square \)
Lemma 6.3
Let f and g be bounded functions defined on \([0,\infty ) \times {{\mathbb {R}}}.\) Then, for any \(k\ge 1,\)
where \({{\mathcal {G}}}\) is the \(\sigma \)-field generated by the sequence \((T_{l})_{l\ge 1}\), i.e., \({{\mathcal {G}}}=\sigma \{T_{l}\,,\,l\ge 1\}\).
Proof
Using (6.5) and, taking into account that the process \((L_{t})_{t\ge 0}\) is independent of \({{\mathcal {G}}}\), we obtain
Moreover,
Then we obtain the desired result. \(\square \)
Lemma 6.4
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold true. Then, for any measurable bounded non-random functions f and g, we have
where \(m_{t} = \sum _{0\le s \le t}(\Delta z_{s})^2 - \int _{0}^{t} \rho (s) \mathrm {d}s\) and the norm \(\vert \Upsilon \vert _{1}\) is given in (4.1).
Proof
Using the definition of the process \((m_{t})_{t\ge 0}\) we can represent this integral as
Note now that
Now, using Lemma 6.3 we can represent the last expectation as
where
The term \(\mathbf{E}\,V^{'}_{n}\) can be represented as
We recall that the norm \(\Vert \cdot \Vert _{t}\) is defined in (2.8). To estimate \(\mathbf{E}\,V^{''}_{n}\), note that in view of Fubini’s theorem
where, similarly to (5.12), the function \({\bar{g}}(\cdot )\) can be represented as
Moreover, using Lemma 6.1, we calculate the expectation of the last term in (6.6), i.e.
This implies that
where \( \delta (t)= \int _{0}^{t}\, f^{2}(v)\, \left( \rho (t-v) - \rho (t) \right) \, \rho (v)\, \mathrm {d}v\). Note here that, in view of Proposition 5.2 the function \(\delta (t)\) can be estimated for any \(0\le t\le n\) as
with \(\Upsilon (x)=\rho (x)-1/{\check{\tau }}\). So,
and, therefore,
and this finishes the proof. \(\square \)
Lemma 6.5
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold true. Then, for any measurable bounded non-random functions f and g, one has
Proof
Putting \(\check{h}_{t}=I^2_{t-}(f) I_{t-}(g) g(t)\), we can represent the integral in (6.8) as
where \(\check{L}= \check{\varrho _{1}} w_t + \check{\varrho _{2}} L_t\), \(\check{\varrho _{1}}=\varrho _{1}/\sqrt{{\bar{\varrho }}}\) and \(\check{\varrho _{2}}=\varrho _{2}/\sqrt{{\bar{\varrho }}}\). First, we will show that
Using the notations (6.4), we set
we obtain that
Taking into account that, for any non random square integrated function g the integral \(\left( \int _{0}^{t}\, g(s) \mathrm {d}w_{s}\right) \) is Gaussian with the parameters \(\left( 0, \int ^{t}_{0}\,g^{2}(s)\mathrm {d}s\right) \), we obtain
By applying inequality (A.1) for the non-random function \(h(s,x)=g(s)x\), and, recalling that \(\Pi (x^{8})<\infty \), we obtain,
Therefore, we obtain that
Finally, by using the Cauchy inequality, we can estimate for any \(0<t\le n\) the following expectation as
i.e.,
Moreover, taking into account that the processes \((\check{L}_{t})_{t\ge 0}\) and \((z_{t})_{t\ge 0}\) are independent, we obtain that
One can check directly here that, for \(t>0,\)
where \(\vert f\vert _{*}= \sup _{0 \le t\le n} \,|f(t)|\). Note that Corollary 5.3 yields \(\mathbf{E}\,N_{t}^2\,< \infty \), therefore \( \sup _{0 \le t\le n}\, \mathbf{E}_{Q} (I_{t}^z(f))^4 < \infty \) and we obtain,
Taking into account that the process \((\check{L}_{t})_{t\ge 0}\) is square integrated martingale with the quadratic characteristic \(<\check{L}>_{t}=t\), we obtain that \(\mathbf{E}J_{1}=0\). As to the last term in (6.11) note that similarly to the previous reasoning we obtain that
Therefore, to show (6.10) one needs to check that
where \(\check{h}^{z}_{t}=(I^{z}_{t-}(f))^{2} I^{z}_{t-}(g) g(t)\). To this end, note that, for any \(0<t\le n\)
i.e.,
where \( I_{klj}= \int _{0}^{n} \mathbf{1}_{\{T_{k}\le t\}} \mathbf{1}_{\{T_{l}\le t\}} \mathbf{1}_{\{T_{j}\le t\}} g(t) \mathrm {d}\check{L}_{t}\). Taking into account that the process \((\check{L}_{t})_{t\ge 0}\) is independent of the field \({{\mathcal {G}}}_{z}=\sigma \{z_{t}\,,t\ge 0\},\) we obtain that \(\mathbf{E}_{Q}\left( I_{klj}\vert {{\mathcal {G}}}_{z} \right) =0\) and
Moreover,
Corollary 5.3 implies that \(\mathbf{E}N^{3}_{n}<\infty \), i.e. \(\mathbf{E}_{Q} \vert I^{\check{L}}_{n}(\check{h}^{z}) \vert <\infty \), and, therefore,
So, we obtain (6.13). Furthermore, to study the last term in (6.9) note that the \((z_{t})_{t\ge 0}\) is the martingale with the bounded variation. Moreover, in view of the definition of \(\check{h}\) we get
Reminding here, that the processes \((\check{L}_{t})_{t\ge 0}\) and \((z_{t})_{t\ge 0}\) are independent and using (6.12), we obtain that for \(1\le m\le 2\) and any bounded function f
Moreover, from (6.14) through the Hölder inequality we obtain that
Taking into account that the sequences \((Y_{j})_{j\ge 1}\) and \((T_{k})_{k\ge 1}\) are independent we obtain through Corrolary 5.3 that
Thus \(\mathbf{E}_{Q} \int ^{n}_{0} \vert \check{h}_{t}\vert \mathrm {d}[z]_{t}<\infty \), therefore, \(\mathbf{E}_{Q}I_{n}^z(\check{h})=0\) and we get the equality (6.8). \(\square \)
7 Properties of the regression model (1.1)
In order to prove the non-asymptotic sharp oracle inequalities we use the method proposed in Konev and Pergamenshchikov (2009a) and Konev and Pergamenshchikov (2012) for the general semi-martingale model (1.1). To this end we need to study the following functions of \(x\in {{\mathbb {R}}}^{n}\)
where \(\sigma _{Q}\) is defined in (2.11), \(\widetilde{\xi }_{j,n}=\xi ^2_{j,n}- \mathbf{E}_{Q}\xi ^2_{j,n}\) and \(\xi _{j,n}\) is given in (3.5). These functions describe the behavior for the total noise intensity and variance respectively for the chosen Fourier coefficients in the estimators (3.7).
Remark 7.1
The propositions 7.1 – 7.2 given below are used to obtain the oracle inequalities in Sect. 4 (see, for example, Konev and Pergamenshchikov 2012).
Proposition 7.1
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold. Then
where \(\mathbf{C}_{1,Q,n}= \sigma _{Q}\,{\check{\tau }}\,\phi ^{2}_{max}\,\vert \Upsilon \vert _{1}\), \(\sigma _{Q}={\bar{\varrho }}+ \varrho _{3}^2/{\check{\tau }}\) and \(\Upsilon (x)=\rho (x)-1/{\check{\tau }}\).
Proof
First, taking into account that \(\xi _{j,n}=n^{-1/2}I_{n}(\phi _{j})\) and \(\Vert \phi _{j}\Vert ^{2}_{n}=n\), we obtain through Lemma 6.1 that
So, in view of Condition (3.1) and the Eq. (4.1), we obtain
We bound here \( \varrho _{3}^2\) by \(\sigma _{Q}{\check{\tau }}\) we obtain the inequality (7.2) and hence the conclusion follows. \(\square \)
Proposition 7.2
Assume that Conditions \((\mathbf{H}_{1})\)–\((\mathbf{H}_{4})\) hold. Then
where \(\vert x\vert ^{2}=\sum ^{n}_{j=1}x^{2}_{j}\), \(\mathbf{C}_{2,Q,n}=\phi ^{4}_{max} (1+\sigma ^{2}_{Q})^{2}\,{\check{\mathbf{l}}} \) and \({\check{\mathbf{l}}}\) is given in (4.3).
Proof
By Ito’s formula one gets
Using the representations (6.2) and (6.3), we can rewrite this differential as
From Lemma 6.1 it follows that \( \mathbf{E}_{Q}\, I^2_{t}(f) = {\bar{\varrho }}\, \Vert f\Vert ^{2}_{t} + \varrho ^{2}_{3} \Vert f\sqrt{\rho }\Vert ^{2}_{t}\). Therefore, putting
we obtain
where \( \check{m}_{t}= \sum _{0\le s \le t}(\Delta L_{s})^2 - t\) and \( m_{t} = \sum _{0\le s \le t}(\Delta z_{s})^2 - \int _{0}^{t} \rho (s) \mathrm {d}s\). For any non-random vector \(x= (x_{j})_{1\le j\le n}\) with \(\sum ^{n}_{j= 1} x^{2}_{j} \le 1\), we set
Denoting
we get the following stochastic differential equation for (7.8)
Now, similarly to (7.4) the Ito formula and representation (6.2) yield
Taking into account that the processes \((L_{t})_{t\ge 0}\) and \((z_{t})_{t\ge 0}\) are independent, we obtain that \(\Delta L_{t} \Delta z_{t}=0\), therefore, for any \(t\ge 0\)
This implies
where \(\check{D}_{n}=\sum _{0\le t\le n}\left( 2A_{t-} \Delta L_{t}+\varrho _{2} B_{t} (\Delta L_{t})^{2}\right) ^{2}\) and
It should be noted here that
Let us now show that
To this end, note that by (7.7)
Using here Lemma 6.5, we get \( \mathbf{E}_{Q} \int _{0}^{n}\, \widetilde{I}_{t-}(\phi _{j})\, I_{t-}(\phi _{i}) \phi _{i}(t)\mathrm {d}\xi _{t} =0\). Moreover, the process \((\check{m}_{t})_{t\ge 0}\) is a martingale, i.e. \(\mathbf{E}_{Q} \int _{0}^{n} \widetilde{I}_{t-}(\phi _{j}) B_{t} \mathrm {d}\check{m}_{t}=0\). Therefore,
Taking into account here that for any non-random bounded function f
we obtain \(\mathbf{E}_{Q}\int _{0}^{n} \widetilde{I}_{t-}(\phi _{j}) \,B_{t} \, \mathrm {d}m_{t} = \mathbf{E}_{Q}\int _{0}^{n} I^2_{t-} (\phi _{j})\,B_{t} \, \mathrm {d}m_{t}\). So, Lemma 6.4 yields
Therefore,
Taking into account here that \( \left( \sum ^{n}_{l=1}\, \vert x_{l} \vert \right) ^{2} \le \, n \sum _{l\ge \,1}\, x^{2}_{l}\le n\), we obtain (7.12). Recall that \(\Pi (x^{2})=1\). Using the definition (2.1) and the properties of the jump measures (see, for example, Liptser and Shiryaev 1986, chapter 3) we obtain that
and
From this it follows directly that
Note that, thanks to Lemma 6.1, we obtain
where \(A_{1,n}=\sum _{i,j} x_{i} x_{j} \int _{0}^{n} \phi _{i}(t)\phi _{j}(t) \,\left( \int _{0}^{t}\, \phi _{i}(v)\phi _{j}(v)\,\rho (v) \mathrm {d}v\right) \mathrm {d}t\). This term can be estimated through Proposition 5.2 as
So, reminding that \(\sigma _{Q}={\bar{\varrho }}+\varrho _{3}^2/{\check{\tau }}\) and that \(\phi _{max}\ge 1\), we obtain that
Taking into account that
that \(\phi _{max}\ge 1,\) and that \({\bar{\varrho }}^2 \le \sigma ^{2}_{Q}\) we estimate the expectation in (7.13) as
Moreover, taking into account that the random variable \(Y_{k}\) is independent of \(A_{T_{k^-}}\) and of the field \({{\mathcal {G}}}=\sigma \{T_{j}\,,\,j\ge 1\}\) and that \(\mathbf{E}_{Q}\left( A_{T_{k^-}}\,\vert {{\mathcal {G}}}\right) =0\), we get
Therefore,
where
Using the bound (7.15) we can estimate the term \(D_{1,n}\) as \( D_{1,n} \le \phi ^{4}_{max} n \mathbf{E}\,N_{n}\). Using here Corollary 5.3, we obtain
Now, to estimate the last term in (7.17), note that the process \(A_{t} \) can be rewritten as
Applying Lemma 6.3 again, we obtain for any \(k\ge 1\)
So, we can represent the last term in (7.17) as
where
Thanks to Proposition 5.2 we obtain
In view of the definition of \(\mathbf{K}(\cdot ,\cdot )\) in (7.19), we can rewrite the last integral as
Since \(\sum ^{n}_{j=1}\,x^{2}_{j}\le 1\), we obtain that,
Let us estimate now the last term in (7.20). This term can be represented as
where
In view of the inequality (7.21) we obtain \( {\bar{\mathbf{K}}}(t)\le \vert \rho \vert _{*}\,\phi ^{2}_{max}\,n\) and, therefore, \( D^{(2)}_{2,n}\le \vert \rho \vert ^{2}_{*}\,\phi ^{2}_{max}\, n^{2}\). So, bounding in (7.20) \(\varrho ^{2}_{3}\) by \({\check{\tau }}\sigma _{Q}\) we obtain that
Therefore, taking into account in (7.17) that \(\mathbf{E}Y^{4}_{1}\ge 1\),
Using all these bound in (7.10) and taking into account that
we obtain (7.3) and thus the conclusion follows. \(\square \)
8 Simulation
In this section we report the results of a Monte Carlo experiment in order to assess the performance of the proposed model selection procedure (3.15). In (1.1) we chose a 1-periodic function which is defined, for \(0\le t\le 1,\) as
We simulate the model
where \(\xi _t= 0.5 w_{t}+ 0.5 z_{t}\).
Here \(z_{t}\) is the semi-Markov process defined in (2.5) with a Gaussian \(\mathcal {N}(0,1)\) sequence \((Y_{j})_{j\ge 1}\) and \((\tau _k)_{k\ge 1}\) used in (2.6) taken as \(\tau _k \sim \chi _{3}^2.\)
We use the model selection procedure (3.15) with the weights (3.19) in which \(k^*= 100+\sqrt{\ln (n)}\), \(t_{i}=i/ \ln (n)\), \(m=[\ln ^2 (n)]\) and \(\delta =(3+\ln (n))^{-2}\). We define the empirical risk as
where the observation frequency \(p=100001\) and the expectation was taken as an average over \(N= 10000\) replications, i.e.,
We set the relative quadratic risk as
In our case \(||S||^2_p = 0.1883601\).
Table 1 gives the values for the sample risks (8.2) and (8.3) for different numbers of observations n.
Figures 1, 2, 3 and 4 show the behaviour of the regression function and its estimates by the model selection procedure (3.15) depending on the values of observation periods n. The black full line is the regression function (8.1) and the red dotted line is the associated estimator.
Remark 8.1
From numerical simulations of the procedure (3.15) with various observation numbers n we may conclude that the quality of the proposed procedure: (i) is good for practical needs, i.e. for reasonable (non large) number of observations; (ii) improves as the number of observations increases.
9 Proofs
We will prove here most of the results of this paper.
9.1 Proof of Theorem 4.1
First, note that we can rewrite the empirical squared error in (3.9) as follows
where the cost function \(J_n(\lambda )\) and the penalty terms are defined in (3.13) and (3.14) respectively, \({\check{\theta }}_{j,n}=\widetilde{\theta }_{j,n}-\theta _{j}\widehat{\theta }_{j,n}\),
Using the definition of \(\widetilde{\theta }_{j,n}\) in (3.10) we obtain that
where \(\varsigma _{j,n}=\mathbf{E}_{Q}\xi ^{2}_{j,n}-\sigma _{Q}\) and \(\widetilde{\xi }_{j,n}=\xi ^{2}_{j,n}-\mathbf{E}_{Q}\xi ^{2}_{j,n}\). Putting
we can rewrite (9.1) as
where \(e(\lambda )=\lambda /|\lambda |\), \(\check{L}(\lambda )=\sum ^{n}_{j=1}\lambda (j)\) and the functions \( B_{1,Q,n}(\cdot )\) and \(B_{2,Q,n}(\cdot )\) are given in (7.1).
Let \(\lambda _0= (\lambda _0(j))_{1\le j\le \,n}\) be a fixed sequence in \(\Lambda \) and \(\widehat{\lambda }= \text{ argmin }_{\lambda \in \Lambda } J_n(\lambda )\). Substituting \(\lambda _0\) and \(\widehat{\lambda }\) in Eq. (9.3), we obtain
where \(\varpi = \widehat{\lambda } - \lambda _{0}\), \(\widehat{e} = e(\widehat{\lambda })\) and \(e_0 = e(\lambda _0)\). Note that, by (3.8),
Using the inequality
and taking into account that \(P^{0}_n(\lambda )\ge 0\) we obtain that for any \(\lambda \in \Lambda \)
Taking into account the bound (7.2) and that \(J(\widehat{\lambda })\le J(\lambda _0)\), we get
where \(B^*_{2,Q,n} = \sup _{\lambda \in \Lambda } B^2_{2,Q,n}((e(\lambda ))\). Moreover, noting that in view of (3.8) \(\sup _{\lambda \in \Lambda } |\lambda |^2 \le \vert \Lambda \vert _{*}\), we can rewrite the previous bound as
To estimate the second term in the right side of this inequality we set
Taking into account that \(M(x)=n^{-1}I_{n}(S_{x})\), we can estimate this term through the inequality (2.8) shown in Corollary 6.2 for any \(x\in {{\mathbb {R}}}^{n}\) as
where, taking into account that the functions \((\phi _{j})_{j\ge 1}\) are orthogonal, the norm \(||S_x||^2=\int ^{1}_{0}S^{2}_x(t)\mathrm {d}t=\sum _{j=1}^{n} x^2(j) \theta ^2_{j}\). To estimate this function for a random vector \(x\in {{\mathbb {R}}}^{n}\) we set
So, as we did for proving (9.6) and (9.7), through Inequality (9.5), we get
It is clear that the last term here can be estimated as
where \({\check{\iota }} = \text{ card }(\Lambda )\). Using the equality (3.5), we obtain that for any \(x\in \Lambda _{1}\),
where \(M_{1}(x) = n^{-1/2}\,\sum _{j=1}^{n}\, x^2(j)\theta _{j} \xi _{j,n}\). Taking into account that, for any \(x \in \Lambda _1\) the components \(|x(j)|\le 1\), we can estimate this term as in (9.7), i.e.,
Similarly to the previous reasoning we set
and we get
Using the same type of arguments as in (9.8), we can derive
From here and (9.10), we get
for any \(0<\delta <1\). Using this bound in (9.8) yields
Taking into account that
we obtain
Using this bound in (9.6) we obtain
Moreover, for \(0<\delta <1/6,\) we can rewrite this inequality as
In view of Proposition 7.2 we bound the expectation of the term \(B^*_{2,Q,n}\) in (9.6) as
Taking into account that \(\vert \Lambda \vert _{*}\ge 1\), we get
Using the upper bound for \( P^{0}_n(\lambda _0)\) in Lemma A.2, one obtains (4.1), that finishes the proof. \(\square \)
9.2 Proof of Proposition 4.2
We use here the same method as in Konev and Pergamenshchikov (2009a). First of all note that the definition (3.12) implies that
where
So, we have
where
Note that, for continuously differentiable functions (see, for example, Lemma A.6 in Konev and Pergamenshchikov 2009a), the Fourier coefficients \((t_{j})\) satisfy the following inequality, for any \(n\ge 1,\)
In the same way as in (9.7) we estimate the term \(M_{n}\), i.e.,
while the absolute value of this term for \(n\ge 1\) can be estimated as
Moreover, using the functions (7.1) for the trigonometric basis (3.2), i.e. with \(\xi _{j,n}=\eta _{j,n}\) and \(\widetilde{\xi }_{j,n}=\eta ^{2}_{j,n}-\mathbf{E}\eta ^{2}_{j,n}\) we can represent the last term in (9.15) as
with \( x'_{j}=\mathbf{1}_{\{\sqrt{n}<j\le n\}}\) and \(x''_{j}=\mathbf{1}_{\{\sqrt{n}<j\le n\}}/\sqrt{n}\). Therefore, using Propositions 7.1 and 7.2, we obtain
From here we obtain the bound (4.5) and hence the desired result. \(\square \)
9.3 Proof of Theorem 4.4
Note, that Theorem 4.1 implies directly the oracle inequality (4.9) with
Using the bound (4.7) and the conditions (2.11) we obtain that for some positive constant \(\mathbf{C}_{*}\)
Moreover, note, that in view of (3.21) and (3.17)
where \(m=[1/\varepsilon ^2]\). Furthermore, the bound (3.22) and Conditions (2.12) and (3.17) yield
where \(\vert \Lambda \vert _{*}=1+ \max _{\lambda \in \Lambda }\,\check{L}(\lambda )\), i.e.
So, taking into account in (9.17) the condition (4.8) we obtain the convergence (4.10). \(\square \)
9.4 Proof of Theorem 4.5
First, we denote by \(Q_{0}\) the distribution of the noise (1.2) and (2.1) with the parameter \(\varrho _{1}=\varsigma ^{*}\) and \(\varrho _{1}=\varrho _{2}=0\), i.e. the distribution for the “signal + white noise” model, i.e. for any \(n\ge 1\) this distribution belongs to the family \({{\mathcal {Q}}}_{n}\) defined in (2.11–2.12) and, therefore, for any \(n\ge 1\)
Now, taking into account the conditions (2.12) Theorem A.9 yields the lower bound (4.14). Hence this finishes the proof. \(\square \)
9.5 Proof of Proposition 4.6
Putting \(\lambda _{0}(j)=0\) for \(j\ge n\) we can represent the quadratic risk for the estimator (3.7) as
where \(H_n= n^{-1/2}\,\sum _{j=1}^{n} (1-\lambda _0(j)) \lambda _0(j) \theta _{j} \xi _{j,n}\). Note that \(\mathbf{E}_{Q} H_{n}=0\) for any \( Q \in Q_n\), therefore,
Proposition 7.1 and the last inequality in (2.11) imply that for any \(Q\in {{\mathcal {Q}}}_{n}\)
where \(\mathbf{C}^{*}_{1,n}=\phi ^{2}_{max}\varsigma ^{*}{\check{\tau }}\vert \Upsilon \vert _{1}\). Therefore,
where \(j_{*}\) and \(\upsilon _{n}\) are defined in (3.19). Setting
we rewrite the last inequality as
where \( {\check{\mathbf{C}}}_{n}= \upsilon ^{2k/(2k+1)}_{n}\mathbf{C}^{*}_{1,n}/n\). Note, that the conditions (2.12) and (4.8) imply that \(\mathbf{C}^{*}_{1,n}=\text{ o }(n^{{\check{\delta }}})\) as \(n\rightarrow \infty \) for any \({\check{\delta }}>0\); therefore, \({\check{\mathbf{C}}}_{n}\rightarrow 0\) as \(n\rightarrow \infty \). Putting
with \(a_{j}=\sum ^k_{i=0}\left( 2\pi [j/2]\right) ^{2i}\), we estimate the first term in (9.18) as
We remind that \(\omega _{\alpha _{0}}\) is defined in (3.19) with \(\alpha _{0}=(k,\mathbf{l}_{0})\) and \(\mathbf{l}_{0}=[\mathbf{r}/\varepsilon ]\varepsilon \). So, taking into account that \(a_{j}/(\pi ^{2k}j^{2k})\rightarrow 1\) as \(j\rightarrow \infty \) and \(\mathbf{l}_{0}\rightarrow \mathbf{r}\) as \(\varepsilon \rightarrow 0\) we obtain that
where
Therefore,
As to the second term in (9.18), note that in vue of the definition (3.19) and taking into account that \(j_{*}=\text{ o }(\omega _{\alpha _{0}})\) as \(n\rightarrow \infty \) we deduce that
So, taking into account that \(\omega _{\alpha _{0}}/\upsilon ^{1/(2k+1)}_{n}\rightarrow ( \mathrm {d}_{k} \mathbf{r})^{1/(2k+1)}\) as \(n\rightarrow \infty \), the limit of \(\Upsilon _{2,n}\) can be calculated as
Moreover, since \(\Upsilon ^*_1+ \Upsilon ^*_2 =:\mathbf{r}^*_{k}\), we obtain
and get the desired result. \(\square \)
9.6 Proof of Theorem 4.7
Combining Proposition 4.6 and Theorem 4.4 yields Theorem 4.7. \(\square \)
References
Barbu VS, Limnios N (2008) Semi-Markov Chains and Hidden Semi-Markov Models toward Applications - Their use in Reliability and DNA Analysis. Lecture Notes in Statistics, 191, Springer, New York
Barndorff-Nielsen OE, Shephard N (2001) Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial mathematics. J R Stat Soc B 63:167–241
Bichteler K, Jacod J (1983) Calcul de Malliavin pour les diffusions avec sauts: existence d’une densité dans le cas unidimensionnel. Séminaire de probabilité, vol XVII, lecture notes in Math., 986, Springer, Berlin, pp 132–157
Cont R, Tankov P (2004) Financial modelling with jump processes. Chapman & Hall, London
Goldie CM (1991) Implicit renewal theory and tails of solutions of random equations. Ann Appl Probab 1(1):126–166
Höpfner R, Kutoyants YuA (2009) On LAN for parametrized continuous periodic signals in a time inhomogeneous diffusion. Stat Decis 27(4):309–326
Höpfner R, Kutoyants YuA (2010) Estimating discontinuous periodic signals in a time inhomogeneous diffusion. Stat Infer Stoch Process 13(3):193–230
Ibragimov IA, Khasminskii RZ (1981) Statistical estimation: asymptotic theory. Springer, Berlin
Jacod J, Shiryaev AN (2002) Limit theorems for stochastic processes, 2nd edn. Springer, Berlin
Konev VV, Pergamenshchikov SM (2003) Sequential estimation of the parameters in a trigonometric regression model with the gaussian coloured noise. Stat Infer Stoch Process 6:215–235
Konev VV, Pergamenshchikov SM (2009) Nonparametric estimation in a semimartingale regression model. Part 1. Oracle inequalities. J Math Mech Tomsk State Univ 3:23–41
Konev VV, Pergamenshchikov SM (2009) Nonparametric estimation in a semimartingale regression model. Part 2. Robust asymptotic efficiency. J Math Mech Tomsk State Univ 4:31–45
Konev VV, Pergamenshchikov SM (2010) General model selection estimation of a periodic regression with a Gaussian noise. Ann Inst Stat Math 62:1083–1111
Konev VV, Pergamenshchikov SM (2012) Efficient robust nonparametric estimation in a semimartingale regression model. Ann Inst Henri Poincaré Probab Stat 48(4):1217–1244
Konev VV, Pergamenshchikov SM (2015) Robust model selection for a semimartingale continuous time regression from discrete data. Stoch Processes Their Appl 125:294–326
Limnios N, Oprisan G (2001) Semi-Markov processes and reliability. Birkhäuser, Boston
Liptser R Sh, Shiryaev AN (1986) Theory of martingales. Springer, Berlin
Marinelli C, Röckner M (2014) On maximal inequalities for purely discontinuous martingales in infinite dimensions. Séminaire de Probabilités, Lect. Notes Math., vol XLVI, pp 293–315
Mikosch T (2004) Non-life insurance mathematics. An introduction with stochastic processes. Springer, Berlin
Novikov AA (1975) On discontinuous martingales. Theory Probab Appl 20(1):11–26
Nussbaum M (1985) Spline smoothing in regression models and asymptotic efficiency in \({{ L}}_2\). Ann Statist 13:984–997
Pinsker MS (1981) Optimal filtration of square integrable signals in gaussian white noise. Probl Transm Inf 17:120–133
Sveshnikov AG, Tikhonov AN (1978) The theory of functions of a complex variable, Translated from the Russian, English translation, Mir Publishers
Acknowledgements
The last author is partially supported by RFBR Grant 16-01-00121, by the Ministry of Education and Science of the Russian Federation in the framework of the research Project No 2.3208.2017/4.6 and by the Russian Federal Professor program (Project No 1.472.2016/1.4, Ministry of Education and Science of the Russian Federation).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was done under financial support of the RSF Grant Number 14-49-00079 (National Research University “MPEI” 14 Krasnokazarmennaya, 111250 Moscow, Russia) and by the project XterM - Feder, University of Rouen, France.
Appendix
Appendix
1.1 A.1 Inequalities for the purely discontinuous martingales
Now let us recall the Novikov inequalities, Novikov (1975), also referred to as the Bichteler–Jacod inequalities (see Bichteler and Jacod 1983; Marinelli and Röckner 2014)
Lemma A.1
Let \(\mu \) be a jump measure and \(\widetilde{\mu }\) its compensator on the \({{\mathbb {R}}}_{*}={{\mathbb {R}}}\setminus \{0\}\). Then for for any \(T>0\), any predictable function h and any \(p\ge 2\)
where \(C^{*}_{p}\) is some positive constant and
1.2 A.2 Property of the penalty term
Lemma A.2
For any \(n\ge \,1\) and \(\lambda \in \Lambda \),
where the coefficient \(P^{0}_n(\lambda )\) was defined in (9.2).
Proof
By the definition of \(\text{ Err }_n(\lambda )\) one has
So, denoting by \(\lambda ^{2}=(\lambda ^{2}(j))_{1\le j\le n}\) we obtain that
where \(\lambda ^{2}=(\lambda ^{2}(j))_{1\le j\le n}\). Now Proposition 7.1 implies the desired result, i.e.
Hence lemma A.2. \(\square \)
1.3 A.3 Properties of the Fourier transform
Theorem A.3
Cauchy (1825) Let U be a simply connected open subset of \({{\mathbb {C}}},\) let \(g : U \rightarrow {{\mathbb {C}}}\) be a holomorphic function, and let \(\gamma \) be a rectifiable path in U whose start point is equal to its end point. Then
Proposition A.4
Let \(g : {{\mathbb {C}}}\rightarrow {{\mathbb {C}}}\) be a holomorphic function in \(U=\left\{ z\in {{\mathbb {C}}}\,:\,-\beta _{1} < \text{ Im }\right. \)\(\left. z< \beta _{2}\right\} \) for some \(0<\beta _{1}\le \infty \) and \(0<\beta _{2}\le \infty \). Assume that, for any \(-\beta _{1}< t\le 0,\)
Then, for any \(x\in {{\mathbb {R}}}\) and for any \(0<\beta <\beta _{1},\)
Proof
First note that the conditions of this theorem imply that
We fix now \(0<\beta <\beta _{1}\) and we set for any \(N\ge 1\)
Now, in view of the Cauchy theorem, we obtain that for any \(N\ge 1\)
The conditions (A.2) provide that
Therefore, letting \(N\rightarrow \infty \) in (A.4) we obtain (A.3). Hence we get the desired result. \(\square \)
The following technical lemma is also needed in this paper.
Lemma A.5
Let \(g : [a,b]\rightarrow {{\mathbb {R}}}\) be a function from \(\mathbf{L}_{1}[a,b]\). Then, for any fixed \(-\infty \le a<b\le +\infty ,\)
Proof
Let first \(-\infty< a<b< +\infty \). Assume that g is continuously differentiable, i.e. \(g\in \mathbf{C}^{1}[a,b]\). Then integrating by parts gives us
So, from this we obtain that
This implies the first limit in (A.5) for this case. The second one is obtained similarly. Let now g be any absolutely integrated function on [a, b], i.e. \(g\in \mathbf{L}_{1}[a,b]\). In this case there exists a sequence \(g_{n}\in \mathbf{C}^{1}[a,b]\) such that
Therefore, taking into account that for any \(n\ge 1\)
we obtain that
So, letting in this inequality \(n\rightarrow \infty \) we obtain the first limit in (A.5) and, similarly, we obtain the second one. Let now \(b=+\infty \) and \(a=-\infty \). In this case we obtain that for any \(-\infty<a<b<+\infty \)
Using here the previous results we obtain that for any \(-\infty<a<b<+\infty \)
Passing here to limit as \(b\rightarrow +\infty \) and \(a\rightarrow -\infty \) we obtain the first limit in (A.5). Similarly, we can obtain the second one. \(\square \)
Let us now study the inverse Fourier transform. To this end, we need the following local Dini condition.
(\(\mathbf{D}\)) Assume that, for some fixed\(x\in {{\mathbb {R}}},\)there exist the finite limits
and there exists\(\delta =\delta (x)>0\)for which
where\(\widetilde{g}(x,t)=g((x+t)+g(x-t))/2\)and\(\widetilde{g}_{0}(x)=(g(x+)+g(x-))/2\).
Remark 10.1
Note that the condition \((\mathbf{H}_{1}\)) is the “uniform” version of the condition \(\mathbf{D}\)).
Proposition A.6
Let \(g : {{\mathbb {R}}}\rightarrow {{\mathbb {R}}}\) be a function from \(\mathbf{L}_{1}({{\mathbb {R}}})\). If, for some \(x\in {{\mathbb {R}}},\) this function satisfies Condition \(\mathbf{D}\), then
where \(\widehat{g}(\theta )=\int _{{{\mathbb {R}}}}\,e^{i\theta t}\,g(t)\,\mathrm {d}t\).
Proof
First, for any fixed \(N>0\) we set
Here, the inner integral may be represented as
Taking into account that for any \(N>0\) the integral
we obtain that
Now we represent the last integral as
where \(I_{1,N}=\int ^{\delta }_{0}\,(\widetilde{g}(x,t)-\widetilde{g}_{0}(x))\,\sin (N t)\,t^{-1}\,\mathrm {d}t\),
Condition \(\mathbf{D}\)) and Lemma A.5 imply that \(I_{1,N}\rightarrow 0\) as \(N\rightarrow \infty \). Now note that, since \(g\in \mathbf{L}_{1}({{\mathbb {R}}}),\) then the function \(t^{-1}\,\widetilde{g}(x,t)\) is absolutely integrated. Therefore, in view of Lemma A.5, \(I_{2,N}\rightarrow 0\) as \(N\rightarrow \infty \). As to the last integral we use the property (A.7), i.e., the changing of the variables gives
Hence we have the desired result. \(\square \)
1.4 A.4 Properties of the analytic functions
We need the following identity theorem (see, for example, Sveshnikov and Tikhonov 1978, Corollary 1, p. 78)
Theorem A.7
Let function \(f(z)\not \equiv 0\) and be analytic in the domain D. In any closed bounded subdomain \(D'\subset D\) it has only a finite numbers of zeros.
Lemma A.8
Assume that the condition \((\mathbf{H}_{3}\))–\((\mathbf{H}_{4}\)) hold. Then there exists some \(0<\beta _{*}<\beta \) such that the function \(\check{G}\) defined in (5.9) is holomorphic in
and for any \(x\ge -\beta _{*}\)
Proof
Firstly, we remind that thanks to the condition \((\mathbf{H}_{3}\)) the function \(\widehat{g}(\theta )\) is holomorphic in \(D=\{\theta \in {{\mathbb {C}}}\,:\,\text{ Im }(\theta )>-\beta \}\). Therefore, the function \(\check{G}(\theta )\) is holomorphic for any \(\theta \) from \(D\cap \{\theta \in {{\mathbb {C}}}\,:\,\widehat{g}(\theta )\ne 1\}\). In this case its derivative can be calculated as
where \(\widehat{g}^{\prime }(\theta )\) is the derivative of the function \(\widehat{g}(\theta )\). Moreover, using the Taylor expansion for \(e^{z}\), the function \(\widehat{g}(\theta )\) and its derivative \(\widehat{g}^{\prime }(\theta )\) can be represent as
where \({\check{\tau }}_{1}=\mathbf{E}_{Q}\tau ^{2}_{1}\) and the terms \(A_{1}(\theta )\) and \(A_{2}(\theta )\) are such that for any \(L>0\)
Therefore, there exists \(0<\delta <1\) for which
So, the expansion (A.9) implies, that the function \(\check{G}(\theta )\) is holomorphic at the point \(\theta =0\) and for such \(\delta >0\)
Moreover, note that in view of lemma A.5 for any \(0<r<\beta \)
Taking into account here that, thanks to (5.6), the module \(\vert \widehat{g}(\theta ) \vert < 1\) for any \(\theta \in {{\mathbb {R}}}\setminus \{ 0\}\), we obtain
Therefore, the function \(\check{G}(\theta )\) is holomorphic at the line \(\{z\in {{\mathbb {C}}}\,:\,\text{ Im }(z)=0\}\) and
In the general case note that, due to Lemma 5.1, the function \(1-\widehat{g}(\theta )\) has no zeros on the line \(\left\{ z\in {{\mathbb {C}}}\,:\,\text{ Im }(z)=-\beta _{1} \right\} \) for some \(0<\beta _{1}<\beta \). Moreover, in view of Theorem A.7 for any \(N>1\) there can be only finitely many zeros of the function \(1-\widehat{g}(\theta )\) in
The limiting equality (A.11) implies that there exists \(N>0\) such that the function \(1-\widehat{g}(\theta )\ne 0\) on the set
So, there can be only finitely many zeros of the function \(1-\widehat{g}(\theta )\) in
Moreover, taking into account the Taylor expansion (A.9) one can check that \(\theta =0\) is an isolated zero of the \(1-\widehat{g}(\theta )\). Therefore, there exists some \(\beta _{*}>0\) for which the function \(1-\widehat{g}(\theta )\) has no zeros in \(\left\{ z\in {{\mathbb {C}}}\,:\,-\beta _{*}\le Im(z)<0 \right\} \). Note also that the function \(1-\widehat{g}(\theta )\) has no zeros in \(\left\{ z\in {{\mathbb {C}}}\,:\,\text{ Im }(z)>0 \right\} \). Since we already shown that the function \(\check{G}(\theta )\) is holomorphic at the line \(\{z\in {{\mathbb {C}}}\,:\,\text{ Im }(z)=0\}\) we obtain that it is holomorphic for any \(\theta \in {{\mathbb {C}}}\) with \(\text{ Im }(\theta )\ge -\beta _{*}\) and in view of (A.13) we obtain the upper bound (A.8). Hence Lemma A.8. \(\square \)
1.5 A.5 Proof of (5.8)
Fist note, that for any \(\vert \theta \vert > \delta \) the inequality (A.12) implies
i.e.
Moreover, from (A.9) we can obtain that
where \(\epsilon _{1}=1-\epsilon \) and \(A_{0}(\theta )=-{\check{\tau }}_{1}/2+\theta A_{1}(\theta )\). Note that \(G_{\epsilon }(0)=0\) for any \(\epsilon >0\). Let now \(0<\vert \theta \vert \le \delta \) for some \(\delta \) for which the inequality (A.10) holds. In this case for some \(0<C_{*}<\infty \)
Moreover, taking into account here the lower bound (A.10) and that
we obtain that
Hence the inequality (5.8). \(\square \)
1.6 A.6 Lower bound for the robust risks
In this section we give the lower bound obtained in Konev and Pergamenshchikov (2009b) (Theorem 6.1) for the robust risks (1.3) for the “signal+white noise” model, i.e. the model (1.1) in which the noise process \((\xi _{t})_{0\le t\le n}\) is defined as \(\xi _{t}=\varsigma ^{*}w_{t}\) for \(0\le t\le n\). To study the efficiency properties we need to obtain the lower bound for the mean square accuracy in the set of all possible estimators \(\widehat{S}_{n}\) denoted by \(\Pi _{n}\).
Theorem A.9
Under Conditions (2.12)
where \(Q_{0}\) is the distribution of the noise process \((\varsigma ^{*}w_{t})_{0\le t\le n}\) and \(\mathbf{r}^{*}_{k}\) is given in (4.13).
Rights and permissions
About this article
Cite this article
Barbu, V.S., Beltaief, S. & Pergamenshchikov, S. Robust adaptive efficient estimation for semi-Markov nonparametric regression models. Stat Inference Stoch Process 22, 187–231 (2019). https://doi.org/10.1007/s11203-018-9186-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11203-018-9186-8