1 Introduction

Let us consider a regression model in continuous time

$$\begin{aligned} \mathrm {d}\,y_{t} = S(t)\mathrm {d}\,t + \mathrm {d}\,\xi _{t}\,,\quad 0\le t \le n\,, \end{aligned}$$
(1.1)

where \(S(\cdot )\) is an unknown 1-periodic function from \(\mathbf{L}_{2}[0,1]\) defined on \({{\mathbb {R}}}\) with values in \({{\mathbb {R}}}\), the noise process \((\xi _{t})_{t\ge \, 0}\) is defined as

$$\begin{aligned} \xi _{t} =\varrho _{1} w_{t}+ \varrho _{2} L_{t} + \varrho _{3} z_{t}\,, \end{aligned}$$
(1.2)

where \(\varrho _{1}\), \(\varrho _{2}\) and \(\varrho _{3}\) are unknown coefficients, \((w_{t})_{t\ge \,0}\) is a standard Brownian motion, \((L_{t})_{t\ge \,0}\) is a jump Lévy process (with \(\mathbf{E}L^2_{t}=t\) cf. Eq. (2.3)) and the pure jump process \((z_{t})_{t\ge \,1},\) defined in (2.5), is assumed to be a semi-Markov process (see, for example, Barbu and Limnios 2008).

The problem is to estimate the unknown function S in the model (1.1) on the basis of observations \((y_{t})_{0\le t\le n}\). Firstly, this problem was considered in the framework of the “signal+white noise” models (see, for example, Ibragimov and Khasminskii 1981 or Pinsker 1981). Later, in order to study dependent observations in continuous time, were introduced “signal+color noise” regressions based on Ornstein-Uhlenbeck processes (cf. Höpfner and Kutoyants 2009, 2010; Konev and Pergamenshchikov 2003, 2010).

Moreover, to include jumps in such models, the papers Konev and Pergamenshchikov (2012) and Konev and Pergamenshchikov (2015) used non Gaussian Ornstein-Uhlenbeck processes introduced in Barndorff-Nielsen and Shephard (2001) for modeling of the risky assets in the stochastic volatility financial markets. Unfortunately, the dependence of the stable Ornstein-Uhlenbeck type decreases with a geometric rate. So, asymptotically when the duration of observations goes to infinity, we obtain very quickly the same “signal+white noise” model.

The main goal of this paper is to consider continuous time regression models with dependent observations for which the dependence does not disappear for a sufficient large duration of observations. To this end we define the noise in the model (1.1) through a semi-Markov process which keeps the dependence for any duration n. This type of models allows, for example, to estimate the signals observed under long impulse noise impact with a memory or in the presence of “against signals”.

In this paper we use the robust estimation approach introduced in Konev and Pergamenshchikov (2012) for such problems. To this end, we denote by Q the distribution of \((\xi _{t})_{0\le t\le n}\) in the Skorokhod space \({{\mathcal {D}}}[0,n]\). We assume that Q is unknown and belongs to some distribution family \({{\mathcal {Q}}}_{n}\) specified in Sect. 4. In this paper we use the quadratic risk

$$\begin{aligned} {{\mathcal {R}}}_{Q}(\widetilde{S}_{n},S)= \mathbf{E}_{Q,S}\,\Vert \widetilde{S}_{n}-S\Vert ^{2}\,, \end{aligned}$$
(1.3)

where \(\Vert f\Vert ^{2}=\int ^{1}_{0}\,f^{2}(s)\mathrm {d}s\) and \(\mathbf{E}_{Q,S}\) is the expectation with respect to the distribution \(\mathbf{P}_{Q,S}\) of the process (1.1) corresponding to the noise distribution Q. Since the noise distribution Q is unknown, it seems reasonable to introduce the robust risk of the form

$$\begin{aligned} {{\mathcal {R}}}^{*}_{n}(\widetilde{S}_{n},S)=\sup _{Q\in {{\mathcal {Q}}}_{n}}\, {{\mathcal {R}}}_{Q}(\widetilde{S}_{n},S)\,, \end{aligned}$$
(1.4)

which enables us to take into account the information that \(Q\in {{\mathcal {Q}}}_{n}\) and ensures the quality of an estimate \(\widetilde{S}_{n}\) for all distributions in the family \({{\mathcal {Q}}}_{n}\).

To summarize, the goal of this paper is to develop robust efficient model selection methods for the model (1.1) with the semi-Markov noise having unknown distribution, based on the approach proposed by Konev and Pergamenshchikov (2012, 2015) for continuous time regression models with semimartingale noises. Unfortunately, we cannot use directly this method for semi-Markov regression models, since their tool essentially uses the fact that the Ornstein-Uhlenbeck dependence decreases with geometrical rate and the “white noise” case is obtained sufficiently quickly.

Thus in the present paper we propose new analytical tools based on renewal methods to obtain the sharp non-asymptotic oracle inequalities. As a consequence, we obtain the robust efficiency for the proposed model selection procedures in the adaptive setting.

The rest of the paper is organized as follows. We start by introducing the main conditions in the next section. Then, in Sect. 3 we construct the model selection procedure on the basis of the weighted least squares estimates. The main results are stated in Sect. 4; here we also specify the set of admissible weight sequences in the model selection procedure. In Sect. 5 we derive some renewal results useful for obtaining other results of the paper. In Sect. 6 we develop stochastic calculus for semi Markov processes. In Sect. 7 we study some properties of the model (1.1). A numerical example is presented in Sect. 8. Most of the results of the paper are proved in Sect. 9. In “Appendix” some auxiliary propositions are given.

2 Main conditions

In the model (1.2) we assume that the jump Lévy process \(L_{t}\) is defined as

$$\begin{aligned} L_{t}= \int ^{t}_{0}\int _{{{\mathbb {R}}}_{*}}x (\mu (\mathrm {d}s,\mathrm {d}x) -\widetilde{\mu }(\mathrm {d}s,\mathrm {d}x)) \,, \end{aligned}$$
(2.1)

where \(\mu (\mathrm {d}s,\mathrm {d}x)\) is the jump measure with the deterministic compensator \(\widetilde{\mu }(\mathrm {d}s\,\mathrm {d}x)=\mathrm {d}s\Pi (\mathrm {d}x)\), where \(\Pi (\cdot )\) is the Levy measure on \({{\mathbb {R}}}_{*}={{\mathbb {R}}}\setminus \{0\}\) (see, for example Jacod and Shiryaev 2002; Cont and Tankov 2004 for details) for which we assume that

$$\begin{aligned} \Pi \left( x^{2}\right) =1 \quad \text{ and }\quad \Pi \left( x^{8}\right) \,<\,\infty \,, \end{aligned}$$
(2.2)

where we use the usual notation \(\Pi (\vert x\vert ^{m})=\int _{{{\mathbb {R}}}_{*}}\,\vert z \vert ^{m}\,\Pi (\mathrm {d}z)\) for any \(m>0\). Note that, using the Ito formula for the martingales (see, for example, Liptser and Shiryaev 1986, p.185) we can obtain directly that

$$\begin{aligned} \mathbf{E}_{Q}L^2_{t} = \mathbf{E}\sum _{0\le s\le t}\, (\Delta L_{s})^2 = \mathbf{E}_{Q} \int ^{t}_{0}\int _{{{\mathbb {R}}}_{*}}x^{2}\mu (\mathrm {d}s,\mathrm {d}x) =\Pi (x^{2})t =t\,, \end{aligned}$$
(2.3)

where \(\Delta L_{s}=L_{s}- L_{s-}\) and \(L_{s-}\) is the left limit to s in probability. Moreover, the last condition in (2.2) and the inequality (A.1) imply that for some positive constant \(C^{*}\) the expectation

$$\begin{aligned} \mathbf{E}\,L^8_{t}\le C^{*}\left( 1+\Pi (x^{8})\right) \,t<\,\infty \,. \end{aligned}$$
(2.4)

Note that \(\Pi ({{\mathbb {R}}}_{*})\) may be equal to \(+\infty \). Moreover, we assume that the pure jump process \((z_{t})_{t\ge \, 0}\) in (1.2) is a semi-Markov process with the following form

$$\begin{aligned} z_{t} = \sum _{i=1}^{N_{t}} Y_{i}, \end{aligned}$$
(2.5)

where \((Y_{i})_{i\ge \, 1}\) is an i.i.d. sequence of random variables with

$$\begin{aligned} \mathbf{E}_{Q}Y_{i}=0\,,\quad \mathbf{E}_{Q}Y^2_{i}=1 \quad \text{ and }\quad \mathbf{E}_{Q}Y^4_{i}<\infty \,. \end{aligned}$$

Here \(N_{t}\) is a general counting process (see, for example, Mikosch 2004) defined as

$$\begin{aligned} N_{t} = \sum _{k=1}^{\infty } \mathbb {1}_{\{T_{k} \le t\}} \quad \text{ and }\quad T_{k}=\sum _{l=1}^k\, \tau _{l}\,, \end{aligned}$$
(2.6)

where \((\tau _{l})_{l\ge \,1}\) is an i.i.d. sequence of positive integrated random variables with distribution \(\eta \) and mean \({\check{\tau }}=\mathbf{E}_{Q}\tau _{1}>0\). We assume that the processes \((N_{t})_{t\ge 0}\) and \((Y_{i})_{i\ge \, 1}\) are independent between them and are also independent of \((L_{t})_{t\ge 0}\).

Note that the process \((z_{t})_{t\ge \, 0}\) is a special case of a semi-Markov process (see, e.g., Barbu and Limnios 2008; Limnios and Oprisan 2001).

Remark 2.1

It should be noted that if \(\tau _{j}\) are exponential random variables, then \((N_{t})_{t\ge 0}\) is a Poisson process and, in this case, \((\xi _{t})_{t\ge 0}\) is a Lévy process for which this model has been studied in Konev and Pergamenshchikov (2009a, b) and Konev and Pergamenshchikov (2012). But, in the general case when the process (2.5) is not a Lévy process, this process has a memory and cannot be treated in the framework of semi-martingales with independent increments. In this case, we need to develop new tools based on renewal theory arguments, what we do in Sect. 5. This tools will be intensively used in the proofs of the main results of this paper.

Note that for any function f from \(\mathbf{L}_{2}[0,n],\)\(f: [0,n] \rightarrow {{\mathbb {R}}},\) for the noise process \((\xi _{t})_{t\ge \, 0}\) defined in (1.2), with \((z_{t})_{t\ge \, 0}\) given in (2.5), the integral

$$\begin{aligned} I_{n}(f)=\int _{0}^{n} f(s) \mathrm {d}\xi _{s} \end{aligned}$$
(2.7)

is well defined with \(\mathbf{E}_{Q}\,I_n(f)=0\). Moreover, as it is shown in Lemma 6.2,

$$\begin{aligned} \mathbf{E}_{Q}\,I^{2}_n(f) \le \varkappa _{Q}\, \Vert f\Vert ^{2}_{n} \quad \text{ and }\quad \varkappa _{Q}={\bar{\varrho }}+\varrho _{3}^2\,\vert \rho \vert _{*} \,, \end{aligned}$$
(2.8)

where \(\Vert f\Vert ^{2}_{t}= \int _{0}^{t} f^2(s) \mathrm {d}\,s\), \({\bar{\varrho }}=\varrho _{1}^{2}+\varrho _{2}^{2}\) and \(\vert \rho \vert _{*}=\sup _{t\ge 0}\vert \rho (t)\vert <\infty \). Here \(\rho \) is the density of the renewal measure \({\check{\eta }}\) defined as

$$\begin{aligned} {\check{\eta }} =\sum ^{\infty }_{l=1}\,\eta ^{(l)} \,, \end{aligned}$$
(2.9)

where \(\eta ^{(l)}\) is the lth convolution power for \(\eta \).

Remark 2.2

In Proposition 5.2 we will prove that, under Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\), the the renewal measure \({\check{\eta }}\) hase a density \(\rho \).

To study the series (2.9) we assume that the measure \(\eta \) has a density g which satisfies the following conditions.

\((\mathbf{H}_{1}\)) Assume that, for any\(x\in {{\mathbb {R}}},\)there exist the finite limits

$$\begin{aligned} g(x-)=\lim _{z\rightarrow x-}g(z) \quad \text{ and }\quad g(x+)=\lim _{z\rightarrow x+}g(z) \end{aligned}$$

and, for any\(K>0,\)there exists\(\delta =\delta (K)>0\)for which

$$\begin{aligned} \sup _{\vert x\vert \le K}\, \int ^{\delta }_{0}\, \frac{ \vert g(x+t)+g(x-t)-g(x+)-g(x-) \vert }{t} \mathrm {d}t \,<\,\infty . \end{aligned}$$

\((\mathbf{H}_{2}\)) For any\(\gamma >0,\)

$$\begin{aligned} \sup _{z\ge 0}\,z^{\gamma }\vert 2g(z) -g(z-)-g(z+) \vert \,<\,\infty . \end{aligned}$$

\((\mathbf{H}_{3}\)) There exists\(\beta >0\)such that\(\int _{{{\mathbb {R}}}}\,e^{\beta x}\,g(x)\,\mathrm {d}x<\infty .\)

Remark 2.3

It should be noted that Condition \((\mathbf{H}_{3})\) means that there exists an exponential moment for the random variable \((\tau _{j})_{j\ge 1}\), i.e. these random variables are not too large. This is a natural constraint since these random variables define the intervals between jumps, i.e., the frequency of the jumps. So, to study the influence of the jumps in the model (1.1) one needs to consider the noise process (1.2) with “small” interval between jumps or large jump frequency.

For the next condition we need to introduce the Fourier transform of any function f from \(\mathbf{L}_{1}({{\mathbb {R}}}),\)\(f : {{\mathbb {R}}}\rightarrow {{\mathbb {R}}},\) defined as

$$\begin{aligned} \widehat{f}(\theta )=\frac{1}{2\pi }\,\int _{{{\mathbb {R}}}}\,e^{i\theta x}\,f(x)\,\mathrm {d}x. \end{aligned}$$
(2.10)

\((\mathbf{H}_{4}\)) There exists\(t^{*}>0\)such that the function\(\widehat{g}(\theta -it)\)belongs to\(\mathbf{L}_{1}({{\mathbb {R}}})\)for any\(0\le t\le t^{*}\).

It is clear that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold true for any continuously differentiable function g, for example for the exponential density.

Now we define the family of the noise distributions for the model (1.1) which is used in the robust risk (1.4). In our case the distribution family \({{\mathcal {Q}}}_{n}\) consists in all distributions on the Skorokhod space \({{\mathcal {D}}}[0,n]\) of the process (1.2) with the parameters satisfying the conditions (2.11) and (2.12). Note that any distribution Q from \({{\mathcal {Q}}}_{n}\) is defined by the unknown parameters in (1.2) and (2.1). We assume that

$$\begin{aligned} \varsigma _{*}\le \sigma _{Q} \le \varsigma ^{*}\,, \end{aligned}$$
(2.11)

where \(\sigma _{Q}=\varrho _{1}^{2}+\varrho _{2}^{2}+ \varrho _{3}^{2}/{\check{\tau }}\), the unknown bounds \(0<\varsigma _{*}\le \varsigma ^{*}\) are functions of n, i.e. \(\varsigma _{*}=\varsigma _{*}(n)\) and \(\varsigma ^{*}=\varsigma ^{*}(n)\), such that for any \({\check{\epsilon }}>0,\)

$$\begin{aligned} \lim _{n\rightarrow \infty }n^{{\check{\epsilon }}}\,\varsigma _{*}(n)=+\infty \quad \text{ and }\quad \lim _{n\rightarrow \infty }\,\frac{\varsigma ^{*}(n)}{n^{{\check{\epsilon }}}}=0\,. \end{aligned}$$
(2.12)

Remark 2.4

As we will see later, the parameter \(\sigma _{Q}\) is the limit for the Fourier transform of the noise process (1.2). Such limit is called variance proxy (see Konev and Pergamenshchikov 2012).

Remark 2.5

Note that, generally (but it is not necessary) the parameters \(\varrho _{1}\), \(\varrho _{2}\) and \(\varrho _{3}\) can be dependent on n. Condition (2.12) means that we consider all possible cases, i.e. these parameters may go to the infinity or be constant or to zero as well. See, for example, the conditions (3.32) in Konev and Pergamenshchikov (2015).

3 Model selection

Let \((\phi _{j})_{j\ge \, 1}\) be an orthonormal uniformly bounded basis in \(\mathbf{L}_{2}[0,1]\), i.e., for some constant \(\phi _{*}\ge 1\), which may be depend on n,

$$\begin{aligned} \sup _{0\le j\le n}\,\sup _{0\le t\le 1}\vert \phi _{j}(t)\vert \, \le \, \phi _{*} <\infty \,. \end{aligned}$$
(3.1)

We extend the functions \(\phi _{j}(t)\) by periodicity, i.e., we set \(\phi _{j}(t):=\phi _{j}(\{t\})\), where \(\{t\}\) is the fractional part of \(t\ge 0\). For example, we can take the trigonometric basis defined as \(\text{ Tr }_{1}\equiv 1\) and, for \(j\ge 2,\)

$$\begin{aligned} \text{ Tr }_{j}(x)= \sqrt{2} \left\{ \begin{array}{ll} \cos (2\pi [j/2] x)&{}\quad \text{ for } \text{ even }\quad j;\\ \sin (2\pi [j/2] x)&{}\quad \text{ for } \text{ odd }\quad j, \end{array} \right. \end{aligned}$$
(3.2)

where [x] denotes the integer part of x.

To estimate the function S we use here the model selection procedure for continuous time regression models from Konev and Pergamenshchikov (2012) based on the Fourrier expansion. We recall that for any function S from \(\mathbf{L}_{2}[0,1]\) we can write

$$\begin{aligned} S(t)=\sum ^{\infty }_{j=1}\,\theta _{j}\,\phi _{j}(t) \quad \text{ and }\quad \theta _{j}= (S,\phi _{j}) = \int _{0}^{1} S(t) \phi _{j}(t)\mathrm {d}t \,. \end{aligned}$$
(3.3)

So, to estimate the function S it suffices to estimate the coefficients \(\theta _{j}\) and to replace them in this representation by their estimators. Using the fact that the function S and \(\phi _{j}\) are 1 - periodic we can write that

$$\begin{aligned} \theta _{j}=\frac{1}{n} \int _{0}^{n}\, \phi _{j}(t)\,S(t) \mathrm {d}t \,. \end{aligned}$$

If we replace here the differential \(S(t)\mathrm {d}t\) by the stochastic observed differential \(\mathrm {d}y_{t}\) we obtain the natural estimate for \(\theta _{j}\) on the time interval [0, n]

$$\begin{aligned} \widehat{\theta }_{j,n}= \frac{1}{n} \int _{0}^{n} \phi _{j}(t) \mathrm {d}\,y_{t}\,, \end{aligned}$$
(3.4)

which can be represented, in view of the model (1.1), as

$$\begin{aligned} \widehat{\theta }_{j,n}= \theta _{j} + \frac{1}{\sqrt{n}}\xi _{j,n}\,, \quad \xi _{j,n}= \frac{1}{\sqrt{n}} I_{n}(\phi _{j})\,. \end{aligned}$$
(3.5)

Now (see, for example, Ibragimov and Khasminskii 1981) we can estimate the function S by the projection estimators, i.e.

$$\begin{aligned} \widehat{S}_{m}(t)=\sum ^{m}_{j=1}\,\widehat{\theta }_{j,n}\,\phi _{j}(t)\,,\quad 0\le t\le 1\,, \end{aligned}$$
(3.6)

for some number \(m\rightarrow \infty \) as \(n\rightarrow \infty \). It should be noted that Pinsker in Pinsker (1981) shows that the projection estimators of the form (3.6) are not efficient. For obtaining efficient estimation one needs to use weighted least square estimators defined as

$$\begin{aligned} \widehat{S}_\lambda (t) = \sum _{j=1}^{n} \lambda (j) \widehat{\theta }_{j,n} \phi _{j}(t)\,, \end{aligned}$$
(3.7)

where the coefficients \(\lambda =(\lambda (j))_{1\le j\le n}\) belong to some finite set \(\Lambda \) from \([0,1]^n\). As it is shown in Pinsker (1981), in order to obtain efficient estimators, the coefficients \(\lambda (j)\) in (3.7) need to be chosen depending on the regularity of the unknown function S. In this paper we consider the adaptive case, i.e. we assume that the regularity of the function S is unknown. In this case we chose the weight coefficients on the basis of the model selection procedure proposed in Konev and Pergamenshchikov (2012) for the general semi-martingale regression model in continuous time. These coefficients will be obtained later in (3.19). To the end, first we set

$$\begin{aligned} {\check{\iota }}_n=\#(\Lambda ) \quad \text{ and }\quad \vert \Lambda \vert _{*}=1+ \max _{\lambda \in \Lambda }\,\check{L}(\lambda ) \,, \end{aligned}$$
(3.8)

where \(\#(\Lambda )\) is the cardinal number of \(\Lambda \) and \(\check{L}(\lambda )=\sum ^{n}_{j=1}\lambda (j)\). Now, to choose a weight sequence \(\lambda \) in the set \(\Lambda \) we use the empirical quadratic risk, defined as

$$\begin{aligned} \text{ Err }_n(\lambda ) = \parallel \widehat{S}_\lambda -S\parallel ^2, \end{aligned}$$

which in our case is equal to

$$\begin{aligned} \text{ Err }_n(\lambda ) = \sum _{j=1}^{n} \lambda ^2(j) \widehat{\theta }^2_{j,n} -2 \sum _{j=1}^{n} \lambda (j) \widehat{\theta }_{j,n}\theta _{j}+ \sum _{j=1}^{\infty } \theta ^2_{j}. \end{aligned}$$
(3.9)

Since the Fourier coefficients \((\theta _{j})_{j\ge \,1}\) are unknown, we replace the terms \(\widehat{\theta }_{j,n}\theta _{j}\) by

$$\begin{aligned} \widetilde{\theta }_{j,n} = \widehat{\theta }^2_{j,n} - \frac{ \widehat{\sigma }_{n}}{n}\,, \end{aligned}$$
(3.10)

where \(\widehat{\sigma }_{n}\) is an estimate for the variance proxy \(\sigma _{Q}\) defined in (2.11). If it is known, we take \(\widehat{\sigma }_{n}=\sigma _{Q}\); otherwise, we can choose it, for example, as in Konev and Pergamenshchikov (2012), i.e.

$$\begin{aligned} \widehat{\sigma }_{n}= \sum ^n_{j=[\sqrt{n}]+1}\,\widehat{t}\,^2_{j,n}\,, \end{aligned}$$
(3.11)

where \(\widehat{t}_{j,n}\) are the estimators for the Fourier coefficients with respect to the trigonometric basis (3.2), i.e.

$$\begin{aligned} \widehat{t}_{j,n}=\frac{1}{n} \int ^{n}_{0}\,Tr_{j}(t)\mathrm {d}y_{t}\,. \end{aligned}$$
(3.12)

Finally, in order to choose the weights, we will minimize the following cost function

$$\begin{aligned} J_n(\lambda )=\sum _{j=1}^{n} \lambda ^2(j) \widehat{\theta }^2_{j,n} -2 \sum _{j=1}^{n} \lambda (j)\widetilde{\theta }_{j,n} + \delta \,P_{n}(\lambda ), \end{aligned}$$
(3.13)

where \(\delta >0\) is some threshold which will be specified later and the penalty term is

$$\begin{aligned} P_{n}(\lambda )= \frac{ \widehat{\sigma }_{n} |\lambda |^2}{n}. \end{aligned}$$
(3.14)

We define the model selection procedure as

$$\begin{aligned} \widehat{S}_{*} = \widehat{S}_{{\hat{\lambda }}} \quad \text{ and }\quad \widehat{\lambda }= \text{ argmin }_{\lambda \in \Lambda } J_n(\lambda )\,. \end{aligned}$$
(3.15)

We recall that the set \(\Lambda \) is finite so \({\hat{\lambda }}\) exists. In the case when \({\hat{\lambda }}\) is not unique, we take one of them.

Let us now specify the weight coefficients \((\lambda (j))_{1\le j\le n}\). Consider, for some fixed \(0<\varepsilon <1,\) a numerical grid of the form

$$\begin{aligned} {{\mathcal {A}}}=\{1,\ldots ,k^*\}\times \{\varepsilon ,\ldots ,m\varepsilon \}\,, \end{aligned}$$
(3.16)

where \(m=[1/\varepsilon ^2]\). We assume that both parameters \(k^*\ge 1\) and \(\varepsilon \) are functions of n, i.e. \(k^*=k^*(n)\) and \(\varepsilon =\varepsilon (n)\), such that

$$\begin{aligned} \left\{ \begin{array}{ll} &{}\lim _{n\rightarrow \infty }\,k^*(n)=+\infty \,, \quad \lim _{n\rightarrow \infty }\,\dfrac{k^*(n)}{\ln n}=0\,,\\ &{} \lim _{n\rightarrow \infty }\,\varepsilon (n)=0 \quad \text{ and }\quad \lim _{n\rightarrow \infty }\,n^{{\check{\delta }}}\varepsilon (n)\,=+\infty \end{array} \right. \end{aligned}$$
(3.17)

for any \({\check{\delta }}>0\). One can take, for example, for \(n\ge 2\)

$$\begin{aligned} \varepsilon (n)=\frac{1}{ \ln n } \quad \text{ and }\quad k^*(n)=k^{*}_{0}+\sqrt{\ln n}\,, \end{aligned}$$
(3.18)

where \(k^{*}_{0}\ge 0\) is some fixed constant. For each \(\alpha =(\beta , \mathbf{l})\in {{\mathcal {A}}}\), we introduce the weight sequence

$$\begin{aligned} \lambda _{\alpha }=(\lambda _{\alpha }(j))_{1\le j\le n} \end{aligned}$$

with the elements

$$\begin{aligned} \lambda _{\alpha }(j)=\mathbf{1}_{\{1\le j<j_{*}\}}+ \left( 1-(j/\omega _\alpha )^\beta \right) \, \mathbf{1}_{\{ j_{*}\le j\le \omega _{\alpha }\}}, \end{aligned}$$
(3.19)

where \(j_{*}=1+\left[ \ln \upsilon _{n}\right] \), \(\omega _{\alpha }=(\mathrm {d}_{\beta }\,\mathbf{l}\upsilon _{n})^{1/(2\beta +1)}\),

$$\begin{aligned} \mathrm {d}_{\beta }=\frac{(\beta +1)(2\beta +1)}{\pi ^{2\beta }\beta } \quad \text{ and }\quad \upsilon _{n}=n/\varsigma ^{*} \,. \end{aligned}$$

and the threshold \(\varsigma ^{*}(n)\) is introduced in (2.11). Now we define the set \(\Lambda \) as

$$\begin{aligned} \Lambda \,=\,\{\lambda _{\alpha }\,,\,\alpha \in {{\mathcal {A}}}\}\,. \end{aligned}$$
(3.20)

It will be noted that in this case the cardinal of the set \(\Lambda \) is

$$\begin{aligned} {\check{\iota }}_{n}=k^{*} m\,. \end{aligned}$$
(3.21)

Moreover, taking into account that \(\mathrm {d}_{\beta }<1\) for \(\beta \ge 1\) we obtain for the set (3.20)

$$\begin{aligned} \vert \Lambda \vert _{*}\, \le \,1+ \sup _{\alpha \in {{\mathcal {A}}}} \omega _{\alpha } \le 1+(\upsilon _{n}/\varepsilon )^{1/3}\,. \end{aligned}$$
(3.22)

Remark 3.1

Note that the form (3.19) for the weight coefficients in (3.7) was proposed by Pinsker in Pinsker (1981) for the efficient estimation in the nonadaptive case, i.e. when the regularity parameters of the function S are known. In the adaptive case these weight coefficients are used in Konev and Pergamenshchikov (2012, 2015) to show the asymptotic efficiency for model selection procedures.

4 Main results

In this section we obtain in Theorem 4.3 the non-asymptotic oracle inequality for the quadratic risk (1.3) for the model selection procedure (3.15) and in Theorem 4.4 the non-asymptotic oracle inequality for the robust risk (1.4) for the same model selection procedure (3.15), considered with the coefficients (3.19). We give the lower and upper bound for the robust risk in Theorems 4.5 and 4.7, and also the optimal convergence rate in Corollary 4.8.

Before stating the non-asymptotic oracle inequality, let us first introduce the following parameters which will be used for describing the rest term in the oracle inequalities. For the renewal density \(\rho \) defined in (2.9) we set

$$\begin{aligned} \Upsilon (x)=\rho (x)-\frac{1}{{\check{\tau }}} \quad \text{ and }\quad \vert \Upsilon \vert _{1}=\int ^{+\infty }_{0}\,\vert \Upsilon (x)\vert \,\mathrm {d}x \,, \end{aligned}$$
(4.1)

where \({\check{\tau }}=\mathbf{E}_{Q}\tau _{1}\). In Proposition 5.2 we show that \(\vert \rho \vert _{*}=\sup _{t\ge 0}\vert \rho (t)\vert <\infty \) and \(\vert \Upsilon \vert _{1}<\infty \). So, using this, we can introduce the following parameters

$$\begin{aligned} \Psi _{Q}=4\varkappa _{Q}{\check{\iota }}_n+ 5\sigma _{Q}\,{\check{\tau }}\,\phi ^{2}_{max}\,\vert \Upsilon \vert _{1} + \frac{4 {\check{\iota }}}{\sigma _{Q}}\, \phi ^{4}_{max} (1+\sigma ^{2}_{Q})^{2}\,{\check{\mathbf{l}}} \end{aligned}$$
(4.2)

and

$$\begin{aligned} \mathbf{c}^{*}_{Q}=\varkappa _{Q}+\sigma _{Q}\,(1+{\check{\tau }}\,\phi ^{2}_{max}\,\vert \Upsilon \vert _{1}) + \phi ^{2}_{max} (1+\sigma ^{2}_{Q})\,\sqrt{{\check{\mathbf{l}}}} \,, \end{aligned}$$
(4.3)

where \( {\check{\mathbf{l}}}=5(1+{\check{\tau }})^{2}(1+\vert \rho \vert ^{2}_{*}) \left( 2+\vert \Upsilon \vert _{1}+\mathbf{E}Y^{4}_{1}+\Pi (x^{4})\right) \). We recall that \({\check{\iota }}_n\) is cardinal of \(\Lambda \), the noise variance \(\sigma _{Q}\) is defined in (2.11) and the parameter \(\varkappa _{Q}\) is given in (2.8). First, let us state the non-asymptotic oracle inequality for the quadratic risk for the model selection procedure (3.15) introduced in (1.3) by \({{\mathcal {R}}}_{Q}(\widetilde{S}_{n},S)= \mathbf{E}_{Q,S}\,\Vert \widetilde{S}_{n}-S\Vert ^{2}.\)

Theorem 4.1

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold. Then, for any \(n\ge \,1\) and \(0<\delta < 1/6\), the estimator of S given in (3.15) satisfies the following oracle inequality

$$\begin{aligned} \mathcal {R}_{Q}(\widehat{S}_*,S)\le \frac{1+3\delta }{1-3\delta } \min _{\lambda \in \Lambda } \mathcal {R}_{Q}(\widehat{S}_\lambda ,S)+ \frac{\Psi _{Q} + 12 \vert \Lambda \vert _{*}\, \mathbf{E}_{S} | \widehat{\sigma }_{n} -\sigma _{Q} |}{n\delta }\,. \end{aligned}$$
(4.4)

Now we study the estimate (3.11).

Proposition 4.2

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold and that the function \(S(\cdot )\) is continuously differentiable. Then, for any \(n\ge 2\),

$$\begin{aligned} \mathbf{E}_{Q,S}|\widehat{\sigma }_{n}-\sigma _{Q}| \le \frac{ 5\Vert \dot{S}\Vert ^2 +\mathbf{c}^{*}_{Q}}{\sqrt{n}} \,, \end{aligned}$$
(4.5)

where \(\dot{S}\) is the differential of S.

Theorem 4.1 and Proposition 4.2 implies the following result.

Theorem 4.3

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold and that the function S is continuously differentiable. Then, for any \(n\ge \, 1 \) and \( 0 <\delta \le 1/6\), the procedure (3.15), (3.11) satisfies the following oracle inequality

$$\begin{aligned} \mathcal {R}_{Q}(\widehat{S}_*,S)\le \frac{1+3\delta }{1-3\delta } \min _{\lambda \in \Lambda } \mathcal {R}_{Q}(\widehat{S}_\lambda ,S)+ \frac{60\widetilde{\Lambda }_{n}\, \Vert \dot{S}\Vert ^2 +\widetilde{\Psi }_{Q,n}}{n\delta } \,, \end{aligned}$$
(4.6)

where \(\widetilde{\Psi }_{Q,n}=12 \widetilde{\Lambda }_{n}\mathbf{c}^{*}_{Q}+\Psi _{Q}\) and \(\widetilde{\Lambda }_{n}=\vert \Lambda \vert _{*}/\sqrt{n}.\)

Remark 4.1

Note that the coefficient \(\varkappa _{Q}\) can be estimated as \(\varkappa _{Q}\le (1+{\check{\tau }}\vert \rho \vert _{*})\sigma _{Q}\). Therefore,taking into account that \(\phi ^{4}_{max}\ge 1\), the remainder term in (4.6) can be estimated as

$$\begin{aligned} \widetilde{\Psi }_{Q,n}\le \mathbf{C}_{*} \left( 1+\sigma ^{4}_{Q}+\frac{1}{\sigma _{Q}} \right) (1+\widetilde{\Lambda }_{n}){\check{\iota }}_{n}\phi ^{4}_{max}\,, \end{aligned}$$
(4.7)

where \(\mathbf{C}_{*}>0\) is some constant which is independent of the distribution Q.

Furthermore, let us study the robust risk (1.4) for the procedure (3.15). In this case, the distribution family \({{\mathcal {Q}}}_{n}\) consists in all distributions on the Skorokhod space \({{\mathcal {D}}}[0,n]\) of the process (1.2) with the parameters satisfying the conditions (2.11) and (2.12).

Moreover, we assume also that the upper bound for the basis functions in (3.1) may depend on \(n\ge 1\), i.e. \(\phi _{*}=\phi _{*}(n)\), such that for any \({\check{\epsilon }}>0\)

$$\begin{aligned} \lim _{n\rightarrow \infty }\,\frac{\phi _{*}(n)}{n^{{\check{\epsilon }}}}= 0\,. \end{aligned}$$
(4.8)

The next result presents the non-asymptotic oracle inequality for the robust risk (1.4) for the model selection procedure (3.15), considered with the coefficients (3.19).

Theorem 4.4

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold and that the unknown function S is continuously differentiable. Then, for the robust risk defined by \({{\mathcal {R}}}^{*}_{n}(\widetilde{S}_{n},S)=\sup _{Q\in {{\mathcal {Q}}}_{n}}\,{{\mathcal {R}}}_{Q}(\widetilde{S}_{n},S),\) through the distribution family (2.112.12), the procedure (3.15) with the coefficients (3.19) for any \(n\ge \, 1 \) and \( 0<\delta <1/6\), satisfies the following oracle inequality

$$\begin{aligned} {{\mathcal {R}}}^{*}_n(\widehat{S}_*,S)\le \frac{1+3\delta }{1-3\delta } \min _{\lambda \in \Lambda } {{\mathcal {R}}}^{*}_n(\widehat{S}_\lambda ,S)+ \frac{\mathbf{U}^{*}_{n}(S)}{n\delta }, \end{aligned}$$
(4.9)

where the sequence \(\mathbf{U}^{*}_{n}(S)>0\) is such that, under the conditions (2.12), (3.17) and (4.8), for any \(r>0\) and \({\check{\delta }}>0,\)

$$\begin{aligned} \lim _{n\rightarrow \infty }\, \sup _{\Vert \dot{S}\Vert \le r} \, \frac{\mathbf{U}^{*}_{n}(S)}{n^{{\check{\delta }}}} =0. \end{aligned}$$
(4.10)

Now we study the asymptotic efficiency for the procedure (3.15) with the coefficients (3.19), with respect to the robust risk (1.4) defined by the distribution family (2.112.12). To this end, we assume that the unknown function S in the model (1.1) belongs to the Sobolev ball

$$\begin{aligned} W^{k}_{r}=\left\{ f\in \,{{\mathcal {C}}}^{k}_{per}[0,1] \,:\,\sum _{j=0}^k\,\Vert f^{(j)}\Vert ^2\le \mathbf{r}\right\} \,, \end{aligned}$$
(4.11)

where \(\mathbf{r}>0\) and \(k\ge 1\) are some unknown parameters, \({{\mathcal {C}}}^{k}_{per}[0,1]\) is the set of k times continuously differentiable functions \(f\,:\,[0,1]\rightarrow {{\mathbb {R}}}\) such that \(f^{(i)}(0)=f^{(i)}(1)\) for all \(0\le i \le k\). The function class \(W^{k}_{r}\) can be written as an ellipsoid in \(\mathbf{L}_{2}[0,1]\), i.e.,

$$\begin{aligned} W^{k}_{r}=\left\{ f\in \,{{\mathcal {C}}}^{k}_{per}[0,1]\,:\, \sum _{j=1}^{\infty }\,a_{j}\,\theta ^2_{j}\,\le \mathbf{r}\right\} , \end{aligned}$$
(4.12)

where \(a_{j}=\sum ^k_{i=0}\left( 2\pi [j/2]\right) ^{2i}\) and \(\theta _{j}=\int ^{1}_{0}\,f(v)\text{ Tr }_{j}(v)\mathrm {d}v\). We recall that the trigonometric basis \((\text{ Tr }_{j})_{j\ge 1}\) is defined in (3.2).

Similarly to Konev and Pergamenshchikov (2012, 2015) we will show here that the asymptotic sharp lower bound for the robust risk (1.4) is given by

$$\begin{aligned} \mathbf{r}^{*}_{k}= \, \left( (2k+1)\mathbf{r}\right) ^{1/(2k+1)}\, \left( \frac{k}{(k+1)\pi } \right) ^{2k/(2k+1)}\,. \end{aligned}$$
(4.13)

Note that this is the well-known Pinsker constant obtained for the nonadaptive filtration problem in “signal + small white noise” model (see, for example, Pinsker 1981). Let \(\Pi _{n}\) be the set of all estimators \(\widehat{S}_{n}\) measurable with respect to the \(\sigma \) - field \(\sigma \{y_{t}\,,\,0\le t\le n\}\) generated by the process (1.1).

The following two results give the lower and upper bound for the robust risk (1.4) defined for the distribution family (2.112.12).

Theorem 4.5

Under Conditions (2.11) and (2.12),

$$\begin{aligned} \liminf _{n\rightarrow \infty }\, \upsilon ^{2k/(2k+1)}_{n} \inf _{\widehat{S}_{n}\in \Pi _{n}}\,\, \sup _{S\in W^{k}_{\mathbf{r}}} \,{{\mathcal {R}}}^{*}_{n}(\widehat{S}_{n},S) \ge \mathbf{r}^{*}_{k}\,, \end{aligned}$$
(4.14)

where \(\upsilon _{n}=n/\varsigma ^{*}\).

Note that if the parameters \(\mathbf{r}\) and k are known, i.e. for the non adaptive estimation case, then to obtain the efficient estimation for the “signal+white noise” model Pinsker (1981) proposed to use the estimate \(\widehat{S}_{\lambda _{0}}\) defined in (3.7) with the weights

$$\begin{aligned} \lambda _{0}=\mathbf{1}_{\{1\le j<j_{*}\}}+ \left( 1-(j/\omega _{\alpha _{0}})^\beta \right) \, \mathbf{1}_{\{ j_{*}\le j\le \omega _{\alpha _{0}}\}}, \end{aligned}$$
(4.15)

where \(\alpha _{0}=(k,\mathbf{l}_{0})\) and \(\mathbf{l}_{0}=[\mathbf{r}/\varepsilon ]\varepsilon \). For the model (1.11.2) we show the same result.

Proposition 4.6

The estimator \(\widehat{S}_{\lambda _{0}}\) satisfies the following asymptotic upper bound

$$\begin{aligned} \lim _{n \rightarrow \infty } \upsilon ^{2k /(2k+1)}_{n}\, \sup _{S\in W^{k}_{\mathbf{r}}} {{\mathcal {R}}}^*_n (\widehat{S}_{\lambda _{0}},S) \le \mathbf{r}^*_{k}\,. \end{aligned}$$
(4.16)

Remark 4.2

Note that the inequalities (4.14) and (4.16) imply that the estimator \(\widehat{S}_{\lambda _{0}}\) is efficient. But we can’t use the weights (4.15) directly because the parameters k and \(\mathbf{r}\) are unknown. By this reason to obtain the efficient estimate in the adaptive setting we use the model selection procedure (3.15) over the estimate family (3.20) which includes the estimator (4.15). Then, using the oracle inequality (4.9) and the upper bound (4.16) we can obtain the efficient property for this model selection procedure.

For the adaptive estimation we use the model selection procedure (3.15) with the parameter \(\delta \) defined as a function of n satisfying

$$\begin{aligned} \lim _{n}\,\delta _{n}=0 \quad \text{ and }\quad \lim _{n}\,n^{{\check{\delta }}}\,\delta _{n}=0 \end{aligned}$$
(4.17)

for any \({\check{\delta }}>0\). For example, we can take \(\delta _{n}=(6+\ln n)^{-1}\).

Let \(\widehat{S}_{*}\) be the procedure (3.15) based on the trigonometric basis (3.2) with the coefficients (3.19) and the parameter \(\delta =\delta _{n}\) satisfying (4.17).

Theorem 4.7

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold true. Then

$$\begin{aligned} \limsup _{n\rightarrow \infty }\, \upsilon ^{2k/(2k+1)}_{n}\, \sup _{S\in W^{k}_{\mathbf{r}}}\, {{\mathcal {R}}}^{*}_{n}(\widehat{S}_{*},S) \le \mathbf{r}^{*}_{k} \,. \end{aligned}$$
(4.18)

Theorems 4.5 and 4.7 allow us to compute the optimal convergence rate.

Corollary 4.8

Under the assumptions of Theorem 4.7 the procedure \(\widehat{S}_{*}\) is efficient, i.e.

$$\begin{aligned} \lim _{n\rightarrow \infty }\, \upsilon ^{2k/(2k+1)}_{n}\, \inf _{\widehat{S}_{n}\in \Pi _{n}}\,\, \sup _{S\in W^{k}_{r}} \,{{\mathcal {R}}}^{*}_{n}(\widehat{S}_{n},S) = \mathbf{r}^{*}_{k} \end{aligned}$$
(4.19)

and

$$\begin{aligned} \lim _{n\rightarrow \infty }\, \frac{ \inf _{\widehat{S}_{n}\in \Pi _{n}}\,\, \sup _{S\in W^{k}_{r}} \,{{\mathcal {R}}}^{*}_{n}(\widehat{S}_{n},S)}{\sup _{S\in W^{k}_{\mathbf{r}}}\, {{\mathcal {R}}}^{*}_{n}(\widehat{S}_{*},S)} = 1\,. \end{aligned}$$

Remark 4.3

It is well known that the optimal (minimax) risk convergence rate for the Sobolev ball \(W^{k}_{r}\) is \(n^{2k/(2k+1)}\) (see, for example, Pinsker 1981; Nussbaum 1985). We see here that the efficient robust rate is \(\upsilon ^{2k/(2k+1)}_{n}\), i.e., if the distribution upper bound \(\varsigma ^{*}\rightarrow 0\) as \(n\rightarrow \infty ,\) we obtain a faster rate with respect to \(n^{2k/(2k+1)}\), and, if \(\varsigma ^{*}\rightarrow \infty \) as \(n\rightarrow \infty ,\) we obtain a slower rate. In the case when \(\varsigma ^{*}\) is constant, than the robust rate is the same as the classical non robust convergence rate. The same properties for the robust risks are obtained in Konev and Pergamenshchikov (2010) and Konev and Pergamenshchikov (2012) for the regression model with Ornstein-Uhlenbeck noise process. So, this is typical situation, when we take the supremum over all noise distribution in (1.4). It is natural that we don’t obtain the same convergence rate as for the usual risks (1.3) and the difference is given by the coefficient \(\varsigma ^{*}\) which satisfies the “slowly changing” properties (2.12).

5 Renewal density

This section is concerned with results related to the renewal measure \( {\check{\eta }} =\sum ^{\infty }_{l=1}\,\eta ^{(l)} \,.\) We start with the following lemma.

Lemma 5.1

Let \(\tau \) be a positive random variable with a density g, such that \(\mathbf{E}e^{\beta \tau } <\infty \) for some \(\beta > 0\). Then there exists a constant \(\beta _1,\)\(0< \beta _1 <\beta \) for which,

$$\begin{aligned} \mathbf{E}e^{(\beta _1+i \omega )\tau } \ne 1 \qquad \forall \omega \in {{\mathbb {R}}}\,. \end{aligned}$$

Proof

We will show this lemma by contradiction, i.e. assume there exists some sequence of positive numbers going to zero \((\gamma _{k} )_{k\ge 1}\) and a sequence \((w_{k})_{k\ge 1}\) such that

$$\begin{aligned} \mathbf{E}e^{(\gamma _k+i \omega _k)\tau } = 1 \end{aligned}$$
(5.1)

for any \(k\ge 1\). Firstly assume that \(\limsup _{k\rightarrow \infty }\,w_{k} = +\infty \), i.e. there exists \((l_{k})_{k\ge 1}\) for which \(\lim _{k\rightarrow \infty }\,w_{l_{k}} = +\infty \). Note that in this case, for any \(N\ge 1,\)

$$\begin{aligned} \left| \int _{0}^{N}\,e^{\gamma _{l_{k}}t} \cos (w_{l_{k}}t) \,g(t) \mathrm {d}t \right|&\le \left| \int _{0}^{N}\, \cos (w_{l_{k}}t) \,g(t) \mathrm {d}t\right| \\&\quad + \left| \int _{0}^{N}\,(e^{\gamma _{l_{k}}t}-1) \cos (w_{l_{k}}t) \,g(t) \mathrm {d}t \right| \,, \end{aligned}$$

i.e., in view of Lemma A.5, for any fixed \(N\ge 1\)

$$\begin{aligned} \lim _{k\rightarrow \infty } \int _{0}^{N}\,e^{\gamma _{l_{k}}t} \cos (w_{l_{k}}t) \,g(t) \mathrm {d}t= 0\,. \end{aligned}$$

Since for some \(\beta >0\) the integral \( \int _{0}^{+\infty }\,e^{\beta t}\,g(t) \mathrm {d}t<\infty \), we get

$$\begin{aligned} \lim _{k\rightarrow \infty } \int _{0}^{+\infty }\,e^{\gamma _{l_{k}}t} \cos (w_{l_{k}}t) \,g(t) \mathrm {d}t= 0\,. \end{aligned}$$

Let now \( \limsup _{k\rightarrow \infty } w_{k}=\omega _{\infty }\) and \(\vert \omega _{\infty }\vert <\infty \). In this case there exists a sequence \((l_k)_{k\ge 1}\) such that \( \lim _{k\rightarrow \infty } w_{l_k}=\omega _{\infty }\), i.e.

$$\begin{aligned} 1=\lim _{k\rightarrow \infty } \mathbf{E}e^{\gamma _{l_k} \tau } \cos (\tau w_{l_k}) = \mathbf{E}_{Q} \cos (\tau w_{\infty })\,. \end{aligned}$$

It is clear that, for random variables having density, the last equality is possible if and only if \(w_{\infty }=0\). In this case, i.e. when \(\lim _{k\rightarrow \infty } w_{l_k} = 0\), the Eq. (5.1) implies

$$\begin{aligned} \lim _{k\rightarrow \infty } \mathbf{E}_{Q} e^{\gamma _{l_k} \tau } \frac{\sin (\tau w_{l_k})}{w_{l_k}} = \mathbf{E}\,\tau =0\,. \end{aligned}$$

But, \(\mathbf{E}\tau >0\), under our conditions. These contradictions imply the desired result. \(\square \)

Proposition 5.2

Let \(\tau \) be a positive random variable with the distribution \(\eta \) having a density g which satisfies Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\). Then the renewal measure (2.9) is absolutely continuous with density \(\rho \), for which

$$\begin{aligned} \rho (x)= \frac{1}{{\check{\tau }}} + \Upsilon (x)\,, \end{aligned}$$
(5.2)

where \({\check{\tau }}=\mathbf{E}\tau _{1}\) and \(\Upsilon (\cdot )\) is some function defined on \({{\mathbb {R}}}_{+}\) with values in \({{\mathbb {R}}}\) such that

$$\begin{aligned} \sup _{x\ge 0}\,x^\gamma \vert \Upsilon (x)\vert <\infty \quad \text{ for } \text{ all }\quad \gamma >0\,. \end{aligned}$$

Proof

First note, that we can represent the renewal measure \({\check{\eta }}\) as \({\check{\eta }}=\eta *\eta _{0}\) and \(\eta _{0}=\sum _{j=0}^{\infty } \eta ^{(j)}\). It is clear that in this case the density \(\rho \) of \({\check{\eta }}\) can be written as

$$\begin{aligned} \rho (x)=\int ^{x}_{0}\, g(x-y)\, \sum _{n\ge 0}\,g^{(n)}(y) \mathrm {d}y\,. \end{aligned}$$
(5.3)

Now we use the arguments proposed in the proof of Lemma 9.5 from Goldie (1991). For any \(0<\epsilon <1\) we set

$$\begin{aligned} \rho _{\epsilon }(x)=\int ^{x}_{0}\, g(x-y)\left( \sum _{n\ge 0} (1-\epsilon )^{n}\,g^{(n)}(y)- \frac{(1-\epsilon )}{{\check{\tau }}}\, g_0(y) \right) \mathrm {d}y -g(x) \,, \end{aligned}$$
(5.4)

where \( g_0(y)= e^{-\epsilon y/{\check{\tau }}}1_{\{y>0\}}.\) It is easy to deduce that for any \(x\in {{\mathbb {R}}}\)

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\,\rho _{\epsilon }(x)\,=\, \rho (x)-\frac{1}{{\check{\tau }}}\,\int ^{x}_{0}\,g(z)\,\mathrm {d}z -g(x)\,. \end{aligned}$$
(5.5)

Moreover, in view of Condition \((\mathbf{H}_{1})\) we obtain that the function \(\rho _{\epsilon }(x)\) satisfies Condition \(\mathbf{D})\) from Section A.3. So, through Proposition A.6 we get

$$\begin{aligned} \rho _{\epsilon }(x+)+\rho _{\epsilon }(x-)= \frac{1}{\pi }\,\int _{{{\mathbb {R}}}}\,e^{-ix\theta }\, \widehat{\rho }_{\epsilon }(\theta )\,\mathrm {d}\theta \,, \end{aligned}$$

where \(\widehat{\rho }_{\epsilon }(\theta ) =\int _{{{\mathbb {R}}}}\,e^{i\theta x}\rho _{\epsilon }(x)\mathrm {d}x\). Note that by the Bunyakovskii–Cauchy–Schwarz inequality

$$\begin{aligned} \vert \widehat{g}(\theta )\vert = \left| \int _{{{\mathbb {R}}}}\,e^{i\theta x} g(x) \mathrm {d}x \right| \le \int _{{{\mathbb {R}}}}\,g(x) \mathrm {d}x =1\,. \end{aligned}$$
(5.6)

It should be noted that this inequality becomes an equality if and only if \(\theta =0\). Therefore, for any \(0<\epsilon <1\) the module \(\vert (1-\epsilon )\widehat{g}(\theta )\vert <1\) and

$$\begin{aligned} \sum _{n=0}^{\infty } (1-\epsilon )^n (\widehat{g}(\theta ))^n = \frac{1}{1-(1-\epsilon )\widehat{g}(\theta )}\,. \end{aligned}$$

So, taking into account that

$$\begin{aligned} \widehat{g}_0(\theta )= \int _{{{\mathbb {R}}}}\,e^{i\theta x} g_0(x)\mathrm {d}x= \frac{{\check{\tau }}}{\epsilon -i{\check{\tau }}\theta }\,, \end{aligned}$$

we obtain

$$\begin{aligned} \widehat{\rho }_{\epsilon }(\theta )= \widehat{g}(\theta ) \sum _{n=0}^{\infty } (1-\epsilon )^n (\widehat{g}(\theta ))^n - \left( \frac{1-\epsilon }{{\check{\tau }}}\right) \widehat{g}(\theta ) \widehat{g}_0(\theta )- \widehat{g}(\theta )=\widehat{g}(\theta ) G_{\epsilon }(\theta ) \end{aligned}$$

where

$$\begin{aligned} G_{\epsilon }(\theta ) =\frac{1}{1-(1-\epsilon )\widehat{g}(\theta )} -\frac{1-i{\check{\tau }}\theta }{\epsilon -i{\check{\tau }}\theta }\,, \end{aligned}$$

i.e.

$$\begin{aligned} \rho _{\epsilon }(x-) + \rho _{\epsilon }(x+) = \frac{1}{\pi }\, \int _{{{\mathbb {R}}}}\,e^{-ix\theta }\, \widehat{g}(\theta ) G_{\epsilon }(\theta )\,\mathrm {d}\theta \,. \end{aligned}$$
(5.7)

In section A.5 we show that

$$\begin{aligned} \sup _{0<\epsilon<1,\theta \in {{\mathbb {R}}}}\vert G_{\epsilon }(\theta )\vert \,<\,\infty \,. \end{aligned}$$
(5.8)

Therefore, using Condition \((\mathbf{H}_{4})\) and Lebesgue’s dominated convergence theorem, we can pass to limit as \(\epsilon \rightarrow 0\) in (5.7), i.e., we obtain that

$$\begin{aligned} \rho (x+) + \rho (x-) -\frac{2}{{\check{\tau }}}\,\int ^{x}_{0}\,g(z)\,\mathrm {d}z -g(x+) -g(x-) =\frac{1}{\pi }\, \int _{{{\mathbb {R}}}}\,e^{-ix\theta }\, \widehat{g}(\theta ) G_{0}(\theta )\,\mathrm {d}\theta \,, \end{aligned}$$

where

$$\begin{aligned} G_{0}(\theta ) =\frac{1}{1-\widehat{g}(\theta )} + \frac{1-i{\check{\tau }}\theta }{i{\check{\tau }}\theta }\,. \end{aligned}$$

Using here again Proposition A.6 we deduce that

$$\begin{aligned} \rho (x+) + \rho (x-) = \frac{2}{{\check{\tau }}}\,\int ^{x}_{0}\,g(z)\,\mathrm {d}z + \frac{1}{\pi }\, \int _{{{\mathbb {R}}}}\,e^{-ix\theta }\, \widehat{g}(\theta ) \, \check{G}(\theta )\,\mathrm {d}\theta \end{aligned}$$
(5.9)

and

$$\begin{aligned} \check{G}(\theta ) =\frac{1}{1-\widehat{g}(\theta )} + \frac{1}{i{\check{\tau }}\theta } \,. \end{aligned}$$

Note now that we can represent the density (5.3) as

$$\begin{aligned} \rho (x)=g*\sum _{n\ge 0}\,g^{(n)} =\sum _{n\ge 1}\,g^{(n)}(x) =g(x)+\sum _{n\ge 2}\,g^{(n)}(x) =:g(x)+\rho _{c}(x) \end{aligned}$$

and the function \( \rho _{c}(x)\) is continuous for all \(x\in {{\mathbb {R}}}\). This means that

$$\begin{aligned} \widetilde{\rho }(x)= \frac{\rho (x+)+\rho (x-)}{2}-\rho (x) = \frac{g(x+)+g(x-)}{2}-g(x) \end{aligned}$$

and, therefore, Condition \((\mathbf{H}_{2})\) implies that, for any \(\gamma >0,\)

$$\begin{aligned} \sup _{x\ge 0}\,x^{\gamma }\,\vert \widetilde{\rho }(x)\vert \,<\infty . \end{aligned}$$

Now we can rewrite (5.9) as

$$\begin{aligned} \rho (x) - \frac{1}{{\check{\tau }}}\, = \frac{1}{{\check{\tau }}}\,\int ^{+\infty }_{x}\,g(z)\,\mathrm {d}z + \frac{1}{2\pi }\, \int _{{{\mathbb {R}}}}\,e^{-ix\theta }\, \widehat{g}(\theta ) \, \check{G}(\theta )\,\mathrm {d}\theta -\widetilde{\rho }(x). \end{aligned}$$
(5.10)

Taking into account that \(\mathbf{E}_{Q}e^{\beta \tau }<\infty \) for some \(\beta >0\) we can obtain that

$$\begin{aligned} \sup _{x\ge 0}\,x^{\gamma }\,\int ^{+\infty }_{x}\,g(z)\,\mathrm {d}z <\infty \,. \end{aligned}$$

To study the second term in (5.10) we will apply Proposition A.4 to the function \(\widehat{g}(\theta ) \check{G}(\theta ).\) First, note that the function \(\widehat{g}(\theta )=\mathbf{E}_{Q}e^{i\tau \theta }\) is holomorphic for any \(\theta \in {{\mathbb {C}}}\) with \(\text{ Im }(\theta )>-\beta ,\) due to Condition \((\mathbf{H}_{3});\) in view of Lemma A.8, there exists \(0<\beta _{*}<\beta \) for which the function \(\check{G}(\theta )\) is holomorphic. Second, Condition \((\mathbf{H}_{4})\) applied to the function \(\widehat{g}(\theta ) \check{G}(\theta )\) implies the first condition in Eq. (A.2). The second condition of (A.2) follows directly from Lemma A.5.

Therefore, the conditions of Proposition A.4 hold with \(\beta _{2}=+\infty \). Thus Proposition A.4 implies that for some \(0<\beta _{0}<\beta _{*}\)

$$\begin{aligned} \int _{{{\mathbb {R}}}}\,e^{-ix\theta }\,\widehat{g}(\theta ) \, \check{G}(\theta )\,\mathrm {d}\theta = e^{-\beta _{0}x}\, \int _{{{\mathbb {R}}}}\,e^{-ix\theta }\,\widehat{g}(\theta -i\beta _{0}) \, \check{G}(\theta -i\beta _{0})\,\mathrm {d}\theta \,. \end{aligned}$$

Taking into account here Condition \((\mathbf{H}_{4}\)) and the bound for \(\check{G}\) given in (A.8), we obtain

$$\begin{aligned} \sup _{x\ge 0}\,e^{\beta _{0}x}\, \left| \, \int _{{{\mathbb {R}}}}\,e^{-ix\theta }\, \widehat{g}(\theta )\, \check{G}(\theta )\,\mathrm {d}\theta \right| <\infty \,. \end{aligned}$$

Hence Proposition 5.2. \(\square \)

Using this proposition we can study the renewal process \((N_{t})_{t\ge 0}\) introduced in (2.6).

Corollary 5.3

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold true. Then, for any \(t>0,\)

$$\begin{aligned} \mathbf{E}\,N_{t}\le \vert \rho \vert _{*}\,t\,, \qquad \mathbf{E}\,N^{2}_{t}\le \, \vert \rho \vert _{*}\,t + \,\vert \rho \vert ^{2}_{*}\,t^{2} \end{aligned}$$
(5.11)

and, moreover, \(\mathbf{E}N^{m}_{t}<\infty \) for any \(m\ge 3\).

Proof

First, by means of Proposition 5.2, note that we get

$$\begin{aligned} \mathbf{E}\,N_{t}=\mathbf{E}\,\sum _{k\ge 1}\,\mathbf{1}_{\{T_{k}\le t\}}=\int ^{t}_{0}\,\rho (v)\,\mathrm {d}v\le \vert \rho \vert _{*}\,t\,. \end{aligned}$$

To estimate the second moment of \(N_{t}\) note that,

$$\begin{aligned} \mathbf{E}\,N^{2}_{t}&=\mathbf{E}\,\sum _{k\ge 1}\,\mathbf{1}_{\{T_{k}\le t\}}+2\mathbf{E}\,\sum _{k\ge 1}\,\mathbf{1}_{\{T_{k}\le t\}}\,\sum _{j \ge k+1}\,\mathbf{1}_{\{T_{j}\le t\}}\\&=\mathbf{E}\,N_{t}+2\mathbf{E}\,\sum _{k\ge 1}\,\mathbf{1}_{\{T_{k}\le t\}}\,\mathbf{E}\left( \sum _{j \ge k+1}\,\mathbf{1}_{\{T_{j}\le t\}}/T_{k} \right) \\&=\mathbf{E}\,N_{t}+2\mathbf{E}\,\sum _{k\ge 1}\,\mathbf{1}_{\{T_{k}\le t\}}\,\Theta (t-T_{k})=\mathbf{E}\,N_{t} +2\int ^{t}_{0}\,\Theta (t-v)\,\rho (v)\,\mathrm {d}v \,, \end{aligned}$$

i.e. we obtain that

$$\begin{aligned} \mathbf{E}\,N^{2}_{t}\le \vert \rho \vert _{*} t+ 2\vert \rho \vert _{*} \int ^{t}_{0}\,\Theta (t-v)\,\mathrm {d}v\,, \end{aligned}$$

where \(\Theta (v)=\mathbf{E}\sum _{j \ge k+1}\,\mathbf{1}_{\{\sum _{i=k+1}^j\, \tau _{i}\le v\}}\). Taking into account that \((\tau _{k})_{k\ge 1}\) is i.i.d. sequence, this term can be represented as

$$\begin{aligned} \Theta (v) =\mathbf{E}\sum _{m \ge 1}\,\mathbf{1}_{\{T_{m}\le v\}} =\mathbf{E}\,N_{v}\le \vert \rho \vert _{*} v\,. \end{aligned}$$
(5.12)

This implies the second inequality in (5.11). Similarly, for \(m\ge 3\) we obtain that for some constant \(\mathbf{C}_{m}>0\)

$$\begin{aligned} \mathbf{E}N^{m}_{t}&\le \mathbf{C}_{m}\,\mathbf{E}\, \sum _{k_{1}<\dots<k_{m}}\prod ^{m}_{j=1}\mathbf{1}_{\{T_{k_{j}}\le t\}}\\&= \mathbf{C}_{m}\,\mathbf{E}\, \sum _{k_{1}<\dots<k_{m-1}}\prod ^{m-1}_{j=1}\mathbf{1}_{\{T_{k_{j}}\le t\}} \mathbf{E}\left( \sum _{l= k_{m-1}+1}\mathbf{1}_{\{T_{l}\le t\}}\vert T_{k_{m-1}}\right) \\&= \mathbf{C}_{m}\,\mathbf{E}\, \sum _{k_{1}<\dots <k_{m-1}}\prod ^{m-1}_{j=1}\mathbf{1}_{\{T_{k_{j}}\le t\}} \Theta (t-T_{k_{m-1}}) \le \mathbf{C}_{m}\,\vert \rho \vert _{*}\,t\,\mathbf{E}\,N^{m-1}_{t}\,. \end{aligned}$$

Therefore, by induction we obtain that \(\mathbf{E}\,N^{m}_{t}<\infty \) for any \(m\ge 3\). Hence Corollary 5.3. \(\square \)

6 Stochastic calculus for semi-Markov processes

In this section we give some results of stochastic calculus for the process \((\xi _{t})_{t\ge \, 0}\) given in (1.2), needed all along this paper. As this process is the combination of a Lévy process and a semi-Markov process, these results are not standard and need to be provided.

Lemma 6.1

Let f and g be any non-random functions from \(\mathbf{L}_{2}[0,n]\) and \((I_{t}(f))_{t\ge \,0}\) be the process defined in (2.7). Then, for any \(0 \le t \le n\),

$$\begin{aligned} \mathbf{E}_{Q}I_{t}(f) I_{t}(g) = {\bar{\varrho }} \, (f,g)_{t}\, + \varrho _{3}^2 \, (f,g\rho )_{t} \,, \end{aligned}$$
(6.1)

where \((f,g)_{t}=\int _{0}^{t} f(s)\,g(s) \mathrm {d}s\) and \(\rho \) is the density of the renewal measure \({\check{\eta }}=\sum ^{\infty }_{l=1}\,\eta ^{(l)}.\)

Proof

First, note that the noise process (1.2) is square integrated martingale which can be represented as

$$\begin{aligned} \xi _{t}=\xi ^{c}_{t}+\xi ^{d}_{t}\,, \end{aligned}$$
(6.2)

where \(\xi ^{c}_{t}=\varrho _{1} w_{t}\) and \(\xi ^{d}_{t}=\varrho _{2}\,L_{t}+ \varrho _{3}\,z_{t}\). Note that the process \((\xi ^{c}_{t})_{t\ge 0}\) is the square integrated continuous martingale with the quadratic characteristic \(<\xi ^{c}>_{t}=\varrho _{1}^2\,t\). Therefore, the quadratic variance \([\xi ]_{t}\) is the following:

$$\begin{aligned}{}[\xi ]_{t}=<\xi ^{c}>_{t}+ \sum _{0\le s\le t}\,\left( \Delta \xi ^{d}_{s}\right) ^{2} =\varrho _{1}^2\,t + \sum _{0\le s\le t}\,\left( \Delta \xi ^{d}_{s}\right) ^{2} \,, \end{aligned}$$

where \(\Delta \xi _{s}=\xi _{s}-\xi _{s-}\) (see, for example, Liptser and Shiryaev 1986). Recalling that the processes \((L_{t})_{t\ge 0}\) and \((z_{t})_{t\ge 0}\) are independent, we obtain that \(\Delta L_{t} \Delta z_{t}=0\) for any \(t>0\), i.e.

$$\begin{aligned}{}[\xi ]_{t}=\varrho _{1}^2 t +\varrho _{2}^2 \sum _{0\le s\le t}\, \left( \Delta L_{s} \right) ^{2} +\varrho _{3}^2 \sum _{0\le s\le t}\, \left( \Delta z_{s}\right) ^{2} \,. \end{aligned}$$
(6.3)

Moreover, note that we can represent the stochastic integral \(I_{t}(f)\) as

$$\begin{aligned} I_{t}(f) = \varrho _{1} I_{t}^w(f) + \varrho _{2} I_{t}^L(f)+ \varrho _{3} I_{t}^z(f) \,, \end{aligned}$$
(6.4)

where the stochastic integrals \(I_{t}^{w}(f)=\int _{0}^{t}\, f(s) \mathrm {d}w_{s}\)\(I_{t}^{L}(f)=\int _{0}^{t}\, f(s) \mathrm {d}L_{s}\) and \(I_{t}^z(f)=\int _{0}^{t}\, f(s) \mathrm {d}z_{s}\) are independent square integrated martingales. Therefore,

$$\begin{aligned} \mathbf{E}_{Q}I_{t}(f)I_{t}(g)= \varrho _{1}^{2} \mathbf{E}\,I^{w}_{t}(f)I^{w}_{t}(g) + \varrho _{2}^{2} \mathbf{E}\,I^{L}_{t}(f)I^{L}_{t}(g) + \varrho _{3}^2 \mathbf{E}\,I^{z}_{t}(f)I^{z}_{t}(g) \,. \end{aligned}$$

Taking into account that \(\mathbf{E}\,I^{w}_{t}(f)I^{w}_{t}(g)=(f,g)_{t}\) and that the expectation of the product of square integrated martingales equals to the expectation of their mutual covariance, i.e. \(\mathbf{E}\,I_{t}^L(f)\,I_{t}^L(g)=\mathbf{E}\,[I^L(f),I^L(g)]_{t}\) and \(\mathbf{E}\,I_{t}^z(f)\,I_{t}^z(g)=\mathbf{E}\,[I^z(f),I^z(g)]_{t}\), we obtain that

$$\begin{aligned} \mathbf{E}_{Q}\, I_{t}(f)I_{t}(g) = \varrho _{1}^{2} (f,g)_{t} + \varrho _{2}^{2} \mathbf{E}\, [I^L(f),I^L(g)]_{t} + \varrho _{3}^2 \mathbf{E}\, [I^z(f),I^z(g)]_{t} \,. \end{aligned}$$

In view of (6.3) the mutual covariances may be calculated as

$$\begin{aligned} \left[ I^L(f),I^L(g)\right] _{t} = \sum _{0\le s\le t}f(s)g(s) \left( \Delta L_{s} \right) ^{2} \end{aligned}$$

and

$$\begin{aligned}{}[I^z(f),I^z(g)]_{t}=\sum _{0\le s\le t}f(s)g(s) \left( \Delta z_{s} \right) ^{2} =\sum _{l=1}^{\infty } f(T_{l}) g(T_{l}) Y^2_{l} 1_{\{T_{l} \le t\}}\,. \end{aligned}$$

Taking into account that \(\Pi (x^{2})=1\) and that the sequences \((Y_{k})_{k\ge 1}\) and \((T_{k})_{k\ge 1}\) are independent, we find

$$\begin{aligned} \mathbf{E}\, \left[ I^L(f)\,I^L(g)\right] _{t} = \, \Pi \left( x^{2}\right) \, (f,g)_{t} =(f,g)_{t} \end{aligned}$$
(6.5)

and

$$\begin{aligned} \mathbf{E}\, [I^z(f)\,I^z(g)]_{t}= \mathbf{E}\, \sum _{l=1}^{\infty } f(T_{l}) g(T_{l}) 1_{\{T_{l} \le t\}} = \int _{0}^{t} f(s) g(s) \rho (s) \mathrm {d}s =(f,g\rho )_{t} \,. \end{aligned}$$

Hence the conclusion follows. \(\square \)

Corollary 6.2

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold true. Then, for any \(n\ge 1\) and for any non random function f from \(\mathbf{L}_{2}[0,n],\) the stochastic integral (2.7) exists and satisfies the inequality (2.8).

Proof

This lemma follows directly from Lemma 6.1 with \(f=g\) and Proposition 5.2 which ensures that \(\sup _{t\ge 0}\rho (t)<\infty \). \(\square \)

Lemma 6.3

Let f and g be bounded functions defined on \([0,\infty ) \times {{\mathbb {R}}}.\) Then, for any \(k\ge 1,\)

$$\begin{aligned} \mathbf{E}_{Q} \left( I_{T_{k^-}} (f)\,I_{T_{k^-}} (g) \mid {{\mathcal {G}}}\right) = {\bar{\varrho }} (f\,,\,g)_{T_{k}}+ \varrho _{3}^2 \sum _{l=1}^{k-1}\, f(T_{l})\,g(T_{l}), \end{aligned}$$

where \({{\mathcal {G}}}\) is the \(\sigma \)-field generated by the sequence \((T_{l})_{l\ge 1}\), i.e., \({{\mathcal {G}}}=\sigma \{T_{l}\,,\,l\ge 1\}\).

Proof

Using (6.5) and, taking into account that the process \((L_{t})_{t\ge 0}\) is independent of \({{\mathcal {G}}}\), we obtain

$$\begin{aligned} \mathbf{E}_{Q} \left( I_{T_{k^-}} (f)\,I_{T_{k^-}} (g) \mid {{\mathcal {G}}}\right) ={\bar{\varrho }} (f\,,\,g)_{T_{k}} + \varrho _{3}^2 \mathbf{E}\, \left( I^{z}_{T_{k^-}} (f)\,I^{z}_{T_{k^-}} (g) \mid {{\mathcal {G}}}\right) \,. \end{aligned}$$

Moreover,

$$\begin{aligned} \mathbf{E}\, \left( I^{z}_{T_{k^-}} (f)\,I^{z}_{T_{k^-}} (g) \mid {{\mathcal {G}}}\right)&= \mathbf{E}\, \left( \left( \sum _{l=1}^{k-1}\,f(T_{l})Y_{l}\right) \left( \sum _{l=1}^{k-1}\,g(T_{l})Y_{l}\right) \mid {{\mathcal {G}}}\right) \\&= \sum _{l=1}^{k-1}\, f(T_{l})\,g(T_{l}) \,. \end{aligned}$$

Then we obtain the desired result. \(\square \)

Lemma 6.4

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold true. Then, for any measurable bounded non-random functions f and g,  we have

$$\begin{aligned} \left| \mathbf{E}_{Q} \int _{0}^{n} I^2_{t-}(f)\, g(t)\, \mathrm {d}m_{t} \right| \le 2 \varrho _{3}^2 \vert g\vert _{*} \vert f\vert ^{2}_{*}\, \vert \Upsilon \vert _{1} \, n, \end{aligned}$$

where \(m_{t} = \sum _{0\le s \le t}(\Delta z_{s})^2 - \int _{0}^{t} \rho (s) \mathrm {d}s\) and the norm \(\vert \Upsilon \vert _{1}\) is given in (4.1).

Proof

Using the definition of the process \((m_{t})_{t\ge 0}\) we can represent this integral as

$$\begin{aligned} \int _{0}^{n}\, I^2_{t-}(f)\, g(t)\, \mathrm {d}m_{t}&= \sum _{k\ge 1}\,I^2_{T_{k}-}(f)\, g(T_{k})\,Y^{2}_{k}\, \mathbf{1}_{\{T_{k}\le n\}}\nonumber \\&\quad - \int _{0}^{n}\, I^2_{t}(f)\, g(t)\,\rho (t)\, \mathrm {d}t =: V_{n}-U_{n} \,. \end{aligned}$$
(6.6)

Note now that

$$\begin{aligned} \mathbf{E}_{Q}V_{n}\, = \mathbf{E}_{Q} \sum _{k\ge 1}\,g(T_{k})\,\mathbf{E}\left( I^2_{T_{k^-}}(f)\,\mid {{\mathcal {G}}}\right) \,\mathbf{1}_{\{T_{k}\le n\}}\,. \end{aligned}$$

Now, using Lemma 6.3 we can represent the last expectation as

$$\begin{aligned} \mathbf{E}_{Q}V_{n}\,= {\bar{\varrho }}\,\mathbf{E}\,V'_{n} + \varrho ^{2}_{3}\,\mathbf{E}\,V''_{n}, \end{aligned}$$
(6.7)

where

$$\begin{aligned} V^{'}_{n}= \sum _{k\ge 1}\,g(T_{k})\, \Vert f\Vert ^{2}_{T_{k}} \,\mathbf{1}_{\{T_{k}\le n\}} \quad \text{ and }\quad V^{''}_{n}= \sum _{k\ge 2}\,g(T_{k})\, \mathbf{1}_{\{T_{k}\le n\}} \, \sum ^{k-1}_{l=1} \, f^{2}(T_{l}) \,. \end{aligned}$$

The term \(\mathbf{E}\,V^{'}_{n}\) can be represented as

$$\begin{aligned} \mathbf{E}\,V^{'}_{n} = \int ^{n}_{0}\, g(t)\, \Vert f\Vert ^{2}_{t} \rho (t)\mathrm {d}t\,. \end{aligned}$$

We recall that the norm \(\Vert \cdot \Vert _{t}\) is defined in (2.8). To estimate \(\mathbf{E}\,V^{''}_{n}\), note that in view of Fubini’s theorem

$$\begin{aligned} \mathbf{E}\,V^{''}_{n}&=\mathbf{E}\,\sum _{l\ge 1}\, f^{2}(T_{l})\,\sum _{k\ge l+1}\,g(T_{k})\,\mathbf{1}_{\{T_{k}\le n\}}\, \mathbf{1}_{\{T_{l}\le n\}}\\&=\mathbf{E}\,\sum _{l\ge 1}\, f^{2}(T_{l})\,\mathbf{E}_{Q}\left( \sum _{k\ge l+1}\,g(T_{k})\,\mathbf{1}_{\{T_{k}\le n\}} \vert T_{l} \,\right) \mathbf{1}_{\{T_{l}\le n\}}\\&=\sum _{l\ge 1}\,\mathbf{E}\, f^{2}(T_{l})\,{\bar{g}}(T_{l}) \mathbf{1}_{\{T_{l}\le n\}} =\int ^{n}_{0}\,f^{2}(v)\,{\bar{g}}(v)\,\rho (v)\mathrm {d}v \,, \end{aligned}$$

where, similarly to (5.12), the function \({\bar{g}}(\cdot )\) can be represented as

$$\begin{aligned} {\bar{g}}(v)&=\mathbf{E}\, \sum _{k\ge l+1}\,g\left( v+\sum _{j=l+1}^k\, \tau _{j}\right) \,\mathbf{1}_{\{ \sum _{j=l+1}^k\, \tau _{j} \le n-v\}}\\&=\mathbf{E}\,\sum _{m\ge 1}\,g(v+T_{m})\, \mathbf{1}_{\{T_{m}\le n-v\}} =\int ^{n-v}_{0}\,g(v+s)\,\rho (s)\mathrm {d}s =\int ^{n}_{v}\,g(t)\,\rho (t-v)\mathrm {d}t \,. \end{aligned}$$

Moreover, using Lemma 6.1, we calculate the expectation of the last term in (6.6), i.e.

$$\begin{aligned} \mathbf{E}_{Q}\,U_{n}= {\bar{\varrho }} \int _{0}^{n}\, \Vert f\Vert ^2_{t}\, g(t)\,\rho (t)\, \mathrm {d}t + \varrho _{3}^2 \int _{0}^{n}\, \Vert f\sqrt{\rho }\Vert ^2_{t} \, g(t)\,\rho (t)\, \mathrm {d}t \,. \end{aligned}$$

This implies that

$$\begin{aligned} \vert \mathbf{E}_{Q} \int _{0}^{n} I^2_{t-}(f)\, g(t)\, \mathrm {d}m_{t} \vert = \varrho _{3}^2 \vert \int _{0}^{n}\, g(t)\,\delta (t) \mathrm {d}t \vert \le \varrho _{3}^2 \vert g\vert _{*} \int _{0}^{n}\, \vert \delta (t)\vert \mathrm {d}t \,, \end{aligned}$$

where \( \delta (t)= \int _{0}^{t}\, f^{2}(v)\, \left( \rho (t-v) - \rho (t) \right) \, \rho (v)\, \mathrm {d}v\). Note here that, in view of Proposition 5.2 the function \(\delta (t)\) can be estimated for any \(0\le t\le n\) as

$$\begin{aligned} \vert \delta (t)\vert \le \vert f\vert ^{2}_{*}\, \vert \rho \vert _{*}\, \int _{0}^{t}\, \left| \Upsilon (t-v) - \Upsilon (t) \right| \,\mathrm {d}v \le \vert f\vert ^{2}_{*}\, \vert \rho \vert _{*}\, \left( \vert \Upsilon \vert _{1} + n \vert \Upsilon (t) \vert \right) \,, \end{aligned}$$

with \(\Upsilon (x)=\rho (x)-1/{\check{\tau }}\). So,

$$\begin{aligned} \int ^{n}_{0}\vert \delta (t)\vert \mathrm {d}t\le n \vert f\vert ^{2}_{*}\, \vert \rho \vert _{*}\, \left( \vert \Upsilon \vert _{1} + \int ^{n}_{0} \vert \Upsilon (t) \vert \mathrm {d}t \right) \le 2 n \vert f\vert ^{2}_{*}\, \vert \rho \vert _{*}\, \vert \Upsilon \vert _{1} \,, \end{aligned}$$

and, therefore,

$$\begin{aligned} \left| \mathbf{E}_{Q} \int _{0}^{n} I^2_{t-}(f)\, g(t)\, \mathrm {d}m_{t} \right| \le 2 \varrho ^{2}_{3} \vert g\vert _{*} \vert f\vert ^{2}_{*}\, \vert \Upsilon \vert _{1} \, n \end{aligned}$$

and this finishes the proof. \(\square \)

Lemma 6.5

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold true. Then, for any measurable bounded non-random functions f and g, one has

$$\begin{aligned} \mathbf{E}_{Q} \int _{0}^{n} I^2_{t-}(f) I_{t-}(g) g(t) d\xi _{t} = 0\,. \end{aligned}$$
(6.8)

Proof

Putting \(\check{h}_{t}=I^2_{t-}(f) I_{t-}(g) g(t)\), we can represent the integral in (6.8) as

$$\begin{aligned} I_{n}(\check{h})= \sqrt{{\bar{\varrho }}} I_{n}^{\check{L}}(\check{h}) + \varrho _{3} I_{n}^z(\check{h}) \,. \end{aligned}$$
(6.9)

where \(\check{L}= \check{\varrho _{1}} w_t + \check{\varrho _{2}} L_t\), \(\check{\varrho _{1}}=\varrho _{1}/\sqrt{{\bar{\varrho }}}\) and \(\check{\varrho _{2}}=\varrho _{2}/\sqrt{{\bar{\varrho }}}\). First, we will show that

$$\begin{aligned} \mathbf{E}_{Q}I_{n}^{\check{L}}(\check{h})\,=0 \,. \end{aligned}$$
(6.10)

Using the notations (6.4), we set

$$\begin{aligned} J_{1}=\int _{0}^{n} I^2_{t-}(f) I_{t-}^{\check{L}}(g) g(t) \mathrm {d}\check{L}_{t} \quad \text{ and }\quad J_{2}=\int _{0}^{n} I^2_{t-}(f) I_{t-}^{z}(g) g(t) \mathrm {d}\check{L}_{t}, \end{aligned}$$

we obtain that

$$\begin{aligned} \int _{0}^{n} \, I^2_{t-}(f) I_{t-}(g) g(t) \mathrm {d}\check{L}_{t}= \sqrt{{\bar{\varrho }}} \, \,J_{1} + \varrho _{3}\,J_{2}\,. \end{aligned}$$
(6.11)

Taking into account that, for any non random square integrated function g the integral \(\left( \int _{0}^{t}\, g(s) \mathrm {d}w_{s}\right) \) is Gaussian with the parameters \(\left( 0, \int ^{t}_{0}\,g^{2}(s)\mathrm {d}s\right) \), we obtain

$$\begin{aligned} \sup _{0 \le t\le n} \mathbf{E}\, \left( I_{t}^{w}(g) \right) ^{8} < \infty \,. \end{aligned}$$
(6.12)

By applying inequality (A.1) for the non-random function \(h(s,x)=g(s)x\), and, recalling that \(\Pi (x^{8})<\infty \), we obtain,

$$\begin{aligned} \sup _{0 \le t\le n} \mathbf{E}\, \left( I_{t}^{L}(g) \right) ^{8} < \infty \,. \end{aligned}$$

Therefore, we obtain that

$$\begin{aligned} \sup _{0 \le t\le n} \mathbf{E}_{Q} \left( I_{t}^{\check{L}}(g) \right) ^{8} < \infty \,. \end{aligned}$$

Finally, by using the Cauchy inequality, we can estimate for any \(0<t\le n\) the following expectation as

$$\begin{aligned} \mathbf{E}_{Q}(I^{\check{L}}_{t}(f))^{4} (I_{t}^{\check{L}}(g))^{2} < \sqrt{\mathbf{E}_{Q}(I^{\check{L}}_{t}(f))^{8}} \sqrt{\mathbf{E}_{Q}(I^{\check{L}}_{t}(g))^{4}} \end{aligned}$$

i.e.,

$$\begin{aligned} \sup _{0\le t\le n}\, \mathbf{E}_{Q}\left( I^{\check{L}}_{t}(f)\right) ^{4} \left( I_{t}^{\check{L}}(g)\right) ^{2} < \infty \,. \end{aligned}$$

Moreover, taking into account that the processes \((\check{L}_{t})_{t\ge 0}\) and \((z_{t})_{t\ge 0}\) are independent, we obtain that

$$\begin{aligned} \mathbf{E}_{Q}(I^{z}_{t}(f))^{4} \left( I_{t}^{\check{L}}(g)\right) ^{\,2} = \mathbf{E}_{Q}(I^{z}_{t}(f))^{4} \mathbf{E}_{Q}(I_{t}^{\check{L}}(g))^{2} = \int ^{t}_{0}\,g^{2}(s)\mathrm {d}s \, \mathbf{E}\, (I^{z}_{t}(f))^{4}\,. \end{aligned}$$

One can check directly here that, for \(t>0,\)

$$\begin{aligned} \mathbf{E}\, |I_{t}^z(f)|^4 \le \vert f\vert _{*}^{4}\, \mathbf{E}\, Y_{1}^4\, \mathbf{E}\, N_{t}^2\,. \end{aligned}$$

where \(\vert f\vert _{*}= \sup _{0 \le t\le n} \,|f(t)|\). Note that Corollary 5.3 yields \(\mathbf{E}\,N_{t}^2\,< \infty \), therefore \( \sup _{0 \le t\le n}\, \mathbf{E}_{Q} (I_{t}^z(f))^4 < \infty \) and we obtain,

$$\begin{aligned} \sup _{0 \le t\le n}\, \mathbf{E}_{Q} (I_{t}(f))^4 (I_{t}^{\check{L}}(g))^{2} < \infty \,. \end{aligned}$$

Taking into account that the process \((\check{L}_{t})_{t\ge 0}\) is square integrated martingale with the quadratic characteristic \(<\check{L}>_{t}=t\), we obtain that \(\mathbf{E}J_{1}=0\). As to the last term in (6.11) note that similarly to the previous reasoning we obtain that

$$\begin{aligned} \mathbf{E}_{Q} \int _{0}^{n} \left( I_{t-}^{\check{L}}(f)\right) ^2 I_{t-}^z(g) g(t) \mathrm {d}\check{L}_{t} = 0 \quad \text{ and }\quad \mathbf{E}_{Q} \int _{0}^{n} I_{t-}^{\check{L}}(f) I_{t-}^z(f) I_{t-}^z(g) g(t) \mathrm {d}\check{L}_{t}=0 \,. \end{aligned}$$

Therefore, to show (6.10) one needs to check that

$$\begin{aligned} \mathbf{E}_{Q} I^{\check{L}}_{n}(\check{h}^{z})=0\,, \end{aligned}$$
(6.13)

where \(\check{h}^{z}_{t}=(I^{z}_{t-}(f))^{2} I^{z}_{t-}(g) g(t)\). To this end, note that, for any \(0<t\le n\)

$$\begin{aligned} I_{t}^z(f)=\sum ^{\infty }_{k=1}\,f(T_{k})\,Y_{k}\,\mathbf{1}_{\{T_{k}\le t\}} = \sum ^{N_{n}}_{k=1}\,f(T_{k})\,Y_{k}\,\mathbf{1}_{\{T_{k}\le t\}}\,, \end{aligned}$$
(6.14)

i.e.,

$$\begin{aligned} I^{L}_{n}(\check{h}^{z}) = \sum ^{N_{n}}_{k=1}\, \sum ^{N_{n}}_{l=1}\, \sum ^{N_{n}}_{j=1}\, f(T_{k})\, f(T_{l})\, g(T_{j})\,Y_{j} Y_{l}\,Y_{k}\, I_{klj} \,, \end{aligned}$$

where \( I_{klj}= \int _{0}^{n} \mathbf{1}_{\{T_{k}\le t\}} \mathbf{1}_{\{T_{l}\le t\}} \mathbf{1}_{\{T_{j}\le t\}} g(t) \mathrm {d}\check{L}_{t}\). Taking into account that the process \((\check{L}_{t})_{t\ge 0}\) is independent of the field \({{\mathcal {G}}}_{z}=\sigma \{z_{t}\,,t\ge 0\},\) we obtain that \(\mathbf{E}_{Q}\left( I_{klj}\vert {{\mathcal {G}}}_{z} \right) =0\) and

$$\begin{aligned} \mathbf{E}_{Q}\left( I^{2}_{klj}\vert {{\mathcal {G}}}_{z} \right) =\int _{0}^{n} \mathbf{1}_{\{T_{k}\le t\}} \mathbf{1}_{\{T_{l}\le t\}} \mathbf{1}_{\{T_{j}\le t\}} g^{2}(t) \mathrm {d}t \le \Vert g\Vert ^{2}_{n}<\infty \,. \end{aligned}$$

Moreover,

$$\begin{aligned} \mathbf{E}_{Q} \vert I^{\check{L}}_{n}(\check{h}^{z}) \vert&\le \vert g\vert _{*} \vert f\vert ^{2}_{*}\Vert g\Vert _{n}\, \mathbf{E}\, \sum ^{N_{n}}_{k=1}\, \sum ^{N_{n}}_{l=1}\, \sum ^{N_{n}}_{j=1}\, \vert Y_{j}\vert \vert Y_{l}\vert \,\vert Y_{k}\vert \\&\le \mathbf{E}\,\vert Y_{1}\vert ^{3} \vert g\vert _{*} \vert f\vert ^{2}_{*}\Vert g\Vert _{n}\, \mathbf{E}N^{3}_{n} \,. \end{aligned}$$

Corollary 5.3 implies that \(\mathbf{E}N^{3}_{n}<\infty \), i.e. \(\mathbf{E}_{Q} \vert I^{\check{L}}_{n}(\check{h}^{z}) \vert <\infty \), and, therefore,

$$\begin{aligned} \mathbf{E}_{Q} I^{\check{L}}_{n}(\check{h}^{z}) =\mathbf{E}_{Q} \sum ^{N_{n}}_{k=1}\, \sum ^{N_{n}}_{l=1}\, \sum ^{N_{n}}_{j=1}\, f(T_{k})\, f(T_{l})\, g(T_{j})\,Y_{j} Y_{l}\,Y_{k}\, \mathbf{E}_{Q}\left( I_{klj}\vert {{\mathcal {G}}}_{z} \right) =0. \end{aligned}$$

So, we obtain (6.13). Furthermore, to study the last term in (6.9) note that the \((z_{t})_{t\ge 0}\) is the martingale with the bounded variation. Moreover, in view of the definition of \(\check{h}\) we get

$$\begin{aligned} \mathbf{E}_{Q} \int ^{n}_{0} \vert \check{h}_{t}\vert \mathrm {d}[z]_{t}&= \mathbf{E}_{Q}\sum ^{N_{n}}_{k=1} \vert \check{h}_{T_{k}}\vert Y^{2}_{k} = \mathbf{E}_{Q}\sum ^{N_{n}}_{k=1} \vert \check{h}_{T_{k}}\vert \\&\le \frac{\vert g\vert _{*}}{2} \left( \sum _{k\ge 1}\mathbf{E}_{Q}\, I^{4}_{T_{k^-}}(f) \mathbf{1}_{\{T_{k}\le n\}} + \sum _{k\ge 1}\mathbf{E}_{Q}\,I^{2}_{T_{k^-}}(g)\mathbf{1}_{\{T_{k}\le n\}} \right) \,. \end{aligned}$$

Reminding here, that the processes \((\check{L}_{t})_{t\ge 0}\) and \((z_{t})_{t\ge 0}\) are independent and using (6.12), we obtain that for \(1\le m\le 2\) and any bounded function f

$$\begin{aligned} \sum _{k\ge 1}\, \mathbf{E}_{Q}\left( I^{\check{L}}_{T_{k^-}}(f)\right) ^{2m} \mathbf{1}_{\{T_{k}\le n\}}&\le \sup _{0\le t\le n}\, \mathbf{E}_{Q}\left( I^{\check{L}}_{t}(f)\right) ^{2m} \mathbf{E}\, \sum _{k\ge 1}\, \mathbf{1}_{\{T_{k}\le n\}}\\&= \sup _{0\le t\le n}\, \mathbf{E}_{Q}\left( I^{\check{L}}_{t}(f)\right) ^{2m} \mathbf{E}N_{n} <\infty \,. \end{aligned}$$

Moreover, from (6.14) through the Hölder inequality we obtain that

$$\begin{aligned} \left( I^{z}_{T_{k^-}}(f)\right) ^{2m}= \left( \sum ^{k-1}_{j=1}\,f(T_{j})\,Y_{j} \right) ^{2m} \le \vert f\vert ^{2m}_{*}(k-1)^{2m-1}\sum ^{k-1}_{j=1}\,Y^{2m}_{j} \,. \end{aligned}$$

Taking into account that the sequences \((Y_{j})_{j\ge 1}\) and \((T_{k})_{k\ge 1}\) are independent we obtain through Corrolary 5.3 that

$$\begin{aligned}&\sum _{k\ge 1}\, \mathbf{E}\, \left( I^{z}_{T_{k^-}}(f)\right) ^{2m} \mathbf{1}_{\{T_{k} \le n\}} \le \vert f\vert ^{2m}_{*}\, \mathbf{E}\, Y^{2m}_{1}\, \mathbf{E}\, \sum _{k\ge 1}\,k^{2m} \mathbf{1}_{\{T_{k}\le n\}}\\&\quad = \vert f\vert ^{2m}_{*}\, \mathbf{E}\, Y^{2m}_{1}\, \mathbf{E}\, \sum ^{N_{n}}_{k\ge 1}\,k^{2m} \le \vert f\vert ^{2m}_{*}\, \mathbf{E}\, Y^{2m}_{1}\, \mathbf{E}\, N^{2m+1}_{n} <\infty \,. \end{aligned}$$

Thus \(\mathbf{E}_{Q} \int ^{n}_{0} \vert \check{h}_{t}\vert \mathrm {d}[z]_{t}<\infty \), therefore, \(\mathbf{E}_{Q}I_{n}^z(\check{h})=0\) and we get the equality (6.8). \(\square \)

7 Properties of the regression model (1.1)

In order to prove the non-asymptotic sharp oracle inequalities we use the method proposed in Konev and Pergamenshchikov (2009a) and Konev and Pergamenshchikov (2012) for the general semi-martingale model (1.1). To this end we need to study the following functions of \(x\in {{\mathbb {R}}}^{n}\)

$$\begin{aligned} B_{1,Q,n}(x)= \sum _{j=1}^{n} x_{j} \, \left( \mathbf{E}_{Q}\xi ^2_{j,n} - \sigma _{Q}\right) \quad \text{ and }\quad B_{2,Q,n}(x)= \sum _{j=1}^{n}\,x_{j}\,\widetilde{\xi }_{j,n} \,, \end{aligned}$$
(7.1)

where \(\sigma _{Q}\) is defined in (2.11), \(\widetilde{\xi }_{j,n}=\xi ^2_{j,n}- \mathbf{E}_{Q}\xi ^2_{j,n}\) and \(\xi _{j,n}\) is given in (3.5). These functions describe the behavior for the total noise intensity and variance respectively for the chosen Fourier coefficients in the estimators (3.7).

Remark 7.1

The propositions 7.17.2 given below are used to obtain the oracle inequalities in Sect. 4 (see, for example, Konev and Pergamenshchikov 2012).

Proposition 7.1

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold. Then

$$\begin{aligned} \sup _{x\in [-1,1]^{n}}\,\vert B_{1,Q,n}(x)\vert \le \mathbf{C}_{1,Q,n} \,, \end{aligned}$$
(7.2)

where \(\mathbf{C}_{1,Q,n}= \sigma _{Q}\,{\check{\tau }}\,\phi ^{2}_{max}\,\vert \Upsilon \vert _{1}\), \(\sigma _{Q}={\bar{\varrho }}+ \varrho _{3}^2/{\check{\tau }}\) and \(\Upsilon (x)=\rho (x)-1/{\check{\tau }}\).

Proof

First, taking into account that \(\xi _{j,n}=n^{-1/2}I_{n}(\phi _{j})\) and \(\Vert \phi _{j}\Vert ^{2}_{n}=n\), we obtain through Lemma 6.1 that

$$\begin{aligned} \mathbf{E}_{Q}{\xi ^2_{j,n}} = {\bar{\varrho }} + \frac{ \varrho _{3}^2}{ n}\int ^{n}_{0}\,\phi ^2_{j}(x)\,\rho (x)\mathrm {d}x =\sigma _{Q} + \frac{ \varrho _{3}^2}{ n}\int ^{n{\check{\tau }}}_{0}\,\phi ^2_{j}(x)\,\Upsilon (x)\mathrm {d}x \,. \end{aligned}$$

So, in view of Condition (3.1) and the Eq. (4.1), we obtain

$$\begin{aligned} \left| \mathbf{E}_{Q}\xi ^2_{j,n} - \sigma _{Q} \right| =\frac{ \varrho _{3}^2}{n}\, \left| \int _{0}^{n} \phi ^2_{j}(x)\Upsilon (x) \mathrm {d}\,x \right| \le \, \frac{ \varrho _{3}^2}{n}\, \phi ^2_{max}\, \vert \Upsilon \vert _{1}\,. \end{aligned}$$

We bound here \( \varrho _{3}^2\) by \(\sigma _{Q}{\check{\tau }}\) we obtain the inequality (7.2) and hence the conclusion follows. \(\square \)

Proposition 7.2

Assume that Conditions \((\mathbf{H}_{1})\)\((\mathbf{H}_{4})\) hold. Then

$$\begin{aligned} \sup _{\vert x\vert \le 1}\, \mathbf{E}_{Q}\, B^{2}_{2,Q,n}(x) \, \le \,\mathbf{C}_{2,Q,n}, \end{aligned}$$
(7.3)

where \(\vert x\vert ^{2}=\sum ^{n}_{j=1}x^{2}_{j}\), \(\mathbf{C}_{2,Q,n}=\phi ^{4}_{max} (1+\sigma ^{2}_{Q})^{2}\,{\check{\mathbf{l}}} \) and \({\check{\mathbf{l}}}\) is given in (4.3).

Proof

By Ito’s formula one gets

$$\begin{aligned} \mathrm {d}I^2_{t}(f)= 2 I_{t-}(f) \mathrm {d}I_{t}(f)+ \varrho _{1}^2\, f^2(t) \mathrm {d}t + \mathrm {d}\sum _{0\le s \le t}f^2(s) (\Delta \xi ^{d}_{s})^2\,. \end{aligned}$$
(7.4)

Using the representations (6.2) and (6.3), we can rewrite this differential as

$$\begin{aligned} \mathrm {d}I^2_{t}(f)&= 2 I_{t-}(f) \mathrm {d}I_{t}(f)+ \varrho ^2_{1}\, f^2(t) \mathrm {d}\,t\nonumber \\&\quad + \varrho ^2_{2}\mathrm {d}\sum _{0\le s \le t}f^2(s) (\Delta L_{s})^2 + \varrho ^{2}_{3}\mathrm {d}\sum _{0\le s \le t}f^2(s) (\Delta z_{s})^2 \,. \end{aligned}$$
(7.5)

From Lemma 6.1 it follows that \( \mathbf{E}_{Q}\, I^2_{t}(f) = {\bar{\varrho }}\, \Vert f\Vert ^{2}_{t} + \varrho ^{2}_{3} \Vert f\sqrt{\rho }\Vert ^{2}_{t}\). Therefore, putting

$$\begin{aligned} \widetilde{I}_{t}(f) = I_{t}^2(f) -\mathbf{E}_{Q} I^2_{t}(f)\,, \end{aligned}$$
(7.6)

we obtain

$$\begin{aligned} \mathrm {d}\widetilde{I}_{t}(f)= 2 I_{t-}(f) f(t) d\xi _{t} + f^2(t) \mathrm {d}\widetilde{m}_{t}\,,\quad \widetilde{m}_{t}=\varrho _{2}^2\check{m}_{t}+\varrho _{3}^2m_{t}\,, \end{aligned}$$
(7.7)

where \( \check{m}_{t}= \sum _{0\le s \le t}(\Delta L_{s})^2 - t\) and \( m_{t} = \sum _{0\le s \le t}(\Delta z_{s})^2 - \int _{0}^{t} \rho (s) \mathrm {d}s\). For any non-random vector \(x= (x_{j})_{1\le j\le n}\) with \(\sum ^{n}_{j= 1} x^{2}_{j} \le 1\), we set

$$\begin{aligned} {\bar{I}}_{t} = {\bar{I}}_{t}(x) = \sum _{j=1}^{n} x_{j} \widetilde{I}_{t}(\phi _{j}). \end{aligned}$$
(7.8)

Denoting

$$\begin{aligned} A_{t} = \sum ^{n}_{j=1} x_{j} I_{t}(\phi _{j}) \phi _{j}(t) \quad \text{ and }\quad B_{t} = \sum ^{n}_{j=1} x_{j} \phi ^2_{j}(t)\,, \end{aligned}$$
(7.9)

we get the following stochastic differential equation for (7.8)

$$\begin{aligned} \mathrm {d}{\bar{I}}_{t} = 2 A_{t-} \mathrm {d}\xi _{t} + B_{t} \mathrm {d}\widetilde{m}_{t} \,, \quad {\bar{I}}_{0}(x)=0\,. \end{aligned}$$

Now, similarly to (7.4) the Ito formula and representation (6.2) yield

$$\begin{aligned} \mathrm {d}\,{\bar{I}}^{2}_{t}&=2 {\bar{I}}_{t-} \mathrm {d}{\bar{I}}_{t}+ \mathrm {d}<{\bar{I}}^{c}>_{t} +\mathrm {d}\sum _{0\le s\le t}\left( \Delta {\bar{I}}_{s}\right) ^{2} \\&=2 {\bar{I}}_{t-} \mathrm {d}{\bar{I}}_{t}+4\varrho ^{2}_{1}\,A^{2}_{t} \mathrm {d}t +\mathrm {d}\sum _{0\le s\le t}\left( 2 A_{s-} \Delta \xi ^{d}_{s} + B_{s}\,\Delta \widetilde{m}_{s} \right) ^{2} \,. \end{aligned}$$

Taking into account that the processes \((L_{t})_{t\ge 0}\) and \((z_{t})_{t\ge 0}\) are independent, we obtain that \(\Delta L_{t} \Delta z_{t}=0\), therefore, for any \(t\ge 0\)

$$\begin{aligned} \left( 2 A_{t-} \Delta \xi ^{d}_{t} + B_{t}\,\Delta \widetilde{m}_{t} \right) ^{2}&= \left( 2\varrho _{2}A_{t-} \Delta L_{t}+\varrho ^{2}_{2} B_{t} (\Delta L_{t})^{2}\right) ^{2}\\&\quad + \left( 2\varrho _{3}A_{t-} \Delta z_{t} + \varrho ^{2}_{3} B_{t}(\Delta z_{t})^{2} \right) ^{2} \,. \end{aligned}$$

This implies

$$\begin{aligned} \mathbf{E}_{Q}{\bar{I}}_n^2&= 2\mathbf{E}_{Q} \int _{0}^{n} {\bar{I}}_{t-} \mathrm {d}{\bar{I}}_{t} + 4 \varrho _{1}^2 \mathbf{E}_{Q}\int _{0}^{n} A_{t}^2 \mathrm {d}\,t \nonumber \\&\quad +\varrho _{2}^2\ \mathbf{E}_{Q} \check{D}_{n} + \varrho _{3}^2\,\mathbf{E}_{Q}D_{n}\,, \end{aligned}$$
(7.10)

where \(\check{D}_{n}=\sum _{0\le t\le n}\left( 2A_{t-} \Delta L_{t}+\varrho _{2} B_{t} (\Delta L_{t})^{2}\right) ^{2}\) and

$$\begin{aligned} D_{n}= \sum _{0\le t\le n} \left( 2A_{t-} \Delta z_{t} + \varrho _{3} B_{t}(\Delta z_{t})^{2} \right) ^{2} \,. \end{aligned}$$

It should be noted here that

$$\begin{aligned} \mathbf{E}_{Q}B^{2}_{2,Q,n}(x) =\frac{1}{n^{2}}\mathbf{E}_{Q}{\bar{I}}^{2}_n(x) \,. \end{aligned}$$
(7.11)

Let us now show that

$$\begin{aligned} \left| \mathbf{E}_{Q} \int _{0}^{n} {\bar{I}}_{t-} d{\bar{I}}_{t} \right| \le 2\, \varrho _{3}^4 \phi ^{3}_{max}\,\vert \Upsilon \vert _{1}\, n^{2}\,. \end{aligned}$$
(7.12)

To this end, note that by (7.7)

$$\begin{aligned} \int _{0}^{n} {\bar{I}}_{t-} \mathrm {d}{\bar{I}}_{t}&= 2 \sum _{1\le j,l\le \,n} x_{j} x_{l} \int _{0}^{n} \widetilde{I}_{t-}(\phi _{j}) \,I_{t-}(\phi _{l})\phi _{l}(t) \mathrm {d}\xi _{t} \\&\quad + \sum _{j=1}^{n} x_{j} \int _{0}^{n} \widetilde{I}_{t-}(\phi _{j}) B_{t} \mathrm {d}\widetilde{m}_{t} \,. \end{aligned}$$

Using here Lemma 6.5, we get \( \mathbf{E}_{Q} \int _{0}^{n}\, \widetilde{I}_{t-}(\phi _{j})\, I_{t-}(\phi _{i}) \phi _{i}(t)\mathrm {d}\xi _{t} =0\). Moreover, the process \((\check{m}_{t})_{t\ge 0}\) is a martingale, i.e. \(\mathbf{E}_{Q} \int _{0}^{n} \widetilde{I}_{t-}(\phi _{j}) B_{t} \mathrm {d}\check{m}_{t}=0\). Therefore,

$$\begin{aligned} \mathbf{E}_{Q} \int _{0}^{n} {\bar{I}}_{t-} d{\bar{I}}_{t} = \varrho ^{2}_{3} \sum _{j=1}^{n} x_{j}\mathbf{E}_{Q} \int _{0}^{n} \widetilde{I}_{t-}(\phi _{j}) B_{t} \mathrm {d}m_{t}\,. \end{aligned}$$

Taking into account here that for any non-random bounded function f

$$\begin{aligned} \mathbf{E}_{Q}\int _{0}^{n} f(t) \mathrm {d}m_{t}=0, \end{aligned}$$

we obtain \(\mathbf{E}_{Q}\int _{0}^{n} \widetilde{I}_{t-}(\phi _{j}) \,B_{t} \, \mathrm {d}m_{t} = \mathbf{E}_{Q}\int _{0}^{n} I^2_{t-} (\phi _{j})\,B_{t} \, \mathrm {d}m_{t}\). So, Lemma 6.4 yields

$$\begin{aligned} \left| \mathbf{E}_{Q}\int _{0}^{n} \widetilde{I}_{t-}(\phi _{j}) \,B_{t} \mathrm {d}m_{t}\right|&= \left| \sum _{l=1}^{n} x_{l} \mathbf{E}_{Q}\int _{0}^{n} I^2_{t-} (\phi _{j}) \phi ^2_{l}(t) \mathrm {d}m_{t} \right| \\&\le \,2\,\varrho ^{2}_{3}\phi ^{3}_{max}\,\vert \Upsilon \vert _{1}\, \sum _{l=1}^{n} \vert x_{l}\vert n\,. \end{aligned}$$

Therefore,

$$\begin{aligned} \left| \mathbf{E}_{Q} \int _{0}^{n} {\bar{I}}_{t-} d{\bar{I}}_{t} \right|&\le 2\, \varrho _{3}^4 \phi ^{3}_{max}\,\vert \Upsilon \vert _{1}\, n \sum _{1\le l,j\le \,n}\, \vert x_{l} \vert \, \vert x_{j} \vert \\&= 2\, \varrho _{3}^4 \phi ^{3}_{max}\,\vert \Upsilon \vert _{1}\, n \left( \sum _{l=1}^{n}\, \vert x_{l} \vert \right) ^{2}. \end{aligned}$$

Taking into account here that \( \left( \sum ^{n}_{l=1}\, \vert x_{l} \vert \right) ^{2} \le \, n \sum _{l\ge \,1}\, x^{2}_{l}\le n\), we obtain (7.12). Recall that \(\Pi (x^{2})=1\). Using the definition (2.1) and the properties of the jump measures (see, for example, Liptser and Shiryaev 1986, chapter 3) we obtain that

$$\begin{aligned} \mathbf{E}_{Q}\sum _{0\le t\le n} (\Delta L_{t})^{4}= \mathbf{E}_{Q} \int ^{n}_{0}\int _{{{\mathbb {R}}}_{*}}x^{4}\,\mu (\mathrm {d}t,\mathrm {d}x)= \mathbf{E}_{Q} \int ^{n}_{0}\int _{{{\mathbb {R}}}_{*}}x^{4}\,\widetilde{\mu }(\mathrm {d}t,\mathrm {d}x) = n \Pi (x^{4}) \end{aligned}$$

and

$$\begin{aligned} \mathbf{E}_{Q}\sum _{0\le t\le n} B_{t}A_{t-} (\Delta L_{t})^{3}&= \mathbf{E}_{Q} \int ^{n}_{0}\int _{{{\mathbb {R}}}_{*}} B_{t}A_{t-}x^{3}\,\mu (\mathrm {d}t,\mathrm {d}x)\\&= \mathbf{E}_{Q} \int ^{n}_{0}\int _{{{\mathbb {R}}}_{*}} B_{t}A_{t}x^{3}\,\widetilde{\mu }(\mathrm {d}t,\mathrm {d}x) =\Pi (x^{3})\,\int ^n_0\,B_{t} \, \mathbf{E}_{Q} A_{t}\, \mathrm {d}t = 0\,. \end{aligned}$$

From this it follows directly that

$$\begin{aligned} \mathbf{E}_{Q}\check{D}_{n} =4\,\mathbf{E}_{Q}\int ^{n}_{0}\,A^{2}_{t}\mathrm {d}t + \varrho _{2}^4 \,\Pi (x^{4}) \int ^{n}_{0}\,B^{2}_{t} \mathrm {d}t \,. \end{aligned}$$
(7.13)

Note that, thanks to Lemma 6.1, we obtain

$$\begin{aligned} \mathbf{E}_{Q} \int _{0}^{n} A^{2}_{t} \mathrm {d}\,t&= \sum _{i,j} x_{i} x_{j} \int _{0}^{n} \phi _{i}(t)\phi _{j}(t) \mathbf{E}_{Q} I_{t}\phi _{i}(t)I_{t}\phi _{j}(t) \mathrm {d}\,t \\&= \sum _{i,j} x_{i} x_{j} \int _{0}^{n} \phi _{i}(t)\phi _{j}(t) \,\int _{0}^{t}\, \phi _{i}(v)\phi _{j}(v)({\bar{\varrho }}+\varrho _{3}^2\rho (v)) \mathrm {d}v\\&= \frac{1}{2}{\bar{\varrho }}\, \sum _{i,j} x_{i} x_{j}\left( \int _{0}^{n} \phi _{i}(t)\phi _{j}(t)\mathrm {d}t\right) ^{2} +\varrho _{3}^2\,A_{1,n} \le \frac{n^{2}}{2}{\bar{\varrho }} + \varrho _{3}^2\,A_{1,n}\,, \end{aligned}$$

where \(A_{1,n}=\sum _{i,j} x_{i} x_{j} \int _{0}^{n} \phi _{i}(t)\phi _{j}(t) \,\left( \int _{0}^{t}\, \phi _{i}(v)\phi _{j}(v)\,\rho (v) \mathrm {d}v\right) \mathrm {d}t\). This term can be estimated through Proposition 5.2 as

$$\begin{aligned} \left| A_{1,n}\right|&= \left| \frac{n^{2}}{2{\check{\tau }}}\,+\, \sum _{i,j} x_{i} x_{j} \int _{0}^{n} \phi _{i}(t)\phi _{j}(t) \,\left( \int _{0}^{t}\, \phi _{i}(v)\phi _{j}(v)\,\Upsilon (v) \mathrm {d}v\right) \mathrm {d}t \right| \\&\le \frac{n^{2}}{2{\check{\tau }}}\,+n \,\phi ^{4}_{max}\, \vert \Upsilon \vert _{1}\, \sum _{i,j} \vert x_{i}\vert \vert x_{j}\vert \le \, \left( \frac{1}{2{\check{\tau }}}\,+\,\phi ^{4}_{max}\, \vert \Upsilon \vert _{1} \right) n^{2}\,. \end{aligned}$$

So, reminding that \(\sigma _{Q}={\bar{\varrho }}+\varrho _{3}^2/{\check{\tau }}\) and that \(\phi _{max}\ge 1\), we obtain that

$$\begin{aligned} \mathbf{E}_{Q}\int _{0}^{n} A^{2}_{t} \mathrm {d}\,t&\le \left( \frac{\sigma _{Q}}{2} +\varrho _{3}^2\phi ^{4}_{max}\, \vert \Upsilon \vert _{1} \right) n^{2}\nonumber \\&\le \left( \frac{1}{2} +{\check{\tau }} \vert \Upsilon \vert _{1} \right) \phi ^{4}_{max}\, \sigma _{Q} \, n^{2} \,. \end{aligned}$$
(7.14)

Taking into account that

$$\begin{aligned} \sup _{t\ge 0} B^2_{t}\,\le \,\phi ^{4}_{max}\, \left( \sum ^{n}_{j=1} \vert x_{j}\vert \right) ^{2} \le \,\phi ^{4}_{max}\, n\,, \end{aligned}$$
(7.15)

that \(\phi _{max}\ge 1,\) and that \({\bar{\varrho }}^2 \le \sigma ^{2}_{Q}\) we estimate the expectation in (7.13) as

$$\begin{aligned} \mathbf{E}_{Q}\check{D}_{n}\le \phi ^{4}_{max}(1+\sigma ^{2}_{Q}) \left( 1+2{\check{\tau }}\vert \Upsilon \vert _{1}+\Pi (x^{4}) \right) \,n^{2} \,. \end{aligned}$$
(7.16)

Moreover, taking into account that the random variable \(Y_{k}\) is independent of \(A_{T_{k^-}}\) and of the field \({{\mathcal {G}}}=\sigma \{T_{j}\,,\,j\ge 1\}\) and that \(\mathbf{E}_{Q}\left( A_{T_{k^-}}\,\vert {{\mathcal {G}}}\right) =0\), we get

$$\begin{aligned}&\mathbf{E}_{Q}\sum _{k=1}^{+\infty } B_{T_{k}}\,A_{T_{k^-}} Y_{k}^3 1_{\{T_{k} \le n\}} = \sum _{k=1}^{+\infty }\mathbf{E}_{Q}\left( \mathbf{E}_{Q} \left( B_{T_{k}}\,A_{T_{k^-}} Y_{k}^3 1_{\{T_{k} \le n\}}\vert {{\mathcal {G}}}\right) \right) \\&\quad = \mathbf{E}_{Q}Y_{1}^3\,\mathbf{E}_{Q}\sum _{k=1}^{+\infty } B_{T_{k}} 1_{\{T_{k} \le n\}}\, \mathbf{E}_{Q} (A_{T_{k^-}}\vert {{\mathcal {G}}}) =0\,. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbf{E}_{Q} \,D_{n} = \varrho _{3}^2 D_{1,n}\,\mathbf{E}\, Y_1^4 + 4 D_{2,n}\,, \end{aligned}$$
(7.17)

where

$$\begin{aligned} D_{1,n} = \sum _{k=1}^{+\infty } \mathbf{E}\, B^2_{T_{k}} \mathbf{1}_{\{T_{k} \le n\}} \quad \text{ and }\quad D_{2,n}= \sum _{k=1}^{+\infty } \mathbf{E}_{Q} A^2_{T_{k^-}}\mathbf{1}_{\{T_{k} \le n\}}\,. \end{aligned}$$

Using the bound (7.15) we can estimate the term \(D_{1,n}\) as \( D_{1,n} \le \phi ^{4}_{max} n \mathbf{E}\,N_{n}\). Using here Corollary 5.3, we obtain

$$\begin{aligned} D_{1,n} \le \,\vert \rho \vert _{*} \phi ^{4}_{max} \,n^{2}\,. \end{aligned}$$
(7.18)

Now, to estimate the last term in (7.17), note that the process \(A_{t} \) can be rewritten as

$$\begin{aligned} A_{t} = \int _{0}^{t} \mathbf{K}(t,s) \mathrm {d}\xi _{s}, \qquad \mathbf{K}(t,s) = \sum ^{n}_{j=1} x_{j} \phi _{j}(s) \phi _{j}(t). \end{aligned}$$
(7.19)

Applying Lemma 6.3 again, we obtain for any \(k\ge 1\)

$$\begin{aligned} \mathbf{E}_{Q}\left( A^2_{T_{k^{-}}}\vert {{\mathcal {G}}}\right) = {\bar{\varrho }}\, \int _{0}^{T_{k}}\,\mathbf{K}^{2}(T_{k},s) \mathrm {d}s + \varrho ^{2}_{3}\, \sum ^{k-1}_{j=1}\, \mathbf{K}^{2}(T_{k},T_{j}) \,. \end{aligned}$$

So, we can represent the last term in (7.17) as

$$\begin{aligned} D_{2,n}= {\bar{\varrho }}\,D^{(1)}_{2,n} + \varrho ^{2}_{3}\,D^{(2)}_{2,n} \,, \end{aligned}$$
(7.20)

where

$$\begin{aligned} D^{(1)}_{2,n}=\sum _{k=1}^{+\infty } \mathbf{E}\mathbf{1}_{\{T_{k} \le n\}}\, \int _{0}^{T_{k}}\,\mathbf{K}^{2}(T_{k},s) \mathrm {d}s \quad \text{ and }\quad D^{(2)}_{2,n}=\sum _{k=1}^{+\infty } \mathbf{E}\mathbf{1}_{\{T_{k} \le n\}}\, \sum ^{k-1}_{j=1}\, \mathbf{K}^{2}(T_{k},T_{j})\,. \end{aligned}$$

Thanks to Proposition 5.2 we obtain

$$\begin{aligned} D^{(1)}_{2,n}= \int ^{n}_{0}\, \int _{0}^{t}\,\mathbf{K}^{2}(t,s) \mathrm {d}s\, \rho (t) \,\mathrm {d}t \,\le \, \vert \rho \vert _{*}\, \int ^{n}_{0}\, \int _{0}^{n}\,\mathbf{K}^{2}(t,s) \mathrm {d}s \,\mathrm {d}t \,. \end{aligned}$$

In view of the definition of \(\mathbf{K}(\cdot ,\cdot )\) in (7.19), we can rewrite the last integral as

$$\begin{aligned} \int _{0}^{n}\,\mathbf{K}^{2}(t,s) \mathrm {d}s&= \sum _{1\le i,j\le n}\,x_{i}\,x_{j}\, \phi _{i}(t)\,\phi _{j}(t)\, \int ^{n}_{0}\, \phi _{i}(s)\,\phi _{j}(s)\,\mathrm {d}s\\&= \sum ^{n}_{i=1}\,x^{2}_{i}\, \phi ^{2}_{i}(t)\, \int ^{n}_{0}\, \phi ^{2}_{i}(s)\,\mathrm {d}s =\,n\, \sum ^{n}_{i=1}\,x^{2}_{i}\, \phi ^{2}_{i}(t) \,. \end{aligned}$$

Since \(\sum ^{n}_{j=1}\,x^{2}_{j}\le 1\), we obtain that,

$$\begin{aligned} \int _{0}^{n}\,\mathbf{K}^{2}(t,s) \mathrm {d}s\, \le \, \phi ^{2}_{max}\, n \quad \text{ and }\quad D^{(1)}_{2,n}\, \le \phi ^{2}_{max}\, \vert \rho \vert _{*}\,n^{2} \,. \end{aligned}$$
(7.21)

Let us estimate now the last term in (7.20). This term can be represented as

$$\begin{aligned} D^{(2)}_{2,n}&=\sum ^{\infty }_{j=1} \,\mathbf{E}\mathbf{1}_{\{T_{j} \le n\}}\,\mathbf{E}\, \left( \sum ^{\infty }_{k=j+1}\mathbf{K}^{2}(T_{k},T_{j})\mathbf{1}_{\{T_{k} \le n\}}\vert T_{j}\right) \\&= \sum ^{\infty }_{j=1}\mathbf{E}\,\mathbf{1}_{\{T_{j} \le n\}}\,{\bar{\mathbf{K}}}(T_{j}) =\int ^{n}_{0}\,{\bar{\mathbf{K}}}(t)\,\rho (t)\mathrm {d}t\,, \end{aligned}$$

where

$$\begin{aligned} {\bar{\mathbf{K}}}(t)&=\mathbf{E}\left( \sum ^{\infty }_{k=j+1}\mathbf{K}^{2}(T_{k},T_{j}) \mathbf{1}_{\{T_{k} \le n\}} \vert T_{j}=t\right) = \sum _{l=1}^{+\infty } \mathbf{E}\mathbf{1}_{\{T_{l}+t \le n\}}\, \mathbf{K}^{2}(t+T_{l},t)\\&= \int ^{n-t}_{0}\, \mathbf{K}^{2}(t+v,t) \, \rho (v)\mathrm {d}v = \int ^{n}_{t}\, \mathbf{K}^{2}(u,t) \, \rho (u-t)\mathrm {d}u \le \vert \rho \vert _{*}\, \int ^{n}_{0}\, \mathbf{K}^{2}(v,t) \,\mathrm {d}v\,. \end{aligned}$$

In view of the inequality (7.21) we obtain \( {\bar{\mathbf{K}}}(t)\le \vert \rho \vert _{*}\,\phi ^{2}_{max}\,n\) and, therefore, \( D^{(2)}_{2,n}\le \vert \rho \vert ^{2}_{*}\,\phi ^{2}_{max}\, n^{2}\). So, bounding in (7.20) \(\varrho ^{2}_{3}\) by \({\check{\tau }}\sigma _{Q}\) we obtain that

$$\begin{aligned} D_{2,n}\le n^2 \sigma _{Q}(1+{\check{\tau }}) \vert \rho \vert ^{2}_{*}\,\phi ^{2}_{max}\,. \end{aligned}$$

Therefore, taking into account in (7.17) that \(\mathbf{E}Y^{4}_{1}\ge 1\),

$$\begin{aligned} \mathbf{E}_{Q} \,D_{n} \le 5\,(1+{\check{\tau }}) \phi ^{4}_{max}\, \, \mathbf{E}Y_1^4 (1+\vert \rho \vert ^{2}_{*})\, n^{2} \sigma _{Q} \,. \end{aligned}$$
(7.22)

Using all these bound in (7.10) and taking into account that

$$\begin{aligned} \mathbf{E}_{Q}B^{2}_{2,Q,n}(x)=\frac{1}{n^{2}}\mathbf{E}_{Q}{\bar{I}}^{2}_n(x) \,, \end{aligned}$$

we obtain (7.3) and thus the conclusion follows. \(\square \)

8 Simulation

In this section we report the results of a Monte Carlo experiment in order to assess the performance of the proposed model selection procedure (3.15). In (1.1) we chose a 1-periodic function which is defined, for \(0\le t\le 1,\) as

$$\begin{aligned} S(t)=t \sin (2\pi t)+t^2(1-t) \cos (4\pi t)\,. \end{aligned}$$
(8.1)

We simulate the model

$$\begin{aligned} \mathrm {d}y_{t} = S(t) \mathrm {d}t + \mathrm {d}\xi _{t}\,, \end{aligned}$$

where \(\xi _t= 0.5 w_{t}+ 0.5 z_{t}\).

Here \(z_{t}\) is the semi-Markov process defined in (2.5) with a Gaussian \(\mathcal {N}(0,1)\) sequence \((Y_{j})_{j\ge 1}\) and \((\tau _k)_{k\ge 1}\) used in (2.6) taken as \(\tau _k \sim \chi _{3}^2.\)

We use the model selection procedure (3.15) with the weights (3.19) in which \(k^*= 100+\sqrt{\ln (n)}\), \(t_{i}=i/ \ln (n)\), \(m=[\ln ^2 (n)]\) and \(\delta =(3+\ln (n))^{-2}\). We define the empirical risk as

$$\begin{aligned} {{\overline{\mathbf{R}}}}= \frac{1}{p} \sum _{j=1}^{p} {{\hat{\mathbf{E}}}} \left( \widehat{S}_n(t_j)-S(t_j)\right) ^2, \end{aligned}$$
(8.2)

where the observation frequency \(p=100001\) and the expectation was taken as an average over \(N= 10000\) replications, i.e.,

$$\begin{aligned} {{\hat{\mathbf{E}}}} \left( \widehat{S}_n(.)-S(.)\right) ^2 = \frac{1}{N} \sum _{l=1}^{N} \left( \widehat{S}^l_n(.)-S(.) \right) ^2. \end{aligned}$$

We set the relative quadratic risk as

$$\begin{aligned} {\overline{\mathbf{R}_*}}={{\overline{\mathbf{R}}}}/ ||S||^2_{p}, \quad \text{ with }\quad ||S||^2_p = \frac{1}{p} \sum _{j=0}^p S^2(t_j)\,. \end{aligned}$$
(8.3)

In our case \(||S||^2_p = 0.1883601\).

Table 1 gives the values for the sample risks (8.2) and (8.3) for different numbers of observations n.

Table 1 Empirical risks

Figures 1, 2, 3 and 4 show the behaviour of the regression function and its estimates by the model selection procedure (3.15) depending on the values of observation periods n. The black full line is the regression function (8.1) and the red dotted line is the associated estimator.

Fig. 1
figure 1

Estimator of S for \(n=20\)

Fig. 2
figure 2

Estimator of S for \(n=100\)

Fig. 3
figure 3

Estimator of S for \(n=200\)

Fig. 4
figure 4

Estimator of S for \(n=1000\)

Remark 8.1

From numerical simulations of the procedure (3.15) with various observation numbers n we may conclude that the quality of the proposed procedure: (i) is good for practical needs, i.e. for reasonable (non large) number of observations; (ii) improves as the number of observations increases.

9 Proofs

We will prove here most of the results of this paper.

9.1 Proof of Theorem 4.1

First, note that we can rewrite the empirical squared error in (3.9) as follows

$$\begin{aligned} \text{ Err }_n(\lambda ) = J_n(\lambda ) + 2 \sum _{j=1}^{\infty } \lambda (j) {\check{\theta }}_{j,n}+ ||S||^2-\delta P_n(\lambda )\,, \end{aligned}$$
(9.1)

where the cost function \(J_n(\lambda )\) and the penalty terms are defined in (3.13) and (3.14) respectively, \({\check{\theta }}_{j,n}=\widetilde{\theta }_{j,n}-\theta _{j}\widehat{\theta }_{j,n}\),

$$\begin{aligned} \widehat{\theta }_{j,n}= \theta _{j} + \frac{1}{\sqrt{n}}\xi _{j,n} \quad \text{ and }\quad \widetilde{\theta }_{j,n} = \widehat{\theta }^2_{j,n} - \frac{ \widehat{\sigma }_{n}}{n} \,. \end{aligned}$$

Using the definition of \(\widetilde{\theta }_{j,n}\) in (3.10) we obtain that

$$\begin{aligned} {\check{\theta }}_{j,n}&= \widehat{\theta }^2_{j,n} - \frac{ \widehat{\sigma }_{n}}{n}-\theta _{j} \left( \theta _{j} + \frac{1}{\sqrt{n}}\xi _{j,n}\right) = \frac{1}{\sqrt{n}}\theta _{j}\xi _{j,n}+\frac{1}{n} \widetilde{\xi }^2_{j,n}-\frac{ \widehat{\sigma }_{n}}{n}\\&=\frac{1}{\sqrt{n}}\theta _{j}\xi _{j,n} +\frac{1}{n}\widetilde{\xi }_{j,n} + \frac{1}{n} \varsigma _{j,n} + \frac{\sigma _{Q} - \widehat{\sigma }_{n} }{n}\,, \end{aligned}$$

where \(\varsigma _{j,n}=\mathbf{E}_{Q}\xi ^{2}_{j,n}-\sigma _{Q}\) and \(\widetilde{\xi }_{j,n}=\xi ^{2}_{j,n}-\mathbf{E}_{Q}\xi ^{2}_{j,n}\). Putting

$$\begin{aligned} M(\lambda ) = \frac{1}{\sqrt{n}}\sum _{j=1}^{n} \lambda (j)\theta _{j} \xi _{j,n} \quad \text{ and }\quad P^{0}_{n}=\frac{\sigma _{Q}\vert \lambda \vert ^{2}}{n}\,, \end{aligned}$$
(9.2)

we can rewrite (9.1) as

$$\begin{aligned} \text{ Err }_n(\lambda )&= J_n(\lambda ) + 2 \frac{\sigma _{Q}- \widehat{\sigma }_{n} }{n}\,\check{L}(\lambda )+ 2 M(\lambda )+\frac{2}{n} B_{1,Q,n}(\lambda )\nonumber \\&\quad + 2 \sqrt{P^{0}_n(\lambda )} \frac{B_{2,Q,n}(e(\lambda ))}{\sqrt{\sigma _{Q} n}} + \Vert S\Vert ^2-\delta P_n(\lambda ), \end{aligned}$$
(9.3)

where \(e(\lambda )=\lambda /|\lambda |\), \(\check{L}(\lambda )=\sum ^{n}_{j=1}\lambda (j)\) and the functions \( B_{1,Q,n}(\cdot )\) and \(B_{2,Q,n}(\cdot )\) are given in (7.1).

Let \(\lambda _0= (\lambda _0(j))_{1\le j\le \,n}\) be a fixed sequence in \(\Lambda \) and \(\widehat{\lambda }= \text{ argmin }_{\lambda \in \Lambda } J_n(\lambda )\). Substituting \(\lambda _0\) and \(\widehat{\lambda }\) in Eq. (9.3), we obtain

$$\begin{aligned} \text{ Err }_n(\widehat{\lambda })-\text{ Err }_n(\lambda _0)&= J(\widehat{\lambda })-J(\lambda _0)+ 2 \frac{\sigma _{Q}-\widehat{\sigma }_{n}}{n}\,\check{L}(\varpi ) + \frac{2}{n} B_{1,Q,n}(\varpi )+2 M(\varpi )\nonumber \\&\quad + 2 \sqrt{P^{0}_{n}(\widehat{\lambda })} \frac{B_{2,Q,n}(\widehat{e})}{\sqrt{\sigma _{Q} n}}-2 \sqrt{P^{0}_{n}(\lambda _0)} \frac{B_{2,Q,n}(e_0)}{\sqrt{\sigma _{Q} n}}\nonumber \\&\quad - \delta P_n(\widehat{\lambda })+\delta P_n(\lambda _0), \end{aligned}$$
(9.4)

where \(\varpi = \widehat{\lambda } - \lambda _{0}\), \(\widehat{e} = e(\widehat{\lambda })\) and \(e_0 = e(\lambda _0)\). Note that, by (3.8),

$$\begin{aligned} |\check{L}(\varpi )| \le \,\check{L}({\hat{\lambda }}) + \check{L}(\lambda _0) \le 2\vert \Lambda \vert _{*}. \end{aligned}$$

Using the inequality

$$\begin{aligned} 2|ab| \le \delta a^2 + \delta ^{-1} b^2 \end{aligned}$$
(9.5)

and taking into account that \(P^{0}_n(\lambda )\ge 0\) we obtain that for any \(\lambda \in \Lambda \)

$$\begin{aligned} 2 \sqrt{P^{0}_n(\lambda )} \frac{|B_{2,Q,n}(e(\lambda ))|}{\sqrt{\sigma _{Q} n}} \le \, \delta P^{0}_{n}(\lambda ) + \frac{B^2_{2,Q,n}(e(\lambda ))}{\delta \sigma _{Q}\,n}. \end{aligned}$$

Taking into account the bound (7.2) and that \(J(\widehat{\lambda })\le J(\lambda _0)\), we get

$$\begin{aligned} \text{ Err }_n({\hat{\lambda }})&\le \,\text{ Err }_n(\lambda _0) + 4 \frac{\vert \widehat{\sigma }_{n}-\sigma _{Q}\vert }{n}\,\vert \Lambda \vert _{*} +2 M(\varpi )+ \frac{2 \mathbf{C}_{1,Q,n}}{n}+ \frac{2 B^*_{2,Q,n}}{\delta \sigma _{Q}\,n} \\&\quad + \frac{1}{n} |\widehat{\sigma }_n -\sigma _{Q}| ( |\widehat{\lambda }|^2 + |\lambda _0|^2)+ 2 \delta P^{0}_n(\lambda _0)\,, \end{aligned}$$

where \(B^*_{2,Q,n} = \sup _{\lambda \in \Lambda } B^2_{2,Q,n}((e(\lambda ))\). Moreover, noting that in view of (3.8) \(\sup _{\lambda \in \Lambda } |\lambda |^2 \le \vert \Lambda \vert _{*}\), we can rewrite the previous bound as

$$\begin{aligned} \text{ Err }_n(\widehat{\lambda })&\le \text{ Err }_n(\lambda _0) +2 M(\varpi ) + \frac{2 \mathbf{C}_{1,Q,n}}{n} + \frac{2 B^*_{2,Q,n}}{\delta \sigma _{Q} n} \nonumber \\&\quad + \frac{6\vert \Lambda \vert _{*}}{n} |\widehat{\sigma }_n -\sigma _{Q}| + 2 \delta P^{0}_n(\lambda _0). \end{aligned}$$
(9.6)

To estimate the second term in the right side of this inequality we set

$$\begin{aligned} S_{x} = \sum _{j=1}^{n} x(j) \theta _{j} \phi _{j}\,, \quad x=(x(j))_{1\le j\le n}\in {{\mathbb {R}}}^{n}\,. \end{aligned}$$

Taking into account that \(M(x)=n^{-1}I_{n}(S_{x})\), we can estimate this term through the inequality (2.8) shown in Corollary 6.2 for any \(x\in {{\mathbb {R}}}^{n}\) as

$$\begin{aligned} \mathbf{E}_{Q} M^2 (x) \le \varkappa _{Q}\frac{||S_{x}||^2}{n} \,, \end{aligned}$$
(9.7)

where, taking into account that the functions \((\phi _{j})_{j\ge 1}\) are orthogonal, the norm \(||S_x||^2=\int ^{1}_{0}S^{2}_x(t)\mathrm {d}t=\sum _{j=1}^{n} x^2(j) \theta ^2_{j}\). To estimate this function for a random vector \(x\in {{\mathbb {R}}}^{n}\) we set

$$\begin{aligned} Z^* = \sup _{x \varepsilon \Lambda _1} \frac{n M^2 (x)}{||S_x||^2}\,, \quad \Lambda _1 = \Lambda - \{\lambda _0\}=\{\lambda -\lambda _{0}\,,\quad \lambda \in \Lambda \}\,. \end{aligned}$$

So, as we did for proving (9.6) and (9.7), through Inequality (9.5), we get

$$\begin{aligned} 2 |M(x)|\le \delta ||S_x||^2 + \frac{Z^*}{n\delta }. \end{aligned}$$
(9.8)

It is clear that the last term here can be estimated as

$$\begin{aligned} \mathbf{E}_{Q} Z^* \le \sum _{x \in \Lambda _1} \frac{n \mathbf{E}_{Q} M^2 (x)}{||S_x||^2} \le \sum _{x \in \Lambda _1} \varkappa _{Q}= \varkappa _{Q}{\check{\iota }}\,, \end{aligned}$$
(9.9)

where \({\check{\iota }} = \text{ card }(\Lambda )\). Using the equality (3.5), we obtain that for any \(x\in \Lambda _{1}\),

$$\begin{aligned} ||S_x||^2&-||\widehat{S}_x||^2 = \sum _{j=1}^{n} x^2(j) (\theta ^2_{j}-\widehat{\theta }^2_{j})\nonumber \\&= - \sum _{j=1}^{n} x^2(j) \left( 2 \frac{1}{\sqrt{n}}\theta _{j}\xi _{j,n} + \frac{\xi ^{2}_{j,n}}{n} \right) \le -2 M_1(x), \end{aligned}$$
(9.10)

where \(M_{1}(x) = n^{-1/2}\,\sum _{j=1}^{n}\, x^2(j)\theta _{j} \xi _{j,n}\). Taking into account that, for any \(x \in \Lambda _1\) the components \(|x(j)|\le 1\), we can estimate this term as in (9.7), i.e.,

$$\begin{aligned} \mathbf{E}_{Q}\, M^2_{1}(x) \le \varkappa _{Q}\, \frac{||S_x||^2}{n}\,. \end{aligned}$$

Similarly to the previous reasoning we set

$$\begin{aligned} Z^*_{1} = \sup _{x \varepsilon \Lambda _1} \frac{n M^2_1 (x)}{||S_x||^2} \end{aligned}$$

and we get

$$\begin{aligned} \mathbf{E}_{Q}\, Z^*_1 \le \varkappa _{Q}\,{\check{\iota }}\,. \end{aligned}$$
(9.11)

Using the same type of arguments as in (9.8), we can derive

$$\begin{aligned} 2 |M_1(x)|\le \delta ||S_x||^2 + \frac{Z^*_1}{n\delta }. \end{aligned}$$
(9.12)

From here and (9.10), we get

$$\begin{aligned} ||S_x||^2 \le \frac{||\widehat{S}_x||^2}{1-\delta } + \frac{Z^*_1}{n \delta (1-\delta )} \end{aligned}$$
(9.13)

for any \(0<\delta <1\). Using this bound in (9.8) yields

$$\begin{aligned} 2 M(x) \le \frac{\delta ||\widehat{S}_x||^2}{1-\delta } + \frac{Z^*+Z^*_1}{n \delta (1-\delta )} \,. \end{aligned}$$

Taking into account that

$$\begin{aligned} \Vert \widehat{S}_{\varpi }\Vert ^{2}= \Vert \widehat{S}_{\widehat{\lambda }} - \widehat{S}_{\lambda _{0}} \Vert ^{2} = \Vert (\widehat{S}_{\widehat{\lambda }}-S) - (\widehat{S}_{\lambda _{0}}-S) \Vert ^{2} \le 2\,(\text{ Err }_n(\widehat{\lambda })+\text{ Err }_n(\lambda _0)) \,, \end{aligned}$$

we obtain

$$\begin{aligned} 2 M(\varpi ) \le \frac{2\delta (\text{ Err }_n(\widehat{\lambda })+\text{ Err }_n(\lambda _0))}{1-\delta } + \frac{Z^*+Z^*_1}{n \delta (1-\delta )}. \end{aligned}$$

Using this bound in (9.6) we obtain

$$\begin{aligned} \text{ Err }_n(\widehat{\lambda })&\le \frac{1+\delta }{1-3\delta } \text{ Err }_n(\lambda _0) + \frac{Z^*+Z^*_1}{n \delta (1-3\delta )} + \frac{2 \mathbf{C}_{1,Q,n}}{n(1-3\delta )} + \frac{2 B^*_{2,Q,n}}{\delta (1-3\delta )\sigma _{Q} n} \\&\quad + \frac{6\vert \Lambda \vert _{*}}{n(1-3\delta )} |\widehat{\sigma } -\sigma _{Q}| + \frac{2\delta }{(1-3\delta )} P^{0}_n(\lambda _0). \end{aligned}$$

Moreover, for \(0<\delta <1/6,\) we can rewrite this inequality as

$$\begin{aligned} \text{ Err }_n(\widehat{\lambda })&\le \frac{1+\delta }{1-3\delta } \text{ Err }_n(\lambda _0) + \frac{2(Z^*+Z^*_1)}{n \delta } + \frac{4 \mathbf{C}_{1,Q,n}}{n} + \frac{4 B^*_{2,Q,n}}{\delta \sigma _{Q} n} \\&\quad + \frac{12\vert \Lambda \vert _{*}}{n} |\widehat{\sigma }_{n} -\sigma _{Q}| + \frac{2\delta }{(1-3\delta )}\, P^{0}_n(\lambda _0). \end{aligned}$$

In view of Proposition 7.2 we bound the expectation of the term \(B^*_{2,Q,n}\) in (9.6) as

$$\begin{aligned} \mathbf{E}_{Q}\, B^*_{2,Q,n} \le \sum _{\lambda \in \Lambda }\mathbf{E}_{Q} B^2_{2,Q,n} (e(\lambda )) \le {\check{\iota }} \mathbf{C}_{2,Q,n}\,. \end{aligned}$$

Taking into account that \(\vert \Lambda \vert _{*}\ge 1\), we get

$$\begin{aligned} {{\mathcal {R}}}(\widehat{S}_*,S)&\le \frac{1+\delta }{1-3\delta } {{\mathcal {R}}}(\widehat{S}_{\lambda _0},S) + \frac{4\varkappa _{Q} {\check{\iota }}}{n \delta } + \frac{4 \mathbf{C}_{1,Q,n}}{n} + \frac{4 {\check{\iota }} \mathbf{C}_{2,Q,n}}{\delta \sigma _{Q} n} \\&\quad + \frac{12\vert \Lambda \vert _{*}}{n} \,\mathbf{E}_{Q}\,|\widehat{\sigma }_n -\sigma _{Q}| + \frac{2\delta }{(1-3\delta )} P^{0}_n(\lambda _0). \end{aligned}$$

Using the upper bound for \( P^{0}_n(\lambda _0)\) in Lemma A.2, one obtains (4.1), that finishes the proof. \(\square \)

9.2 Proof of Proposition 4.2

We use here the same method as in Konev and Pergamenshchikov (2009a). First of all note that the definition (3.12) implies that

$$\begin{aligned} \widehat{t}_{j,n}= t_{j}+ \frac{1}{\sqrt{n}}\, \eta _{j,n}\,, \end{aligned}$$
(9.14)

where

$$\begin{aligned} t_{j}= \int ^{1}_{0}\,S(t)\,Tr_{j}(t)\mathrm {d}t \quad \text{ and }\quad \eta _{j,n}= \frac{1}{\sqrt{n}}\, \int ^{n}_{0}\,\text{ Tr }_{j}(t)\,\mathrm {d}\xi _{t} \,. \end{aligned}$$

So, we have

$$\begin{aligned} \widehat{\sigma }_{n}= \sum ^n_{j=[\sqrt{n}]+1}\,t^2_{j} + 2M_{n} + \frac{1}{n}\, \sum ^n_{j=[\sqrt{n}]+1}\,\eta ^2_{j,n} \,, \end{aligned}$$
(9.15)

where

$$\begin{aligned} M_{n}= \frac{1}{\sqrt{n}} \sum ^n_{j=[\sqrt{n}]+1}\,t_{j}\,\eta _{j,n}\,. \end{aligned}$$

Note that, for continuously differentiable functions (see, for example, Lemma A.6 in Konev and Pergamenshchikov 2009a), the Fourier coefficients \((t_{j})\) satisfy the following inequality, for any \(n\ge 1,\)

$$\begin{aligned} \sum ^{\infty }_{j=[\sqrt{n}]+1}\,t^2_{j} \le \frac{4\left( \int ^{1}_{0}\vert \dot{S}(t)\vert \mathrm {d}t\right) ^{2}}{\sqrt{n}} \le \frac{4\Vert \dot{S}\Vert ^{2}}{\sqrt{n}} \,. \end{aligned}$$
(9.16)

In the same way as in (9.7) we estimate the term \(M_{n}\), i.e.,

$$\begin{aligned} \mathbf{E}_{Q}\,M^{2}_{n}\le \frac{\varkappa _{Q}}{n}\, \sum ^{n}_{j=[\sqrt{n}]+1}\,t^{2}_{j} \le \frac{4\varkappa _{Q}\Vert \dot{S}\Vert ^{2}}{n\sqrt{n}}, \end{aligned}$$

while the absolute value of this term for \(n\ge 1\) can be estimated as

$$\begin{aligned} \mathbf{E}_{Q}\, \vert M_{n}\vert \le \frac{\varkappa _{Q}+\Vert \dot{S}\Vert ^{2}}{\sqrt{n}}\,. \end{aligned}$$

Moreover, using the functions (7.1) for the trigonometric basis (3.2), i.e. with \(\xi _{j,n}=\eta _{j,n}\) and \(\widetilde{\xi }_{j,n}=\eta ^{2}_{j,n}-\mathbf{E}\eta ^{2}_{j,n}\) we can represent the last term in (9.15) as

$$\begin{aligned} \frac{1}{n} \sum ^n_{j=[\sqrt{n}]+1}\,\eta ^2_{j,n}&= \frac{\sigma _{Q}(n-\sqrt{n})}{n} + \frac{\sum ^n_{j=[\sqrt{n}]+1}\,(\mathbf{E}\eta ^2_{j,n}-\sigma _{Q}+\widetilde{\xi }_{j,n})}{n} \\&=\frac{\sigma _{Q}(n-\sqrt{n})}{n} +\frac{B_{1,Q,n}(x')}{n} +\frac{B_{2,Q,n}(x'')}{\sqrt{n}}, \end{aligned}$$

with \( x'_{j}=\mathbf{1}_{\{\sqrt{n}<j\le n\}}\) and \(x''_{j}=\mathbf{1}_{\{\sqrt{n}<j\le n\}}/\sqrt{n}\). Therefore, using Propositions 7.1 and 7.2, we obtain

$$\begin{aligned} \mathbf{E}_{Q} \left| \frac{1}{n} \sum ^n_{j=[\sqrt{n}]+1}\,\eta ^2_{j,n} -\sigma _{Q} \right| \le \frac{\sigma _{Q}}{\sqrt{n}} +\frac{\mathbf{C}_{1,Q,n}}{n} +\frac{\sqrt{\mathbf{C}_{2,Q,n}}}{\sqrt{n}}. \end{aligned}$$

From here we obtain the bound (4.5) and hence the desired result. \(\square \)

9.3 Proof of Theorem 4.4

Note, that Theorem 4.1 implies directly the oracle inequality (4.9) with

$$\begin{aligned} \mathbf{U}^{*}_{n}(S) = 60\,\widetilde{\Lambda }_{n}\, \Vert \dot{S}\Vert ^2 + \widetilde{\Psi }^{*}_{n} \quad \text{ and }\quad \widetilde{\Psi }^{*}_{n} = \sup _{Q\in {{\mathcal {Q}}}_{n}} \widetilde{\Psi }_{Q,n} \,. \end{aligned}$$

Using the bound (4.7) and the conditions (2.11) we obtain that for some positive constant \(\mathbf{C}_{*}\)

$$\begin{aligned} \widetilde{\Psi }^{*}_{n} \le \mathbf{C}_{*} \left( 1+(\varsigma ^{*})^{4}+\frac{1}{\varsigma _{*}} \right) \left( 1+ \widetilde{\Lambda }_{n} \right) {\check{\iota }}_{n}\phi ^{4}_{max} \,. \end{aligned}$$
(9.17)

Moreover, note, that in view of (3.21) and (3.17)

$$\begin{aligned} \lim _{n\rightarrow \infty }\,\frac{{\check{\iota }}}{n^{{\check{\epsilon }}}}= \lim _{n\rightarrow \infty }\,\frac{k^{*} m}{n^{{\check{\epsilon }}}}=0 \qquad \text{ for } \text{ any }\quad {\check{\epsilon }}>0 \,, \end{aligned}$$

where \(m=[1/\varepsilon ^2]\). Furthermore, the bound (3.22) and Conditions (2.12) and (3.17) yield

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\vert \Lambda \vert _{*}}{n^{1/3+{\check{\epsilon }}}}=0 \quad \text{ for } \text{ any }\quad {\check{\epsilon }}>0 \,, \end{aligned}$$

where \(\vert \Lambda \vert _{*}=1+ \max _{\lambda \in \Lambda }\,\check{L}(\lambda )\), i.e.

$$\begin{aligned} \widetilde{\Lambda }_{n}= \frac{\vert \Lambda \vert _{*}}{n^{1/2}} \rightarrow 0 \quad \text{ as }\quad n\rightarrow \infty \,. \end{aligned}$$

So, taking into account in (9.17) the condition (4.8) we obtain the convergence (4.10). \(\square \)

9.4 Proof of Theorem 4.5

First, we denote by \(Q_{0}\) the distribution of the noise (1.2) and (2.1) with the parameter \(\varrho _{1}=\varsigma ^{*}\) and \(\varrho _{1}=\varrho _{2}=0\), i.e. the distribution for the “signal + white noise” model, i.e. for any \(n\ge 1\) this distribution belongs to the family \({{\mathcal {Q}}}_{n}\) defined in (2.112.12) and, therefore, for any \(n\ge 1\)

$$\begin{aligned} {{\mathcal {R}}}^{*}_{n}(\widetilde{S}_{n},S) =\sup _{Q\in {{\mathcal {Q}}}_{n}}\,\mathbf{E}_{S,Q}\Vert \widetilde{S}_{n}-S\Vert ^{2} \ge \mathbf{E}_{S,Q_{0}}\Vert \widetilde{S}_{n}-S\Vert ^{2} \,. \end{aligned}$$

Now, taking into account the conditions (2.12) Theorem A.9 yields the lower bound (4.14). Hence this finishes the proof. \(\square \)

9.5 Proof of Proposition 4.6

Putting \(\lambda _{0}(j)=0\) for \(j\ge n\) we can represent the quadratic risk for the estimator (3.7) as

$$\begin{aligned} \parallel \widehat{S}_{\lambda _0}-S\parallel ^2=\sum _{j=1}^{\infty } (1-\lambda _0(j))^2 \theta ^2_{j}-2 H_{n} + \frac{1}{n} \sum _{j=1}^{n} \lambda _0^2(j) \xi ^2_{j,n}\,, \end{aligned}$$

where \(H_n= n^{-1/2}\,\sum _{j=1}^{n} (1-\lambda _0(j)) \lambda _0(j) \theta _{j} \xi _{j,n}\). Note that \(\mathbf{E}_{Q} H_{n}=0\) for any \( Q \in Q_n\), therefore,

$$\begin{aligned} \mathbf{E}_{Q}\parallel \widehat{S}_{\lambda _0}-S\parallel ^2=\sum _{j=1}^{\infty } (1-\lambda _0(j))^2 \theta ^2_{j} + \frac{1}{n} \mathbf{E}_{Q}\sum _{j=1}^{n} \lambda _0^2(j) \xi ^2_{j,n}\,. \end{aligned}$$

Proposition 7.1 and the last inequality in (2.11) imply that for any \(Q\in {{\mathcal {Q}}}_{n}\)

$$\begin{aligned} \mathbf{E}_{Q} \sum _{j=1}^{n} \lambda _0^2(j) \xi ^2_{j,n} \le \varsigma ^* \sum _{j=1}^{n} \lambda _0^2(j) + \mathbf{C}^{*}_{1,n}\,, \end{aligned}$$

where \(\mathbf{C}^{*}_{1,n}=\phi ^{2}_{max}\varsigma ^{*}{\check{\tau }}\vert \Upsilon \vert _{1}\). Therefore,

$$\begin{aligned} {{\mathcal {R}}}^*_{n} (\widehat{S}_{\lambda _{0}},S) \le \sum _{j=j_{*}}^{\infty } (1-\lambda _0(j))^2 \theta ^2_{j}+ \frac{1}{\upsilon _{n}} \sum _{j=1}^{n} \lambda _0^2(j) +\frac{\mathbf{C}^{*}_{1,n}}{n}, \end{aligned}$$

where \(j_{*}\) and \(\upsilon _{n}\) are defined in (3.19). Setting

$$\begin{aligned} \Upsilon _{1,n}(S) = \upsilon ^{2k/(2k+1)}_{n} \sum _{j=j_{*}}^{\infty } (1-\lambda _0(j))^2 \theta ^2_{j} \quad \text{ and }\quad \Upsilon _{2,n}= \frac{1}{\upsilon ^{1/(2k+1)}_{n}} \sum _{j=1}^{n} \lambda _0^2(j) \,, \end{aligned}$$

we rewrite the last inequality as

$$\begin{aligned} \upsilon ^{2k/(2k+1)}_{n}\, R^*_{n} (\widehat{S}_{\lambda _{0}},S) \le \Upsilon _{1,n}(S) + \Upsilon _{2,n} + {\check{\mathbf{C}}}_{n} \,, \end{aligned}$$
(9.18)

where \( {\check{\mathbf{C}}}_{n}= \upsilon ^{2k/(2k+1)}_{n}\mathbf{C}^{*}_{1,n}/n\). Note, that the conditions (2.12) and (4.8) imply that \(\mathbf{C}^{*}_{1,n}=\text{ o }(n^{{\check{\delta }}})\) as \(n\rightarrow \infty \) for any \({\check{\delta }}>0\); therefore, \({\check{\mathbf{C}}}_{n}\rightarrow 0\) as \(n\rightarrow \infty \). Putting

$$\begin{aligned} u_{n}= \upsilon ^{2k/(2k+1)}_{n} \sup _{j\ge j_{*}} (1-\lambda _0(j))^2/a_j \,, \end{aligned}$$

with \(a_{j}=\sum ^k_{i=0}\left( 2\pi [j/2]\right) ^{2i}\), we estimate the first term in (9.18) as

$$\begin{aligned} \sup _{S\in W_{\mathbf{r}}^k}\, \Upsilon _{1,n}(S) \le \sup _{S\in W_{\mathbf{r}}^k}\, u_{n}\,\sum _{j\ge 1} a_j \theta ^{2}_{j} \le u_{n} \mathbf{r}\,. \end{aligned}$$

We remind that \(\omega _{\alpha _{0}}\) is defined in (3.19) with \(\alpha _{0}=(k,\mathbf{l}_{0})\) and \(\mathbf{l}_{0}=[\mathbf{r}/\varepsilon ]\varepsilon \). So, taking into account that \(a_{j}/(\pi ^{2k}j^{2k})\rightarrow 1\) as \(j\rightarrow \infty \) and \(\mathbf{l}_{0}\rightarrow \mathbf{r}\) as \(\varepsilon \rightarrow 0\) we obtain that

$$\begin{aligned} \limsup _{n\rightarrow \infty } u_{n}&\le \lim _{n\rightarrow \infty } \, \upsilon ^{2k/(2k+1)}_{n} \sup _{j\ge j_{*}} \frac{(1-\lambda _0(j))^2}{(\pi \,j)^{2k}}\\&= \lim _{n\rightarrow \infty } \, \frac{\upsilon ^{2k/(2k+1)}_{n}}{\pi ^{2k} \omega ^{2k}_{\alpha _{0}}} = \frac{1}{\pi ^{2k}\,(\mathrm {d}_{k}\mathbf{r})^{2k/(2k+1)}}\,, \end{aligned}$$

where

$$\begin{aligned} \mathrm {d}_{k}=\frac{(k+1)(2k+1)}{\pi ^{2k}k}\,. \end{aligned}$$

Therefore,

$$\begin{aligned} \limsup _{n\rightarrow \infty } \sup _{S\in W^{k}_{\mathbf{r}}} \Upsilon _{1,n}(S) \le \frac{r^{1/(2k+1)}}{\pi ^{2k} (\mathrm {d}_{k})^{2k/(2k+1)}}=: \Upsilon ^*_{1}\,. \end{aligned}$$
(9.19)

As to the second term in (9.18), note that in vue of the definition (3.19) and taking into account that \(j_{*}=\text{ o }(\omega _{\alpha _{0}})\) as \(n\rightarrow \infty \) we deduce that

$$\begin{aligned}&\lim _{n\rightarrow \infty } \frac{1}{\omega _{\alpha _{0}}}\,\sum _{j=1}^{n} \lambda _0^2(j) = \lim _{\omega _{\alpha _{0}}\rightarrow \infty } \frac{1}{\omega _{\alpha _{0}}}\,\sum _{1\le j\le \omega _{\alpha _{0}}}\, \left( 1-\left( \frac{j}{\omega _{\alpha _{0}}}\right) ^{k}\right) ^{2} \\&\quad = \int ^{1}_{0}(1-t^{k})^{2}\mathrm {d}t = \frac{2k^{2}}{(k+1)(2k+1)} \,. \end{aligned}$$

So, taking into account that \(\omega _{\alpha _{0}}/\upsilon ^{1/(2k+1)}_{n}\rightarrow ( \mathrm {d}_{k} \mathbf{r})^{1/(2k+1)}\) as \(n\rightarrow \infty \), the limit of \(\Upsilon _{2,n}\) can be calculated as

$$\begin{aligned} \lim _{n\rightarrow \infty } \,\Upsilon _{2,n}= \frac{2( \mathrm {d}_{k} \mathbf{r})^{1/(2k+1)}\,k^{2}}{(k+1)(2k+1)}=: \Upsilon ^*_{2}\,. \end{aligned}$$

Moreover, since \(\Upsilon ^*_1+ \Upsilon ^*_2 =:\mathbf{r}^*_{k}\), we obtain

$$\begin{aligned} \lim _{n \rightarrow \infty } \upsilon ^{2k /(2k+1)}_{n}\, \sup _{S\in W^{k}_{\mathbf{r}}} {{\mathcal {R}}}^*_n (\widehat{S}_{\lambda _{0}},S) \le \mathbf{r}^*_{k} \end{aligned}$$

and get the desired result. \(\square \)

9.6 Proof of Theorem 4.7

Combining Proposition 4.6 and Theorem 4.4 yields Theorem 4.7. \(\square \)