Adaptive efficient estimation for generalized semi-Markov big data models

Barbu, Vlad Stefan; Beltaief, Slim; Pergamenchtchikov, Serguei

doi:10.1007/s10463-022-00820-y

Adaptive efficient estimation for generalized semi-Markov big data models

Published: 05 March 2022

Volume 74, pages 925–955, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Adaptive efficient estimation for generalized semi-Markov big data models

Download PDF

Vlad Stefan Barbu¹,
Slim Beltaief² &
Serguei Pergamenchtchikov^1,3

257 Accesses
1 Citation
Explore all metrics

Abstract

In this paper we study generalized semi-Markov high dimension regression models in continuous time, observed at fixed discrete time moments. The generalized semi-Markov process has dependent jumps and, therefore, it is an extension of the semi-Markov regression introduced in Barbu et al. (Stat Inference Stoch Process 22:187–231, 2019a). For such models we consider estimation problems in nonparametric setting. To this end, we develop model selection procedures for which sharp non-asymptotic oracle inequalities for the robust risks are obtained. Moreover, we give constructive sufficient conditions which provide through the obtained oracle inequalities the adaptive robust efficiency property in the minimax sense. It should be noted also that, for these results, we do not use neither sparse conditions nor the parameter dimension in the model. As examples, regression models constructed through spherical symmetric noise impulses and truncated fractional Poisson processes are considered. Numerical Monte-Carlo simulations confirming the theoretical results are given in the supplementary materials.

Efficient Improved Estimation Method for Non-Gaussian Regression from Discrete Data

Robust adaptive efficient estimation for semi-Markov nonparametric regression models

Article 22 June 2018

Improved estimation method for high dimension semimartingale regression models based on discrete data

Article 09 October 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Motivations

In this paper we study the following linear regression model in continuous time

$$\begin{aligned} \mathrm {d}y_{{t}}=\,\left( \sum ^{q}_{{j=1}}\beta _{{j}}\mathbf{u}_{{j}}(t)\,\right) \mathrm {d}t +\mathrm {d}\xi _{{t}}\,, \quad 0\le t\le T\,, \end{aligned}$$

(1)

where the functions $(\mathbf{u}_{{j}})_{{1\le j\le q}}$ are known linear independent 1-periodic ${{\mathbb {R}}}\rightarrow {{\mathbb {R}}}$ functions, the duration of observations T is an integer number and $(\xi _{{t}})_{{t\ge 0}}$ is an unobservable noise process defined in Sect. 2. The process (1) is observed only at the fixed time moments

$$\begin{aligned} (y_{{t_{{j}}}})_{{0\le j\le n}}\,, \quad t_{{j}}=\frac{j}{p} \quad \text{ and }\quad n= p T \,, \end{aligned}$$

(2)

where the observation frequency p is some fixed integer number. We consider the model (1) in the case when the parameter dimension is greater than the number of observations, i.e., $q>n$. Such models are called big data or high dimension regression in continuous time (see, for example, Fujimori, 2019 for diffusion processes). The problem is to estimate the unknown parameters $(\beta _{{j}})_{{1\le j\le q}}$ on the basis of the observations (2). Usually for such problems one uses either the Lasso algorithm or the Dantzig selector method. It should be emphasized that to apply these methods one needs to assume sparsity conditions which provide the nonlarge (“reasonable”) number of the nonzero unknown parameters and, moreover, the parameter dimension q must be known (see, for example, Hastie et al., 2008). It should be noted also that the case of unknown parameter dimension q is one of the crucial points in important practical problems such as, for example, signal and image statistical processing (see, for example, Beltaief et al., 2020 and the references therein).

For the model (1) we consider a nonparametric setting, i.e., this is the setting for the estimation problem of the function

$$\begin{aligned} S(t)=\sum ^{q}_{{j=1}}\beta _{{j}}\mathbf{u}_{{j}}(t)\,, \end{aligned}$$

i.e.,

$$\begin{aligned} \mathrm {d}y_t=S(t)\mathrm {d}t+\mathrm {d}\xi _{{t}}\,, \quad 0\le t\le T\,, \end{aligned}$$

(3)

where S is an unknown 1-periodic function, $S: {{\mathbb {R}}}\rightarrow {{\mathbb {R}}},$ such that the restriction of S to the interval [0, 1] belongs to $\mathcal{L}_{{2}}[0,1].$ This model means that we observe T times, i.e. on the interval [0, T], the same function S defined on [0, 1], with values in ${{\mathbb {R}}}.$ Here we do not assume neither sparsity conditions, nor the condition that the parameter dimension is known; in particular, we can assume that $q=+\infty .$ Now the problem is to estimate the unknown function S in the model (3) on the basis of observations (2). Originally, such problems were considered in the framework “signal+white noise” models (see, for example, Ibragimov and Khasminskii, 1981; Kutoyants, 1994; Pinsker, 1981). Later, these models were extended to the “colour noise” models defined through non Gaussian Ornstein-Uhlenbeck processes (see Barndorff-Nielsen and Shephard, 2001; Konev and Pergamenshchikov, 2012, 2015). The problem here is that the dependence defined on the basis of the Ornstein-Uhlenbeck processes disappears very fast, i.e., at a geometric rate. This means that such models are asymptotically equivalent to models with independent observations. To keep the dependence in the observations for large time periods for the estimation problem on the complete data, in the article (Barbu et al., 2019a) it is proposed to define the model (3) through semi-Markov processes with jumps. Such models considerably extend the potential applications of statistical results in many important practical fields such as finance, insurance, signals and image processing, reliability, biology (see, for example, Barbu et al., 2019a; Barbu and Limnios, 2008 and the references therein). In this paper we extend the semi-Markov regression models to the generalized semi-Markov processes by introducing an additional dependence in jump sizes of $(\xi _{{t}})_{{t\ge 0}}$.

1.2 Methods

In order to estimate the 1-periodic function S defined on the interval [0, 1], we develop model selection methods using the quadratic risks defined as

$$\begin{aligned} \mathcal{R}_{{Q}}(\widehat{S}_{{T}},S)= \mathbf{E}_{{Q,S}}\,\Vert \widehat{S}_{{T}}-S\Vert ^{2}\,, \quad \Vert f\Vert ^{2}=\int ^{1}_{{0}}\,f^{2}(s)\mathrm {d}s\,, \end{aligned}$$

(4)

where $\widehat{S}_{{T}}(\cdot )$ is some estimate based on T periods of the observations of the model (3) (i.e. any 1—periodical function measurable with respect to the observations $\sigma \{y_{{t_{{0}}}},\ldots y_{{t_{{n}}}}\}$ given in (2)) and $\mathbf{E}_{{Q,S}}$ is the expectation with respect to the distribution $\mathbf{P}_{{Q,S}}$ of the process (3) corresponding to the unknown noise distribution Q in the Skorokhod space $\mathcal{D}[0,T]$ and to the function S. We assume that this distribution belongs to some distribution family $\mathcal{Q}_{{T}}$ specified in Sect. 2. To study the properties of the estimators uniformly over the noise distribution (what is really needed in practice), we use the robust risk defined as

$$\begin{aligned} \mathcal{R}^{*}_{{T}}(\widehat{S}_{{T}},S)=\sup _{{Q\in \mathcal{Q}_{{T}}}}\, \mathcal{R}_{{Q}}(\widehat{S}_{{T}},S)\,. \end{aligned}$$

(5)

It should be noted that statistical procedures that are optimal in the sense of this risk possess stable mean square accuracy uniformly over all possible admissible noise distributions in the model (3). This means that the corresponding statistical optimal algorithms have high noise immunity and, therefore, significantly improve the quality and reliability of statistical inference obtained on their basis.

To construct model selection procedures on the basis of the discrete data (2) we use the approach proposed in Konev and Pergamenshchikov (2015). It should be noted that the main analytic tool in that paper is based on the exponential decrease rate of the dependence in Ornstein-Uhlenbeck models, and, therefore, we cannot apply those methods to the semi-Markov models of the current work, since these models can retain a dependence in noises for a long time. So, in this paper, to study the estimation problem based on the discrete observations (2) for the model (3) with noises defined through semi-Markov processes, we develop new methods based on the special renewal theory from Barbu et al. (2019a); based on these techniques we can analyse the approximation errors in the discrete observations and obtain non-asymptotic sharp oracle inequalities. Moreover, as a consequence, we found constructive sufficient conditions on the observation frequency that provide the robust efficiency for proposed model selection procedures in an adaptive setting, i.e., in the case when the regularity properties of the function S are unknown.

1.3 Main contributions of this paper

As previously mentioned, in this paper we use for the first time nonparametric adaptive methods for estimation problems in the framework of the big data generalized semi-Markov regression models. To this end, we develop model selection procedures and corresponding analytical tools providing, under some constructive sufficient conditions, the optimality in the sharp oracle inequality sense and the robust adaptive efficiency in the minimax sense for the proposed estimators. It turns out that these conditions hold true for important practical cases such as, for example, regression models constructed through truncated fractional Poisson processes introduced in Barbu et al. (2019b). Moreover, in this paper, we extend for the first time the model from Barbu et al. (2019a) using the generalized semi-Markov models obtained by introducing a dependence structure in the sizes of the jumps. As an example, we use spherically symmetric random variables, which play very important role in many practical applications (see, for example, Fourdrinier and Pergamenshchikov, 2007 and the references therein).

1.4 Organization of the paper

The rest of the paper is organized as follows. In Sect. 2 we state the main conditions under which we consider the model (3). In Sect. 3 we present the truncated fractional Poisson process and its main properties. In Sect. 4 we construct model selection procedures on the basis of weighted least squares estimates. In Sect. 5 we state the main results. In Sect. 6 we develop the stochastic calculus for the generalized semi-Markov processes. Section 7 gives the proofs of the main results. Some auxiliary tools are given in an Appendix.

2 Main conditions

First, we assume that the noise process $(\xi _{{t}})_{{t\ge \, 0}}$ in the model (3) is defined as

$$\begin{aligned} \xi _{{t}} =\varrho _{1} w_{{t}}+ \varrho _{2} L_{{t}} + \varrho _{3} z_{{t}}\,, \end{aligned}$$

(6)

where $\varrho _{{1}}$, $\varrho _{{2}}$ and $\varrho _{{3}}$ are unknown coefficients, $(w_{{t}})_{{t\ge \,0}}$ is a standard Brownian motion, $ L_{{t}}= \int ^{t}_{{0}}\int _{{{{\mathbb {R}}}_{{*}}}}x (\mu (\mathrm {d}s,\mathrm {d}x) -\widetilde{\mu }(\mathrm {d}s,\mathrm {d}x))$, $\mu (\mathrm {d}s\,\mathrm {d}x)$ is the jump measure with deterministic compensator $\widetilde{\mu }(\mathrm {d}s\,\mathrm {d}x)=\mathrm {d}s\varPi (\mathrm {d}x)$, $\varPi (\cdot )$ is the Lévy measure on ${{\mathbb {R}}}_{{*}}={{\mathbb {R}}}\setminus \{0\}$ (see, for example, Liptser and Shiryaev, 1989 for details), with

$$\begin{aligned} \varPi (x^{2})=1 \quad \text{ and }\quad \varPi (x^{8}) \,<\,\infty \,. \end{aligned}$$

(7)

Here we use the usual notations for $\varPi (\vert x\vert ^{m})=\int _{{{{\mathbb {R}}}}}\,\vert z\vert ^{m}\,\varPi (\mathrm {d}z)$. Note that $\varPi (\vert x\vert )$ may be equal to $+\infty $. In this paper we assume that the “dependent part” in the noise (6) is modelled by the generalized semi-Markov process $(z_{{t}})_{{t\ge \, 0}} $ defined as

$$\begin{aligned} z_{{t}} = \sum _{{i=1}}^{N_{{t}}} \zeta _{{i}}, \end{aligned}$$

(8)

where $(\zeta _{{i}})_{{i\ge \, 1}}$ are random variables satisfying the following conditions:

$(\mathbf{C}_{{1}})$ For any $i\ge 1,$ we have $ \mathbf{E}\,\zeta _{{i}}=0$, $\mathbf{E}\,\zeta _{{i}}^2=1$ and $\zeta _{{*}}=\sup _{{l \ge 1}}\mathbf{E}\,\zeta ^4_{{l}}<\infty $;

$(\mathbf{C}_{{2}})$ $\mathbf{E}\,\zeta _{{i}}\, \zeta _{{j}} =0$ for any $i\ne j$;

$(\mathbf{C}_{{3}})$ For any $1\le k_{{1}}< k_{{2}}< k_{{3}}< k_{{4}},$ the random variables $(\zeta _{{k_{{i}}}})_{{1\le i\le 4}}$ are such that $ \mathbf{E}\,\zeta ^{\iota _{{1}}}_{{k_{{1}}}}\, \zeta ^{\iota _{{2}}}_{{k_{{2}}}}\, \zeta ^{\iota _{{3}}}_{{k_{{3}}}}\, \zeta ^{\iota _{{4}}}_{{k_{{4}}}}=0$ for any $\iota _{{1}},\ldots , \iota _{{4}} \in \{0,1,2,3\}$ for which $3\le \sum _{i=1}^{4} \iota _{{i}} \le 4$ and at least one among them is equal to one.

Now we give some examples for the correlation conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$. To this end, we first remind the definition of spherically symmetric distribution (see, for example, Fourdrinier and Pergamenshchikov, 2007). A random vector $\zeta =(\zeta _{{1}},\ldots ,\zeta _{{d}})'$ is called spherically symmetric if its density in ${{\mathbb {R}}}^{d}$ has the form $\mathbf{g}(\vert \cdot \vert ^{2})$ for some nonnegative function $\mathbf{g}$. Here the prime denotes the transposition. Note that there is a very important particular case of the spherically symmetric vectors represented by Gaussian mixture distributions. The vector $\zeta =(\zeta _{{1}},\ldots ,\zeta _{{d}})'$ is called a Gaussian mixture in ${{\mathbb {R}}}^{d}$ if it has the spherically symmetric distribution with

$$\begin{aligned} \mathbf{g}(t)=\mathbf{E}\,\frac{1}{(2\pi \mathbf{s})^{d/2}}\,e^{-\frac{t}{2\mathbf{s}^{2}}} \,, \end{aligned}$$

(9)

where $\mathbf{s}$ is a non-negative random variable. It should be emphasized that in radio-physics such distributions are very popular for statistical signal processing (see, for example, Middleton, 1979; Kassam, 1988). Using these definitions it is easy to see that the following random variables satisfy the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$:

$(\zeta _{{j}})_{{j\ge \,1}}$ that are i.i.d. random variables satisfying condition $(\mathbf{C}_{{1}})$;
For some $d>1,$ a random vector $(\zeta _{{1}},\ldots ,\zeta _{{d}})'$ that has a spherically symmetric distribution in ${{\mathbb {R}}}^{d},$ with $\mathbf{E}\,\zeta _{{1}}^2=1$, $\mathbf{E}\,\zeta _{{1}}^4 < \infty $ and such that the random variables $(\zeta _{{j}})_{{j>\,d}}$ are independent and satisfy condition $(\mathbf{C}_{{1}})$;
For some $d\ge 1,$ a random vector $(\zeta _{{1}},\ldots ,\zeta _{{d}})'$ that is a Gaussian mixture with mixture variable $\mathbf{s}$ for which $\mathbf{E}\,\mathbf{s}^{2}=1$ and $\mathbf{E}\,\mathbf{s}^{4}<\infty .$

Note that the process $N_{{t}}$ in (8) is a general counting process defined as

$$\begin{aligned} N_{{t}} = \sum _{{k=1}}^{\infty } \mathbf{1}_{{\left\{ \sum _{{l=1}}^k\, \tau _{{l}} \le t\right\} }}, \end{aligned}$$

(10)

with $(\tau _{{l}})_{{l\ge \,1}}$ an i.i.d. sequence of positive integrated random variables with distribution $\eta $ and mean $\overline{\tau }=\mathbf{E}_{{Q}}\,\tau _{{1}}>0.$ We assume that the processes $(N_{{t}})_{{t\ge 0}}$, $(Y_{{i}})_{i\ge \, 1}$ and $(L_{{t}})_{{t\ge 0}}$ are independent. In the sequel we will use the renewal measure defined as

$$\begin{aligned} \overline{\eta } =\sum ^{\infty }_{{l=1}}\,\eta ^{(l)} \,, \end{aligned}$$

(11)

where $\eta ^{(l)}$ is the lth convolution power of the measure $\eta $.

Remark 1

Note that in the case when the random variables $(\zeta _{{j}})_{{j\ge 1}}$ are i.i.d., then (8) is the semi-Markov process used in Barbu et al. (2019a).

To use the renewal methods from Barbu et al. (2019a) we assume that the distribution $\eta $ has a density g for which the following conditions hold true.

$(\mathbf{H}_{{1}}$) Assume that, for any $x\in {{\mathbb {R}}},$ there exist the finite limits $ g(x-)=\lim _{{z\rightarrow x-}}g(z)$ and $g(x+)=\lim _{{z\rightarrow x+}}g(z)$ and, $\forall \, K>0$, $ \exists \delta =\delta (K)>0$ for which

$$\begin{aligned} \sup _{{\vert x\vert \le K}}\, \int ^{\delta }_{{0}}\, \frac{ \vert g(x+t)+g(x-t)-g(x+)-g(x-) \vert }{t} \mathrm {d}t \,<\,\infty . \end{aligned}$$

$(\mathbf{H}_{{2}}$) $\forall \gamma >0,$ the upper bound $ \sup _{{z\ge 0}}\,z^{\gamma }\vert 2g(z) -g(z-)-g(z+) \vert \,<\,\infty $.

$(\mathbf{H}_{{3}}$) There exists $\beta >0$ such that $\int _{{{{\mathbb {R}}}_{{+}}}}\,e^{\beta x}\,g(x)\,\mathrm {d}x<\infty .$

$(\mathbf{H}_{{4}}$) There exists $t^{*}>0$ such that the Fourier transformation $\widehat{g}(\theta -it)$ belongs to $\mathcal{L}_{{1}}({{\mathbb {R}}})$ for any $0\le t\le t^{*}$, where $ \widehat{g}(z)= (2\pi )^{-1}\, \int _{{{{\mathbb {R}}}}}\,e^{i z v} g(v)\mathrm {d}v$.

Moreover, to check these conditions, we will use the following assumption.

$(\mathbf{H}^{*}_{{4}}$) The density g is two times continuously differentiable with $g(0)=0$ and there exists $\beta >0$ such that $ \int ^{+\infty }_{{0}}\,e^{\beta x}\left( g(x) +\vert g^{'}(x)\vert + \vert g^{''}(x)\vert \right) \mathrm {d}x<\infty $ and $\lim _{{x\rightarrow \infty }}\,e^{\beta x}\left( g(x)+ \vert g^{'}(x)\vert \right) =0$.

It is clear that the conditions $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{3}})$ hold true in this case. To obtain the condition $(\mathbf{H}_{{4}})$ it suffices to calculate the integral in $\widehat{g},$ integrating by parts two times. For example, one can take the gamma distribution of order $m\ge 2$

$$\begin{aligned} g(x)=\frac{\mathbf{a}^{m}x^{m-1}}{m !}\,e^{-\mathbf{a}x}\mathbf{1}_{{\{x\ge 0\}}} \quad \text{ and } \quad \mathbf{a}>0 \,. \end{aligned}$$

(12)

It should be noted that, in view of Proposition 5.2 from Barbu et al. (2019a), Conditions $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ imply that the renewal measure (11) has a continuous density $\rho $ such that for any $\mathbf{b}>0$

$$\begin{aligned} \sup _{{x\ge 0}}x^{\mathbf{b}} \vert \varUpsilon (x)\vert \,<\infty \quad \text{ and }\quad \varUpsilon (x)=\rho (x)-\frac{1}{\overline{\tau }} \,. \end{aligned}$$

Note that this implies

$$\begin{aligned} \vert \rho \vert _{{*}}=\sup _{{t\ge 0}}\vert \rho (t)\vert<\infty \quad \text{ and }\quad \Vert \varUpsilon \Vert _{{1}}=\int ^{+\infty }_{{0}}\,\vert \varUpsilon (x)\vert \,\mathrm {d}x \,<\infty \,. \end{aligned}$$

(13)

Remark 2

It should be noted that Condition $(\mathbf{H}_{{4}})$ does not hold for the exponential random variable $(\tau _{{j}})_{{j\ge 1}}$ since its density is not continuous in zero. But for exponential random variables, i.e. in the case when $(N_{{t}})_{{t\ge 0}}$ is a Poisson process, the renewal density can be calculated directly, i.e. $\rho (x)\equiv 1/\overline{\tau }$ and $\varUpsilon \equiv 0$.

Now we describe the class of possible admissible noise distributions used in the robust risk (5). To this end we set

$$\begin{aligned} \sigma _{{Q}}=\varrho _{1}^{2}+\varrho _{2}^{2} + \frac{\varrho _{3}^{2}}{\overline{\tau }}\,. \end{aligned}$$

(14)

As to the parameters in (6), we assume that

$$\begin{aligned} \varsigma _{{*}}\le \sigma _{{Q}} \le \varsigma ^{*}\,, \end{aligned}$$

(15)

where the unknown bounds $0<\varsigma _{{*}}\le \varsigma ^{*}$ can be functions of T, i.e. $\varsigma _{{*}}=\varsigma _{{*}}(T)$ and $\varsigma ^{*}=\varsigma ^{*}(T)$, such that for any $\mathbf{b}>0$

$$\begin{aligned} \lim _{{T\rightarrow \infty }}T^{\mathbf{b}}\,\varsigma _{{*}}(T)=+\infty \quad \text{ and }\quad \lim _{{T\rightarrow \infty }}\,\frac{\varsigma ^{*}(T)}{T^{\mathbf{b}}}=0\,. \end{aligned}$$

(16)

We denote by $\mathcal{Q}_{{T}}$ the family of all distributions of the process (6) in $\mathcal{D}[0,T]$ satisfying the properties (15)–(16).

Remark 3

As we will see later, the parameter (14) is the limit of the Fourier transform of the noise process (6). This limit is called variance proxy (see Konev and Pergamenshchikov, 2012).

3 Truncated fractional Poisson processes

As an example of the process (10) that satisfies Conditions $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}}),$ we give the truncated fractional Poisson process introduced in Barbu et al. (2019b). To this end, we remind the definition of the fractional Poisson process (see, for example, Biard and Saussereau, 2014; Laskin, 2003). The process (10) is called fractional Poisson process if the i.i.d. random variables $(\tau _{{j}})$ have the Mittag-Leffler distribution which, for some $\mathbf{a}>0,$ is defined as

$$\begin{aligned} \mathbf{P}(\tau _{{1}}> t) = \mathcal{E}_{{H}}(- \mathbf{a}t^H)\,, \end{aligned}$$

(17)

where $0<H\le 1$ is called the Hurst index,

$$\begin{aligned} \mathcal{E}_{{H}}(z) = \sum _{k=0}^{\infty } \frac{z^k}{\Gamma (1+H k)} \quad \text{ and }\quad \Gamma (x)=\int ^{+\infty }_{{0}}\,t^{x-1}\,e^{-t}\,\mathrm {d}t \,. \end{aligned}$$

Note that, if $H=1$, then we obtain the exponential distribution with parameter $\mathbf{a}>0$ and, therefore, the process (10) is a Poisson process. If $0<H<1$, then the density of the distribution (17) (see, for example, Repin and Saichev, 2000) can be represented as

$$\begin{aligned} f_{{H}}(t)=\frac{\mathbf{a}\,\sin (\pi H)}{\pi }\,\int ^{+\infty }_{{0}}\frac{x^{H}\,e^{-tx}}{x^{2H}+\mathbf{a}^{2}+2\mathbf{a}x^{H}\cos (\pi H)} \mathrm {d}x\,. \end{aligned}$$

(18)

Form here we can directly obtain that

$$\begin{aligned} f_{{H}}(t)\sim t^{H-1}\,,\quad f^{'}_{{H}}(t)\sim t^{H-2}\,,\quad f^{''}_{{H}}(t)\sim t^{H-3} \quad \text{ as }\quad t\rightarrow 0 \end{aligned}$$

(19)

and

$$\begin{aligned} f_{{H}}(t)\sim t^{-H-1}\,,\quad f^{'}_{{H}}(t)\sim t^{-H-2}\,, \quad f^{''}_{{H}}(t)\sim t^{-H-3} \quad \text{ as }\quad t\rightarrow \infty \,. \end{aligned}$$

(20)

In particular, this implies that the Mittag-Leffler distribution has a heavy tail, i.e.

$$\begin{aligned} \mathbf{P}(\tau _{{1}}> t)\quad \sim \quad t^{-H} \quad \text{ as }\quad t\rightarrow \infty \,, \end{aligned}$$

(21)

i.e. $\mathbf{E}\tau _{{1}}=+\infty $. Therefore, the condition $(\mathbf{H}_{{3}})$ does not hold for the distribution (17). To correct this effect, in Barbu et al. (2019b) it is proposed to replace the Mittag-Leffler random variables in (10) with i.i.d. random variables distributed as $\tau ^{*}_{{1}}=\min (X^{\mathbf{b}}_{{*}},X^{*})$, where $X_{{*}}$ is a Mittag-Leffler random variable with $0<H<1$, $0<\mathbf{b}\le H/3$ and $X^{*}$ is a positive random variable satisfying the condition $(\mathbf{H}^{*}_{{4}})$. Such processes are called truncated Poisson processes. Using the asymptotic properties (19) and (20) one can check directly that the random variable $\tau ^{*}_{{1}}$ satisfies the condition $(\mathbf{H}^{*}_{{4}})$ and, therefore, the conditions $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true for this case.

Remark 4

It should be noted also that the process (10) with the Mittag-Leffler random variables has a “memory” in its increments (see, for example, Maheshwari and Vellaisamy, 2016) in the sense that, for any $\delta >0$ and $s>0,$ the correlation coefficient

$$\begin{aligned} \text{ Corr }\left( \left( N_{{s+\delta }}-N_{{s}} \right) \,,\,\left( N_{{t+\delta }}-N_{{t}} \right) \right) \quad \sim \quad \,t^{-\frac{3-H}{2}} \quad \text{ as }\quad t\rightarrow \infty \,. \end{aligned}$$

It should be noted that this property is very important for many practical problems and it essentially allows to expand the possible applications of statistical results.

Unfortunately, we can’t use directly the fractional Poisson process in the regression model (3) since the impulse noise of the fractional Poisson processes will be very rare, since the time between jumps is not integrable, i.e. very large and, therefore, they have almost negligible influence in the observation models. On the contrary, the truncated process has an exponential moment, i.e. the same property as Poisson processes, and, moreover, it keeps a dependence on large time intervals.

4 Model selection

In this section we construct a model selection procedure for estimating the unknown function S given in (3) starting from the discrete-time observations (2) and we establish the oracle inequality for the associated risk. To this end, note that for any function $f:[0,T] \rightarrow {{\mathbb {R}}}$ from $\mathcal{L}_{{2}}[0,T]$, the integral

$$\begin{aligned} I_{{T}}(f)=\int _{{0}}^{T} f(s) \mathrm {d}\xi _{{s}} \end{aligned}$$

(22)

is well defined, with $\mathbf{E}_{{Q}}\,I_{{T}}(f)=0$. Moreover, as it is shown in Lemma 1 under the conditions $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}}),$

$$\begin{aligned} \mathbf{E}_{{Q}}\,I^{2}_{{T}} (f) \le \varkappa _{{Q}}\,\int _{{0}}^{T} f^2(s) \mathrm {d}\,s \quad \text{ and }\quad \varkappa _{{Q}}=\varrho _{1}^{2}+\varrho _{2}^{2}+\varrho _{3}^2\,\vert \rho \vert _{{*}}\,. \end{aligned}$$

(23)

In this paper we assume that the observations frequency p in (2) is odd and we will use the trigonometric basis $(\phi _{{j}})_{{j\ge \, 1}}$ in $\mathcal{L}_{{2}}[0,1]$ defined as

$$\begin{aligned} \phi _{{1}} = 1\,,\quad \phi _{{j}}(x)= \sqrt{2} \text{ Tr}_{{j}}(2\pi [j/2]x)\,,\quad j\ge \,2\,, \end{aligned}$$

(24)

where the function $\text{ Tr}_{{j}}(x)= \cos (x)$ for even j and $\text{ Tr}_{{j}}(x)= \sin (x)$ for odd j, [x] denotes the integer part of x. Note, that these functions are orthonormal on the points $(t_{{j}})_{{1\le j\le p}}$, i.e. for any $1\le i,j\le p$

$$\begin{aligned} (\phi _{{i}},\phi _{{j}})_{{p}} = \frac{1}{p} \sum _{{l=1}}^{p} \phi _{{i}}(t_{{l}}) \phi _{{j}}(t_{{l}}) =\mathbf{1}_{{\{i=j \}}}\,. \end{aligned}$$

(25)

In the sequel we denote by $\Vert x\Vert ^{2}_{{p}}=(x,x)_{{p}}$. Now note that, for any $1\le l\le p,$

$$\begin{aligned} S(t_{{l}})= \sum _{{j=1}}^{p}\,\theta _{{j,p}}\, \phi _{{j}}(t_{{l}}) \quad \text{ and }\quad \theta _{{j,p}}=(S,\phi _{{j}})_{{p}}\,. \end{aligned}$$

(26)

Using the approach from Konev and Pergamenshchikov (2015), we estimate the Fourier coefficients $\theta _{{j,p}}$ on the basis of the observations from the interval [0, T] as

$$\begin{aligned} \widehat{\theta }_{{j,p}}= \frac{1}{T} \int _{{0}}^{T} \psi _{{j,p}}(t) \mathrm {d}\,y_{{t}}, \quad \text{ and }\quad \psi _{{j,p}}(t) = \sum _{{l=1}}^{n} \phi _{{j}}(t_l)1_{{\{t_{l-1} < t \le t_l \}}} \,. \end{aligned}$$

(27)

Note here that the functions $(\psi _{{j,p}})_{1 \le j \le p }$ are 1-periodic and orthonormal on the interval [0, 1], i.e. in $\mathcal{L}_{{2}}[0,1]$,

$$\begin{aligned} (\psi _{{j,p}},\psi _{{i,p}})= \int _{{0}}^{1} \psi _{{j,p}}(t) \psi _{{i,p}}(t) \mathrm {d}\,t\ = (\phi _{{j}},\phi _{{i}})_{{p}} =\mathbf{1}_{{\{i=j \}}} \,. \end{aligned}$$

(28)

The Fourier coefficients of S for this basis in $\mathcal{L}_{{2}}[0,1]$ can be represented as

$$\begin{aligned} \overline{\theta }_{{j,p}}= (S,\psi _{{i,p}}) = \int _{{0}}^{1} S(t) \psi _{{i,p}}(t) \mathrm {d}\,t\ = \theta _{{j,p}} + h_{j,p}, \end{aligned}$$

(29)

where $h_{j,p}(S)= \sum _{{l=1}}^{p} \int _{{t_{l-1}}}^{t_l} \phi _{{j}}(t_l) (S(t)-S(t_l)) \mathrm {d}\,t$. Taking into account here that the functions $(\psi _{{j,p}})_{1 \le j \le p }$ are 1-periodic and using the observation model (3) in (27) we obtain, that

$$\begin{aligned} \widehat{\theta }_{{j,p}}&=\frac{1}{T}\int ^{T}_{{0}}\psi _{{i,p}}(t) S(t)\mathrm {d}\,t +\frac{1}{T}\int ^{T}_{{0}}\psi _{{i,p}}(t)\mathrm {d}\xi _{{t}}\\[2mm]&=\int ^{1}_{{0}}\psi _{{i,p}}(t) S(t)\mathrm {d}\,t +\frac{1}{T}\,\int ^{T}_{{0}}\psi _{{i,p}}(t) \mathrm {d}\,\xi _{{t}} \,. \end{aligned}$$

Therefore,

$$\begin{aligned} \widehat{\theta }_{{j,p}}= \overline{\theta }_{{j,p}} + \frac{1}{\sqrt{T}}\xi _{{j,p}} \quad \text{ and }\quad \xi _{{j,p}}= \frac{1}{\sqrt{T}} I_{{T}}(\psi _{{j,p}}) \,. \end{aligned}$$

(30)

As in Barbu et al. (2019a), we use the model selection procedures based on the following weighted least squares estimators

$$\begin{aligned} \widehat{S}_{{\lambda }} (t) = \sum _{{j=1}}^{p}\, \lambda (j) \widehat{\theta }_{{j,p}} \psi _{{j,p}}(t)\,, \quad 0\le t\le 1\,, \end{aligned}$$

(31)

where the weight vector $\lambda =(\lambda (1),\ldots ,\lambda (p))'$ belongs to some finite set $\varLambda $ from $[0,1]^{p}$. Here the prime $'$ denotes the transposition. Moreover, we set

$$\begin{aligned} \mathbf{m}_{{*}}=\text{ card }(\varLambda ) \quad \text{ and }\quad \varLambda _{{*}}= \max _{{\lambda \in \varLambda }}\,\sum ^{p}_{{j=1}} \mathbf{1}_{{\{ \lambda (j)>0\}}} \,, \end{aligned}$$

(32)

where $\text{card }(\varLambda )$ is the cardinal number of the set $\varLambda $. We assume that $\varLambda _{{*}}\le T$. Now we use the same criteria as in Barbu et al. (2019a) to choose a weight vector in $\varLambda $, i.e., we minimize the empirical error

$$\begin{aligned} \text{ Err }(\lambda ) = \Vert \widehat{S}_\lambda -S\Vert ^2\,, \end{aligned}$$

(33)

which can be represented as

$$\begin{aligned} \text{ Err }(\lambda ) = \sum _{{j=1}}^{p} \lambda ^2(j) \widehat{\theta }^2_{{j,p}} -2 \sum _{{j=1}}^{p} \lambda (j) \widehat{\theta }_{{j,p}}\overline{\theta }_{{j,p}}+ \Vert S\Vert ^{2} \,. \end{aligned}$$

(34)

Note that the Fourier coefficients $(\theta _{{j,p}})_{{j\ge \,1}}$ are unknown. Therefore, using the approach from Barbu et al. (2019a) to minimize this function we replace the terms $\widehat{\theta }_{{j,p}}\overline{\theta }_{{j,p}}$ by their estimators

$$\begin{aligned} \widetilde{\theta }_{{j,p}} = \widehat{\theta }^2_{{j,p}} - \frac{\sigma _{{Q}}}{T}\,, \end{aligned}$$

where the proxy variance $\sigma _{{Q}}$ is defined in (15). In the case when this variance is unknown we use its estimator, i.e.

$$\begin{aligned} \widetilde{\theta }_{{j,p}} = \widehat{\theta }^2_{{j,p}} - \frac{ \widehat{\sigma }_{{T}}}{T} \quad \text{ and }\quad \widehat{\sigma }_{{T}}=\frac{T}{p} \sum ^{p}_{{j=[\sqrt{T}]}}\,\widehat{\theta }^2_{{j,p}}\,. \end{aligned}$$

(35)

Now, using this estimator we define the penalty term as

$$\begin{aligned} \widehat{P}_{{T}}(\lambda )= \frac{ \widehat{\sigma }_{{T}} |\lambda |^2}{T} \quad \text{ and }\quad |\lambda |^2=\sum ^{p}_{{j=1}}\,\lambda ^{2}(j) \,. \end{aligned}$$

(36)

In the case when the variance $\sigma _{{Q}}$ is known we set

$$\begin{aligned} P_{{T}}(\lambda )= \frac{\sigma _{{Q}} |\lambda |^2}{T}\,. \end{aligned}$$

(37)

Finally, we define the cost function as

$$\begin{aligned} J_{{T}}(\lambda )=\sum _{{j=1}}^{p} \lambda ^2(j) \widehat{\theta }^2_{{j,p}} -2 \sum _{{j=1}}^{p} \lambda (j)\widetilde{\theta }_{{j,p}} + \delta \,\widehat{P}_{{T}}(\lambda ), \end{aligned}$$

(38)

where $\delta >0$ is some threshold which will be specified later. Now we set the model selection procedure as

$$\begin{aligned} \widehat{S}_{{*}} = \widehat{S}_{{\widehat{\lambda }}} \quad \text{ and }\quad \widehat{\lambda } = \text{ argmin}_{{\lambda \in \varLambda }} J_{{T}}(\lambda ) \,. \end{aligned}$$

(39)

In the case when $\widehat{\lambda }$ is not unique we take one of them.

5 Main results

5.1 Oracle inequalities

First, we obtain non-asymptotic oracle inequalities for the procedure (39).

Theorem 1

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true. Then, there exists some constant $\mathbf{c}^{*}>0$ such that for any $T\ge \, 1$ and any noise distribution $Q\in \mathcal{Q}_{{T}}$ and $ 0 <\delta \le 1/6$, the procedure (39) satisfies the following oracle inequality

$$\begin{aligned} \mathcal{R}_{{Q}}(\widehat{S}_{{*}},S)\le&\frac{1+3\delta }{1-3\delta } \min _{{\lambda \in \varLambda }} \mathcal{R}_{{Q}}(\widehat{S}_\lambda ,S)\nonumber \\&+ \mathbf{c}^{*}\frac{\mathbf{m}_{{*}}}{\delta T} \left( \sigma _{{Q}} +\varLambda _{{*}}\, \mathbf{E}_{{Q}}\vert \widehat{\sigma }_{{T}} - \sigma _{{Q}}\vert \right) \,. \end{aligned}$$

(40)

In the case when $\sigma _{{Q}}$ is known, the inequality (40) has the form given in the next result.

Corollary 1

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true and that the proxy variance $\sigma _{{Q}}$ is known. Then there exists some constant $\mathbf{c}^{*}>0$ such that for any $T\ge \,1$ and for any noise distribution $Q\in \mathcal{Q}_{{T}}$ and $ 0 <\delta \le 1/6$, the procedure (39) with $\widehat{\sigma }_{{T}}=\sigma _{{Q}}$ satisfies the following oracle inequality

$$\begin{aligned} \mathcal{R}_{{Q}}(\widehat{S}_{{*}},S)\le \frac{1+3\delta }{1-3\delta } \min _{{\lambda \in \varLambda }} \mathcal{R}_{{Q}}(\widehat{S}_\lambda ,S) + \mathbf{c}^{*}\frac{\sigma _{{Q}} \mathbf{m}_{{*}}}{\delta T} \,. \end{aligned}$$

Now we study the estimator $\widehat{\sigma }_{{T}}$ defined in (35).

Proposition 1

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true for the model (3) and that $S(\cdot )$ is continuously differentiable. Then, there exists a constant $\mathbf{c}^{*}>0$ such that for any $T\ge 4$, $Q\in \mathcal{Q}_{{T}}$ and $p>\sqrt{T}$,

$$\begin{aligned} \mathbf{E}_{{Q,S}}|\widehat{\sigma }_{{T}}-\sigma _{{Q}}| \le \, \mathbf{c}^{*}\, (1+\Vert \dot{S}\Vert ^2)(1+\sigma _{{Q}})\,\mathbf{g}^{*}_{{T,p}} \,, \end{aligned}$$

(41)

where $\mathbf{g}^{*}_{{T,p}}=\sqrt{T}/p+ 1/\sqrt{p}$.

Now Theorem 1 and this proposition imply directly the following result.

Theorem 2

Assume that the function S is continuously differentiable and that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true. Then there exists some constant $\mathbf{c}^{*}>0$ such that for any continuously differentiable function S for any $T\ge \, 4 $, for any noise distribution $Q\in \mathcal{Q}_{{T}}$, $p> \sqrt{T}$ and $ 0 <\delta \le 1/6$,

$$\begin{aligned} \mathcal{R}_{{Q}}(\widehat{S}_{{*}},S) \le&\frac{1+3\delta }{1-3\delta } \min _{{\lambda \in \varLambda }} \mathcal{R}_{{Q}}(\widehat{S}_\lambda ,S)\nonumber \\&+ \mathbf{c}^{*}\frac{\mathbf{m}_{{*}}}{\delta T} (1+\sigma _{{Q}}) \, \left( 1+\Vert \dot{S}\Vert ^{2} \right) \left( 1+\varLambda _{{*}}\mathbf{g}^{*}_{{T,p}} \right) \,. \end{aligned}$$

(42)

Now, to study asymptotic properties of the term $\mathbf{g}^{*}_{{T,p}}$ as $T\rightarrow \infty $ and to provide an efficient estimation for the function S, we have to assume some condition on the frequency of the observations p.

$(\mathbf{H}_{{5}}$) Assume that the frequency p is a function of T, i.e. $p=p_{{T}}$, such that

$$\begin{aligned} \liminf _{{T\rightarrow \infty }}\,\frac{p_{{T}}}{T^{5/6+\epsilon }}>0 \quad \text{ for } \text{ some }\quad \epsilon >0 \,. \end{aligned}$$

(43)

Remark 5

This condition is the same as in Konev and Pergamenshchikov (2015) (see (3.41)). Note that it is not too restrictive since we use only observations of the process (3) at discrete time moments. It should be noted also that, to provide optimal asymptotic properties for the model selection procedure (39), one needs to approximate the model (3) on the basis of the observations (2) such that to minimise the right-hand side of the inequality (42). To do this, we need to find a family of optimal estimators $\widehat{S}_{{\lambda \in \varLambda }}$ and, moreover, we need to minimise the estimation accuracy for the variance $\sigma _{{Q}}$. For these reasons we have to use the lower bound (43).

Moreover, to study the last term of the right-hand side of the inequality (42), we also need a condition for weights.

$(\mathbf{H}_{{6}}$) The parameters $\mathbf{m}_{{*}}$ and $\varLambda _{{*}}$ defined in (32) can be functions of T, i.e., $\mathbf{m}_{{*}}=\mathbf{m}_{{*}}(T)$ and $\varLambda _{{*}}=\varLambda _{{*}}(T)$, such that $\forall \,\mathbf{b}> 0$ $ \lim _{{T\rightarrow \infty }} T^{-\mathbf{b}}\mathbf{m}_{{*}}(T) =0$ and $\lim _{{T\rightarrow \infty }} T^{-1/3-\mathbf{b}} \varLambda _{{*}}(T) =0$.

Now, Theorem 2 implies the following oracle inequality.

Theorem 3

Assume that the function S is continuously differentiable and that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{6}})$ hold true. Then, for any $T\ge 4$, $p> \sqrt{T}$ and $ 0<\delta <1/6,$ the procedure (39) satisfies the following oracle inequality

$$\begin{aligned} \mathcal{R}^{*}(\widehat{S}_{{*}},S)\le \frac{1+3\delta }{1-3\delta } \min _{{\lambda \in \varLambda }} \mathcal{R}^{*}(\widehat{S}_\lambda ,S)+ \frac{\mathbf{U}^{*}_{{T}}}{T\delta } \,, \end{aligned}$$

where the term $\mathbf{U}^{*}_{{T}}>0$ is such that, for any $r>0$ and $\mathbf{b}>0,$

$$\begin{aligned} \lim _{{T\rightarrow \infty }}\, \sup _{{\Vert \dot{S}\Vert \le r}} \, T^{-\mathbf{b}}\mathbf{U}^{*}_{{T}} =0 \,. \end{aligned}$$

(44)

To obtain the efficiency property, we specify the weight coefficients in the procedure (39). Consider, for some $0<\varepsilon <1,$ a numerical grid of the form

$$\begin{aligned} {{\mathcal {A}}}=\{1,\ldots ,k^*\}\times \{\varepsilon ,\ldots ,[1/\varepsilon ^2] \varepsilon \}\,, \end{aligned}$$

(45)

where $k^*\ge 1$ and $\varepsilon $ are functions of T, i.e. $k^*=k^*(T)$ and $\varepsilon =\varepsilon (T)$, such that

$$\begin{aligned} \left\{ \begin{array}{ll} &{}\lim _{{T\rightarrow \infty }}\,k^*(T)=+\infty \,, \quad \lim _{{T\rightarrow \infty }}\,\dfrac{k^*(T)}{\ln T}=0\,,\\ &{} \lim _{{T\rightarrow \infty }}\,\varepsilon (T)=0 \quad \text{ and }\quad \lim _{{T\rightarrow \infty }}\,T^{\mathbf{b}}\varepsilon (T)\,=+\infty \end{array} \right. \end{aligned}$$

(46)

for any $\mathbf{b}>0$. One can take, for example, for $T\ge 2$

$$\begin{aligned} \varepsilon (T)=\frac{1}{ \ln T } \quad \text{ and }\quad k^*(T)=k^{*}_{{0}}+\sqrt{\ln T}\,, \end{aligned}$$

where $k^{*}_{{0}}\ge 0$ is a fixed constant. For each $\alpha =(k, r)\in {{\mathcal {A}}}$, we set the vector

$$\begin{aligned} \lambda _{{\alpha }}=(\lambda _{{\alpha }}(j))_{{1\le j\le p}} \end{aligned}$$

through its components which are defined as

$$\begin{aligned} \lambda _{{\alpha }}(j)=\mathbf{1}_{{\{1\le j<\ln T\}}}+ \left( 1-(j/\omega _\alpha )^{k}\right) \, \mathbf{1}_{{\{ \ln T \le j\le \omega _{{\alpha }}\}}}\,, \end{aligned}$$

(47)

where $ \omega _{{\alpha }}= \left( \tau _{{k}} \,r\upsilon _{{T}} \right) ^{1/(2k+1)}$,

$$\begin{aligned} \tau _{{k}}= \frac{(k+1)(2k+1)}{\pi ^{2k}k}\,, \quad \upsilon _{{T}}=T/\varsigma ^{*} \end{aligned}$$

and $\varsigma ^{*}$ is introduced in (15). Now we define the set $\varLambda $ as

$$\begin{aligned} \varLambda \,=\,\{\lambda _{{\alpha }}\,,\,\alpha \in {{\mathcal {A}}}\}\,. \end{aligned}$$

(48)

These weight coefficients are used in Konev and Pergamenshchikov (2009a; 2012; 2015) for continuous time regression models to show the asymptotic efficiency. Note also that in this case the cardinal of the set $\varLambda $ is $\mathbf{m}_{{*}}=k^{*} m$. Moreover, taking into account that for $k\ge 1$ the coefficient $\omega _{{\alpha }}<(r\upsilon _{{T}})^{1/(2k+1)}$, we obtain that the norm of the set $\varLambda $ defined in (32) can be bounded as $\varLambda _{{*}}\, \le \, \sup _{{\alpha \in {{\mathcal {A}}}}} \omega _{{\alpha }} \le (\upsilon _{{T}}/\varepsilon )^{1/3}$. Therefore, the properties (46) imply the condition $\mathbf{H}_{{6}})$.

5.2 Robust asymptotic efficiency

Now we study the asymptotic efficiency properties for the procedure (39), (48) with respect to the robust risks (5) defined by the distribution family (15)–(16). To this end, we assume that on the interval [0, 1] the unknown function S in the model (3) belongs to the Sobolev ball

$$\begin{aligned} \mathcal{W}_{{\mathbf{r},\mathbf{k}}}=\left\{ f\in \,{{\mathcal {C}}}^{\mathbf{k}}_{{per}}[0,1]\,:\, \sum _{{j=0}}^{\mathbf{k}}\,\Vert f^{(j)}\Vert ^2\le \mathbf{r}\right\} , \end{aligned}$$

(49)

where $\mathbf{r}>0$, $k\ge 1$ are some unknown parameters, ${{\mathcal {C}}}^{\mathbf{k}}_{{per}}[0,1]$ is the set of $\mathbf{k}$ times continuously differentiable functions $f\,:\,[0,1]\rightarrow {{\mathbb {R}}}$ such that $f^{(i)}(0)=f^{(i)}(1)$ for all $0\le i \le \mathbf{k}$. Note, that the class (49) is an ellipsoid, i.e.

$$\begin{aligned} \mathcal{W}_{{\mathbf{r},\mathbf{k}}}= \left\{ f=\sum _{{j\ge 1}}\theta _{{j}}\phi _{{j}} \,:\, \sum _{{j=1}}^{\infty }\,a_{{j}}\,\theta ^2_{{j}}\,\le \mathbf{r}\right\} , \end{aligned}$$

(50)

where $a_{{j}}=\sum ^{\mathbf{k}}_{{i=0}}\left( 2\pi [j/2]\right) ^{2i}$ and $\theta _{{j}}=(f,\phi _{{j}})=\int ^{1}_{{0}}f(t)\phi _{{j}}(t)\mathrm {d}t$. Similarly to Barbu et al. (2019a) we will show here that the asymptotic sharp lower bound for the normalized robust risk (5) is given by the well-known Pinsker constant defined as

$$\begin{aligned} \mathbf{l}_{{*}}=\mathbf{l}_{{*}}(\mathbf{r})= \, \left( (2\mathbf{k}+1)\mathbf{r}\right) ^{1/(2\mathbf{k}+1)}\, \left( \frac{\mathbf{k}}{(\mathbf{k}+1)\pi } \right) ^{2\mathbf{k}/(2\mathbf{k}+1)}\,. \end{aligned}$$

(51)

To study efficient properties we need to use the set $\Xi _{{T}}$ of all possible estimators $\widehat{S}_{{T}}$ measurable with respect to the sigma-algebra $\sigma \{y_{{t}}\,,\,0\le t\le T\}$.

Theorem 4

For the risk (5) with the coefficient rate $\upsilon _{{T}}=T/\varsigma ^{*}$

$$\begin{aligned} \liminf _{{T\rightarrow \infty }}\, \upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}} \inf _{{\widehat{S}_{{T}}\in \Xi _{{T}}}}\,\, \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}} \,\mathcal{R}^{*}_{{T}}(\widehat{S}_{{T}},S) \ge \mathbf{l}_{{*}}\,. \end{aligned}$$

(52)

Note that, if the radius $\mathbf{r}$ and the regularity $\mathbf{k}$ are known, i.e. for the nonadaptive estimation problem on the continuous observations $(y_{{t}})_{{0\le t\le T}}$, in Barbu et al. (2019a) it is proposed to use the estimate $\widehat{S}_{{\lambda _{{0}}}}$ defined in (31) with the weights (48)

$$\begin{aligned} \lambda _{{0}}=\lambda _{{\alpha _{{0}}}} \,,\quad \alpha _{{0}}=(\mathbf{k},r_{{0}}) \quad \text{ and }\quad r_{{0}}=[\mathbf{r}/\varepsilon ]\varepsilon \,. \end{aligned}$$

(53)

Now, we show the same result for the discrete observations (2).

Proposition 2

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{2}})$ and $(\mathbf{H}_{{1}}$)–$(\mathbf{H}_{{5}}$) hold true. Then

$$\begin{aligned} \lim _{{T \rightarrow \infty }} \upsilon ^{2k /(2k+1)}_{{T}}\, \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}} \mathcal{R}^*_{{T}} (\widehat{S}_{\lambda _{{0}}},S) \le \mathbf{l}_{{*}} \,. \end{aligned}$$

(54)

For the adaptive estimation we user the model selection procedure (39) with the parameter $\delta $ defined as a function of T, i.e. $\delta =\delta _{{T}}$, such that

$$\begin{aligned} \lim _{{T \longrightarrow \infty }}\,\delta _{{T}}=0 \quad \text{ and }\quad \lim _{{T \longrightarrow \infty }}\,T^{\mathbf{b}}\,\delta _{{T}}=+\infty \end{aligned}$$

(55)

for any $\mathbf{b}>0$. For example, we can take $\delta _{{T}}=(6+\ln T)^{-1}$.

Theorem 5

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{6}})$ hold true. Then the robust risk (5) for the procedure (39) with the coefficients (48) and the parameter $\delta =\delta _{{T}}$ satisfying (55) has the following upper bound

$$\begin{aligned} \limsup _{{T\rightarrow \infty }}\, \upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\, \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}}\, \mathcal{R}^{*}_{{T}}(\widehat{S}_{{*}},S) \le \mathbf{l}_{{*}} \,. \end{aligned}$$

Theorems 4 and 5 imply the following result.

Theorem 6

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{6}})$ hold true. Then the procedure (39) with the weight coefficients (48) and the parameter $\delta =\delta _{{T}}$ satisfying (55) is asymptotically efficient, i.e.

$$\begin{aligned} \lim _{{T\rightarrow \infty }}\, \frac{ \inf _{{\widehat{S}_{{T}}\in \Xi _{{T}}}}\,\, \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}} \,\mathcal{R}^{*}_{{T}}(\widehat{S}_{{T}},S)}{\sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}}\, \mathcal{R}^{*}_{{T}}(\widehat{S}_{{*}},S)}=1 \end{aligned}$$

and

$$\begin{aligned} \lim _{{T\rightarrow \infty }}\, \upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\, \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}} \,\mathcal{R}^{*}_{{T}}(\widehat{S}_{{*}},S) = \mathbf{l}_{{*}}\,. \end{aligned}$$

Remark 6

It is well known that the optimal (minimax) risk convergence rate for the Sobolev ball $\mathcal{W}_{{\mathbf{r},\mathbf{k}}}$ is $T^{2\mathbf{k}/(2\mathbf{k}+1)}$ (see, for example, Pinsker, 1981; Konev and Pergamenshchikov, 2009b). We see here that the efficient robust rate is $\upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}$, i.e. if the distribution upper bound $\varsigma ^{*}\rightarrow 0$ as $T\rightarrow \infty $ we obtain a faster rate with respect to $T^{2\mathbf{k}/(2\mathbf{k}+1)}$, and if $\varsigma ^{*}\rightarrow \infty $ as $T\rightarrow \infty $ we obtain a slower rate. In the case when $\varsigma ^{*}$ is constant the robust rate is the same as the classical non robust convergence rate.

5.3 Big data analysis for the model (1)

Now we consider the estimation problem for the parameters $(\beta _{{j}})_{{1\le j\le q}}$ in (3) with unknown q. In this case we have to estimate the sequence $\beta =(\beta _{{j}})_{{j\ge 1}}$ in which $\beta _{{j}}=0$ for $j\ge q+1$. To this end we assume that the functions $(\mathbf{u}_{{j}})_{{j\ge 1}}$ are orthonormal in $\mathcal{L}_{{2}}[0,1]$, i.e. $(\mathbf{u}_{{i}},\mathbf{u}_{{j}})=\mathbf{1}_{{\{i\ne j\}}}$.

Indeed, we can use always the Gram-Schmidt orthogonalization procedure to provide this property. Thus, in this case we estimate the parameters $\beta =(\beta _{{j}})_{{j\ge 1}}$ through the estimator (39) as $\widehat{\beta }_{{*}}=(\widehat{\beta }_{{*,j}})_{{j\ge 1}}$ and $\widehat{\beta }_{{*,j}}=(\mathbf{u}_{{j}},\widehat{S}_{{*}})$. Similarly, using the weighted estimators (31) we define the basic estimators $(\widehat{\beta }_{{\lambda }})_{{\lambda \in \varLambda }}$ as $\widehat{\beta }_{{\lambda }}=(\widehat{\beta }_{{j,\lambda }})_{{j\ge 1}}$ and $ \widehat{\beta }_{{j,\lambda }}=(\mathbf{u}_{{j}},\widehat{S}_{{\lambda }})$. Taking into account that in this case

$ \vert \widehat{\beta }_{{*}}-\beta \vert ^{2}=\sum ^{\infty }_{{j=1}}\,(\widehat{\beta }_{{*,j}}-\beta _{{j}})^{2} =\Vert \widehat{S}_{{*}}- S\Vert ^{2}$ and $ \vert \widehat{\beta }_{{\lambda }}-\beta \vert ^{2}=\Vert \widehat{S}_{{\lambda }}- S\Vert ^{2}$, Theorem 3 implies the following oracle inequality.

Theorem 7

Assume that the function (3) is continuously differentiable and that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$, $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{5}})$ and (15)–(16) hold true. Then, for any $T\ge \, 4 $ and $ 0<\delta <1/6,$ the following oracle inequality holds true

$$\begin{aligned} \sup _{{Q\in \mathcal{Q}_{{T}}}}\,\mathbf{E}_{{Q,S}}\vert \widehat{\beta }_{{*}}-\beta \vert ^{2} \le \frac{1+3\delta }{1-3\delta } \min _{{\lambda \in \varLambda }} \sup _{{Q\in \mathcal{Q}_{{T}}}}\,\mathbf{E}_{{Q,S}}\vert \widehat{\beta }_{{\lambda }}-\beta \vert ^{2} + \frac{\mathbf{U}^{*}_{{T}}}{T\delta } \,, \end{aligned}$$

where the term $\mathbf{U}^{*}_{{T}}>0$ satisfies the property (44).

Moreover, Theorem 6 implies the following efficiency property.

Theorem 8

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{6}})$ hold true. Then the estimator $\widehat{\beta }_{{*}}$ constructed through the procedure (39) with the weight coefficients (48) and the parameter $\delta =\delta _{{T}}$ satisfying (55) is asymptotically efficient in the minimax sense, i.e.

$$\begin{aligned} \lim _{{T\rightarrow \infty }}\, \frac{ \inf _{{\widehat{\beta }_{{T}}}}\,\, \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}} \, \sup _{{Q\in \mathcal{Q}_{{T}}}}\,\mathbf{E}_{{Q,S}}\vert \widehat{\beta }_{{T}}-\beta \vert ^{2}}{ \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}}\, \sup _{{Q\in \mathcal{Q}_{{T}}}}\,\mathbf{E}_{{Q,S}}\vert \widehat{\beta }_{{*}}-\beta \vert ^{2}}=1 \end{aligned}$$

(56)

and

$$\begin{aligned} \lim _{{T\rightarrow \infty }}\, \upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\, \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}} \sup _{{Q\in \mathcal{Q}_{{T}}}}\,\mathbf{E}_{{Q,S}}\vert \widehat{\beta }_{{*}}-\beta \vert ^{2} = \mathbf{l}_{{*}}\,, \end{aligned}$$

where the infimum is taken over all possible estimators $\widehat{\beta }_{{T}}$ measurable with respect the field $\sigma \{y_{{t}}\,,\,0\le t\le T\}$ and the lower bound $\mathbf{l}_{{*}}$ is defined in (51).

Remark 7

It should be emphasized that the efficiency properties (56) are obtained without sparse conditions on the number of nonzero parameters $\beta _{{j}}$ in the model (1) (see, for example, Hastie et al., 2008). Moreover, we do not use even the parameter dimension q which can be equal to $+\infty $.

6 Stochastic calculus for generalized semi-Markov processes

In this section we study some properties of the stochastic integrals (22). First, note that using the conditions $(\mathbf{C}_{{1}})$ and $(\mathbf{C}_{{2}})$ and the stochastic calculus developed in Barbu et al. (2019a) for semi-Markov processes we can show the following Lemmas 1 and 2.

Lemma 1

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{2}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true. Then, for any nonrandom functions f and h from $\mathcal{L}_{{2}}[0,T]$

$$\begin{aligned} \mathbf{E}_{{Q}}\,I_{{t}}(f) I_{{t}}(h) = (\varrho ^{2}_{{1}}+\varrho ^{2}_{{2}}) \, (f,h)_{{t}}\, + \varrho _{{3}}^2 \, (f,h\rho )_{{t}} \,, \end{aligned}$$

(57)

where $(f,h)_{{t}}=\int _{{0}}^{t} f(s)\,h(s) \mathrm {d}s$ and $\rho $ is the density of the renewal measure (11).

It should be noted that this lemma implies directly that the stochastic integral (22) satisfies the properties (23).

Lemma 2

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{2}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true. Then, for any bounded $[0,\infty ) \rightarrow {{\mathbb {R}}}$ functions f and h and for any $k\ge 1,$

$$\begin{aligned} \mathbf{E}_{{Q}} \left( I_{{\mathbf{t}_{{k}}-}} (f)\,I_{{\mathbf{t}_{{k}}-}} (h) \mid \mathcal{G}\right) = (\varrho ^{2}_{{1}}+\varrho ^{2}_{{2}}) (f\,,\,h)_{{\mathbf{t}_{{k}}}}+ \varrho _{{3}}^2 \sum _{{l=1}}^{k-1}\, f(\mathbf{t}_{{l}})\,h(\mathbf{t}_{{l}}), \end{aligned}$$

where $\mathbf{t}_{{k}}=\sum ^{k}_{{j=1}}\tau _{{j}}$ and $\mathcal{G}=\sigma \{\mathbf{t}_{{l}}\,,\,l\ge 1\}$.

Lemma 3

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true. Then, for any nonrandom bounded $[0,T]\rightarrow {{\mathbb {R}}}$ functions f and h,

the expectation $ \mathbf{E}_{{Q}} \int _{{0}}^{T} I^2_{{t-}}(f) I_{{t-}}(h) h(t) \mathrm {d}\xi _{{t}} = 0$.

Proof

Setting $ \check{L}_{{t}} =\varrho _{1} w_{{t}}+ \varrho _{2} L_{{t}}$ , we can represent the integral (22) as

$$\begin{aligned} I_{{t}}(f)=\check{I}_{{t}}(f)+\varrho _{{3}}I^{z}_{{t}}(f) \,, \end{aligned}$$

(58)

where $\check{I}_{{t}}(f)=\int ^{t}_{{0}} f(u)\mathrm {d}\check{L}_{{u}}$ and $I^{z}_{{t}}(f)=\int ^{t}_{{0}} f(u)\mathrm {d}z_{{u}}$. Note that using the condition (7) and the inequality for martingales from Novikov (1975) we can obtain that $\mathbf{E}_{{Q}}\,\sup _{{0\le t\le T}}\,\check{I}^{8}_{{t}}(f) <\infty $. Since $\check{L}_{{t}}$ and $z_{{t}}$ are independent, we get $\mathbf{E}_{{Q}} \int _{{0}}^{T} I^2_{{t-}}(f) I_{{t-}}(h) h(t) d\check{L}_{{t}} = 0$. Moreover, the conditions $(\mathbf{C}_{{1}})$ – $(\mathbf{C}_{{3}})$ yield that, for any nonrandom $(c_{{i,j}})$ and $k\ge 1,$ $\mathbf{E}\,\left( \sum ^{k-1}_{{j=1}}c_{{1,j}} \zeta _{{j}}\right) ^{2} \zeta _{{k}}=0$ and $\mathbf{E}\,\left( \sum ^{k-1}_{{j=1}}c_{{1,j}} \zeta _{{j}}\right) ^{2}\, \left( \sum ^{k-1}_{{j=1}}c_{{2,j}} \zeta _{{j}}\right) \zeta _{{k}}=0$. Therefore, taking into account that the sequence $(\zeta _{{k}})_{{k\ge 1}}$ does not depend on the moments $(\mathbf{t}_{{k}})_{{k\ge 1}}$ and the process $(\check{L}_{{t}})_{{t\ge 0}}$, and using the same method as in the proof of Lemma 8.4 from Barbu et al. (2019a) we obtain

$$\begin{aligned} \mathbf{E}_{{Q}} \int _{{0}}^{T}\, I^2_{{t-}}(f) I_{{t-}}(h) h(t) \mathrm {d}z_{{t}} =\mathbf{E}_{{Q}}\, \sum _{{k\ge 1}}\,\mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}}\, I^2_{{\mathbf{t}_{{k}}-}}(f) I_{{\mathbf{t}_{{k}}-}}(h) h(\mathbf{t}_{{k}})\zeta _{{k}} = 0\,. \end{aligned}$$

This implies the desired result. $\square $

Now we study the integrals $\widetilde{I}_{{T}}(f)= I^2_{{T}}(f)- \mathbf{E}_{{Q}} I_{{T}}^2(f)$ as functions of f.

Proposition 3

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{3}})$ and $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{4}})$ hold true. Then, for any $[0,\infty ) \rightarrow {{\mathbb {R}}}$ functions f, h such that $ \vert f \vert _{{*}} \le 1$ and $ \vert h \vert _{{*}} \le 1$, one has

$$\begin{aligned} \vert \mathbf{E}_{{Q}} \widetilde{I}_{{T}}(f) \widetilde{I}_{{T}}(h) \vert \le \sigma ^{2}_{{Q}} \left( 4\,(f,h)^{2}_{{T}} +3\,T\widetilde{\mathbf{c}} \right) \,, \end{aligned}$$

(59)

where $ \widetilde{\mathbf{c}}=4\overline{\tau } \Vert \varUpsilon \Vert _{{1}} \left( 2+ \overline{\tau }\vert \rho \vert _{{*}}+\Vert \varUpsilon \Vert _{{1}}\right) +\varPi (x^{4})+\zeta _{{*}}\vert \rho \vert _{{*}}$, $\zeta _{{*}}=\sup _{{j\ge 1}} \mathbf{E}\zeta ^{4}_{{j}}$ and $ \vert f \vert _{{*}}=\sup _{{t\ge 0}}\vert f(t)\vert $.

Proof

First of all, note that in view of the Ito formula and using the fact that for the process (6) the jumps $\varDelta z_{{s}} \varDelta L_{{s}}=0$ a.s. for any $s\ge 0$, we obtain that

$$\begin{aligned} \mathrm {d}I^2_{{t}}(f)&= 2 I_{{t-}}(f) \mathrm {d}I_{{t}}(f)+ \varrho _{{1}}^2\, f^2(t) \mathrm {d}\,t\\&+ \varrho ^{2}_{{2}}\mathrm {d}\sum _{{0\le s \le t}}f^2(s) (\varDelta L_{{s}})^2 + \varrho ^{2}_{{3}}\mathrm {d}\sum _{{0\le s \le t}}f^2(s) (\varDelta z_{{s}})^2 \,. \end{aligned}$$

Note also that Lemma 1 yields $ \mathbf{E}_{{Q}} I^2_{{t}}(f) = (\varrho ^{2}_{{1}}+\varrho ^{2}_{{2}})\Vert f\Vert ^{2}_{{t}} +\varrho ^{2}_{{3}} \Vert f\sqrt{\rho }\Vert ^2_{{t}}$ with $ \Vert f\Vert ^{2}_{{t}}=\int ^{t}_{{0}}\,f^{2}(t)\mathrm {d}t$. Therefore,

$$\begin{aligned} \mathrm {d}\widetilde{I}_{{t}}(f)= 2 I_{{t-}}(f) f(t) d\xi _{{t}} + f^2(t) \mathrm {d}\widetilde{m}_{{t}}\,,\quad \widetilde{m}_{{t}}=\varrho ^{2}_{{2}}\check{m}_{{t}}+\varrho ^{2}_{{3}}m_{{t}}\,, \end{aligned}$$

where $ \check{m}_{{t}}= \sum _{{0\le s \le t}}(\varDelta L_{{s}})^2 - t$ and $ m_{{t}} = \sum _{{0\le s \le t}}(\varDelta z_{{s}})^2 - \int _{{0}}^{t} \rho (s) \mathrm {d}s$. Thus,

$$\begin{aligned} \mathbf{E}_{{Q}} \widetilde{I}_{{T}}(f) \widetilde{I}_{{T}}(h) = \mathbf{E}_{{Q}} \int ^{T}_{{0}} \widetilde{I}_{{t-}}(f) \mathrm {d}\widetilde{I}_{{t}}(h) + \mathbf{E}_{{Q}} \int ^{T}_{{0}} \widetilde{I}_{{t-}}(h) \mathrm {d}\widetilde{I}_{{t}}(f) + \mathbf{E}_{{Q}}\,[ \widetilde{I} (f),\widetilde{I} (h)\,]_{{T}}\,. \end{aligned}$$

Using here Lemma 3 and, taking into account that $(\check{m}_{{t}})_{{t\ge 0}}$ is a square integrable martingale, we get

$$\begin{aligned} \mathbf{E}_{{Q}} \int ^{T}_0 \widetilde{I}_{{t-}}(f) \mathrm {d}\widetilde{I}_{{t}}(h) =\mathbf{E}_{{Q}} \int ^{T}_0 \widetilde{I}_{{t-}}(f) h^2(t) \mathrm {d}\widetilde{m}_{{t}} = \varrho _{{3}}^2\mathbf{E}_{{Q}} \int ^{T}_0 \,I^{2}_{{t-}}(f) h^2(t) \mathrm {d}m_{{t}}\,. \end{aligned}$$

The last integral can be represented as

$$\begin{aligned} \mathbf{E}_{{Q}}\int ^{T}_0 \,I^{2}_{{t-}}(f) h^2(t) \mathrm {d}m_{{t}} = J_{{1}}- J_{{2}} \,, \end{aligned}$$

(60)

where $J_{{1}}=\mathbf{E}_{{Q}}\sum _{{k\ge 1}}\,I^{2}_{{\mathbf{t}_{{k}}-}}(f) h^{2}(\mathbf{t}_{{k}})\mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}}$ and $J_{{2}}=\int ^{T}_0 \,\mathbf{E}_{{Q}}\,I^{2}_{{t}}(f) h^{2}(t)\rho (t) \mathrm {d}t$. By Lemma 2 we get

$$\begin{aligned} J_{{1}}=\mathbf{E}_{{Q}}\sum _{{k\ge 1}}\,\mathbf{E}_{{Q}}\left( I^{2}_{{\mathbf{t}_{{k}}-}}(f)\vert \mathcal{G}\right) h^{2}(\mathbf{t}_{{k}})\mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}} = (\varrho _{{1}}^2+\varrho _{{2}}^2) J_{{1,1}} + \varrho _{{3}}^2J_{{1,2}}\,, \end{aligned}$$

where $ J_{{1,1}}=\mathbf{E}_{{Q}}\sum _{{k\ge 1}}\, \Vert f\Vert ^{2}_{{\mathbf{t}_{{k}}}} h^{2}(\mathbf{t}_{{k}}) \mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}} =\int ^{T}_{{0}}\,\Vert f\Vert ^{2}_{{t}} h^{2}(t)\rho (t)\mathrm {d}t$ and

$$\begin{aligned} J_{{1,2}}&= \mathbf{E}_{{Q}}\sum _{{k\ge 1}}\, \sum _{{l=1}}^{k-1}\, f^{2}(\mathbf{t}_{{l}}) \, h^{2}(\mathbf{t}_{{k}}) \mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}}= \mathbf{E}_{{Q}}\sum _{{l\ge 1}} f^{2}(\mathbf{t}_{{l}}) \sum _{{k\ge l+1}} \, h^{2}(\mathbf{t}_{{k}}) \mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}} \\&=\int ^{T}_{{0}}\,f^{2}(x) \left( \int ^{T-x}_{{0}}\,h^{2}(x+t)\rho (t)\mathrm {d}t \right) \rho (x)\mathrm {d}x\,. \end{aligned}$$

Moreover, using Lemma 1 for the last term in (60), we obtain that

$$\begin{aligned} J_{{2}}=(\varrho ^{2}_{{1}}+\varrho ^{2}_{{2}})\int ^{T}_{{0}}\Vert f\Vert ^{2}_{{t}}h^{2}(t)\rho (t)\mathrm {d}t + \varrho ^{2}_{{3}}\int ^{T}_{{0}}\Vert f \sqrt{\rho }\Vert ^{2}_{{t}}h^{2}(t)\rho (t)\mathrm {d}t \end{aligned}$$

and we can represent the expectation in (60) as

$$\begin{aligned} \mathbf{E}_{{Q}}\int ^{T}_0 \,I^{2}_{{t-}}(f) h^{2}(t) \mathrm {d}m_{{t}} =\varrho ^{2}_{{3}}\int ^{T}_{{0}}f^{2}(x) \left( \int ^{T}_{{x}} h^{2}(t)(\varUpsilon (t-x)-\varUpsilon (t))\mathrm {d}t \right) \rho (x)\mathrm {d}x\,. \end{aligned}$$

Therefore, $ \vert \mathbf{E}_{{Q}}\int ^{T}_0 \,I^{2}_{{t-}}(f) h^{2}(t) \mathrm {d}m_{{t}}\vert \le 2 \varrho ^{2}_{{3}}T\Vert \varUpsilon \Vert _{{1}}\, \vert \rho \vert _{{*}}$ and

$$\begin{aligned} \vert \mathbf{E}_{{Q}} \int ^{T}_0 \widetilde{I}_{{t-}}(f) \mathrm {d}\widetilde{I}_{{t}}(h)\vert \le 2 \varrho ^{4}_{{3}}T\Vert \varUpsilon \Vert _{{1}} \, \vert \rho \vert _{{*}} \,. \end{aligned}$$

(61)

Furthermore, note that

$$\begin{aligned}{}[ \widetilde{I} (f),\widetilde{I} (h) ]_{{T}} = < \widetilde{I}^c (f),\widetilde{I}^c (h)>_{{T}} + \mathbf{D}_{{T}}(f,h)\,, \end{aligned}$$

where $ \widetilde{I}^c_{{t}} (f)=2\varrho _{{1}} \int ^{t}_{{0}}\,I_{{s}}(f) f(s) \mathrm {d}w_{{s}}$ and $\mathbf{D}_{{T}}(f,h)= \sum _{0\le t \le T} \varDelta \widetilde{I}^{d}_{{t}}(f) \varDelta \widetilde{I}^{d}_{{t}}(h)$. In this case $ \widetilde{I}^{d}_{{t}} (f)=2 \int ^{t}_{{0}}\,I_{{s-}}(f) f(s) \mathrm {d}\xi ^{d}_{{s}} + \int ^{t}_{{0}}\,f^2(s) \mathrm {d}\widetilde{m}_{{s}}$ and $\xi ^{d}_{{t}}=\varrho _{{2}}L_{{t}}+\varrho _{{3}}z_{{t}}$. Therefore, in view of Lemma 1,

$$\begin{aligned} \mathbf{E}_{{Q}} < \widetilde{I}^c (f),&\widetilde{I}^c (h) >_{{T}}= 4 \varrho _1^2 \int ^{T}_0 \mathbf{E}_{{Q}} ( I_{{t}}(f) I_{{t}}(h)) f(t) h(t) \mathrm {d}t \\&= 4 \varrho _1^2(\varrho _1^2+\varrho _2^2)\int ^{T}_0 (f,h)_{{t}}\,f(t) h(t) \mathrm {d}t + 4 \varrho _1^2 \varrho _3^2 \int ^{T}_0 (f,h\rho )_{{t}} f(t) h(t) \mathrm {d}t\\&= 2\varrho _1^2 \sigma _{{Q}}\, (f,h)^{2}_{{T}} + 4 \varrho _1^2 \varrho _3^2\int ^{T}_0 (f,g\varUpsilon )_{{t}} f(t) h(t) \mathrm {d}t \,. \end{aligned}$$

Since $\vert f \vert _{{*}} \le 1$ and $ \vert h \vert _{{*}} \le 1,$ we get $\int ^{T}_{{0}} \vert (f,h\varUpsilon )_{{t}} f(t) h(t)\vert \mathrm {d}t\le T \Vert \varUpsilon \Vert _{{1}}$ and

$$\begin{aligned} \left| \mathbf{E}_{{Q}}<\widetilde{I}^c (f),\widetilde{I}^c (h)>_{{T}} \right| \le \sigma ^{2}_{{Q}} \left( 2 (f,h)^{2}_{{T}} + 4 T \overline{\tau } \Vert \varUpsilon \Vert _{{1}} \right) \,. \end{aligned}$$

(62)

To study the process $\mathbf{D}_{{T}}(f,h)$ note that $ \varDelta \xi ^{d}_{{t}} \varDelta \widetilde{m}_{{t}} = \varrho ^{3}_{{2}} (\varDelta L_{{t}})^{3} + \varrho ^{3}_{{3}} (\varDelta z_{{t}})^{3}$. Note also that for any $t\ge 0$ the expectation $\mathbf{E}_{{Q}}I_{{t}}(f)=0$. Therefore, using the definition of the process $L_{{t}},$ we obtain through the Fubini theorem that, for any bounded measurable nonrandom functions V, $V: [0,T]\rightarrow {{\mathbb {R}}},$ we have

$$\begin{aligned} \mathbf{E}_{{Q}}\,\sum _{{0\le t\le T}}\,V(t)\,I_{{t-}}(f) (\varDelta L_{{t}})^{3}=\varPi (x^{3}) \int ^{T}_{{0}}\,V(t)\mathbf{E}_{{Q}}\,I_{{t}}(f)\mathrm {d}t=0\,. \end{aligned}$$

Moreover, since the processes $(\check{L}_{{t}})_{{t\ge 0}}$ and $(z_{{t}})_{{t\ge 0}}$ are independent we get

$$\begin{aligned} \mathbf{E}_{{Q}}\,\sum _{{0\le t\le T}}\,V(t)\check{I}_{{t-}}(f) (\varDelta z_{{t}})^{3}= \mathbf{E}_{{Q}}\,\sum _{{k\ge 1}}\,V(\mathbf{t}_{{ k}}) \zeta ^{3}_{{k}} \mathbf{E}_{{Q}}\left( \check{I}_{{\mathbf{t}_{{k}}-}}(f) \vert \mathcal{G}^{z}\right) =0\,, \end{aligned}$$

where the integral $\check{I}_{{t}}(f)$ is defined in (58) and the $\mathcal{G}^{z}=\sigma \{z_{{t}}\,,\,t\ge 0\}$. Note that the condition $\mathbf{C}_{{3}})$ implies that for any $k\ge 1$ and nonrandom $(c_{{j}})_{{j\ge 1}}$ $ \mathbf{E}_{{Q}}\left( \sum ^{ k-1}_{{j=1}}c_{{j}} \zeta _{{j}}\right) \,\zeta ^{3}_{{k}} =0$. Therefore, $ \mathbf{E}_{{Q}}\,\sum _{{0\le t\le T}}\,V(t)I^{z}_{{t-}}(f) (\varDelta z_{{t}})^{3}=0$ and

$$\begin{aligned} \mathbf{E}_{{Q}}\,\sum _{{0\le t\le T}}\,I_{{t-}}(f) f(t) h^{2}(t) \varDelta \xi ^{d}_{{t}}\,\varDelta \widetilde{m}_{{t}} = \mathbf{E}_{{Q}}\,\sum _{{0\le t\le T}}\,I_{{t-}}(h) h(t) f^{2}(t) \varDelta \xi ^{d}_{{t}}\,\varDelta \widetilde{m}_{{t}} =0 \,. \end{aligned}$$

So, the expectation of $\mathbf{D}_{{T}}(f,h)$ can be represented as

$$\begin{aligned} \mathbf{E}_{{Q}}\,\mathbf{D}_{{T}}(f,h) =4\varrho ^{2}_{{2}}\mathbf{E}_{{Q}}\,\mathbf{D}_{{1,T}}(f,h) + 4\varrho ^{2}_{{3}}\mathbf{E}_{{Q}}\,\mathbf{D}_{{2,T}}(f,h) + \mathbf{E}_{{Q}}\, \mathbf{D}_{{3,T}}(f,h) \,, \end{aligned}$$

where $\mathbf{D}_{{1,T}}(f,h)=\sum _{{0\le t\le T}}\, I_{{t-}}(f)I_{{t-}}(h) f(t)h(t)(\varDelta L_{{t}})^{2}$,

$$\begin{aligned} \mathbf{D}_{{2,T}}(f,h)=\sum _{{0\le t\le T}}\, I_{{t-}}(f)I_{{t-}}(h) f(t)h(t)(\varDelta z_{{t}})^{2} \end{aligned}$$

and $ \mathbf{D}_{{3,T}}(f,h)=\sum _{{0\le t\le T}}\,f^2(t)\,h^{2}(t) (\varDelta \widetilde{m}_{{t}})^{2}$. First, since $\varPi (x^{2})=1$, we get

$$\begin{aligned} \mathbf{E}_{{Q}}\,&\mathbf{D}_{{1,T}}(f,h)=\int ^{T}_{{0}}\,f(t)h(t)\mathbf{E}_{{Q}}\, I_{{t}}(f)I_{{t}}(h) \,\mathrm {d}t= (\varrho ^{2}_{{1}}+\varrho ^{2}_{{2}})\,\int ^{T}_{{0}}\,f(t) h(t)\,(f,h)_{{t}} \,\mathrm {d}t\\&+\varrho ^{2}_{{3}} \,\int ^{T}_{{0}}\,f(t)h(t)\,(f,h\rho )_{{t}} \,\mathrm {d}t =\sigma _{{Q}}\frac{(f,h)^{2}_{{T}}}{2} +\varrho ^{2}_{{3}} \,\int ^{T}_{{0}}\,f(t)h(t)\,(f,h\varUpsilon )_{{t}} \,\mathrm {d}t \end{aligned}$$

and $\vert \mathbf{E}_{{Q}}\,\mathbf{D}_{{1,T}}(f,h)\vert \le \sigma _{{Q}} \left( (f,h)^{2}_{{T}}/2 +T \overline{\tau }\Vert \varUpsilon \Vert _{{1}} \right) $. Then, taking into account that $\mathbf{E}\,\zeta ^{2}_{{j}}=1$ and using Lemma 2, we find, that

$$\begin{aligned} \mathbf{E}_{{Q}}\,&\mathbf{D}_{{2,T}}(f,h)=\mathbf{E}\sum _{{k\ge 1}}\, \mathbf{E}_{{Q}}\left( I_{{\mathbf{t}_{{k}}-}}(f)I_{{\mathbf{t}_{{k}}-}}(h) \vert \mathcal{G}\right) f(\mathbf{t}_{{k}})h(\mathbf{t}_{{k}}) \,\mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}}\\&=(\varrho ^{2}_{{1}}+\varrho ^{2}_{{2}})\mathbf{E}_{{Q}} \sum _{{k\ge 1}}(f\,,h)_{{\mathbf{t}_{{k}}}}f(\mathbf{t}_{{k}})h(\mathbf{t}_{{k}}) \,\mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}} + \varrho _{{3}}^2 \,\mathbf{E}_{{Q}}\,\mathbf{D}^{'}_{{2,T}}(f,h)\\&= (\varrho ^{2}_{{1}}+\varrho ^{2}_{{2}}) \int ^{T}_{{0}}(f,h)_{{t}}\,f(t)h(t)\rho (t)\mathrm {d}t + \varrho _{{3}}^2\mathbf{E}_{{Q}}\,\mathbf{D}^{'}_{{2,T}}(f,h) \,, \end{aligned}$$

where $ \mathbf{D}^{'}_{{2,T}}(f,h)= \sum _{{k\ge 1}} \sum _{{l=1}}^{k-1}\, f(\mathbf{t}_{{l}})\,h(\mathbf{t}_{{l}}) f(\mathbf{t}_{{k}})h(\mathbf{t}_{{k}}) \,\mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}}$. Note, that

$$\begin{aligned} \int ^{T}_{{0}}(f,h)_{{t}}\,f(t)h(t) \rho (t)\mathrm {d}t=\frac{1}{2\overline{\tau }}(f,h)^{2}_{{T}} + \int ^{T}_{{0}}(f,h)_{{t}}\,f(t)h(t)\varUpsilon (t)\mathrm {d}t \,, \end{aligned}$$

i.e.

$$\begin{aligned} \left| \int ^{T}_{{0}}(f,h)_{{t}}\,f(t)h(t)\rho (t)\mathrm {d}t \right| \le \frac{1}{2\overline{\tau }}(f,h)^{2}_{{T}} +T\Vert \varUpsilon \Vert _{{1}}\,. \end{aligned}$$

Furthermore, the expectation of $\mathbf{D}^{'}_{{2,T}}(f,h)$ can be represented as

$$\begin{aligned} \mathbf{E}_{{Q}}\,\mathbf{D}^{'}_{{2,T}}(f,h)&= \mathbf{E}_{{Q}}\,\sum _{{l\ge 1}}\, f(\mathbf{t}_{{l}})\,h(\mathbf{t}_{{l}}) \sum _{{k\ge l+1}} f(\mathbf{t}_{{k}})h(\mathbf{t}_{{k}}) \,\mathbf{1}_{{\{\mathbf{t}_{{k}}\le T\}}}\\&=\int ^{T}_{{0}}f(x)g(x)\, \left( \int ^{T-x}_{{0}}\,f(x+t)h(x+t)\rho (t)\mathrm {d}t \right) \rho (x)\mathrm {d}x\\&= \frac{1}{2\overline{\tau }^{2}}(f,h)^{2}_{{T}} +\mathbf{D}^{''}_{{2,T}}(f,h) \,, \end{aligned}$$

where

$$\begin{aligned} \mathbf{D}^{''}_{{2,T}}(f,h)=&\int ^{T}_{{0}}f(x)h(x)\, \left( \int ^{T-x}_{{0}}\,f(x+t)h(x+t)\varUpsilon (t)\mathrm {d}t \right) \rho (x)\mathrm {d}x\\&+ \frac{1}{\overline{\tau }}\, \int ^{T}_{{0}}f(x)h(x)\, \left( \int ^{T-x}_{{0}}\,f(x+t)h(x+t)\varUpsilon (t)\mathrm {d}t \right) \varUpsilon (x)\mathrm {d}x \,. \end{aligned}$$

Since $T\ge 1$, we can obtain that

$$\begin{aligned} \vert \mathbf{D}^{''}_{{2,T}}(f,h)\vert \le \int ^{T}_{{0}}\rho (x)\mathrm {d}x \Vert \varUpsilon \Vert _{{1}} +\frac{1}{\overline{\tau }}\,\Vert \varUpsilon \Vert ^{2}_{{1}} \le T \left( \vert \rho \vert _{{*}} \Vert \varUpsilon \Vert _{{1}} +\frac{1}{\overline{\tau }}\Vert \varUpsilon \Vert ^{2}_{{1}} \right) \end{aligned}$$

and, therefore,

$$\begin{aligned} \vert \mathbf{E}_{{Q}}\,\mathbf{D}_{{2,T}}(f,h)\vert \le \sigma _{{Q}} \left( \frac{(f,h)^{2}_{{T}}}{2\overline{\tau }} +T\left( \Vert \varUpsilon \Vert _{{1}}(1+\overline{\tau }\vert \rho \vert _{{*}})+\Vert \varUpsilon \Vert ^{2}_{{1}}\right) \right) \,. \end{aligned}$$

Moreover, using that $\zeta _{{*}}=\sup _{{j\ge 1}} \mathbf{E}\zeta ^{4}_{{j}}<\infty $, we get directly

$$\begin{aligned} \mathbf{E}_{{Q}}\,\mathbf{D}_{{3,T}}(f,h)&\le \varrho ^{4}_{{2}} \varPi (x^{4})\int ^{T}_{{0}}f^{2}(t)\,h^{2}(t)\mathrm {d}t +\varrho ^{4}_{{3}}\,\zeta _{{*}}\, \int ^{T}_{{0}}f^{2}(t)\,h^{2}(t)\rho (t)\mathrm {d}t\\&\le T \sigma ^{2}_{{Q}}\left( \varPi (x^{4})+ \zeta _{{*}} \overline{\tau }^{2} \vert \rho \vert _{{*}} \right) \,. \end{aligned}$$

Therefore, $ \vert \mathbf{E}_{{Q}}\,\mathbf{D}_{{T}}(f,h)\vert \le \sigma ^{2}_{{Q}} \left( 2(f,h)^{2}_{{T}} +T\widetilde{\mathbf{c}} \right) $, where $\widetilde{\mathbf{c}}$ is given in (59). Now, using (62), we get $ \mathbf{E}_{{Q}}\, [ \widetilde{I} (f),\widetilde{I} (h)]_{{T}} \le \sigma ^{2}_{{Q}} \left( 4 (f,h)^{2}_{{T}} +2T\widetilde{\mathbf{c}} \right) $. This bound with (61) implies (59). Hence the proof is achieved. $\square $

In order to prove the oracle inequalities we need to study the conditions introduced in Konev and Pergamenshchikov (2012) for the general semi-martingale model (3). To this end, we set for any $x\in {{\mathbb {R}}}^{p}$ the functions

$$\begin{aligned} B_{{1,Q}}(x)= \sum _{{j=1}}^{p} x_{{j}}\, \left( \mathbf{E}_{{Q}}\xi ^2_{{j,p}} - \sigma _{{Q}}\right) \quad \text{ and }\quad B_{{2,Q}}(x) = \sum _{{j=1}}^{p}\,x_{{j}}\,\widetilde{\xi }_{{j,p}} \,, \end{aligned}$$

(63)

where $\sigma _{{Q}}$ is defined in (15) and $\widetilde{\xi }_{{j,p}}=\xi ^2_{{j,p}}- \mathbf{E}_{{Q}}\xi ^2_{{j,p}}$.

Proposition 4

Assume that the conditions $(\mathbf{C}_{{1}})$–$(\mathbf{C}_{{2}})$, $(\mathbf{H}_{{1}})$–$(\mathbf{H}_{{5}})$ hold true. Then there exists some constant $\mathbf{c}^{*}>0$ such that for any $Q\in \cup _{{k\ge 1}}\,\mathcal{Q}_{{k}}$

$$\begin{aligned} \mathbf{L}_{{1,Q}} = \sup _{{T\ge 3}}\ \sup _{{{\begin{matrix}x\in [-1,1]^{p} \\ \#(x)\le T\end{matrix}}}} \, \left| B_{{1,Q}}(x) \right| \le \mathbf{c}^{*} \, \sigma _{{Q}} \end{aligned}$$

(64)

and

$$\begin{aligned} \mathbf{L}_{{2,Q}} = \sup _{{T\ge 3}}\ \sup _{{{\begin{matrix} \vert x\vert \le 1 \\ \#(x)\le T\end{matrix}}}} \, \mathbf{E}_{{Q}} \, B^{2}_{{2,Q}}(x) \le \, \mathbf{c}^{*} \, \sigma ^{2}_{{Q}}\,, \end{aligned}$$

(65)

where $|x|^2 = \sum _{{j=1}}^{p} x^2_{{j}}$ and $\#(x)=\sum _{{j=1}}^{p} \mathbf{1}_{{\{x_{{j}}\ne 0\}}}$.

Proof

Firstly, using here Lemma 1, we obtain that

$$\begin{aligned} \mathbf{E}{\xi ^2_{{j,p}}} = \varrho ^{2}_{{1}}+\varrho ^{2}_{{2}} + \frac{\varrho ^2_3}{T} \int ^{T}_{{0}}\,\psi ^2_{{j,p}}(x)\,\rho (x) \mathrm {d}\,x=\sigma _{{Q}} + \frac{\varrho ^2_3}{T} \int ^{T}_{{0}}\,\psi ^2_{{j,p}}(x)\,\varUpsilon (x) \mathrm {d}\,x \,. \end{aligned}$$

From (13) it follows that $ \left| \mathbf{E}_{{Q}}\xi ^2_{{j,p}}-\sigma _{{Q}} \right| \le 2 \varrho ^2_3 \Vert \varUpsilon \Vert _{{1}}/T$ and, therefore, taking into account that $\#(x)\le T$, we obtain the inequality (64). Next, note that

$$\begin{aligned} \mathbf{E}_{{Q}}\, \left( \sum _{j=2}^{p} x_j \widetilde{\xi }_{j,p}\right) ^2 \le \frac{1}{T^2} \sum _{j=1}^{p}\sum _{l=1}^{p} \vert x_{{j}}\vert \, \vert x_{{l}}\vert \vert \mathbf{E}_{{Q}}\,\widetilde{I}_{{T}}(\psi _{j,p}) \widetilde{I}_{{T}}(\psi _{l,p})\vert \,, \end{aligned}$$

(66)

where $\widetilde{I}_{{T}}(f)= I^2_{{T}}(f)- \mathbf{E}_{{Q}} I_{{T}}^2(f)$. Now Proposition 3 and the property (28) imply, that for some constant $\mathbf{c}^{*}>0$ and for $\vert x\vert \le 1$

$$\begin{aligned} \mathbf{E}_{{Q}}\, \left( \sum _{j=2}^{p} x_j \widetilde{\xi }_{j,p}\right) ^2 \le \mathbf{c}^{*} \left( \vert x\vert ^{2} + \frac{1}{T} \left( \sum _{{j=1}}^{p}\vert x_{{j}}\vert \right) ^{2} \right) \le c^{*} \left( 1 + \frac{\#(x)}{T}\right) \,. \end{aligned}$$

Since $\#(x)\le T$, we obtain the upper bound (65). $\square $

7 Proofs

7.1 Proof of Theorem 1

Using the cost function given in (38), we can rewrite the empirical squared error in (34) as follows

$$\begin{aligned} \text{ Err }(\lambda ) = J_{{T}}(\lambda ) + 2 \sum _{{j=1}}^{p} \lambda (j) \vartheta _{{j,p}}+ \Vert S\Vert ^2-\delta \widehat{P}_{{T}}(\lambda ), \end{aligned}$$

(67)

where

$$\begin{aligned} \vartheta _{{j,p}}=\widetilde{\theta }_{{j,p}}-\overline{\theta }_{{j,p}} \widehat{\theta }_{{j,p}}= \frac{1}{\sqrt{T}}\overline{\theta }_{{j,p}}\xi _{{j,p}} +\frac{1}{T} \widetilde{\xi }_{{j,p}} + \frac{1}{T} \varsigma _{{j,T}} +\frac{\sigma _{{Q}} - \widehat{\sigma }_{{T}} }{T} \,, \end{aligned}$$

with $\varsigma _{{j,p}}=\mathbf{E}_{{Q}}\xi ^{2}_{{j,p}}-\sigma _{{Q}}$ and $\widetilde{\xi }_{{j,p}}=\xi ^{2}_{{j,p}}-\mathbf{E}_{{Q}}\xi ^{2}_{{j,p}}$. Setting

$$\begin{aligned} \mathbf{M}(\lambda ) = \frac{1}{\sqrt{T}}\sum _{{j=1}}^{p} \lambda (j)\,\overline{\theta }_{{j,p}} \xi _{{j,p}} \quad \text{ and }\quad L(\lambda )=\sum ^{p}_{{ j=1}}\lambda (j) \end{aligned}$$

(68)

and using the functions (63) through the penalty term (37), we rewrite (67) as

$$\begin{aligned} \text{ Err }(\lambda )&= J_{{T}}(\lambda ) + 2 \frac{\sigma _{{Q}}- \widehat{\sigma }_{{T}} }{T}\,L(\lambda )+ 2 \mathbf{M}(\lambda )+\frac{2}{T} B_{{1,Q}}(\lambda )\nonumber \\&\quad + 2 \sqrt{P_{{T}}(\lambda )} \frac{ B_{{2,Q}}(\nu (\lambda ))}{\sqrt{\sigma _{{Q}} T}} + \Vert S\Vert ^2-\delta \widehat{P}_{{T}}(\lambda ), \end{aligned}$$

(69)

where $\nu (\lambda )=\lambda /|\lambda |$. Let $\lambda _{{0}}= (\lambda _{{0}}(j))_{{1\le j\le \,p}}$ be a fixed sequence in $\varLambda $ and $\widehat{\lambda }$ be defined as in (39). Substituting $\lambda _{{0}}$ and $\widehat{\lambda }$ in (69), we obtain

$$\begin{aligned} \text{ Err }&(\widehat{\lambda })-\text{ Err }(\lambda _{{0}})= J_{{T}}(\widehat{\lambda })-J_{{T}}(\lambda _{{0}})+ 2 \frac{\sigma _{{Q}}-\widehat{\sigma }_{{T}}}{T}\,L(\varpi ) + \frac{2}{T} B_{{1,Q}}(\varpi )+2 \mathbf{M}(\varpi )\\&+ 2 \sqrt{P_{{T}}(\widehat{\lambda })} \frac{ B_{{2,Q}}(\widehat{\nu })}{\sqrt{\sigma _{{Q}} T}}-2 \sqrt{P_{{T}}(\lambda _{{0}})} \frac{ B_{{2,Q}}(\nu _{{0}})}{\sqrt{\sigma _{{Q}} T}} - \delta \widehat{P}_{{T}}(\widehat{\lambda })+\delta \widehat{P}_{{T}}(\lambda _{{0}}), \end{aligned}$$

where $\varpi = \widehat{\lambda } - \lambda _{{0}}$, $\widehat{\nu } = \nu (\widehat{\lambda })$ and $\nu _{{0}} = \nu (\lambda _{{0}})$. Now, in view of the inequality $2|ab| \le \delta a^2 + \delta ^{-1} b^2$ we get that

$$\begin{aligned} 2 \sqrt{P_{{T}}(\lambda )} \frac{| B_{{2,Q}}(\nu (\lambda ))|}{\sqrt{\sigma _{{Q}} T}} \le \, \delta P_{{T}}(\lambda ) + \frac{B^2_{{2,Q}}(\nu (\lambda ))}{\delta \sigma _{{Q}}\,T}. \end{aligned}$$

Then, taking into account that $|L(\varpi )| \le \,L(\widehat{\lambda }) + L(\lambda ) \le 2\varLambda _{{*}}$ and using the definition (64) we get

$$\begin{aligned} \text{ Err }(\widehat{\lambda }) \le \text{ Err }(\lambda _{{0}}) +2 \mathbf{M}(\varpi ) + \frac{2 \mathbf{L}_{{1,Q}}}{T} + \frac{2 B^*_{{2,Q}}}{\delta \sigma _{{Q}} T} + \frac{4\varLambda _{{*}} |\widehat{\sigma }_{{T}} -\sigma _{{Q}}|}{T} + 2 \delta \widehat{P}_{{T}}(\lambda _{{0}})\,, \end{aligned}$$

where $B^*_{{2,Q}} = \sup _{{\lambda \in \varLambda }} B^2_{{2,Q}}((\nu (\lambda ))$. Using now the definitions (36), (37) and the inequality $|\lambda _{{0}}|^{2}=\sum ^{p}_{{j=1}} \lambda ^2_{{0}}(j)\le \varLambda _{{*}}$, we can estimate the penalty term $\widehat{P}_{{T}}(\lambda _{{0}})$ as $\widehat{P}_{{T}}(\lambda _{{0}})\le P_{{T}}(\lambda _{{0}})+\varLambda _{{*}} |\widehat{\sigma }_{{T}} -\sigma _{{Q}}|/T$. Therefore, using this in the last inequality, we get, that for $0< \delta < 1$

$$\begin{aligned} \text{ Err }(\widehat{\lambda }) \le \text{ Err }(\lambda _{{0}}) +2 \mathbf{M}(\varpi ) + \frac{2 \mathbf{L}_{{1,Q}}}{T} + \frac{2 B^*_{{2,Q}}}{\delta \sigma _{{Q}} T} + \frac{6\varLambda _{{*}} |\widehat{\sigma }_{{T}} -\sigma _{{Q}}|}{T} + 2 \delta P_{{T}}(\lambda _{{0}})\,. \end{aligned}$$

To study here the term $\mathbf{M}(\cdot )$ we define for any $x=(x(j))_{{1\le j\le p}}\in {{\mathbb {R}}}^{p}$ the function $ S_{{x}} = \sum _{{j=1}}^{p} x(j) \overline{\theta }_{{j,p}} \psi _{{j,p}}$. Then, using the definition of $\xi _{{j,p}}$ in (30) we get through (68), that $\mathbf{M}(x)=I_{{T}}(S_{{x}})/\sqrt{T}$ and, therefore, thanks to (23)

$$\begin{aligned} \mathbf{E}_{{Q}} \mathbf{M}^2 (x) \le \varkappa _{{Q}}\frac{\Vert S_{{x}}\Vert ^2}{T}= \varkappa _{{Q}} \frac{1}{T} \sum _{{j=1}}^{p} x^2(j) \overline{\theta }^2_{{j,p}}\,. \end{aligned}$$

(70)

Moreover, setting here $ Z^* = \sup _{{x \in \varLambda _{{1}}}} T M^2 (x)/\Vert S_{{x}}\Vert ^2$ and $\varLambda _{{1}} = \varLambda - \lambda _{{0}}$, we get

$$\begin{aligned} 2 |\mathbf{M}(x)|\le \delta \Vert S_{{x}}\Vert ^2 + \frac{Z^*}{T\delta }. \end{aligned}$$

(71)

The last term here can be estimated from above as

$$\begin{aligned} \mathbf{E}_{{Q}} Z^* \le \sum _{{x \in \varLambda _{{1}}}} \frac{T \mathbf{E}_{{Q}} \mathbf{M}^2 (x)}{\Vert S_{{x}}\Vert ^2} \le \sum _{{x \in \varLambda _{{1}}}} \varkappa _{{Q}}= \varkappa _{{Q}}\mathbf{m}_{{*}}\,, \end{aligned}$$

where $\mathbf{m}_{{*}} = \text{ card }(\varLambda )$. Moreover, note that, for any $x\in \varLambda _{{1}}$,

$$\begin{aligned} \Vert S_{{x}}\Vert ^2-\Vert \widehat{S}_{{x}}\Vert ^2 = \sum _{{j=1}}^{p} x^2(j) (\overline{\theta }^2_{{j,p}}-\widehat{\theta }^2_{{j,p}}) \le -2 \mathbf{M}_{{1}}(x), \end{aligned}$$

(72)

where $\mathbf{M}_{{1}}(x) = T^{-1/2}\,\sum _{{j=1}}^{p}\, x^2(j)\overline{\theta }_{{j,p}} \xi _{{j,p}}$. Taking into account now that, for any $x \in \varLambda _{{1}}$, the components $|x(j)|\le 1$, we can estimate this term as in (70), i.e. $ \mathbf{E}_{{Q}}\, \mathbf{M}^2_{{1}}(x) \le \varkappa _{{Q}}\,\Vert S_{{x}}\Vert ^2/T$. Similarly to the previous reasoning setting

$ Z^*_{{1}} = \sup _{{x \in \varLambda _{{1}}}}T \mathbf{M}^2_1 (x)/\Vert S_{{x}}\Vert ^2$, we get $\mathbf{E}_{{Q}}\, Z^*_1 \le \varkappa _{{Q}}\,\mathbf{m}_{{*}}$. Using the same type of arguments as in (71), we can derive

$$\begin{aligned} 2 |\mathbf{M}_{{1}}(x)|\le \delta \Vert S_{{x}}\Vert ^2 + \frac{Z^*_1}{T\delta }. \end{aligned}$$

(73)

From here and (72), we get

$$\begin{aligned} \Vert S_{{x}}\Vert ^2 \le \frac{\Vert \widehat{S}_{{x}}\Vert ^2}{1-\delta } + \frac{Z^*_1}{T \delta (1-\delta )} \end{aligned}$$

(74)

for any $0<\delta <1$. Using this bound in (71) yields

$$\begin{aligned} 2 M(x) \le \frac{\delta \Vert \widehat{S}_{{x}}\Vert ^2}{1-\delta } + \frac{Z^*+Z^*_1}{T \delta (1-\delta )} \,. \end{aligned}$$

Taking into account that $\vert \widehat{S}_{{\varpi }}\vert ^{2}\le 2\,(\text{ Err }(\widehat{\lambda })+\text{ Err }(\lambda _{{0}}))$, we obtain

$$\begin{aligned} 2 M(\varpi ) \le \frac{2\delta (\text{ Err }(\widehat{\lambda })+\text{ Err }(\lambda _{{0}}))}{1-\delta } + \frac{Z^*+Z^*_1}{T \delta (1-\delta )} \end{aligned}$$

and, therefore,

$$\begin{aligned} \text{ Err }(\widehat{\lambda }) \le&\frac{1+\delta }{1-3\delta } \text{ Err }(\lambda _{{0}}) + \frac{Z^*+Z^*_1}{T \delta (1-3\delta )} + \frac{2 \mathbf{L}_{{1,Q}}}{T(1-3\delta )} + \frac{2 B^*_{{2,Q}}}{\delta (1-3\delta )\sigma _{{Q}} T} \\[2mm]&+ \frac{6\varLambda _{{*}}|\widehat{\sigma }_{{T}} -\sigma _{{Q}}|}{T(1-3\delta )} + \frac{2\delta }{(1-3\delta )} P_{{T}}(\lambda _{{0}}). \end{aligned}$$

Note here, that (65) implies $ \mathbf{E}_{{Q}}\, B^*_{{2,Q}} \le \sum _{{\lambda \in \varLambda }}\mathbf{E}_{{Q}} B^2_{{2,Q}} (\nu (\lambda )) \le \mathbf{m}_{{*}} \mathbf{L}_{{2,Q}}$ and, therefore, taking into account, that $1-3\delta \ge 1/2$ for $0<\delta <1/6$, we get

$$\begin{aligned} \mathcal{R}_{{Q}}(\widehat{S}_{{*}},S) \le&\frac{1+\delta }{1-3\delta } \mathcal{R}_{{Q}}(\widehat{S}_{{\lambda _{{0}}}},S) + \frac{4\varkappa _{{Q}} \mathbf{m}_{{*}}}{T \delta } + \frac{4 \mathbf{L}_{{1,Q}}}{T} + \frac{4 \mathbf{m}_{{*}} \mathbf{L}_{{2,Q}}}{\delta \sigma _{{Q}} T} \\[2mm]&+ \frac{12\varLambda _{{*}}\mathbf{E}_{{Q}}\,|\widehat{\sigma }_{{T}} -\sigma _{{Q}}|}{T} + \frac{2\delta }{(1-3\delta )} P_{{T}}(\lambda _{{0}}). \end{aligned}$$

Now Lemma 4 yields

$$\begin{aligned} \mathcal{R}_{{Q}}(\widehat{S}_{{*}},S) \le&\frac{1+3\delta }{1-3\delta } \mathcal{R}_{{Q}}(\widehat{S}_{{\lambda _{{0}}}},S) + \frac{4\varkappa _{{Q}} \mathbf{m}_{{*}}}{T \delta } + \frac{4 \mathbf{L}_{{1,Q}}}{T} + \frac{4 \mathbf{m}_{{*}} \mathbf{L}_{{2,Q}}}{\delta \sigma _{{Q}} T} \\&+ \frac{12\varLambda _{{*}}}{T} \,\mathbf{E}_{{Q}}\,|\widehat{\sigma }_{{T}} -\sigma _{{Q}}| + \frac{2\delta \mathbf{L}_{{1,Q}}}{(1-3\delta )T} \,. \end{aligned}$$

Moreover, noting here, that $\varkappa _{{Q}}\le (1+\overline{\tau }\vert \rho \vert _{{*}})\sigma _{{Q}}$ and using the bounds (64) and (65) we obtain the inequality (40). Hence we obtain the desired result. $\square $

7.2 Proof of Proposition 1

Let $x^{'}=(x^{'}_{{j}})_{{1\le j\le p}}$ with $x^{'}_{{j}}= \mathbf{1}_{{\{[\sqrt{T}] \leqslant j\leqslant p\}}}$. Then (30) and (35) yield

$$\begin{aligned} \widehat{\sigma }_{{T}}= \frac{T}{p} \sum ^{p}_{{j=[\sqrt{T}]}}\, (\overline{\theta }_{{j,p}})^2 + \frac{2T}{p} \mathbf{M}(x^{'}) + \frac{1}{p}\, \sum ^{p}_{{j=[\sqrt{T}]}}\,\xi ^2_{{j,p}} \,, \end{aligned}$$

(75)

where $\mathbf{M}$ is given in (68). Setting $x^{''}=(x^{''}_{{j}})_{{1\le j\le p}}$ and $ x^{''}_{{j}}= p^{-1/2} \mathbf{1}_{{\{[\sqrt{T}] \leqslant j\leqslant p\}}}$, one can write the last term on the right-hand side of (75) as

$$\begin{aligned} \frac{1}{p}\, \sum ^{p}_{{j=[\sqrt{T}] }}\,\xi ^2_{{j,p}} = \frac{1}{\sqrt{p}} \, B_{{2,Q}}(x^{''}) + \frac{1}{p} B_{{1,Q}}(x^{'}) + \frac{(p- [\sqrt{T}]+1) \sigma _{{Q }}}{p} \,, \end{aligned}$$

where the functions $B_{{1,Q}}$ and $B_{{2,Q}}$ are defined in (63). To estimate the first term in (75) we note, that $\sum _{{j\ge [\sqrt{T}]}}j^{-2}\le 2[\sqrt{T}]^{-1}$ and $\sqrt{T}\ge 2$. Therefore, using Proposition 4 and Lemma 6, we come to the following upper bound

$$\begin{aligned} \mathbf{E}_{{Q}}|\widehat{\sigma }_{{T}}-\sigma _{{Q}}| \le \frac{16 \left( \int ^{1}_{{0}}\vert \dot{S}(t)\vert \mathrm {d}t\right) ^{2} T }{[\sqrt{T}] p} +\frac{2T}{p} \mathbf{E}_{{Q}}\,| \mathbf {M}(x^{'})| +\frac{\mathbf{L}_{{1,Q}}}{p} +\frac{\sqrt{\mathbf{L}_{{2,Q}}}}{\sqrt{p}} +\frac{\sigma _{{Q}}\sqrt{T}}{p}\,. \end{aligned}$$

In the same way as in (70) through Lemma 6, we obtain

$$\begin{aligned} \mathbf{E}_{{Q}}\,|M(x^{'})| \le \left( \frac{\varkappa _{{Q}}}{T}\, \sum ^{p}_{{j=[\sqrt{T}]}} \, \overline{\theta }^2_{{j,p}} \right) ^{1/2} \le \frac{4(\varkappa _{{Q}})^{1/2} \int ^{1}_{{0}}\vert \dot{S}(t)\vert \mathrm {d}t }{\sqrt{T}}\,. \end{aligned}$$

Taking into account, that $\int ^{1}_{{0}}\vert \dot{S}(t)\vert \mathrm {d}t \le \Vert \dot{S}\Vert $ and $\kappa _{{Q}}\le (1+\overline{\tau }\vert \rho \vert _{{*}})\sigma _{{Q}}$ and using the bounds (64) and (65) we obtain the inequality (41). Hence Proposition 1 holds true. $\square $

7.3 Proof of Theorem 2

This proof directly follows from Theorem 1 and Proposition 1. $\square $

7.4 Proof of Theorem 4

First, we denote by $Q_{{0}}$ the distribution in $\mathcal{D}[0,n]$ of the noise (6) with the parameter $\varrho _{{1}}=\varsigma ^{*}$, $\varrho _{{2}}=0$ and $\varrho _{{3}}=0$, i.e., the distribution for the “signal + white noise” model. So, we can estimate from below the robust risk $ \mathcal{R}^{*}_{{T}}(\widetilde{S}_{{T}},S)\ge \mathcal{R}_{{Q_{{0}}}}(\widetilde{S}_{{T}},S)$. Now, Theorem 6.1 from Konev and Pergamenshchikov (2009b) yields the bound (52). Hence we obtain the desired result. $\square $

7.5 Proof of Proposition 2

First, we note that in view of (31) one can represent the quadratic risk for the empiric norm $\Vert \cdot \Vert _{{p}}$ defined in (25) as

$$\begin{aligned} \mathbf{E}_{{Q}}\, \Vert \widehat{S}_{{\lambda _{{0}}}}-S\Vert ^{2}_{{p}} = \frac{1}{T} \sum _{j=1}^{p}\,\lambda _{{0}}^2(j)\,\mathbf{E}_{{Q}}\,\xi ^2_{{j,p}} + \overline{\varTheta }_{{p}}\,, \end{aligned}$$

where $\overline{\varTheta }_{{p}}= \sum _{j=1}^{p}\, \left( \theta _{{j,p}}-\lambda _{{0}}(j)\,\overline{\theta }_{{j,p}} \right) ^2$. First, note that

$$\begin{aligned} \sup _{{Q\in \mathcal{Q}_{{T}}}}\, \mathbf{E}_{{Q}} \sum _{j=1}^{p}\,\lambda _{{0}}^2(j) \,\xi ^{2}_{{j,p}}\le \, \frac{ \varsigma _{{*}}}{T} \,\sum _{j=1}^{p}\,\lambda _{{0}}^2(j) +\mathbf{L}^{*}_{{1,T}}\,. \end{aligned}$$

where $\mathbf{L}^{*}_{{1,T}}=\sup _{{Q\in \mathcal{Q}_{{T}}}}\,\mathbf{L}_{{1,Q}}$. Taking into account that $\upsilon _{{T}}=T/\varsigma ^{*}$, we get

$$\begin{aligned} \sup _{{Q\in \mathcal{Q}_{{T}}}}\, \mathbf{E}_{{Q}}\, \Vert \widehat{S}_{{\lambda _{{0}}}}-S\Vert ^{2}_{{p}} \,\le \, \frac{1}{\upsilon _{{T}}} \sum _{{j=1}}^{p}\,\lambda _{{0}}^2(j) +\frac{\mathbf{L}^{*}_{{1,T}}}{T} +\overline{\varTheta }_{{p}} \,. \end{aligned}$$

Recalling here, that $\lambda _{{0}}=\lambda _{{\alpha _{{0}}}}$, we get that

$$\begin{aligned} \lim _{T\rightarrow \infty }\, \frac{\sum _{{j=1}}^{T}\, \lambda ^2_{{0}}(j)}{\upsilon ^{1/(2\mathbf{k}+1)}_{{T}}}= \frac{2(\tau _{{\mathbf{k}}}\,\mathbf{r})^{1/(2\mathbf{k}+1)}\,\mathbf{k}^2}{(\mathbf{k}+1)(2\mathbf{k}+1)} \,, \end{aligned}$$

(76)

where $\omega _{{0}}=\omega _{{\alpha _{{0}}}}= \left( \tau _{{k}} r_{{0}} \upsilon _{{T}}\right) ^{1/(2k+1)}$, $r_{{0}}=\left[ \mathbf{r}/ \varepsilon \right] \varepsilon $ and $\tau _{{\mathbf{k}}}$ is given in (47). Indeed, this follows immediately from the fact that $\lim _{{T\rightarrow \infty }}r_{{0}}=\mathbf{r}$ and

$$\begin{aligned} \lim _{{T\rightarrow \infty }}&\frac{1}{\omega _{{0}}}\sum ^{\omega _{{0}}}_{{j=[\ln T]}} \left( 1- (j/\omega _{{0}})^{\mathbf{k}}\right) ^{2} = \lim _{{T\rightarrow \infty }} \frac{1}{\omega _{{0}}}\sum ^{\omega _{{0}}}_{{j=1}} \left( 1- (j/\omega _{{0}})^{\mathbf{k}}\right) ^{2}\\[2mm]&=\int ^{1}_{{0}}(1-t^{\mathbf{k}})^{2}\mathrm {d}t= \frac{2\,\mathbf{k}^2}{(\mathbf{k}+1)(2\mathbf{k}+1)} \,. \end{aligned}$$

Now, from (29) we obtain that for any $0<\widetilde{\varepsilon }<1$

$$\begin{aligned} \overline{\varTheta }_{{p}}\le (1+\widetilde{\varepsilon })\,\varTheta _{{p}}+ (1+\widetilde{\varepsilon }^{-1})\,\sum ^{p}_{{j=1}}\,h^{2}_{{j,p}}\,, \end{aligned}$$

(77)

where $\varTheta _{{p}}= \sum ^{p}_{j=1}\,(1-\lambda _{{0}}(j))^2\,\theta ^2_{{j,p}}$. Moreover, in view of the definition (53)

$$\begin{aligned} \varTheta _{{p}}= \sum _{{j=[\ln T]}}^{[\omega _{{0}}]}\,(1-\lambda _{{0}}(j))^2\, \theta ^2_{{j,p}} + \sum _{{j=[\omega _{{0}}]+1}}^{p}\,\theta ^2_{{j,p}} :=\varTheta _{{1,p}}+\varTheta _{{2,p}}\,, \end{aligned}$$

Note here, that $\theta ^2_{{j,p}}\le (1+\widetilde{\varepsilon })\theta ^{2}_{{j}}+(1+\widetilde{\varepsilon }^{-1})(\theta _{{j,p}}-\theta _{{j}})^{2}$ for any $\widetilde{\varepsilon }>0$. Therefore, in view of Lemma 8 using the bound $\sum ^{[\omega _{{0}}]}_{{j=[\ln T]}}j^{2}\le \omega ^{3}_{{0}}$, we get

$$\begin{aligned} \varTheta _{{1,p}}&\le (1+\widetilde{\varepsilon })\, \sum _{{j=[\ln T]}}^{[\omega _{{0}}]}\,(1-\lambda _{{0}}(j))^2\,\theta ^{2}_{{j}} +4\pi ^{2}r (1+\widetilde{\varepsilon }^{-1})\,p^{-2} \sum ^{[\omega _{{0}}]}_{{j=[\ln T]}}j^{2}\\&\le (1+\widetilde{\varepsilon })\, \sum _{{j=[\ln T]}}^{[\omega _{{0}}]}\,(1-\lambda _{{0}}(j))^2\,\theta ^{2}_{{j}} +4\pi ^{2}r (1+\widetilde{\varepsilon }^{-1})\,\omega ^{3}_{{0}}\,p^{-2}\,. \end{aligned}$$

Through Lemma 7 we have $ \varTheta _{{2,p}}\le (1+\widetilde{\varepsilon }) \sum _{{j\ge [\omega _{{0}}]+1}}\,\theta ^{2}_{{j}} +(1+\widetilde{\varepsilon }^{-1})\,\mathbf{r}\,p^{-2}$. Hence, $ \varTheta _{{p}}\, \le (1+\widetilde{\varepsilon })\, \varTheta ^{*} + (1+\widetilde{\varepsilon }^{-1})\, \left( 4\pi ^{2}r\omega ^{3}_{{0}}+r\right) \,p^{-2}$, where the first term $\varTheta ^{*} =\sum _{{j\ge \ln T}}\,(1-\lambda _{{0}}(j))^2\,\theta ^{2}_{{j}}$. Moreover, note that

$$\begin{aligned} \sup _{{S\in \mathcal{W}_{{\mathbf{r},1}}}}\, \max _{{1\le j\le p}}\, h^{2}_{{j,p}}\le \,\Vert \dot{S}\Vert ^{2}\,p^{-2}\,\le \,r\,p^{-2}\,. \end{aligned}$$

Moreover, $\mathcal{W}_{{\mathbf{r},\mathbf{k}}}\subseteq \mathcal{W}_{{\mathbf{r},2}}$ for any $\mathbf{k}\ge 2$. From here and Lemma 9 we get

$$\begin{aligned} \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}}\, \sum ^{p}_{{j=1}}\,h^{2}_{{j,p}}\le r \left( p^{-1}\,\mathbf{1}_{{\{\mathbf{k}=1\}}}+3p^{-2}\mathbf{1}_{{\{\mathbf{k}\ge 2\}}}\right) \end{aligned}$$

and, therefore, in view of the condition $(\mathbf{H}_{{5}}$)

$$\begin{aligned} \lim _{{T\rightarrow \infty }}\,\upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\,\left( p^{-1}\mathbf{1}_{{\{\mathbf{k}=1\}}} + \omega ^{3}_{{0}}p^{-2} \right) \, =0\,. \end{aligned}$$

This implies, that

$$\begin{aligned} \limsup _{{T\rightarrow \infty }}\,\upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\,\sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}}\, \overline{\varTheta }_{{p}}\, \le \, \limsup _{{T\rightarrow \infty }}\,\upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\,\sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}}\, \varTheta ^{*}\,. \end{aligned}$$

To estimate the term $\varTheta ^{*}$ we set

$$\begin{aligned} \mathbf{U}_{{T}}= \upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}} \sup _{{j\ge \ln T}}(1-\lambda _{{0}}(j))^2/a_{{j}}\,, \end{aligned}$$

where the sequence $(a_{{j}})_{{j\ge 1}}$ is defined in (50). This leads to the inequality

$$\begin{aligned} \sup _{{S\in \mathcal{W}_{{\mathbf{r},1}}}}\, \upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\,\varTheta ^{*} \, \le \mathbf{U}_{{T}}\, \sum _{{j\ge 1}}\,a_{{j}}\,\theta ^{2}_{{j}}\,\le \,\mathbf{U}_{{T}}\mathbf{r}\,. \end{aligned}$$

Using $\lim _{{T\rightarrow \infty }} r_{{0}}=\mathbf{r}$, we get $ \limsup _{T\rightarrow \infty }\, \mathbf{U}_{{T}} \le \, \pi ^{-2\mathbf{k}}\left( \tau _{{\mathbf{k}}}\,r \right) ^{-2\mathbf{k}/(2\mathbf{k}+1)}$, where the coefficient $\tau _{{\mathbf{k}}}$ is given in (76). This implies immediately that

$$\begin{aligned} \limsup _{{T\rightarrow \infty }}\,\upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\, \sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}}\, \overline{\varTheta }_{{p}}\, \le \frac{\mathbf{r}^{1/(2\mathbf{k}+1)}}{\pi ^{2\mathbf{k}}(\tau _{{\mathbf{k}}})^{2\mathbf{k}/(2\mathbf{k}+1)}}\,. \end{aligned}$$

(78)

Therefore, from (76) and (78) it follows that

$$\begin{aligned} \lim _{{T\rightarrow \infty }} \upsilon ^{2\mathbf{k}/(2\mathbf{k}+1)}_{{T}}\,\sup _{{S\in \mathcal{W}_{{\mathbf{r},\mathbf{k}}}}} \sup _{{Q\in \mathcal{Q}_{{T}}}}\, \mathbf{E}_{{Q}}\, \Vert \widehat{S}_{{\lambda _{{0}}}}-S\Vert ^{2}_{{p}} \le \,\mathbf{l}_{{*}}\,. \end{aligned}$$

(79)

Using now Lemma 5 and the condition $(\mathbf{H}_{{5}})$, we get the upper bound (54). Hence we obtain the desired result. $\square $

References

Barbu, V. S., Beltaief, S., Pergamenshchikov, S. M. (2019a). Robust adaptive efficient estimation for semi-Markov nonparametric regression models. Statistical Inference for Stochastic Processes, 22(2), 187–231.
Article MathSciNet Google Scholar
Barbu, V. S., Beltaief, S., Pergamenshchikov, S. M. (2019b). Robust statistical signal processing in semi-Markov nonparametric regression models. Les Annales de l’I.S.U.P, 63(2–3), 45–56.
MATH Google Scholar
Beltaief, S., Chernoyarov, O., Pergamenshchikov, S. M. (2020). Model selection for the robust efficient signal processing observed with small Lévy noise. Annals of the Institute of Statistical Mathematics, 72, 1205–1235.
Article MathSciNet Google Scholar
Barbu, V. S., Limnios, N. (2008). Semi markov chains and hidden semi-markov models toward applications their use in reliability and DNA analysis. Lecture notes in statistics. New York: Springer.
MATH Google Scholar
Barndorff-Nielsen, O. E., Shephard, N. (2001). Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial mathematics. Journal of the Royal Statistical Society Series B (Statistical Methodology), 63, 167–241.
Article MathSciNet Google Scholar
Biard, R., Saussereau, B. (2014). Fractional Poisson processes: Long-range dependence and applications in ruin theory. Journal of Applied Probability, 51, 727–740.
Article MathSciNet Google Scholar
Fourdrinier, D., Pergamenshchikov, S. M. (2007). Improved selection model method for the regression with dependent noise. Annals of the Institute of Statistical Mathematics, 59(3), 435–464.
Article MathSciNet Google Scholar
Fujimori, K. (2019). The Dantzig selector for a linear model of diffusion processes. Statistical Inference for Stochastic Processes, 22, 475–498.
Article MathSciNet Google Scholar
Hastie, T., Friedman, J., Tibshirani, R. (2008). The elements of statistical leaning. data mining, inference and prediction (2nd ed.). New York: Springer, Springer series (in Statistics).
MATH Google Scholar
Ibragimov, I. A., Khasminskii, R. Z. (1981). Statistical estimation: Asymptotic theory. New York: Springer.
Book Google Scholar
Kassam, S. A. (1988). Signal detection in Non-Gaussian Noise. IX. New York: Springer.
Book Google Scholar
Konev, V. V., Pergamenshchikov, S. M. (2009). Nonparametric estimation in a semimartingale regression model. Par.t 1. Oracle Inequalities. Vestnik Tomskogo Gosudarstvennogo Universiteta. Matematika i Mekhanika, 3(7), 23–41.
Google Scholar
Konev, V. V., Pergamenshchikov, S. M. (2009). Nonparametric estimation in a semimartingale regression model. Part 2. Robust asymptotic efficiency. Vestnik Tomskogo Gosudarstvennogo Universiteta. Matematika i Mekhanika, 4(8), 31–45.
Google Scholar
Konev, V. V., Pergamenshchikov, S. M. (2012). Efficient robust nonparametric in a semimartingale regression model. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4), 1217–1244.
Article MathSciNet Google Scholar
Konev, V. V., Pergamenshchikov, S. M. (2015). Robust model selection for a semimartingale continuous time regression from discrete data. Stochastic Processes and their Applications, 125, 294–326.
Article MathSciNet Google Scholar
Kutoyants, Yu. A. (1994). Identification of dynamical systems with small noise. Dordrecht: Kluwer Academic Publishers Group.
Book Google Scholar
Laskin, N. (2003). Fractional Poisson processes. Communications in Nonlinear Science and Numerical Simulation, 8, 201–213.
Article MathSciNet Google Scholar
Liptser, R. S., Shiryaev, A. N. (1989). Theory of martingales. New York: Springer.
Book Google Scholar
Maheshwari, A., Vellaisamy, P. (2016). On the long - range dependence of fractional Poisson processes. Journal of Applied Probability, 53(4), 989–1000.
Article MathSciNet Google Scholar
Middleton, D. (1979). Canonical non-Gaussian noise models: Their implications for measurement and for prediction of receiver performance. IEEE Transactions on Electromagnetic Compatibility, 21, 209–220.
Article Google Scholar
Novikov, A. A. (1975). On discontinuous martingales. Theory of Probability and its Applications, 20(1), 11–26.
Article MathSciNet Google Scholar
Pinsker, M. S. (1981). Optimal filtration of square integrable signals in gaussian white noise. Problems of Transmission Information, 17, 120–133.
Google Scholar
Repin, O. N., Saichev, A. I. (2000). Fractional Poisson law. Radiophysics and Quantum Electronics, 43(9), 738–741.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Mathématiques Raphaël Salem, UMR 6085 CNRS-Université de Rouen Normandie, Avenue de l’Université, BP.12, 76801, Saint-Etienne-du-Rouvray, France
Vlad Stefan Barbu & Serguei Pergamenchtchikov
ALTEN de Toulouse, 9 Rue Alain Fournier, 31300, Toulouse, France
Slim Beltaief
International Laboratory of Statistics of Stochastic Processes and Quantitative Finance, Tomsk State University, Tomsk, Russia
Serguei Pergamenchtchikov

Authors

Vlad Stefan Barbu
View author publications
You can also search for this author in PubMed Google Scholar
Slim Beltaief
View author publications
You can also search for this author in PubMed Google Scholar
Serguei Pergamenchtchikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Serguei Pergamenchtchikov.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by RSF, Project No 20-61-47043 (National Research Tomsk State University, Russia).

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 448 KB)

Appendix

1.1 Property of the penalty term

Lemma 4

For any $n\ge \,1$ and $\lambda \in \varLambda $,

$$\begin{aligned} P_{{T}}(\lambda ) \le \mathcal{R}_{{Q}}(\widehat{S}_{{\lambda }},S) +\frac{\mathbf{L}_{{1,Q}}}{T}, \end{aligned}$$

where the coefficient $P_{{T}}(\lambda )$ is defined in (68) and $\mathbf{L}_{{1,Q}}$ is defined in (64).

Proof

From (30) and (33) we obtain

$$\begin{aligned} \text{ Err }(\lambda ) \ge \sum _{{j=1}}^{p} \left( \lambda (j) \widehat{\theta }_{{j,p}} - \overline{\theta }_{{j,p}} \right) ^2 = \sum _{{j=1}}^{p} \left( (\lambda (j)-1) \overline{\theta }_{{j,p}}+ \frac{\lambda (j)}{T}\xi _{{j,p}} \right) ^2 \,. \end{aligned}$$

Now Proposition 4 implies

$$\begin{aligned} \mathcal{R}_{{Q}}(\widehat{S}_{{\lambda }},S)= \mathbf{E}_{{Q}}\, \text{ Err }(\lambda ) \ge \, \frac{1}{T}\sum _{{j=1}}^{p} \lambda ^2(j) \mathbf{E}_{{Q}}\,\xi ^2_{{j,p}} \ge \, P_{{T}}(\lambda )-\frac{\mathbf{L}_{{1,Q}}}{T} \,. \end{aligned}$$

Hence we obtain the result. $\square $

1.2 Properties of the Fourier coefficients

Lemma 5

Let f be an absolutely continuous function, $f: [0,1]\rightarrow {{\mathbb {R}}},$ with $\Vert \dot{f}\Vert <\infty $ and g be a simple function, $g: [0,1]\rightarrow {{\mathbb {R}}}$ of the form $ g(t)=\sum _{j=1}^p\,c_{{j}}\,\chi _{(t_{j-1},\mathbf{t}_{{j}}]}(t),$ where $c_{{j}}$ are some constants. Then, for any $\varepsilon >0,$ the function $\varDelta =f-g$ satisfies the following inequalities

$$\begin{aligned} \Vert \varDelta \Vert ^{2}\le (1+\widetilde{\varepsilon })\Vert \varDelta \Vert ^{2}_{{p}} + (1+\widetilde{\varepsilon }^{-1})\frac{\Vert \dot{f}\Vert ^{2}}{p^{2}}\,, \quad \Vert \varDelta \Vert ^{2}_{{p}}\le (1+\widetilde{\varepsilon })\Vert \varDelta \Vert ^{2} + (1+\widetilde{\varepsilon }^{-1})\frac{\Vert \dot{f}\Vert ^{2}}{p^{2}} \,. \end{aligned}$$

Lemma 6

Let the function S(t) in (3) be absolutely continuous and have an absolutely integrable derivative. Then the coefficients $(\overline{\theta }_{{j,p}})_{1\leqslant j \leqslant p}$ defined in (29) satisfy the inequalities $\max _{{2\leqslant j \leqslant p}} j \vert \overline{\theta }_{{j,p}} \vert \leqslant 2 \sqrt{2} \int ^{1}_{{0}}\vert \dot{S}(t) \vert \mathrm {d}t$.

Lemma 7

For any $p\ge 2$, $1\le N\le p$ and $r>0$, the coefficients $(\theta _{{j,p}})_{{1\le j\le p}}$ of functions S from the class $\mathcal{W}_{{\mathbf{r},1}}$ satisfy, for any $\widetilde{\varepsilon }>0$, the following inequality $ \sum ^{p}_{{j=N}} \theta ^{2}_{{j,p}} \, \le \,(1+\widetilde{\varepsilon }) \,\sum _{{j\ge N}}\,\theta ^{2}_{{j}} \, +(1+\widetilde{\varepsilon }^{-1})\,r\,p^{-2}$.

Lemma 8

For any $p\ge 2$ and $r>0$, the coefficients $(\theta _{{j,p}})_{{1\le j\le p}}$ of functions S satisfy the inequality $ \max _{{1\le j\le p}} \,\sup _{{S\in \mathcal{W}_{{\mathbf{r},1}}}} \left( |\theta _{{j,p}} - \theta _{{j}}| -2\pi \sqrt{r} \,j\,p^{-1} \right) \, \le 0$.

Lemma 9

For any $p\ge 2$ and $r>0$ the correction coefficients from (29) satisfy the inequality $ \sup _{{S\in \mathcal{W}_{{\mathbf{r},2}}}} \sum ^{p}_{{j=1}} h^{2}_{{j,p}} \le \,3r\,p^{-2}$.

Lemmas 5–9 are proven in Konev and Pergamenshchikov (2015).

About this article

Cite this article

Barbu, V.S., Beltaief, S. & Pergamenchtchikov, S. Adaptive efficient estimation for generalized semi-Markov big data models. Ann Inst Stat Math 74, 925–955 (2022). https://doi.org/10.1007/s10463-022-00820-y

Download citation

Received: 10 March 2021
Revised: 09 November 2021
Accepted: 04 January 2022
Published: 05 March 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10463-022-00820-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Adaptive efficient estimation for generalized semi-Markov big data models

Abstract

Similar content being viewed by others

Efficient Improved Estimation Method for Non-Gaussian Regression from Discrete Data

Robust adaptive efficient estimation for semi-Markov nonparametric regression models

Improved estimation method for high dimension semimartingale regression models based on discrete data

1 Introduction

1.1 Motivations

1.2 Methods

1.3 Main contributions of this paper

1.4 Organization of the paper

2 Main conditions

Remark 1

Remark 2

Remark 3

3 Truncated fractional Poisson processes

Remark 4

4 Model selection

5 Main results

5.1 Oracle inequalities

Theorem 1

Corollary 1

Proposition 1

Theorem 2

Remark 5

Theorem 3

5.2 Robust asymptotic efficiency

Theorem 4

Proposition 2

Theorem 5

Theorem 6

Remark 6

5.3 Big data analysis for the model (1)

Theorem 7

Theorem 8

Remark 7

6 Stochastic calculus for generalized semi-Markov processes

Lemma 1

Lemma 2

Lemma 3

Proof

Proposition 3

Proof

Proposition 4

Proof

7 Proofs

7.1 Proof of Theorem 1

7.2 Proof of Proposition 1

7.3 Proof of Theorem 2

7.4 Proof of Theorem 4

7.5 Proof of Proposition 2

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 448 KB)

Appendix

Appendix

1.1 Property of the penalty term

Lemma 4

Proof

1.2 Properties of the Fourier coefficients

Lemma 5

Lemma 6

Lemma 7

Lemma 8

Lemma 9

About this article

Cite this article

Share this article

Keywords

Search

Navigation