Abstract
We consider a Gaussian continuous time moving average model \(X(t)=\int _0^t a(t-s)dW(s)\) where W is a standard Brownian motion and a(.) a deterministic function locally square integrable on \({{\mathbb {R}}}^+\). Given N i.i.d. continuous time observations of \((X_i(t))_{t\in [0,T]}\) on [0, T], for \(i=1, \dots , N\) distributed like \((X(t))_{t\in [0,T]}\), we propose nonparametric projection estimators of \(a^2\) under different sets of assumptions, which authorize or not fractional models. We study the asymptotics in T, N (depending on the setup) ensuring their consistency, provide their nonparametric rates of convergence on functional regularity spaces. Then, we propose a data-driven method corresponding to each setup, for selecting the dimension of the projection space. The findings are illustrated through a simulation study.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Samples of infinite dimensional data, especially of data recorded continuously over a time interval are now a commonly encountered type of data due to the possibilities of modern technology. They arise in many fields of applications, e.g. in econometrics where authors rather speak of panel data and supply the field of functional data analysis (FDA) whose scope is no more to be demonstrated (see, for general ideas and lots of examples, Hsiao (2003), Ramsay and Silverman (2007), Wang et al. (2015)). Parametric models are most often proposed to deal with FDA. However, nonparametric approaches allow for more flexibility and robustness.
In the present contribution, we consider i.i.d. observations \((X_i(t), t\in [0,T], i=1, \ldots ,N) \) of the continuous time moving average (CMA) process
where \((W(t), t\ge 0)\) is a Wiener process and \(a: {{\mathbb {R}}}^+ \rightarrow {{\mathbb {R}}}\) is a deterministic square integrable function. Our aim is to study the new and challenging question of the nonparametric estimation of the function \(g=a^2\) from these observations under very general conditions on the function a(t). Our assumptions include in particular the classical CARMA processes (continuous ARMA) but also more complicated processes such as the continuous time fractionally integrated process of order d (see (3)), defined in (Comte and Renault 1996, Definition 2) which is linked with Brownian motion with Hurst index \(H=d+(1/2)\).
CMA processes have been the subject of a huge number of contributions concerned with modelling properties. Estimation procedures rely on the observation of a unique sample path on a time interval [0, T] and usually, the stationary version of (X(t)), namely
is considered. We refer e.g. to (Brockwell (2001)) for a reference book, Brockwell et al. (2012) and the references given therein, where a general Lévy process (L(t)) may replace (W(t)) (see also e.g. Belomestny et al. 2019; Schnurr and Woerner 2011). For what concerns nonparametric estimation, a pointwise estimator of a(t) for mainly Gaussian CARMA(p, q) processes in stationary regime (see formula (2)), is proposed in Brockwell et al. (2012) based on the discrete observation of one sample path. Except for this reference, to our knowledge, the nonparametric estimation of a(t) for general CMA processes has not yet been studied.
In the present paper, stationarity of the process is not required. The asymptotic framework will be that either N tends to infinity with fixed T or both Nand T tend to infinity. We assume that g is square integrable. Considering sequences \((S_m, m\in {{\mathbb {N}}})\) of finite dimensional subspaces of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\), we propose two kinds of projections estimators of g built using the observations \((X_i(t), t\in [0,T], i=1, \ldots ,N)\): i.e., we build estimators of the orthogonal projection \(g_m\) of g on \(S_m\) by estimating the coefficients of the projection on an orthonormal basis of \(S_m\). The first method relies on the assumption that a(t) belongs to \(C^1([0, +\infty ))\) which excludes the continuous time fractionally integrated process. In this case, (X(t)) is an Itô process with explicit stochastic differential. The second approach which is more general applies without regularity assumptions on a(t). Then, in the general case, we propose a data-driven selection of the dimension leading to an adaptive estimator. For this part, the Gaussian character of the process (X(t)) is especially exploited. Proofs which do not rely on this property are possible though longer.
In Sect. 2, we present assumptions and the collections of models. Two collections are especially investigated. First, we consider for fixed T the collection of spaces generated by the trigonometric basis of \({{\mathbb {L}}}^2([0,T])\) and thus we estimate \(g_T=g{{\mathbf {1}}}_{[0,T]}\). Second, for large T, we consider spaces generated by the Laguerre basis of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\). This basis has been largely investigated and used in recent references for nonparametric estimation by projection method (see e.g. Comte and Genon-Catalot 2018). The estimators are presented in Sects. 2.2 (first method) and 2.3 (second method under more general assumptions). Several risk bounds for the projection estimators on a fixed space are obtained and discussed. In Sect. 3, we detail the possible rates of convergence that can be deduced from the risk bounds depending on regularity spaces for the unknown function g. Section 4 is concerned with the data-driven choice of the dimension of the projection space. We prove that our estimators are adaptive in the sense that their risk bounds automatically achieve the best compromise between square bias and variance terms (Theorems 1 and 2). Section 5 contains a simulation study. Estimators are implemented on simulated data for various examples of functions g. We give table of risks obtained by Monte-Carlo simulations. In Sect. 6, some concluding remarks are given. Proofs are gathered in Sects. 7 and 8 contains the necessary definitions and properties of the Laguerre basis.
2 Projection estimators on a fixed space
2.1 Assumptions and collection of models
We estimate the function
Our study will depend on assumptions on the unknown function a(t):
-
[H0] The function \(g(t)=a^2(t)\) belongs to \({{\mathbb {L}}}^1({{\mathbb {R}}}^+)\cap {{\mathbb {L}}}^2({{\mathbb {R}}}^+)\)
-
[H1] The function a(t) belongs to \(C^1({{\mathbb {R}}}^+)\), is bounded and \(\int _0^{+\infty } (a'(t))^2dt<+\infty \).
Example 1
Consider the following example: \(a(t)=t^d{{\tilde{a}}}(t)/\Gamma (d+1)\) where \(d>-1/2\) and \({{\tilde{a}}}\in C^1({{\mathbb {R}}}^+)\) and \({{\tilde{a}}}(0) \ne 0\),
In particular for \({{\tilde{a}}}(x)=1\), this process is the continuous time fractional Brownian motion of order d defined in (Comte and Renault 1996 , Definition 1) and the general formulation above corresponds to the continuous time fractionally integrated process of order d (Definition 2 therein). The integrability of \(a^2, a'^2,a^4\) near infinity can be ensured by the rate of decrease of \({{\tilde{a}}}\) near infinity, for instance if \({{\tilde{a}}}(t)=e^{-t}\). The behaviour near 0 depends on d:
- (i):
-
The process X(t) is well defined for any \(d>-1/2\) as a is locally square integrable.
- (ii):
-
For \(-1/2<d<0\), a(0) is not defined.
- (iii):
-
For \(d\ge 1\), a(t) belongs to \(C^1({{\mathbb {R}}}^+)\) and \(a'\) is locally square integrable.
- (iv):
-
As \(a(t) \sim ({{\tilde{a}}}(0)/\Gamma (d+1)) t^d\) at 0, [H0] requires \(d>-1/4\).
In other words, fractional processes can be studied only under [H0].
We denote respectively by \(\Vert .\Vert _T\) (resp. \(\langle .,.\rangle _T\)) the norm (resp. the scalar product) of \({{\mathbb {L}}}^2([0,T])\) and \(\Vert .\Vert \) (resp. \(\langle .,.\rangle \)) the norm (resp. the scalar product) of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\). We set
Note that \({{\mathbb {E}}}(X^2(t))=G(t)\) is what enables us to estimate g, whereas \({{\mathbb {E}}}(Y^2(t))=\Vert a\Vert ^2\) would not. To build estimators of g, we use a projection method and consider two settings.
-
In the first case, T is fixed and we estimate \(g_T=g{{\mathbf {1}}}_{[0,T]}\). For this, we consider the collection \((S_{m}^{Trig}, m\ge 0)\) of subspaces of \({{\mathbb {L}}}^2([0,T])\) where \(S_{m}^{Trig}\) has odd dimension m and is generated by the orthonormal trigonometric basis \((\varphi _{j,T})\) where \(\varphi _{0,T}(t)=\sqrt{1/T}\mathbf {1}_{[0,T]}(t)\), \(\varphi _{2j-1,T}(t)=\sqrt{2/T}\cos (2\pi jt/T) \mathbf {1}_{[0,T]}(t)\) and \(\varphi _{2j,T}(t)=\sqrt{2/T}\sin (2\pi j t/T) \mathbf {1}_{[0,T]}(t)\) for \(j=1, \dots , (m-1)/2\). This basis satisfies
$$\begin{aligned} \sum _{j=0}^{m-1} \varphi _{j,T}^2(t)= \frac{m}{T} \quad \text{ and }\quad \int _0^T \varphi _{0,T}(t)dt=\sqrt{T} , \int _0^T \varphi _{j,T}(t)dt= 0 \quad \text{ for }\quad j\ne 0. \end{aligned}$$ -
In the second case, we may consider that either T is fixed but large enough, or that T tends to infinity. In this case, we estimate g on \({{\mathbb {R}}}^+\) and we rather consider a collection of subspaces of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\), generated by an orthonormal basis. The basis considered here is the Laguerre basis defined by
$$\begin{aligned} \ell _j(t)= \sqrt{2} L_j(2t) e^{-t}{{\mathbf {1}}}_{t\ge 0}, \quad j\ge 0, \quad L_j(t)=\sum _{k=0}^j (-1)^k \left( {\begin{array}{c}j\\ k\end{array}}\right) \frac{t^k}{k!}. \end{aligned}$$(5)We set \(S_{m}^{Lag}= \mathrm{span}\{\ell _j, j=0, \ldots , m-1\}\). We have
$$\begin{aligned} \forall t\ge 0, \quad \sum _{j=0}^{m-1} \ell _{j}^2(t)\le 2m, \quad \text{ and } \quad \int _0^{+\infty } \ell _j(t)dt= \sqrt{2}(-1)^j. \end{aligned}$$The second property is obtained by exact computation and the first one comes from the fact that \(\forall j, |\ell _j(t)|\le \sqrt{2}\). Moreover, \({{\mathcal {L}}}_j(T):=\int _0^T\ell _j(u)du\) can computed recursively, see (48). All formulae concerning this basis are recalled in Sect. 8.
Remark 1
In the case of fixed T, we could also consider the subspaces \((S_{m}^{Hist})\) of \({{\mathbb {L}}}^2([0,T])\) generated by the histogram basis
where \(\sum _{j=0}^{m-1} \varphi _{j,T}^2(t)=m/T\) and \(\int _0^T \varphi _{j,T}(t)dt=\sqrt{T/m}\). But these basis functions are not differentiable and thus would not be suitable for all our proposals.
For simplicity, in order to use a unique notation, we denote by \(\varphi _j\) either \(\varphi _{j,T}\) or \(\ell _j\) and set \(S_m= \mathrm{span}\{\varphi _j, j=0, \ldots , m-1\}\). In all cases, under [H0], the function g admits a development
We define \(g_m(t)= \sum _{j=0}^{m-1} \theta _j \varphi _j(t)\) the orthogonal projection of g on \(S_m\).
2.2 Estimators under [H0]–[H1]
Under [H1], the stochastic differential of (X(t)) satisfies:
(see Comte and Renault (1996), Eq. (6)).
Remark 2
By Eq. (6), we have, for each trajectory \(X_i\), for \(t_k=kT/n\) with fixed T,
Thus, we can assume that g(0) is known, as we have continuous observation of the sample paths.
The construction of our first estimator relies on the following lemma.
Lemma 1
Under [H0]-[H1], denoting by \(\theta _j=\langle g, \varphi _j\rangle \), we have
Obviously, if the basis has support [0, T], integrals are on this interval. Relying on this lemma, we can set:
The projection estimator of g on a fixed space \(S_m\) is given by:
We refer to Remark 2 concerning the fact that g(0) is known. We mention that here, the histogram basis can be used in the fixed-T setting.
Note that, by the Ito formula and (6), we can write \({{\hat{\theta }}}_j\) without stochastic integral, provided that \(\varphi _j\) is differentiable:
The following proposition gives bounds for the \({{\mathbb {L}}}^2\)-risk of \({{\hat{g}}}_m\) in the case of fixed T and the trigonometric basis.
Proposition 1
Assume [H0]-[H1] and consider that \((\varphi _j=\varphi _{j,T})\) is the trigonometric basis. Then
where \(G_1(T)=\int _0^T(a'(u))^2du\). (Recall that G is defined in (4), that \(g_m\) denotes the orthogonal projection of g on \(S_m^{Trig}\) and that \(\Vert u\Vert _T^2=\int _0^T u^2(s)ds\).)
If \(g(0)=0\),
Let us discuss these bounds for fixed T and large N. The bounds involve a standard squared bias term \(\Vert g-g_m\Vert _T^2\) due to the projection method. For \(g(0)\ne 0\), the variance has order m/N and the last two terms are residuals (see (9)). Therefore in this case, for choosing m, the bias-variance compromise can be done between the first two terms.
The case \(g(0)=0\) is different as the process is differentiable, see (6) with \(a(0)=0\), and the bound (10) shows that m must simply be chosen as large as possible.
Proposition 2
Assume [H0]-[H1].
If \((\varphi _j)\) is an orthonormal basis of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\), for all \(T\ge 1, N\ge 1, m\ge 0\), we have
where \(c_G=4\left( 2 G(T)G_1(T) + g^2(0)\right) \). If in addition \(g(0)=0\),
If \((\varphi _j)\) is the Laguerre basis of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\), g is bounded and \(T\ge 6m-3\), then
where \(C= C'(g(0)^2+\Vert g\Vert _\infty ^2+ \Vert a\Vert ^2 \Vert a'\Vert ^2))\), \(C'\) and \(\gamma _2\) are positive constants depending on the basis only.
We can discuss these bounds for fixed T or large T. Here again, the bounds involve a standard squared bias term \(\Vert g-g_m\Vert ^2\).
Bounds (11) and (12) may be compared to (9) and (10). In (9), T is fixed so that the variance has order m/N for \(g(0)\ne 0\) and the term T/N is a negligible residual. If T can be large, the term T/N may no more be negligible and (11)–(12) involve an additional bias term \(\int _T^{+\infty } g^2(s)ds\) which is small for large T. But the order of these terms, depending on T which can not be chosen, are difficult to discuss.
Bound (12) implies as in the trigonometric case that m must be chosen as large as possible. Bound (13) looks more classical: T does not appears, the variance term has order \(m^3/N\) and \(m \exp {(-12\gamma _2m)}\) is a negligible additional bias term.
Comparing (11) and (13), we see that in the Laguerre case, the variance term is less than
Note that the constants \(G_1(T)\) and C are difficult to estimate which is a drawback for model selection. In Sect. 5, we propose a practical data-driven choice of m taking into account this difficulty.
2.3 Estimator under [H0]
In this paragraph, to handle more general processes, including fractional processes, we propose another estimation method. We no longer assume that a belongs to \(C^1({{\mathbb {R}}}^+)\). Therefore, the stochastic differential (6), which requires [H1], no more holds. As a counterpart, we consider basis functions that have to be differentiable on their domain, [0, T] or \({{\mathbb {R}}}^+\).
The construction of the second estimator is based on the following lemma.
Lemma 2
Assume that [H0] holds and that \((\varphi _j)_j\) is differentiable on [0, T], then
Therefore, we can set
Remark 3
We can remark that \({\tilde{\theta }}_j\) can also be written \({{\tilde{\theta }}}_j= - \int _0^T \varphi _{j}'(s) {\widehat{G}}(s) ds + \varphi _{j}(T){\widehat{G}}(T)\) and thus can be seen as an estimator of \(g=G'\) where G defined by (4) is seen as \(G(s)={{\mathbb {E}}}(X^2(s))\) for \(s\in [0,T]\). It corresponds to an empirical and truncated version of the integration by parts formula
Thus the estimator defined below may be considered in a more general context than processes X(t) given by (1). However, our computations are specific to this setting.
Note that under [H0] only, formula (8) no longer holds, this is why we use another notation, \({\tilde{\theta }}_j\) instead of \({\hat{\theta }}_j\). If \(\varphi _j=\varphi _{j,T}\) is the trigonometric basis, then \(\varphi _{0,T}(T)= 1/\sqrt{T}, \varphi _{2j-1,T}(T)=\sqrt{2/T}, \varphi _{2j,T}(T)=0\), \(j\ge 1\). Then we define the estimator by
We introduce the assumption:
-
[H2] \(\displaystyle \int _0^1 \frac{G^2(s)}{s} ds = c_0<+\infty \). Actually, [H2] is rather weak and allows to consider fractional processes.
Example 1
(continued). If we consider, as in example 1, \(a(t) =t^d {{\tilde{a}}}(t)/\Gamma (d+1)\), where \(d>-1/2\) and \({{\tilde{a}}} \in C^1({{\mathbb {R}}}^+)\), with \({{\tilde{a}}}(0)\ne 0\), then \(G^2(s)/s \sim _{s \rightarrow 0} s^{4d+1}{{\tilde{a}}}^4(0)/\Gamma ^4(d+1)\) and [H2] holds (\(d>-1/2\)). The constraint is weaker than [H0].
The following risk bounds hold for \({{{\tilde{g}}}}_m\).
Proposition 3
Assume [H0].
-
If \((\varphi _j=\varphi _{j,T})\) the trigonometric basis, then
$$\begin{aligned} {{\mathbb {E}}}(\Vert {{{\tilde{g}}}}_m - g\Vert ^2_T) \le \Vert g_m-g\Vert ^2_T + 6 G^2(T)\frac{4\pi ^2 m^2}{NT}+6G^2(T)\frac{m}{NT}. \end{aligned}$$(15) -
Let \((\varphi _j=\ell _j)\) be the Laguerre basis.
-
Then, for all \(T\ge 1, N\ge 1, m\ge 0\),
$$\begin{aligned}&{{\mathbb {E}}}(\Vert {{{\tilde{g}}}}_m - g\Vert ^2) \le \Vert g_m-g\Vert ^2 + 12\left( G^2(T)+ 2 \int _0^T \frac{G^2(u)}{u} du \right) \, \frac{m}{N} + \nonumber \\&\quad 12G^2(T) \frac{T }{N} + \int _T^{\infty } g^2(s)ds, \end{aligned}$$(16)where, if [H2] holds, ,
$$\begin{aligned} \int _0^T \frac{G^2(u)}{u} du \le c_0 + G^2(T) \log (T). \end{aligned}$$ -
If \(T\ge 6(m-1)+3=6m-3\) and \((\varphi _j)\) is the Laguerre basis, then
$$\begin{aligned} {{\mathbb {E}}}(\Vert {{{\tilde{g}}}}_m - g\Vert ^2) \le \Vert g_m-g\Vert ^2+ c_1 G^2(T)\frac{m^3}{N}+ c_2\Vert a\Vert ^2 m\; \exp {(-12\gamma _2m)} \end{aligned}$$(17)where \( c_1, c_2, \gamma _2\) are constants depending on the basis only.
-
As previously, all the risk bounds involve a squared bias term, \(\Vert g-g_m\Vert _T^2\) or \(\Vert g-g_m\Vert ^2\). The variance term in (15) can be compared to the one in (9), taking into acount that \(G(T)\le \Vert a\Vert ^2<+\infty \), and the order is now \(m^2/(NT)\) which for fixed T is larger than m/N obtained for \({\hat{g}}_m\) with the sama basis. Similarly, the variance term in (17) has order \(m^3/N\), which is larger than \(m^2/N\) in (13). This increase is the price of more general assumptions and estimators. As for (16), it is to be compared with (11): the variance order is m/N and there are the two additional terms T/N and \(\int _T^{\infty } g^2(s)ds\), difficult to discuss. We develop a data-driven selection method in Sect. 4, based on (15)–(16), which is implemented on simulated data.
3 Rates of convergence
Rates of convergence can be deduced from Propositions 1 and 3 in the asymptotic framework where N tends to infinity. As it is always the case in nonparametric estimation, we must link the bias term \(\Vert g-g_m\Vert ^2\) with regularity properties of function g, and the regularity spaces depend on the projection spaces.
3.1 Rates on periodic Fourier–Sobolev spaces for trigonometric basis
Consider first Inequality (9) and estimators built using the trigonometric basis. Let \(\beta \) be a positive integer, \(L>0\) and define
By Proposition 1.14 of Tsybakov (2009), a function \(f\in W^{per}(\beta , L)\) admits a development \(f=\sum _{j=0}^{\infty } \theta _j\varphi _{j,T}\) such that \(\sum _{j\ge 0} \theta _j^2 \tau _j^2\le C(L,T)\) where \(\tau _j=j^\beta \) for even j, \(\tau _j=(j-1)^\beta \) for odd j and \(C(L,T)=L^2(T/\pi )^{2\beta }\).
Therefore, consider the sets
Now, assume that \(g\in {{\mathcal {W}}}_1^{per}\). As \(g\in W^{per}(\beta , L)\), then \(\Vert g-g_m\Vert ^2\le C(L,T)m^{-2\beta }\) and Inequality (9) becomes
As \(g(0)\ne 0\), choosing \(m_{\mathrm{opt}}=c_TN^{1/(2\beta +1)}\) yields, for fixed T,
Thus, for fixed (not large) T, the estimator \({\hat{g}}_{m_{\mathrm{opt}}}\) is convergent in MISE when N grows to infinity, with rate \(N^{-2\beta /(2\beta +1)} \) and
If \(g\in {{\mathcal {W}}}_2^{per}\), then \(g(0)=0\), and choosing m as large as possible we can obtain the rate \(N ^{-1}\) for fixed T.
On the other hand, if \(g\in {{\mathcal {W}}}_3^{per}\), then we must consider the estimator \({{\tilde{g}}}_m\). As \(g\in W^{per}(\beta , L)\), Inequality (15) yields, for a choice \({{\tilde{m}}}_{\mathrm{opt}}={{\tilde{c}}}_T \, N^{1/(2\beta +2)}\) a rate for \({{\tilde{g}}}_{{{\tilde{m}}}_{\mathrm{opt}}}\) of order \(N^{-2\beta /(2\beta +2)}\) that is
Clearly \(N^{-2\beta /(2\beta +2)} >N^{-2\beta /(2\beta +1)}\). The rate is less good than on \({{\mathcal {W}}}_1\), but the contraints are different. All the constants in the rates depend on \(\beta \) and L.
3.2 Rates on Sobolev–Laguerre spaces
Now, look at inequality (13) where \({\hat{g}}_m\) is computed using the (non compactly supported) Laguerre basis. Assume for consistency that \(m^2\lesssim N\) and \(m\le T/6\). The last term is negligible with respect to the variance term \(m^2/N\) and the usual square bias term \(\Vert g-g_m\Vert ^2\). An adequate solution to assess the rate of the bias term is provided by the balls of Sobolev-Laguerre spaces. For \(s\ge 0\), let
where \(\theta _k(h)=\int _0^{+\infty } h(u)\varphi _k(u)du\). We set
for the Sobolev-Laguerre space. The link with regularity properties of functions can be seen for s integer. In this case, if \(h:(0, +\infty ) \rightarrow {{\mathbb {R}}}\) belongs to \(L^{2}((0,+\infty ))\),
is equivalent to the property that h admits derivatives up to order \(s-1\), with \(h^{(s-1)}\) absolutely continuous on \((0,+\infty )\) and for \(m=0, \ldots , s-1\), the functions
belong to \({{\mathbb {L}}}^2((0, +\infty ))\). Moreover, for \(m=0,1, \dots , s-1\),
(see Comte and Genon-Catalot Comte and Genon-Catalot 2018).
Now, consider the classes of functions \( {{\mathcal {W}}}_1=\{ g, \; g\in W^s((0, +\infty ),K), g \text{ satisfies } \text{[H0] } \text{ and } \text{[H1] }\}, \) and \( {{\mathcal {W}}}_2=\{ g, \; g\in W^s((0, +\infty ),K), g \text{ satisfies } \text{[H0] } \text{ but } \text{ not } \text{[H1] }\}.\)
Assume that \(g\in {{\mathcal {W}}}_1\). Then, as g belongs to \(W^s((0, +\infty ),K)\), it holds \(\Vert g-g_m\Vert ^2\le K m^{-s}\). Considering Inequality (13), the minimization of \(m^{-s}+m^2/N\) yields \(m_{opt}= N^{1/(2+s)}\) and a rate of order \(N^{-s/(2+s)}\) for the \({{\mathbb {L}}}^2\)-risk of \({\hat{g}}_m\) on the set \({{\mathcal {W}}}_1\).
The constraint \(m_{opt}= N^{1/(2+s)}\le T/6\) holds for all s as soon as \(T\ge \sqrt{N}\).
Assume that \(g\in {{\mathcal {W}}}_2\). The rate of convergence for the \({{\mathbb {L}}}^2\)-risk must be discussed for \({{\tilde{g}}}_m\), and relies on Inequality (17). Assume that \(m^3\lesssim N\). As g belongs to \(W^s((0, +\infty ),K)\), we still have \(\Vert g-g_m\Vert ^2\le Km^{-s}\). By minimizing \((m^3/N)+m^{-s}\), we find \(m_{opt}= N^{1/(s+3)}\) and a rate of order \(N^{-s/(s+3)}\). The constraint \(T>6m_{opt}\) holds for all s as soon as \(T\ge N^{1/3}\). To sum up this case, for \(T\ge N^{1/3}\),
All the constants in the rates depend on s and K.
Inequalities (11) and (16) are appealing: the variance terms are smaller and they require less conditions. However they contain a term \(\int _T^{+\infty }g^2(s)ds\): this term is hopefully small for large (not too small) T, but rates of convergence are difficult to discuss. Nevertheless, our model selection procedures rely on these inequalities because the constants g(0)G(T) and \(G^2(T)\) are known in theory and possible to estimate in practice.
Example 1
(continued). Consider the function \(a(t)=t^d\exp {(-t)}\) with \(-1/4<d<1/2\), case where a(0) may not be defined and \(a'\) is not locally square integrable. Then, g belongs to \(W^1((0,+\infty ))\) if, moreover, \(\sqrt{t}(a^2(t)+2a(t)a'(t)) \in {{\mathbb {L}}}^2((0,+\infty ))\), which holds for \(0<d<1/2\). But for these values of d, we can check that g does not belong to \(W^2((0,+\infty ))\) as \(t(a^2(t)+ 2(a')^2(t)+ (a^2)''(t))\) does not belong to \({{\mathbb {L}}}^2((0,+\infty ))\). Therefore, the bias term for such a function is of order smaller than \(m^{-1}\) but larger than \(m^{-2}\), for \(0<d<1/2\).
4 Adaptive procedure under [H0]
As the second estimator can be computed under more general assumptions, we concentrate on this one for finding a data-driven choice of the projection dimension.
The estimator \({{\tilde{g}}}_m\) can be obtained as:
for \((B)=(Lag)\) or \((B)=(Trig)\), and where
We consider the sets \({{\mathcal {M}}}_N^{(Lag)}=\{ m\in {{\mathbb {N}}}, m \le N/\log (T)\}\) and \({{\mathcal {M}}}_N^{(Trig)}=\{ m\in {{\mathbb {N}}}, m^2 \le N\}\). By inequalities (15)–(16), the variance term in the \({{\mathbb {L}}}^2\)-risk of all \({{{\tilde{g}}}}_m\) with \(m \in {{\mathcal {M}}}_N^{(B)}\) is bounded, where the superscript (B) indicates the basis: \((B)=(Trig)\) for the trigonometric basis and \((B)=(Lag)\) for the Laguerre basis. Now, we define, for \(\kappa \) a numerical constant,
where
Note that \(\gamma _{N,T}({{\tilde{g}}}_m)=-\Vert {{\tilde{g}}}_m\Vert ^2\). Thus, as \(\Vert g-g_m\Vert ^2=\Vert g\Vert ^2 -\Vert g_m\Vert ^2\), \(-\Vert {{\tilde{g}}}_m\Vert ^2\) provides an estimation of the squared bias, up to a constant. On the other hand, \(\mathrm{pen}^{(B)}(m)\) has the variance order, up to the \(\log (N)\) factor. We do not know if this factor is structural or due to technical problems (in the proofs) only. Anyway, the choice of \({{\widetilde{m}}}^{(B)}\) mimicks the squared bias-variance compromise. The following risk bound holds.
Theorem 1
Assume [H0] and [H2]. Then, there exists a numerical value \(\kappa _0^{(B)}>0\) such that \(\forall \kappa \ge \kappa _0^{(B)}\),
and
where C is a numerical constant.
The term \(G^2(T)\) in the definition of \(\mathrm{pen}^{(Trig)}(m)\) is unknown and must be replaced by an estimator. In practical implementation, we set
Indeed, \({{\mathbb {E}}}(X_1^4(T)) = 3G^2(T)\). From theoretical point of view, it can be proved that the result of Theorem 1 still holds for trigonometric basis with this substitution, see Section 4.1.4, Proof of Theorem 4.1 in Comte and Genon-Catalot (2015).
For the implementation of the procedure, we have to fix the constants \(\kappa \) in the penalties (see (22)). The numerical values of \(\kappa _0^{(B)}\), given in the proofs, are too large. In this method, finding the minimal value of \(\kappa \) is a difficult problem. This is why the choice of \(\kappa \) in the penalties is standardly calibrated by preliminary simulations.
Theorem 1 shows that the estimator \({{\tilde{g}}}_{{{{\widetilde{m}}}}^{(B)}}\) is adaptive in the sense that its \({{\mathbb {L}}}^2\)-risk automatically achieves the best compromise between squared bias and variance terms, up to remainder terms \(C^{(B)}(T,N)\). For \( C^{(Trig)}(T,N)\), it is clearly negligible, as \(T>1\) is fixed. As already noticed earlier, the term \(C^{(Lag)}(T,N)\) contains T/N and \(\int _T^{+\infty } g^2(s)ds\) which are in conflict: T should be large enough for the latter, but not too large for the former. However our risk bounds are valid for any T, N.
Another strategy is possible for Laguerre basis, without [H2], which solves the conflict mentioned above. Let \({{\mathcal {M}}}_N^\star =\{ m, m^3\le N\}\) so that, by inequality (17), the variance term of \({{\tilde{g}}}_m\) is bounded and define
Theorem 2
Assume [H0]. Consider the Laguerre basis, and \(T\ge 6 N^{1/3}\). Then, there exists a numerical value \(\kappa _0^\star >0\) such that \(\forall \kappa \ge \kappa _0^\star \),
where C is a constant depending on the basis.
Theorem 2 also shows that the estimator \({{\tilde{g}}}_{m^\star }\) is adaptive in the sense that its \({{\mathbb {L}}}^2\)-risk automatically achieves the best compromise between the squared bias and the variance term of inequality (17). The comments after Theorem 1 apply also here.
5 Simulation study
In this section, we implement the adaptive estimators of the previous sections on simulated data. To simulate an exact discrete sampling of \((X_i(t),i=1, \ldots ,N\) with small sampling interval \(\Delta \), we use the property that the vectors \((X_i(k\Delta ), k=1, \ldots ,n)'\) with \(T=n\Delta \) are i.i.d. centered Gaussian vectors with covariance matrix \(A=(A_{j,k})\) where for \(1\le j\le k\),
which can be computed exactly or numerically according to the examples. Integrals in the estimators formulae are discretized. The following examples of functions a(.) and thus g(.) are considered.
-
(1)
(Ornstein-Uhlenbeck process) \(a_1(t)= \sigma \exp {(-\theta t)}\),
$$\begin{aligned} A(j,k)= \frac{\sigma ^2\exp {(-\theta k\Delta )}}{2\theta }(\exp {(\theta j\Delta )}-\exp {(-\theta j\Delta )}), \; k\ge j. \end{aligned}$$We take \(\sigma =0.5, \theta =0.25\).
-
(2)
\(a_2(t)= (\beta (3,3,t/10)/\omega _2^{1/2})^{1/2}\) where \(\beta (p,q,x)\) is the density of a \(\beta (p,q)\) distribution at point x and \(\omega _2= 14.157\) is such that \(\int _{{{\mathbb {R}}}^+} g_2^2(u)du \approx 1\).
-
(3)
\(a_3(t)=(\frac{1}{2} \beta (3,3, t/3)+ \frac{1}{2} \beta (3,3,t/3-2))^{1/2}\).
-
(4)
\(a_4(t)= 10 b(6t)/(\omega _4)^{0.25}\) with \(b(t) =0.3 \Gamma (3,2, t)+0.7 \Gamma (7,4,t)\) where \(\Gamma (p,q,x)\) is the density of a \(\Gamma (p,q)\) distribution at point x and \(\omega _4= 0.03048\) is such that \(\int _{{{\mathbb {R}}}^+} g_4^2(u)du \approx 1\).
-
(5)
\(a_5(t)= t^{1.25} e^{-t/2}\).
-
(6)
\(a_6(t)= t^{0.25} e^{-t/3}\).
-
(7)
\(a_7(t)= t^{-0.125} e^{-t/5}\).
-
(8)
\(a_8(t)=1/\sqrt{1+t^2}\).
In all cases, recall that \(g_i(t)= a_i^2(t)\). The functions \(a_2\) and \(a_4\) are normalized (constants \(\omega _2, \omega _4\)), in order that \(\int g_i^2(u)du\approx 1\), \(i=2,4\), while for the other functions, this integral falls between 0.5 and 2.5. In Table 1, we compute the values of residual terms of formula (16): the values of \(\int _T^{+\infty } g_i^2(u)du\) are always negligible; but the values of \(TG_i^2(T)/N\) are comparable to the risk values obtained in Table 2, and thus not so small.
All functions \(g_i\), \(i=1, \dots , 8\) satisfy [H0]. The functions \(g_2\) to \(g_6\) are null at zero, \(a_6\) and \(a_7\) do not satisfy [H1]. Thus, the first method (valid under [H0]-[H1]) should work for all functions except \(g_6\) and \(g_7\), with parametric rate (and large chosen dimension) for \(g_2\) to \(g_5\). Nevertheless, we implemented both methods for all functions. Note that all functions satisfy [H2].
We also experiment different settings for (N, T): \(T=n\Delta =10\), \(n=400, \Delta =0.1/4\) with \(N=500, 2000, 8000\).
The estimators are computed via the formulae given in Sects. 2.2 and 2.3.
More precisely, inspired by Inequalities (9) and (11), we implement a data driven estimator relying on \({\hat{g}}_m\) given in Sect. 2.2 with dimension selected as follows: for \((B)= (Lag),\, (Trig)\),
Note that no theoretical result is given in this case. We compute \(({\hat{g}}_m^{(Trig)})_{1\le m\le D_{\max }}\) and \(({\hat{g}}_m^{(Lag)})_{1\le m\le D_{\max }}\) the collections of estimators respectively in trigonometric and Laguerre basis, with coefficients given by (7), with \(D_{\max }=45\). Note that the first term in the curly bracket estimates the squared bias and the second estimates the main variance term. Moreover \({\widehat{G}}(T)=(1/N)\sum _{i=1}^N X_i^2(T)\) and \(g^\dag (0)\) is computed using the quadratic variation (see Remark 2)
Next, we implement the estimators of Theorem 1. We compute \(({{\tilde{g}}}_m^{(Trig)})_{1\le m\le D_{\max }}\) and \(({{\tilde{g}}}_m^{(Lag)})_{1\le m\le D_{\max }}\) the collection of estimators in trigonometric and Laguerre basis, with coefficients given by (14). We select
and
where \(\widehat{G^2(T)}\) is defined by (23).
We do not present results using the procedure of Theorem 2, as the method seemed not stable.
Based on preliminary simulations, the constants are calibrated once and for all to the following values \(\kappa _{1}^{(Lag)}=27\), \(\kappa _{1}^{(Trig)}=6\), \(\kappa _{2}^{(Lag)}=0.11\) and \(\kappa _{2}^{(Trig)}= 0.6\).
Table 2 presents the values of the risks of the adaptive estimators computed for the eight functions, following the two methods (method 1: estimators \({\hat{g}}\), method 2: estimators \({{\tilde{g}}}\)) and using two bases, Laguerre (index L) and trigonometric (index T). For each function, the first line gives the MISE multipled by 100, over 200 repetitions, with standard deviation multiplied by 100 in parenthesis on the line below. The line “Or” gives the mean of path-by-path minimal integrated error (computed using the true function). The fourth line provides the mean of selected dimensions, and “dim or” the mean of the dimensions associated to the oracle estimators. We can compare lines 1 (MISE)and 3 (Or), and lines 4 (dim) and 5 (dim Or), where MISE and dim should be as close as possible to Or and dim Or.
Naturally, the risk decreases as N increases. Globally, the Laguerre basis performs satisfactorily, and better than the trigonometric one, except for function \(a_2\). Note that the methods are easy to implement and the computation time is quite fast.
To conclude this section, we provide in Fig. 1 plots illustrating the behaviour of our estimators following the two strategies in the Laguerre basis, for three of the functions of the list, namely the mixed-beta function 3) and two functions of type \(t^d e^{-t/b}\) with \(d=0.25, b=3\) (function 6)) and \(d=-0.25, b=5\) (function 7)). Each couple of plots corresponds to the representation of 50 estimators computed by the two methods, together with the true function in bold (red). Two values of N are compared, and the MISE are given, to make the orders of Table 2 concrete; the improvement from \(N=500\) to \(N=8000\) is obvious in most cases. We note that the first method seems to still work for function 6), contrary to what was expected from the theory. But it fails for function 7), as expected : the estimator is biased. Method 2 always gives good results.
6 Concluding remarks
In this paper, we consider i.i.d. continuous observations of the processes \((X_i(t), t\in [0,T]), i=1, \ldots , N)\) distributed as the CMA process (1). We build collections of nonparametric estimators of the unknown function \(g=a^2\) by projection method on finite-dimensional subspaces of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\). The subspaces are generated by the trigonometric basis of \({{\mathbb {L}}}^2([0,T])\) or the Laguerre basis of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\). After proving various risk bounds for each estimator, we propose a data-driven selection of the dimension of the projection space and prove that it leads to an adaptive estimator. Our methods are implemented on simulated data and show convincing results in terms of risks and plots with a better performance for the estimators in Laguerre basis.
The consistency of the estimators is ensured for fixed T as N tends to infinity (case of trigonometric basis) or when both Tand N tend to infinity (case of Laguerre basis) but with T/N not too large. It would be interesting to clarify this point which has an impact on the risk bounds as we noticed on Monte-Carlo simulations.
Our proofs rely on the Gaussian character of (1) especially for the adaptive procedure. The generalization to other processes than the Wiener process in (1) is of interest and left to further work. Clearly, the results could be obtained with more general deviation inequalities, rather than the \(\chi \)-square deviations specifically used here.
The question of taking into account, from the theoretical point of view, the discretization step used in practice may also be worth investigation. Lastly, there may be some developments about optimality, but the meaning of this in our context would have first to be carefully defined.
7 Proofs
7.1 Proofs of section 2
Proof of Lemma 1
Using the stochastic differential (6), we write:
Therefore, as \({{\mathbb {E}}}\int _0^{+\infty } \varphi _j^2(s)X_i^2(s)ds =\int _0^{+\infty } \varphi _j^2(s)G(s)ds\le \Vert a\Vert ^2 <+\infty \),
which gives the result. \(\square \)
Proof of Lemma 1
We consider that \((\varphi _j)=(\varphi _{j,T})\) is the trigonometric basis on [0, T]. In this case, \({{\mathbb {E}}}{\hat{\theta }}_j=\theta _j\), we can write \({{\mathbb {E}}}\Vert {\hat{g}}_m-g\Vert _T^2= {{\mathbb {E}}}\Vert {\hat{g}}_m - {{\mathbb {E}}} {\hat{g}}_m\Vert ^2+ \Vert g_m-g\Vert _T^2.\) We have, setting \(X=X_1\),
(Note that for functions on \(S_{m,T}\), the norms \(\Vert .\Vert _T\) and \(\Vert .\Vert \) are identical). We have:
where \(Y(s)= \int _0^sa'(s-u)dW(u)\). We have \({{\mathbb {E}}}(X^2(s))=G(s)\le G(T)\le \Vert a\Vert ^2\). Now, using that \((\varphi _j)=(\varphi _{j,T})\) is an orthonormal basis of \({{\mathbb {L}}}^2([0,T])\)
As (X(s), Y(s)) is a Gaussian centered vector, we know that:
with \(G_1(s)=\int _0^s (a'(u))^2du\le \Vert a'\Vert ^2\). Therefore, if \(g(0)\ne 0\)
Therefore, we obtain (9).
If \(g(0)=0\), (26) becomes
and thus
which gives inequality (10). \(\square \)
Proof of Preposition 2
Now, we look at the case of a basis of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\). The estimator \({\hat{\theta }}_j\) is no more unbiased. We write \({\hat{g}}_m-g= {\hat{g}}_m - {{\mathbb {E}}} {\hat{g}}_m + {{\mathbb {E}}} {\hat{g}}_m-g_m + g_m-g\) and
The first term is the usual square bias term. The middle term is a variance term which can be treated as in the previous proposition. The last term is an additional bias term, due to the truncation of the integrals. We have:
and we obtain inequalities (11) and (12).
Now, consider the Laguerre basis. To get (13), we bound differently (26). We write:
By (27) and the assumptions, \({{\mathbb {E}}}(X(s)Y(s))^2\le \frac{1}{2}(g^2(0)+ \Vert g\Vert _\infty ^2)+ \Vert a\Vert ^2\Vert a'\Vert ^2\) is bounded. Therefore, we need to bound \(\int _0^T |\varphi _j(s)|ds\). For this, we split each integral according to the inequalities of Askey and Wainger (1965) recalled in Sect. 8 (we assume without loss of generality that they hold for all j). We have:
and bound each term. Setting \(\nu _j=4j+2\),
Consequently, for \(j=0, \ldots ,m-1\) and \(T\ge 6(m-1)+3=6m-3\),
Finally,
Using again the inequalities of Askey and Wainger (1965) (see Sect. 8), for \(T\ge 6m-3\), for all \(j \in \{0, 1, \ldots , m-1\}\), \(|\varphi _j(x/2) |\le \exp {(-\gamma _2 x)}\) and
The additional bias term (29) is therefore bounded as follows:
We thus obtain (13) by joining (30), (32) and (33).\(\square \)
Proof of Lemma 2
We have
which is the result. \(\square \)
Proof of Proposition 3
Assume that \((\varphi _j=\varphi _{j,T})\) is the trigonometric basis. Then, \({\tilde{\theta }}_j\) is an unbiased estimator of \(\theta _j\). We only need to study the variance term of the risk.
where \({{\mathbb {E}}}X^4(T)= 3\left( \int _0^Ta^2(s)ds\right) ^2=3G^2(T)\) and \(\sum _{j=0}^{m-1}\varphi _j^2(T)=m/T\).
We have
Proceeding as in Proposition 1 (projection argument), we obtain
using that \({{\mathbb {E}}}X^4(s)= 3\left( \int _0^s \, a^2(u)du\right) ^2\le 3 G^2(T)\) for \(s\le T\). This gives (15).
Now, we look at the case of the Laguerre basis on \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\). We start as above by
We get
Using that \({{\mathbb {E}}}X^4(T)= 3\left( \int _0^Ta^2(s)ds\right) ^2= 3 G^2(T)\) and \(|\ell _j|\le \sqrt{2}\), we get
Next, we use that the Laguerre basis satisfies \(\ell '_0(x)=-\ell _0(x)\) and \(\ell '_j(x)= -\ell _j(x) - \sqrt{2j/x} \ell _{j-1}^{(1)}(x)\) for \(j\ge 1\) where \((\ell _{k}^{(1)}(x), k\ge 0)\) is the Laguerre basis with index 1 (see Sect. 8) and we find
Under [H2], we obtain
Finally, the variance term is bounded by
If [H2] does not hold and \(T\ge 6m-3\), we can bound differently the variance and bias terms.
We decompose the integral to obtain
and bound each term. Using (47) and again the inequalities of Askey and Wanger (1965)(see Sect. 8), we get that, for \(s\ge 6m-3\), \(|\varphi '_j(s)|\le 2\sum _{k=0}^{j} |\ell _j(s)|\le 2(j+1)\exp {(-\gamma _2s)}\). Thus, \(\sum _{j=0}^{m-1}(\ell _j'(s))^2\le 4m^3 \exp {(-2\gamma _2s)}\). So,
Now,
Finally, we get
So, we have the two variance bounds.
For the bias term, we have \({{\mathbb {E}}} {\tilde{\theta }}_j= \theta _j -\ell _j(T)G(T)- \int _T^{+\infty } \ell _j(s)g(s)ds\). We have \(G(T)\le G(+\infty )= \Vert a\Vert ^2\). Moreover, inequality (33) still holds. Joining variance and bias terms, we obtain (16) and (17). \(\square \)
7.2 Proof of Theorem 1
Let us state a preliminary Lemma:
Lemma 3
Let \(V_N=\sum _{i=1}^N (X_i^2-1)\) where \(X_i\) are i.i.d. standard Gaussian variables. Then for all \(\varepsilon \in (0,1]\),
Proof of Lemma 3
By Lemma 1 and Inequalities (4.3)–(4.4) in Laurent and Massart (2000), we have, for any \(u>0\),
Thus, setting \(u=Nx\), we have, for any \(x>0\), \({{\mathbb {P}}}(|V_N|\ge 2N\sqrt{x} + 2 Nx)\le 2\exp (-Nx)\). Now we set \(N\varepsilon = 2N(x+\sqrt{x})\) and using Birgé and Massart (1998), Lemma 8, Inequality (7.14) with \(v=\sqrt{2}\) and \(c=2\), we find
and the result follows. \(\square \)
7.2.1 Case of Laguerre basis
Note that, as \(G(0)=0\) and \(h(+\infty )=0\), \(\langle h,g \rangle = - \langle h',G\rangle \). Therefore,
where \(\nu _{N,T}(h)=\nu _{N,1}(h)+ \nu _{N,2}(h)\),
and
Therefore,
Using the definition of \({{\tilde{m}}} ={{\tilde{m}}}^{(Lag)}\), we have for all \(g_m\in S_m\),
where for simplicity \(\mathrm{pen}=\mathrm{pen}^{(Lag)}\). We deduce
Let \(B_m= \{h \in S_m, \Vert h\Vert \le 1\}\). We use that
We have \(\sup _{h\in B_m} R^2_T(h)\le \int _T^{+\infty } g^2(u)du\) so that
Gathering terms yields
where \(p^{(Lag)}(m,m')=p_1^{(Lag)}(m,m')+p_2^{(Lag)}(m,m')\),
Now we use that \(8p^{(Lag)}(m,m')\le \mathrm{pen}(m)+ \mathrm{pen}(m')\) for \(\kappa \ge \kappa _0^{(Lag)}=8\times 128\), and the result of the following Lemma:
Lemma 4
Under the Assumptions of Theorem 1, for \(\ell =1,2\),
where
and C is a positive numerical constant.
And we obtain
which ends the proof of Theorem 1 in the Laguerre case. \(\square \)
.
Proof of Lemma 4
Let us define
which is for all u distributed as \((\chi ^2(N)-N)/N\), and set
By Lemma 3, \({{\mathbb {P}}}(A_N(u)^c)\le 2N^{-2}\) provided that \(16\log (N)/N\le 1\) i.e. \(N\ge 68\).
Now we can write \(\nu _{N,1}(h)= - \int _0^T G(u)h'(u)Z_N(u)du\) and split it
Then
With \(B(u):= G(u)Z_N(u)\mathbf{1}_{A_N(u)}\), and by using Formula (46), we have
Now, using the definition of \(A_N(u)\),
As a consequence
Similarly, for \(C(u):= G(u)Z_N(u)\mathbf{1}_{A_N(u)^c}\), we have
Now, by the Rosenthal Inequality (see Hall and Heyde 1980), \({{\mathbb {E}}}(Z_N^4(u))\lesssim N^{-2}\) and thus
As a consequence,
Gathering (41) and (43) implies the result of Lemma 4 for \(\ell =1\) and \((B)=(Lag)\). Now we look at \(\nu _{N,2}(h)\) and write
Therefore
and using (42),
Finally
where C is a numerical constant. Thus, we obtain Lemma 4 for \(\ell =2\) and \((B)=(Lag)\).
\(\square \)
7.2.2 Case of trigonometric basis
We proceed analogously. As now \(R_T(h)=0\), we have, with for simplicity, \({\widetilde{m}}={\widetilde{m}}^{(Trig)}\),
with \(p^{(Trig)}(m,m')=p_1^{(Trig)}(m,m')+p_2^{(Trig)}(m,m')\) and we find
In the same way as Lemma 4, the following lemma determines \(p_1^{(Trig)}(m,m'),p_2^{(Trig)}(m,m')\).
Lemma 5
Under the Assumptions of Theorem 1, for \(\ell =1,2\),
where
and C is a positive numerical constant.
To conclude the proof in the trigonometric basis case, analogously, we use that \(8p^{(Trig)}(m,m')\le \mathrm{pen}^{(Trig)}(m)+ \mathrm{pen}^{(Trig)}(m')\) for \(\kappa \ge \kappa _0^{(Trig)}=8\times 16(8\pi ^2+1)\) and obtain
\(\square \)
Proof of Lemma 5
Using the properties of the derivatives \(\varphi '_{j,T}\) (see (34)) and the definition of \(A_N(u)\), the bound for \(\nu _{N,1,1}^2(h)\) now writes
And, using the definition of \({{\mathcal {M}}}_N^{(Trig)}\) and (42),
The other term is
This implies Lemma 5. \(\square \)
7.3 Proof of Theorem 2
The proof follows the same steps as Theorem 1. We only indicate the changes. Here, we have, proceeding as in Proposition 3:
Analogously, for the term with C(u), using that the maximal value in \({{\mathcal {M}}}_N^\star \) is bounded by \(N^{1/3}\),
Thus,
The study of \(\sup _{h\in B_{{\widetilde{m}} \vee m}}\nu _{N,2}^2(h)\) is the same as previously and we can set
Then,
We set \(p^\star (m,m')=p_1^\star (m,m')+p_2^\star (m,m')\) and check that \(8p^\star (m,m')\le \mathrm{pen}^\star (m)+ \mathrm{pen}^\star (m')\) for \(\kappa \ge \kappa _0^\star = 8\times (16(12 + 4\gamma _2^{-2})+32)\).
Lastly, we have from the proof of Inequality (17) in Proposition 3 that, for \(T\ge 6m-1\),
Therefore, \(\sup _{h\in S_{M_N}, \Vert h\Vert \le 1}R^2_T(h)\lesssim \frac{1}{N}\). \(\square \)
8 Appendix
For this paragraph, we refer to Abramowitz and Stegun (1964) and Comte and Comte and Genon-Catalot (2018).
The Laguerre polynomial with index \(\delta \), \(\delta >-1\), and degree k is given by
The following holds:
We consider the Laguerre functions with index \(\delta \), given by
The family \((\ell _k^{(\delta )})_{k\ge 0}\) is an orthonormal basis of \({{\mathbb {L}}}^2({{\mathbb {R}}}^+)\).
For \(\delta =0\), we set \(L_k^{(0)}=L_k\), \(\varphi _k^{(0)}=\ell _k\). Using (44), we obtain for \(j\ge 1\):
The following properties hold for the \(\ell _j\)’s. For all \(x\ge 0\),
Then integrating from x to \(+\infty \) formula (47) for \(j\ge 1\), and setting \(\widetilde{{{\mathcal {L}}}}_j(x)=\int _x^{+\infty } \ell _j(u)du\), we obtain \(\ell _j=\widetilde{{{\mathcal {L}}}}_j+2\sum _{k=0}^{j-1}\widetilde{{{\mathcal {L}}}}_k\). Thus, \(\widetilde{ {{\mathcal {L}}}}_j= \ell _j-\ell _{j-1} -\widetilde{ {{\mathcal {L}}}} _{j-1}\). Using that \(\widetilde{ {{\mathcal {L}}}}_0=\ell _0\), we obtain by elementary induction \(\widetilde{ {{\mathcal {L}}}}_j= \ell _j+2 \sum _{k=1}^j (-1)^k \ell _{j-k}\). Moreover, setting \({{\mathcal {L}}}_j(x)=\int _0^x \ell _j(u)du\), we have
Lastly, the following asymptotic formulae can be found in Askey and Wainger (1965). For \(\nu =4k+2\), and k large enough
where \(\gamma _1\) and \(\gamma _2\) are positive and fixed constants.
References
Abramowitz M, Stegun IA (1964) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover, New York, ninth dover printing, tenth gpo printing edition
Askey R, Wainger S (1965) Mean convergence of expansions in Laguerre and Hermite series. Am J Math 87:695–708
Belomestny D, Panov V, Woerner JHC (2019) Low-frequency estimation of continuous-time moving average Lévy processes. Bernoulli 25:902–931
Birgé L, Massart P (1998) Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4:329–375
Brockwell PJ (2001) Continuous-time ARMA processes. Stochastic processes: theory and methods, 249–276. In: Handbook of Statistics, 19, C.R. Rao and D.N. Shanbhag (eds), North-Holland, Amsterdam
Brockwell PJ, Ferrazzano V, Klüppelberg C (2013) High-frequency sampling and kernel estimation for continuous-time moving average processes. J Time Ser Anal 34:385–404
Comte F, Genon-Catalot V (2015) Adaptive Laguerre density estimation for mixed Poisson models. Electron J Stat 9:1113–1149
Comte F, Genon-Catalot V (2018) Laguerre and Hermite bases for inverse problems. J Korean Stat Soc 47:273–296
Comte F, Genon-Catalot V (2015) Adaptive estimation for Lévy processes. Lévy matters IV:77–177, Lecture Notes in Math., 2128, Lévy Matters, Springer, Cham
Comte F, Renault E (1996) Long memory continuous time models. J Econom 73:101–149
Hall P, Heyde CC (1980) Martingale limit theory and its application. Probability and mathematical statistics. Academic Press, New York
Hsiao C (2003) Analysis of panel data, 2nd edn. Cambridge University Press, Cambridge
Laurent B, Massart P (2000) Adaptive estimation of a quadratic functional by model selection. Ann Stat 28:1302–1338
Ramsay JO, Silverman BW (2007) Applied functional data analysis: methods and case studies. Springer, Berlin
Schnurr A, Woerner JHC (2011) Well-balanced Lévy driven Ornstein–Uhlenbeck processes. Stat Risk Model 28:343–357
Tsybakov AB (2009) Introduction to nonparametric estimation. Revised and extended from the, (2004) French original. Translated by Vladimir Zaiats. Springer series in statistics. Springer, New York
Wang J-L, Chiou, J-M, Mueller, H-G (2015) Review of functional data analysis. ArKiv preprint arXiv:1507.05135
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Comte, F., Genon-Catalot, V. Nonparametric estimation for i.i.d. Gaussian continuous time moving average models. Stat Inference Stoch Process 24, 149–177 (2021). https://doi.org/10.1007/s11203-020-09228-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11203-020-09228-y
Keywords
- Continuous time moving average
- Gaussian processes
- Model selection
- Nonparametric estimation
- Projection estimators