1 Introduction

Stochastic differential models have been intensively surveyed in the theoretical literature with either continuous observations (e.g. Kutoyants 2004) or discrete observations, both in the parametric field (e.g. Genon-Catalot and Jacod 1993) or in the nonparametric field (e.g. Hoffmann 1999; Comte et al. 2007). More recently, stochastic differential equations with random effects have been introduced with various applications such as neuronal modelling or pharmacokinetics (e.g. Picchini et al. 2008; Delattre and Lavielle 2013; Donnet and Samson 2013). Mixed-effects models are used to analyse repeated measurements with similar functional form but with some variability between experiments (see Davidian and Giltinan 1995; Pinheiro and Bates 2000; Diggle et al. 2002). The advantage is that a single estimation procedure is used to fit the overall data simultaneously.

Estimation methods in stochastic differential models with random effects have been proposed, especially in the parametric framework (e.g. Donnet and Samson 2008, 2014; Donnet et al. 2010; Picchini et al. 2010; Picchini and Ditlevsen 2011; Delattre and Lavielle 2013; Genon-Catalot, and Larédo 2016; Delattre et al. 2015). All these parametric estimation methods of the density of the random effects are developed assuming a known model on the density, which is often Gaussian. However, one can wonder if this assumption is reasonable depending on the application context. We focus here on the nonparametric estimation of the density of the independent identically distributed random effects. To the best of our knowledge, the only references in this context are Comte et al. (2013) and Dion and Genon-Catalot (2015). The first one provides a nonparametric estimator of the density under restrictive assumptions on the drift and diffusion coefficients. The second one studies the more general case of two linear random effects in the drift. It provides a kernel estimator of the bivariate density of the couple of random parameters. Assuming that the process is at its stationary regime, the authors obtain \(\mathbb {L}^2\)-convergence results.

The present work proposes two nonparametric estimation methods in the simpler model, i.e. an Ornstein–Uhlenbeck stochastic differential model with one additive random effect, X, the time scale parameter being assumed known. More precisely, we consider N real valued stochastic processes \((X_j(t), t \in [0, T])\), \(j=1, \ldots ,N\), with dynamics ruled by the following SDEs:

$$\begin{aligned} {\left\{ \begin{array}{ll} dX_j(t)= \left( \phi _j-\frac{X_j(t)}{\alpha }\right) dt +\sigma dW_j(t)\\ X_j(0)= x_j \end{array}\right. } \end{aligned}$$
(1)

where \((W_j)_{1\le j \le N}\) are N independent Wiener processes, and \((\phi _j)_{1\le j \le N}\) are N unobserved independent and identically distributed (i.i.d.) random variables taking values in \(\mathbb {R}\), with a common density f. The sequences \((\phi _j)_{1\le j \le N}\) and \((W_j)_{1\le j \le N}\) are independent. Here \((x_1,\ldots , x_N)\) are known values. The positive constants \(\sigma \) and \(\alpha \) are supposed to be known; in practice they are estimated from experimental data. The estimation of \(\sigma \) can be done using the quadratic variation of the process. The constant \(\alpha \) is a physical quantity. Picchini et al. (2010) give an estimator of \(\alpha \) for the Ornstein–Uhlenbeck model (1) where the likelihood function is explicit and one can compute maximum likelihood estimators. Each process \((X_j(t),0 \le t \le T)\) represents an individual and the variable \(\phi _j\) is the random effect of individual j. Due to the independence of the \(\phi _j\) and the \(W_j\), the \(X_j(t),\) for \(j=1,\ldots ,N\) are i.i.d. random variables when t is fixed, also the N trajectories \((X_j(t),0\le t \le T)\), \(j=1,\ldots ,N\) are i.i.d. Nevertheless, differences between observations are due to the realization of both \(W_j\) and \(\phi _j\). The Ornstein–Uhlenbeck model is very useful in practice: at first in physics to describe the movement of a particle, then in the econometric field, or for example in neuroscience to describe the membrane potential of a neuron.

The purpose of the present work is to build nonparametric estimators of the random effect density f, considering that only the processes are observed on [0, T] with \(T>0\) given. In practice we consider discrete observations of the \(X_j\)’s with a very small time step \(\delta \). We are able to evaluate the error made by this discretization. The main difficulty is that we do not observe the \(\phi _j\)’s but only the \(X_j(k\delta )\)’s. Thus the first step is to find an estimator of the random effects \(\phi _j\) and then to estimate f, taking into account the approximation introduced by the estimation of the \(\phi _j\).

In the context of stochastic differential equations with random effects, Comte et al. (2013) propose different nonparametric estimators with good theoretical properties for large T. Here we adopt two different approaches. First we assume that T is large and we propose a direct estimation method of the density from the estimator of the \(\phi _j\)’s. This is the kernel estimator. Then, assuming that T may be small (due to the chosen units for example but still with high frequency data) we focus on the deconvolution estimator.

The kernel estimator depends on a bandwidth to be chosen from the data. Several selection methods of the bandwidth of kernel estimators are known. The originality here is that we use a method, proposed by Goldenshluger and Lepski (2011), which provides an adaptive estimator. This kind of non-asymptotic result is new in this context.

Then we study an estimator built by a deconvolution method (see Butucea and Tsybakov 2007; Comte et al. 2013, for example). The novelty lies in the introduction of an additional tuning parameter to control the variance of the noise. The value of T is then allowed to be small but we still need high frequency observation meaning a small time step.

We obtain a collection of estimators depending on two parameters. To select the final estimator among this collection, we extend the Goldenshluger and Lepski method for a two-dimensional model selection (Goldenshluger and Lepski 2011). Finally we have a consistent estimator satisfying an oracle inequality, for any value of T. This estimator is likely to be applied to experimental data with small T.

We illustrate the properties of the proposed estimators with a simulation study. Especially, we compare them with standard bandwidth selection method of cross-validation type. Then, the estimators are applied to neuronal data. They are intracellular measurements of the neuronal membrane potential between two spikes which can be modelled with an Ornstein–Uhlenbeck model with one random effect as in (1). The potential being reset at a fixed initial value after a spike, we consider that the measure between two spikes is an independent experimental unit with a different realization of the random effect. This assumption has already been considered with parametric strategies in Picchini et al. (2008, (2010), where it is assumed that the random effect is Gaussian and proven that the Ornstein–Uhlenbeck model with one random effect fits better the data than the model without them. Our goal is to estimate nonparametrically the density of the random effect. This estimated density could be used in further works to model this phenomenon (instead of using the Gaussian density systematically).

The paper is organized as follows. Section 2 is dedicated to giving definitions and presenting the estimators investigated in this work. Then in Sect. 3 we set up a method of bandwidth selection for the kernel estimator. In Sect. 4 we define and study the final data-driven estimator built by deconvolution. In Sect. 5 we calibrate the selection methods and illustrate the good performances of both estimators on simulated data. In Sect. 6 we experiment the procedures on real data. We conclude this article with a discussion in Sect. 7. All proofs are gathered in Sect. 8, and the computation of the error made by discretization is done in “Appendix 1”.

2 Presentation of the strategies

2.1 Notation and assumptions

Let us introduce some notations. For two functions \(g_1\) and \(g_2\) in \(\mathbb {L}^1(\mathbb {R}) \cap \mathbb {L}^2(\mathbb {R})\), the convolution product of \(g_1\) and \(g_2\) for all \(x \in \mathbb {R}\), is \( g_1{\star } g_2(x)=\int _\mathbb {R} g_1(x-y)g_2(y)dy\) and the scalar product is: \(\langle g_1,g_2 \rangle =\int _\mathbb {R} g_1(x){\overline{g_2(x)}}dx\). Then the Fourier transform of \(g_1\) is \(g_1^*(x)=\int _{\mathbb {R}} e^{iux}g_1(u)du\) for all \(x\in \mathbb {R}\) and the \(\mathbb {L}^2\)-norm is \(\Vert g_1\Vert ^2=\int _{\mathbb {R}}|g_1(x)|^2dx\). Finally we recall the Plancherel–Parseval’s formula: \( 2\pi \Vert g_1\Vert ^2=\Vert g_1^*\Vert ^2\).

We assume (A) \(f \in \mathbb {L}^2(\mathbb {R})\), \(f^*\in \mathbb {L}^1(\mathbb {R}) \cap \mathbb {L}^2(\mathbb {R})\).

2.2 Initial idea

As previously mentioned, the first step of the procedure is to estimate the random effect \(\phi _j\) which are not observed, in order to recover their density, in a second time.

For this purpose, we introduce the following random variables for \(j=1,\ldots ,N\) and \(\tau \in ]0,T]\),

$$\begin{aligned} Z_{j,\tau }:= \frac{X_j(\tau )-X_j(0)-\int _0^{\tau } \left( -\frac{X_j(s)}{\alpha }ds\right) }{\tau }=\phi _j + \frac{\sigma }{\tau }W_j(\tau ). \end{aligned}$$
(2)

The \((Z_{j,\tau })_{\tau }\) are estimators of the \(\phi _j\) based on the trajectory \((X_j(t))\). They correspond to the maximum in \(\varphi \) of the conditional likelihood of (1) given \(\phi _j=\varphi \). Moreover random variables satisfy \(\mathbb {E}[Z_{j,\tau }]=\mathbb {E}[\phi _j]\) and when \(\tau \) goes to infinity, the noise \(\sigma W_j(\tau )/\tau \) goes to zero. This attests the goodness of the estimator. Notice that the \((Z_{j,\tau })_{j=1,\ldots ,N}\) are i.i.d. when \(\tau \) is fixed, with density \(f_{Z_{\tau }}\), due to the independence of \((\phi _{j})_{j=1,\ldots ,N}\) and \((W_{j})_{j=1,\ldots ,N}\). These new random variables are available, depending only on the observations and known parameters. Nevertheless, we only have discrete observations of the process. Thus we discretize the integral: \(\int _0^\tau X_s ds \approx \delta \sum \nolimits _{k=1}^{\lfloor \tau /\delta \rfloor } X_{(k-1)\delta }\). The error due to this approximation is studied in section ‘Discretization’ of “Appendix 2”. At this point, two strategies materialize which we explain in the following Section.

2.3 Estimation strategies

Let us present the two investigated methods.

Kernel strategy

The first idea is to reduce the noise which appears in formula (2). Indeed, \(\text {Var}(\sigma W_j(\tau )/\tau )= \sigma ^2/\tau \) leads to focus on the largest \(\tau \): \(\tau =T\). Moreover, when T is large \(Z_{j,T}\) clearly approximates \(\phi _j\) without needing to remove the noise. Then we are able to build a kernel estimator of the density f of the \(\phi _j\)’s based on the \(Z_{j,T}\) using directly the \(Z_{j,T}\) as an approximation of the non-observed random effects \(\phi _j\). These N random variables are i.i.d. and the resulting kernel estimator is given for all \( x \in \mathbb {R},\) by

$$\begin{aligned} {\widehat{f}}_h(x)=\frac{1}{N} \sum _{j=1}^{N} K_h(x-Z_{j,T}) \end{aligned}$$
(3)

where \(h>0\) is a bandwidth, and \(K: \mathbb {R} \rightarrow \mathbb {R}\) is a \({\mathcal {C}}^2\) kernel such that

$$\begin{aligned} \int K(u)du=1, \quad \Vert K\Vert ^2= & {} \int K^2(u)du< +\infty ,\quad \int (K''(u))^2du < +\infty ,\nonumber \\ K_h(x)= & {} \frac{1}{h}K\left( \frac{x}{h}\right) . \end{aligned}$$
(4)

This natural estimator is studied in detail in Sect. 3.

Deconvolution strategy

The other idea is to build an estimator of f using all variables \(Z_{j,\tau }\) for different \(\tau \in [0,T]\). Recovering f from the observations \((X_1(t),\ldots ,X_N(t))_{t\in [0,T]}\) is called the deconvolution problem because the common density of \((Z_{j,\tau })_{j=1,\ldots ,N}\) is a convolution product between two densities. Indeed, the two members of the sum (2) are independent when \(\tau \) is fixed, which implies for all \(j=1, \ldots , N\),

$$\begin{aligned} f_{Z_{\tau }}(u)=f{\star } f_{\frac{\sigma }{\tau }W_1(\tau )}(u). \end{aligned}$$

Then the characteristic function of \(\phi _j\) is recoverable from that of \(Z_\tau \). Taking the Fourier transform under assumption (A) gives the simple product

$$\begin{aligned} f^*_{Z_{\tau }}(u)=f^*(u) f^*_{\frac{\sigma }{\tau }W_1(\tau )}(u), \end{aligned}$$

with \(f^*_{\frac{\sigma }{\tau }W_1(\tau )}(u)= e^{-\frac{u^2\sigma ^2}{2\tau }}\). In this particular case the noise is Gaussian and this convolution problem has been investigated in the literature, see Fan (1991), Butucea and Tsybakov (2007) for example. However it has been proven in Carroll and Hall (1988) that the best rates of convergence obtained in this case are logarithmic. This suggests to improve the deconvolution procedure and this is the reason why we choose not to use previous estimators but to propose a new method, based on repeated observations and new parameters chosen carefully.

We have \(f^*(u)=f^*_{Z_{\tau }}(u)e^{{u^2\sigma ^2}/{2\tau }}\). Finally the Fourier inversion gives the closed formula, for all \(x \in \mathbb {R}\),

$$\begin{aligned} {f}(x)= \frac{1}{2\pi } \int _{\mathbb {R}} e^{-iux} f^*_{Z_{\tau }}(u) e^{\frac{u^2\sigma ^2}{2\tau }}du. \end{aligned}$$
(5)

Then, we estimate \(f^*_{Z_{\tau }}(u)\) by its empirical estimator \({\widehat{f}}^*_{Z_{\tau }}(u)=(1/N) \sum _{j=1}^N e^{iuZ_{j,\tau }}\). However, plugging this in formula (5) involves integrability problems. Indeed the integrability of \({\widehat{f}}^*_{Z_{\tau }}(u)e^{u^2\sigma ^2/2\tau }\) is not ensured. Therefore, we have to introduce a cut-off. The nonparametric estimation using a deconvolution method in the Gaussian case commonly yields bad speeds of convergence. To improve the rates, an idea of Comte and Samson (2012), for linear mixed models, was to link this cut-off and the time of the process. Comte et al. (2013) link the time of the process \(\tau \) and the cut-off as follows:

$$\begin{aligned} {\widehat{f}}_{\tau }(x)=\frac{1}{2\pi } \int _{-\sqrt{\tau }}^{\sqrt{\tau }} e^{-iux} \frac{1}{N} \sum _{j=1}^N e^{iuZ_{j,\tau }} e^{\frac{u^2\sigma ^2}{2\tau }}du. \end{aligned}$$
(6)

Then the time \(\tau \) is chosen by a Goldenshluger and Lepski’s method and the final estimator is denoted \({\widetilde{f}}_{{\widetilde{\widetilde{\tau }}}}\). Nevertheless, when \(\tau \) is small (which is the case for the real dataset we investigate), the integration domain is not large enough, and the estimators of f are not satisfactory (cf an explicit example in Sect. 6). We adapt Comte et al. (2013) estimator to this small T framework. Indeed, to improve the previous estimator, we introduce a new parameter s in the cut-off:

$$\begin{aligned} {\widehat{f}}_{s,\tau }(x)=\frac{1}{2\pi } \int _{-s\sqrt{\tau }}^{s\sqrt{\tau }} e^{-iux} \frac{1}{N} \sum _{j=1}^N e^{iuZ_{j,\tau }} e^{\frac{u^2\sigma ^2}{2\tau }}du. \end{aligned}$$

Then, in order to simplify the theoretical study, we replace \(s\sqrt{\tau }\) in the integral by a new parameter m. The resulting estimator \({\widetilde{f}}_{m,s}\) is defined when \(m^2/s^2 \in ]0,T]\), by

$$\begin{aligned} {\widetilde{f}}_{m,s}(x)=\frac{1}{2\pi } \int _{-m}^{m} e^{-iux} \frac{1}{N} \sum _{j=1}^N e^{iuZ_{j,m^2/s^2}} e^{\frac{u^2\sigma ^2s^2}{2m^2}} du \end{aligned}$$
(7)

with m and s in two finite sets \({\mathcal {M}}\) and \({\mathcal {S}}\) that we will precise later.

In the following we survey in detail the two strategies.

3 Study of the kernel estimator

The kernel estimator given by (3) has been investigated in Comte et al. (2013). First we recall the MISE bound that the kernel estimator \({\widehat{f}}_h\) satisfies. Then we develop the bandwidth selection procedure we are interested in in this work.

3.1 Risk bound

Let us define \(f_h:=K_h {\star } f\), for \(h>0\). We denote for all \(p \in \mathbb {R}\), \(\Vert f\Vert _p= (\int |f(x)|^p dx)^{1/p}\) and for \(p=2\) we still use \(\Vert f\Vert _2=\Vert f\Vert \). Notice that \(\Vert K_h\Vert =\Vert K\Vert /\sqrt{h}\) and \(\Vert K_h\Vert _1=\Vert K\Vert _1\). We recall the result proven in Comte et al. (2013) for the MISE.

Proposition 3.1

Considering estimator \({\widehat{f}}_h\) given by (3), we have

$$\begin{aligned} \mathbb {E}[\Vert {\widehat{f}}_h-f\Vert ^2]\le 2\Vert f-f_h\Vert ^2+ \frac{\Vert K\Vert ^2}{Nh}+ \frac{2\sigma ^4\Vert K''\Vert ^2}{3T^2h^5}. \end{aligned}$$
(8)

The right-hand side of (8) involves three terms, and the middle one is the integrated variance. The integrated bias is \(\Vert \mathbb {E}[{\widehat{f}}_h]-f\Vert ^2 \le 2\Vert f-f_h\Vert ^2+2\Vert \mathbb {E}[{\widehat{f}}_h]-f_h\Vert ^2\), with

$$\begin{aligned} \Vert \mathbb {E}[{\widehat{f}}_h]-f_h\Vert ^2\le \frac{\sigma ^4\Vert K''\Vert ^2}{3T^2h^5}. \end{aligned}$$
(9)

Therefore, the first term \(\Vert f-f_h\Vert ^2\) is a bias term, which decreases when h decreases. The second term is the term of variance which increases when h decreases. Finally, the third term, also given in (9), is an unusual error term due to the approximation of the \(\phi _j\)’s by the \(Z_{j,T}\) also increasing when h decreases. We see on this bound that the rate \(\sigma ^2/T\) must be small to obtain a small risk.

3.2 Adaptation of the bandwidth

Now that we have at hand a collection of estimators depending on a bandwidth h, we focus on the crucial matter namely how to choose the bandwidth from the data. The best choice of h is the one which minimizes the sum of these three terms. The selection of the bandwidth can be done for example using cross validation, see e.g. the R-function density which is commonly used. However, the only theoretical results known for cross-validation procedure are asymptotic and to the best of our knowledge there is no adaptive result on the final estimator. In the present work, we propose to adapt a selection method due to Goldenshluger and Lepski (2011) mentioned before, which provides a data-driven bandwidth for which we provide non-asymptotic theoretical results.

We denote \({\mathcal {H}}_{N,T}\) the finite set of bandwidths h, to be defined later. The best theoretical choice of the bandwidth is the h which minimizes the bound on the MISE given by (8). Nevertheless, in practice, the bias term is unknown, and this bound has to be estimated.

To choose h adequately, we use a Goldenshluger and Lepski’s criterion introduced in Goldenshluger and Lepski (2011). The idea is to estimate \(\Vert f-f_{h}\Vert ^2\) by the \(\mathbb {L}^2\)-distance between two estimators defined in (3). But this induces an error which has to be corrected by the variance term. Then the estimator of the bias term is

$$\begin{aligned} A(h)=\underset{h' \in {\mathcal {H}}_{N,T}}{\sup } \left( \Vert {\widehat{f}}_{h,h'}-{\widehat{f}}_{h'}\Vert ^2-V(h')\right) _+ \end{aligned}$$
(10)

where

$$\begin{aligned} {\widehat{f}}_{h,h'}(x):=K_{h'} {\star } {\widehat{f}}_h(x)=\frac{1}{N}\sum _{j=1}^N K_{h'} {\star } K_h(x-Z_{j,T}) \end{aligned}$$

and V correspond to the two terms of variance

$$\begin{aligned} V(h)=\kappa _1\frac{\Vert K\Vert ^2_1\Vert K\Vert ^2}{Nh}+\kappa _2 \frac{\sigma ^4\Vert K\Vert ^2_1\Vert K''\Vert ^2}{T^2h^5} \end{aligned}$$
(11)

with \(\kappa _1\) and \(\kappa _2\) two numerical positive constants. We will prove that A(h) has the order of the bias term (see Eq. (24)). Finally the bandwidth is selected as follows:

$$\begin{aligned} {\widehat{h}}= \underset{h\in {\mathcal {H}}_{N,T}}{{\text {arg}}\!{\text {min}}}\;{\left\{ A(h)+V(h)\right\} } \end{aligned}$$
(12)

with \({\mathcal {H}}_{N,T}\) a finite discrete set of bandwidths h such that \(h>0\), \(\displaystyle \frac{1}{Nh} \le 1, \frac{1}{h^5T^2} \le 1\) and \(\text {Card}({\mathcal {H}}_{N,T}) \le N\). It must be chosen such that when N goes to infinity, for all \(c \in {\mathcal {C}}\) \(\sum _{h\in {\mathcal {H}}_{N,T}} h^{-1/2}e^{-c/\sqrt{h}} \le S(c)\) with S(c) a positive constant depending on c. For example notice that taking \({\mathcal {H}}_{N,T}=\{1/k^2, k=1,\ldots , \sqrt{N}\}\), the sum \(\sum _{h\in {\mathcal {H}}_{N,T}} h^{-1/2}e^{-c/\sqrt{h}} \le \sum _{k \ge 1}k e^{-ck}\) converges, which is a necessary condition for the proof.

Then we can prove the following Theorem.

Theorem 3.1

Consider estimator \({\widehat{f}}_h\) given by (3) with \(h \in {\mathcal {H}}_{N,T}\). Then, there exist two penalty constants \(\kappa _1\), \(\kappa _2\) such that

$$\begin{aligned} \mathbb {E}[\Vert {\widehat{f}}_{{\widehat{h}}}-f\Vert ^2]\le C_1 \underset{h \in {\mathcal {H}}_{N,T}}{\inf } \left\{ \Vert f-f_h\Vert ^2+ V(h) \right\} +\frac{C_2}{N} \end{aligned}$$

where \(C_1,C_2\) are two positive constants such that \(C_1=\max (7, 30 \Vert K\Vert _1^2+6)\) and \(C_2\) depends on \(\Vert f\Vert ,\Vert K\Vert \), \(\Vert K\Vert _1\), \(\Vert K\Vert _{4/3}\).

The theoretical study gives \(\kappa _1 \ge \max (40/ \Vert K\Vert _1^2,40)\) and \(\kappa _2 \ge \max (10/3,10/ (3\Vert K\Vert _1^2))\). But in practice these two constants are calibrated from a simulation study (and always smaller than the theoretical ones). Theorem 3.1 is an oracle inequality: the bias variance compromise is automatically obtained and in a data-driven and non-asymptotic way.

This strategy requires large T as we assume \(1/h^5 \le T^2\). The error implied by the discrete observations and the use of Riemann sums to compute the \(Z_{j,T}\) are detailed in Comte et al. (2013).

4 Study of the deconvolution estimator

4.1 Risk bound

Let us emphasize that the estimator \({\widetilde{f}}_{m,s}\) given by (7) depends on two parameters which have to be selected from the data. This is not usual in the deconvolution setting, where only one cut-off parameter is often introduced. The selection of these two parameters (ms) among the finite sets \({\mathcal {M}}\), \({\mathcal {S}}\) is thus more difficult. It is even more challenging here because the cut-off m appears both in the integral and in the integrand. But this will induce gains in the rates of the estimators. Before proposing a selection method of (ms) we start by evaluating the quality of the estimator with the mean integrated squared error (MISE):

$$\begin{aligned} \mathbb {E}\left[ \Vert {\widetilde{f}}_{m,s}-f\Vert ^2 \right] =\Vert f-\mathbb {E}[{\widetilde{f}}_{m,s}]\Vert ^2+\mathbb {E} \left[ \Vert {\widetilde{f}}_{m,s}-\mathbb {E}[{\widetilde{f}}_{m,s}]\Vert ^2 \right] . \end{aligned}$$

In Proposition 4.1 we prove that \(\mathbb {E}[{\widetilde{f}}_{m,s}]=f_m\) where \(f_m\) is defined by its Fourier transform

$$\begin{aligned} f^*_{m}:=f^* \mathbf {1}_{[-m,m]}. \end{aligned}$$

It means that the bias does not depend on s. We obtain the following bound on the MISE of \({\widetilde{f}}_{m,s}\).

Proposition 4.1

Under (A), the estimator \({\widetilde{f}}_{m,s}\) given by (7) is an unbiased estimator of \(f_{m}\) and we have

$$\begin{aligned} \mathbb {E}\left[ \Vert {\widetilde{f}}_{m,s}-f\Vert ^2 \right] \le \Vert {f}_{m}-f\Vert ^2+\frac{m}{\pi N} \int _{0}^{1} e^{\sigma ^2s^2v^2}dv. \end{aligned}$$
(13)

The proofs are relegated to Sect. 8. Let us look at the risk bound. The first term of the bound (13) is the bias term. It represents the error resulting from estimating f by \(f_{m}\) and it decreases when m increases, indeed:

$$\begin{aligned} \Vert f_{m}-f\Vert ^2=\frac{1}{2\pi }\int _{|u| \ge m}|f^*(u)|^2 du. \end{aligned}$$

The second term is the variance term, and it increases with m and s. One can notice that it is bounded as soon as s is bounded and \(m \le N\).

We specify the two sets \({\mathcal {M}}\) and \({\mathcal {S}}\). We notice that the quality of the estimate in the Fourier domain is good on an interval around zero with length related with \(\sigma \). The chosen set for s is

$$\begin{aligned} {\mathcal {S}}:=\left\{ s_l=\frac{1}{2^l} \frac{2}{\sigma }, ~l=0,\ldots ,P\right\} . \end{aligned}$$

Notice that for \(s_l \in {\mathcal {S}}\), \(1/ 2^{P-1} \le \sigma s_l \le 2\). Moreover with this chosen collection \({\mathcal {S}}\), the order of the variance term is m / N. With the idea that \(m^2/s^2\) is homogeneous to a time, we choose m in the finite collection:

$$\begin{aligned} {\mathcal {M}}:=\left\{ m=\frac{\sqrt{k \varDelta }}{\sigma }, ~ k\in \mathbb {N}^*,~0<m\le N\right\} \end{aligned}$$

with \(0<\varDelta <1\) a small step to be fixed. The collection of couples of parameters is

$$\begin{aligned} {\mathcal {C}}:=\{(m,s)\in {\mathcal {M}}\times {\mathcal {S}}, \quad m^2/s^2 \le T\}. \end{aligned}$$

The final estimator is the estimator from the collection \({\mathcal {C}}\) which achieves the bias-variance compromise. Choosing the final estimator is not an easy task except if we know the regularity of f. Indeed, let us assume that f is in the Sobolev ball with regularity parameter b, i.e. f belongs to the set defined by

$$\begin{aligned} {\mathcal {A}}_b(L)=\left\{ f\in \mathbb {L}^1(\mathbb {R}) \cap \mathbb {L}^2(\mathbb {R}) , \int _{\mathbb {R}} |f^*(x)|^2(1+x^2)^b dx \le L \right\} \end{aligned}$$

with \(b > 0\), \(L>0\). For example the standard normal distribution is in a space \({\mathcal {A}}_b(L)\) for some L and for all \(b>0\), an exponential distribution is in some \({\mathcal {A}}_b(L)\) for \(b<1/2\) or more generally a Gamma distribution with shape parameter k is in some \({\mathcal {A}}_b(L)\) for \(b<(k-1/2)\). Thus when \(f\in {\mathcal {A}}_b(L)\), the bias term satisfies:

$$\begin{aligned} \Vert f_{m}-f\Vert ^2=\frac{1}{2\pi }\int _{|u| \ge m}|f^*(u)|^2 du \le \frac{L}{2\pi } m^{-2b}. \end{aligned}$$

Consequently, the \(\mathbb {L}^2\)-risk of \({\widetilde{f}}_{m,s}\) is bounded by

$$\begin{aligned} \mathbb {E}\left[ \Vert {\widetilde{f}}_{m,s}-f\Vert ^2\right] \le \frac{L}{2\pi } m^{-2b} + \frac{m}{\pi N}e^{\sigma ^2s^2}. \end{aligned}$$

Therefore, the best theoretical choice of s is \(s_P\) the smallest s in our collection, and

$$\begin{aligned} m={m}^*=K_b N^{\frac{1}{(2b+1)}} \end{aligned}$$

with \(K_{b}=(bL\exp (-1/(2^{2(P-1))}))^{1/(2b+1)}\). Then we obtain the following asymptotic result.

Corollary 4.1

If \(f\in {\mathcal {A}}_b(L)\), and if we choose \(s=s_P\) and \(m=m^*\), there exists a constant K depending on bLP, such that

$$\begin{aligned} \mathbb {E}\left[ \Vert {\widetilde{f}}_{m^*,s_P}-f\Vert ^2\right] \le K N^{-\frac{2b}{2b+1}}. \end{aligned}$$

The order of the risk in this case is \(N^{-2b/(2b+1)}\) for a large N, and it is the nonparametric estimation rate of convergence obtained when the observations are N realizations of the variable of interest. Nevertheless, it is not easy to see that \((m,s) \in {\mathcal {C}}\) and this choice is theoretical because it depends on the regularity b of f, which is unknown. The next section provides a data-driven method to select (ms).

4.2 Selection of the final estimator

In this Section we deal with the choice of the best estimator among the available collection of \({\widetilde{f}}_{m,s}\). In the previous work Comte et al. (2013), s was fixed to \(s=1\) and m was selected. But we saw empirically that it did not work in the setting corresponding to the data. This is why we experimented different values for s. But then we did not find any reliable criterion to select m for any given s. On the contrary, if we look at the bound and try to select s first, we just get \(s=0\), which is not of interest if we are looking to improve the estimator through s in particular. This implies to select the couple (ms) minimizing the MISE and realizing the compromise between the two terms, in a data-driven way. This is a crucial issue. Indeed, the role of the two parameters is not the same. Thus we propose a new criterion adapted from the Goldenshluger and Lepski (2011) method.

The idea is to select the couple which minimizes the MISE: \(\mathbb {E}\left[ \Vert {\widetilde{f}}_{m,s}-f\Vert ^2\right] \). As it is unknown, we have to find a computable approximation of this quantity. We define the best couple (ms) as the one minimizing a criterion defined as the sum of a squared bias term and a variance term called penalty. We define the penalty function, which has the same order as the bound on the variance term:

$$\begin{aligned} \text {pen}(m,s)=\kappa \frac{m}{N} e^{\sigma ^2s^2}, \end{aligned}$$

where \(\kappa \) is a numerical constant to be calibrated. Note that for \(m\in {\mathcal {M}}\) and \(s\in {\mathcal {S}}\), the penalty function is bounded.

To estimate the bias term, we generalize Goldenshluger and Lepski’s criterion for a two-dimensional index. The method is inspired by the ideas developed for kernel estimators by Goldenshluger and Lepski (2011) and adapted to model selection in one dimension in Comte and Johannes (2012) and in two dimensions by Chagny (2013). The idea is to estimate \(\Vert f-f_{m}\Vert ^2\) by the \(\mathbb {L}^2\)-distance between two estimators defined in (7). But this induces a bias which has to be corrected by the penalty function. We consider the following estimator of the bias, with \((m',s')\wedge (m,s):=(m' \wedge m, s'\wedge s)\),

$$\begin{aligned} \varGamma _{m,s}= & {} \underset{(m',s')\in {\mathcal {C}}}{\max } \left( \Vert {\widetilde{f}}_{m',s'}-{\widetilde{f}}_{(m',s')\wedge (m,s)}\Vert ^2-\text {pen}({m',s'}) \right) _+ \end{aligned}$$
(14)

for \((m,s) \in {\mathcal {C}}\). Finally the selected couple is:

$$\begin{aligned} ({\widetilde{m}},{\widetilde{s}})= \arg \underset{(m,s)\in {\mathcal {C}}}{\min } \{{\varGamma }_{m,s} + \text {pen}(m,s) \}. \end{aligned}$$
(15)

We are now able to obtain the following result.

Theorem 4.1

Under (A), consider the estimator \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\) given by (7) and (15). There exists \(\kappa _0\) a numerical constant such that, for all penalty constants \(\kappa \ge \kappa _0\),

$$\begin{aligned} \mathbb {E}\left[ \Vert {\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}-f\Vert ^2\right] \le C \underset{(m,s)\in {\mathcal {C}}}{\inf }\left\{ \Vert f-f_{m}\Vert ^2 +\text {pen}(m,s) \right\} + \frac{C'(P+1)}{N} \end{aligned}$$
(16)

where \(C>0\) is a numerical constant and \(C'\) is a constant depending on \(\Vert f\Vert \), \(\sigma \), \(\varDelta \), and \(P+1\) the cardinality of \({\mathcal {S}}\).

The key of the proof is to prove that

$$\begin{aligned} \mathbb {E}[\varGamma _{m,s}] \le 18 \Vert f-f_m\Vert ^2+\frac{C'(P+1)}{N} \end{aligned}$$

(see the proof in Sect. 8.3.) Inequality (16) means that \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\) automatically makes the bias-variance trade-off. Moreover, our result is of non asymptotic nature w.r.t. N.

One should notice that this new parameter s generalizes the results of Comte et al. (2013) even if T is large. We choose the two parameters in an adaptive way, thus this gives more flexibility in the choice of the estimator.

It follows from the proof that \(\kappa _0=24\) would suit. But in practice, values obtained from the theory are generally too large and the constant is calibrated by simulations. Once chosen, it remains fixed for all simulation experiments. Besides the cardinality P of the set \({\mathcal {S}}\) is chosen small in practice (\(P=3\) or 10 for example).

In the section ‘Discretization’ of “Appendix 2’ we investigate the error implied by the discrete observations and thus of the discretization of \(Z_{j,\tau }\) given by (2).

5 Simulation study

In the following section we compare on simulations the two procedures we compute \({\widehat{f}}_{{\widehat{h}}}\) and \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\), the estimator of Comte et al. (2013) \({\widetilde{f}}_{{\widetilde{\widetilde{\tau }}}}\) and we compare our bandwidth selection method with the estimator from the R-function density with the cross-validation argument bw = ucv, on the \(Z_{j,T},j=1,\ldots ,N\), we denote it \({\widehat{f}}_{cv}\).

We simulate data by computing the exact solutions of (1) given by Itô’s formula,

$$\begin{aligned} X_j(t)=X_j(0)e^{-t/\alpha }+ \phi _j \alpha (1-e^{-t/\alpha }) +\sigma e^{-t/\alpha }\int _0^t e^{s/\alpha }dW_j(s) \end{aligned}$$
(17)

at discrete times \(t_k \in {\mathcal {T}}:=\{k\delta ,k\in \{0,\ldots ,J\}, J\delta =T\}\). For the simulation study, we have to fix \(N,\delta , T, \sigma , \alpha ,\) and the density f. We take \(\sigma =0.0135, 0.05, 1\) and \(\sigma =0.05\), \(\alpha =0.039, 1, 39\). For the time T, we choose \(T=0.3, 10, 50, 100, 300\) with different values of \(\delta \) the discrete time step at which observations are recorded. The value of J, the number of observations for one trajectory ranges from 150 to 5000 for Table 1 and is fixed to \(J=2000\) for Table 2. All these parameter values are chosen in relation with the parameters of the real dataset. In this study we hope to highlight the influence of each one. For f, we investigate four different distributions:

  • Gaussian distribution \({\mathcal {N}}(0.278,(0.041)^2)\)

  • Gamma distribution \(\varGamma (1.88,0.148)\)

  • mixed Gaussian distribution \(0.3{\mathcal {N}}(0,(0.02)^2)+0.7{\mathcal {N}}(1,(0.02)^2)\) (denoted M-Gaussian)

  • mixed Gamma distribution \(0.4\varGamma (3,0.08)+0.6\varGamma (30,0.035)\) (denoted M-Gamma)

    where we write \(\varGamma (k,\theta )\) with k the scale parameter and \(\theta \) the shape.

First, we implement the two collections of estimators: \({\widehat{f}}_{{\widehat{h}}}\) and \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\). We begin by computing the random variables used by both estimators: \(Z_{j,\tau }\) given by (2), with Riemann sums approximations (see section ‘Discretization’ of “Appendix 2’ for details). For the deconvolution estimator given by (7) we also use Riemann sums to compute the integral. For the collection of m, we choose \(\varDelta =0.08\) and \(\delta \) changes. Furthermore, for the kernel estimator given by (3), we choose a Gaussian kernel: \(K(u)=({1}/\sqrt{2\pi }) e^{-u^2/2}\). In this case \(\Vert K\Vert _1=1\), \(\Vert K\Vert _2^2=1/(2\sqrt{\pi })\), \({\Vert K''\Vert _2^2=(1+1/\sqrt{2})/(\sqrt{2\pi })}\).

Then, the selected bandwidth \({\widehat{h}}\) is given by Eq. (12). Note that for all \((h,h')\in {\mathcal {H}}^2\),

$$\begin{aligned} K_{h'} {\star } K_h(x)=\frac{1}{\sqrt{2\pi }\sqrt{h'^2+h^2}} e^{-x^2/[2(h'^2+h^2)]}. \end{aligned}$$

We use this relation to compute the \({\widehat{f}}_{h,h'}\).

Secondly, we have to calibrate the penalty constants: \(\kappa _1,\kappa _2\) for the kernel estimator and \(\kappa \) for the deconvolution estimator. Classically, the constants are fixed thanks to preliminary simulation experiments. Different functions f have been investigated with different parameter values, and a large number of replications. Comparing the MISE obtained as functions of the constants \(\kappa _1\), \(\kappa _2\) and \(\kappa \) yield to select values making a good compromise over all experiments. Finally we choose \(\kappa _1=1\), \(\kappa _{2}=0.0001\) and \(\kappa =0.3\). A recent work Lacour and Massart (2016) proposes to change the calibration constants in the variance term V(h): taking \(\kappa \) in the term \(\varGamma _{(m,s)}\) (14) and \(2\kappa \) for the second V(h) in the selection criterion (12). It has been done in practice for the kernel estimator. We notice that this strategy produces very good results in practice, better than choosing the same \(\kappa \) for the two apparitions of term V(h). On Fig. 1 25 estimators \({\widehat{f}}_{{\widehat{h}}}\) are plotted and on Fig. 2 25 estimators \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\), for the 4 investigated densities f . The batch of estimators is close to the estimated density.

In order to evaluate the performances of each estimator on the different designs, we compare their empirical MISE computed from 100 simulated data sets.

Table 1 summarises the results for different parameters values. It shows the bad performances of the estimator of Comte et al. (2013) \({\widetilde{f}}_{{\widetilde{\widetilde{\tau }}}}\) when T is small compared to \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\). It performs clearly better when T is increasing. Besides we notice that both kernel estimators have good results. Nevertheless these results are satisfying because it appears that our estimator \({\widehat{f}}_{{\widehat{h}}}\) fits slightly better the true density than \({\widehat{f}}_{cv}\). The computation time is close for both selection method. We show the results for different values of \(\alpha \) do not seem to influence the quality of estimators (while the selected hms are very different). During the simulation study, we noticed that the parameter \(\alpha \) is important is the sense that when the value of \(\alpha \) does not have the same order as the values of \(\phi \), the estimation is harder. Except when \(T=300\) the ratio signal noise which is the standard deviation of the random effect divided by \(\sigma \) is larger than one thus the settings are favourables. But for both Gamma and mixed Gamma cases, the standard deviations are respectively 0.2 and 0.15, which is not small compared to \(\sigma \) or to their mean. The mixed Gamma case is difficult for the nonparametric estimation: Fig. 3 illustrates the performances of estimators for this choice. Finally, it is interesting to note that when the standard deviation of the random effect of interest has a larger variance, the density estimation is easier, which is the case of the chosen Gamma density for example compared with the Gaussian case.

In the following as the two kernel estimators seem very close, we only show the results for \({\widehat{f}}_{{\widehat{h}}}\) which is of interest here. Besides we no longer investigate the previous estimator \({\widetilde{f}}_{{\widetilde{\widetilde{\tau }}}}\) in light of Table 1 for \(T<N\).

Fig. 1
figure 1

Simulated data. In plain red (black) 25 estimators \({\widehat{f}}_{{\widehat{h}}}\) with parameters: \(N=240\), \(T=0.3, \delta =0.00015\), \(\sigma =0.0135\), \(\alpha =0.039\) and the true density f in plain bold black line. a f Gaussian, b f mixed Gaussian density, c f Gamma and d f mixed Gamma density

Fig. 2
figure 2

Simulated data. In plain red (black) 25 estimators \({\widetilde{f}}_{{\widetilde{m}}, {\widetilde{s}}}\) with parameters: \(N=240\), \(T=0.3, \delta =0.00015\), \(\sigma =0.0135\), \(\alpha =0.039\) and the true density f in bold plain black line. a f Gaussian, b f mixed Gaussian, c f Gamma and d f mixed Gamma

Table 1 Empirical MISE computed from 100 simulated data sets, with \(N=200\), with various T, \(\delta \), \(\sigma \) and \(\alpha \) for two kernel estimators \({\widehat{f}}_{cv}\), \({\widehat{f}}_{{\widehat{h}}}\) and two deconvolution estimators \({\widehat{f}}_{{\widetilde{\widetilde{\tau }}}}\) and \({\widetilde{f}}_{{\widetilde{m}}, {\widetilde{s}}}\)

We further compare \({\widehat{f}}_{{\widehat{h}}}\) and \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\). The two estimators seem close to the true density on graphs, see Figs. 1 and 2. In Table 2 on the MISE it is clear that the kernel estimator is the best. Furthermore, we can point out some differences. The first row of the Table corresponds to simulations with the parameters of the real dataset. In the first column, the Gaussian case, the MISE are 10 times larger than the ones for other cases. This can be easily explained: the values of the estimated density are 10 times larger than others. Nevertheless, on lines 3 and 4 for the Gaussian case, the MISE are very large. This is due to the bad estimation of the \(\phi _j\) by the \(Z_{j,T}\) with \(\sigma =0.05\) and \(T=0.3\).Footnote 1 The quality of the estimation is significantly better when we try a \({\mathcal {N}}(0.278,0.2)\) (0.2 is the variance of the mixed Gaussian density we implement for example). In general one can notice that when \(\sigma \) is larger than the standard deviation of the density of the random effects f, the estimation is less precise, which is coherent in term of ratio signal noise.

Table 2 shows that if T increases, it improves the results for \(\sigma =0.05\), compare cases 2 and 5 with 4 and 7 for example. If J is large enough, meaning if \(\delta \) is small enough (which is the case even for \(J=150\) when \(T=0.3\)), the deconvolution estimator fits well the density. In practice, when T increases, the selected value of s decreases, which could have been predicted. The results are still satisfying for large T. For the kernel estimator, although the theoretical condition \(1/h^5<T^2\) is not satisfied, the numerical results are good.

Another point is, as expected, that the larger N is, the better the estimators \({\widehat{f}}_{{\widehat{h}}}\) and \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\). We can refer to Comte et al. (2013) for a study with different values for N. It highlights the influence of N when the estimated density has two modes; for example with \(N=50\) the estimation is clearly less precise than for \(N=200\).

A main difference between our two estimators \({\widehat{f}}_{{\widehat{h}}}\) and \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\) is the computation time: a few seconds for the first one and ten minutes for the second one. Thus the kernel estimator with the method of bandwidth selection is very efficient, especially in the case of multi-modal densities, and performs often better than the deconvolution one.

Fig. 3
figure 3

Simulated data. In bold plain black curve is the true density f mixed Gamma, the estimator \({\widehat{f}}_{cv}\) estimator \({\widehat{f}}_{{\widehat{h}}}\) are superposed is in plain green (grey), estimator \({\widetilde{f}}_{{\widetilde{\widetilde{\tau }}}}\) is dotted black (blue) and estimator \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\) is plain blue (black), with \(N=200\), \(T=50\), \(\delta =0.05\), \(\sigma =0.05\), \(\alpha =1\)

Table 2 Empirical MISE computed from 100 simulated data sets, with \(N=240\), \(\alpha =0.039\) and various T, \(\delta \), \(\sigma \) for the kernel estimator \({\widehat{f}}_{{\widehat{h}}}\) and the deconvolution estimator \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\)

6 Application to neuronal data

6.1 Dataset

We describe quickly the data but we refer to Yu et al. (2004), Lansky et al. (2006) for example for details on data acquisition. The data are intracellular measurements of the membrane potential in volts along time, for one single neuron of a pig between the spikes. This is the depolarization phase. In this neuronal context, between the \((j-1)\)th and the jth spike, the depolarization of the membrane potential receiving a random input can be described by the Ornstein–Uhlenbeck model with one random effect (1). The spikes are not intrinsic to the model but are generated when the voltage reaches for the first time a certain threshold S, then the process is reset to a fixed initial voltage. Thus each trajectory is observed on an interval \([0,T_j]\) where \(T_j=\inf \{t>0, X_j(t) \ge S\}\). The initial voltage (the value following a spike) is assumed to be equal to the resting potential. The present dataset has been normalised to obtain N trajectories which begin in zero: \(x_j=0\).

The positive constant parameter \(\alpha \) is called the time constant of the neuron (the coefficient of decay in the exponential, when there is no noise), which is intrinsic to the neuron and fixed to \(\alpha =0.039 ~[\mathtt {s}]\) (Lansky et al. 2006). The diffusion coefficient \(\sigma ~[\mathtt{V/ \sqrt{s}}]\) has been estimated using the estimator \( {\widehat{\sigma ^2}}=(1/N)\sum _{j=1}^N \left( (1/J)\sum _{k=1}^J {((X_{j}(\delta (k+1))-X_j(\delta k))^2}/{\delta } \right) \). We obtain \({\hat{\sigma }}= 0.0135\), which is the same value as that used in Picchini et al. (2008). The \(\phi _j\) represents the local average input that the neuron receives during the jth inter-spike interval. We assume that \(\phi _j\) changes from one trajectory to another because of other neurons or the influence of the environment, for example. So parameters \(\phi \) and \(\sigma \) characterize the input, while \(\alpha \), \(x_j\) (the resting potential) and S (the firing threshold) describe the neuron irrespectively of the incoming signal (Picchini et al. 2008).

Data are composed of \(N=312\) inter-spike trajectories. For each interval \([0,T_j]\) the time step is the same: \(\delta =0.00015 ~[\mathtt s]\). We decide to keep only realizations with more than 2000 observations \((T_j/\delta \ge 2000)\). Finally we have \(N=240\) realizations with \(J=2000\) observations and for \(j=1,\ldots ,N\), \(T=T_j=0.3~[\mathtt s]\). Also the data are normalized in order to begin with zero at the initial time. The study of the units of measurement can highlight the collections given in Sect. 4. One can notice that the unit of measurement of v in the integrand must be \([\mathtt{s/V}]\) (same unit as \(1/Z_{j,\tau }\)) such that the exponential terms are without unit. The unit of s is \([\mathtt{\sqrt{s}/V}]\), and the choice of \({\mathcal {M}}\) with the same unit as v seems natural.

It is interesting to note that the normality of the \(Z_{j,T}\) is rejected by Shapiro and Wilk test (p-value \(10^{-7}\)) and Kolmogorov–Smirnov test (p-value \(10^{-3}\)). This suggests that the \(\phi _j\)’s are not Gaussian. Thus we want to estimate nonparametrically their density. In the following we compare our results to the estimation obtained in Picchini et al. (2010) under the parametric Gaussian assumption.

6.2 Comparison of estimators

The estimation of the density f obtained by Picchini et al. (2010) under the Gaussian assumption on these data, with \(\alpha =0.039\) are \({\mathcal {N}}(\mu =0.278, \eta ^2=(0.041)^2)\). Using a maximum-likelihood estimator on the \((Z_{j,T})\)’s we obtain for the mean 0.270 and for the standard deviation 0.046. We notice that these two estimations are close to that of Picchini et al. (2010). We use our two nonparametric estimators to see how close to a Gaussian density they are.

On Fig. 4 we represent both estimators \({\widehat{f}}_{{\widehat{h}}}\) and \({\widetilde{f}}_{{\widetilde{m}},{\widetilde{s}}}\) applied on the real data and the density \({\mathcal {N}}(\mu , \eta ^2)\). The two estimations are close, and close to the estimation of Picchini et al. (2010). However, it is also legitimate to think about a Gamma distribution to model the random parameters \(\phi _j\)’s because, as the data have been normalised, the estimated random effects are positive. Then it seems reasonable to use a non-negative random variable to model this local average input. Thus, a Gamma distribution may seem more appropriate than a Gaussian distribution, even if the chosen Gaussian has small probability to be negative. We look for the Gamma distribution which has for mean \(\mu =0.278\) and for standard deviation \(\eta =0.041\). This corresponds to a Gamma distribution with the shape parameter 46.3 and the scale parameter 0.006. We notice the similarity between the previous Gaussian curve and the new one. Thus this distribution seems also suitable to fit the distribution of the \(\phi _j\)’s, as Fig. 4 shows.

The Gaussian assumption is strong and leads to parametric tractable models. The present work confirms that this approximation is acceptable. However, the nonparametric estimation gives a density for the \(\phi _j\)’s that can be used to simulate the random effect and could be closer to the true one.

Notice that, as mentioned in introduction, Comte et al. (2013)’s estimator cannot handle small values of T while our new proposals are successful in such case. Let us precise this point. If the number of points is large enough, the variable \(Z_{j,T}\) can be approximate:

$$\begin{aligned} Z_{j,T}\approx \left( X_j(T)-X_j(0) +\frac{\delta }{\alpha } \sum _{l=1}^J X((l-1)\delta ) \right) \frac{1}{T}. \end{aligned}$$

It is unchanged when we change the units: V,s to mV,ms thus \(T=0.3\) to \(T=300\). However, the deconvolution estimator of Comte et al. (2013) for \(\tau =T\) is changing when the value of T is changing. In fact, the estimator is

$$\begin{aligned} {\widehat{f}}_{T}(x)= & {} \frac{1}{2\pi } \int _{-\sqrt{T}}^{\sqrt{T}} e^{-iux} \frac{1}{N} \sum _{j=1}^N e^{iuZ_{j,T}} e^{\frac{u^2\sigma ^2}{2T}}du\\\approx & {} \frac{1}{2\pi } \sum _{k=1}^{n_\text {pas}} \left( (u_{k+1}-u_k) e^{-iu_k x} \frac{1}{N} \sum _{j=1}^N e^{iu_k Z_{j,T}} e^{\frac{u_k^2\sigma ^2}{2T}}\right) \end{aligned}$$

and according to \(T=0.3\) or \(T=300\) the values of u and thus the interval of integration and the three exponential terms are changing. Finally estimator \({\widehat{f}}_{T}\) is changing with the units. And the interval of integration is not large enough in the case \(T=0.3\) to give a good estimation.

To solve this problem we have proposed a new estimator \({\widetilde{f}}_{{\widetilde{m}}, {\widetilde{s}}}\) to allow the user to deal with data with the units he/she wants and to not oblige him/her to change it.

But one can wonder if the new estimators are robust when increasing T. Indeed, our method works for larger T. Precisely changing volts in millivolts and seconds in milliseconds implies \(T=300\), \(\sigma =0.426\), and on simulated data we adequately reconstruct the shape of the density.

Fig. 4
figure 4

Real data. In bold green (grey) estimator \({\widehat{f}}_{{\widehat{h}}}\), in bold red (black) \({\widetilde{f}}_{{\widetilde{m}}, {\widetilde{s}}}\), the black dotted and bold line the density \({\mathcal {N}}(\mu ,\eta ^2)\) from Picchini et al. (2010) and the black dotted thin line the density \(\varGamma (46.3,0.006)\)

7 Discussion

In this work we study a stochastic differential Ornstein–Uhlenbeck mixed-effects model. We propose two estimators of the density of the random effect. Both estimators are not very sensitive to the effect of the time of observation T. Indeed the kernel strategy corresponds to a context with large T while we built a deconvolution estimator especially for small values of T. Both are data-driven and satisfy an oracle-type inequality. According to the numerical study, the kernel estimator seems to be the efficient one: the numerical results are convincing and close to the ones obtained by cross-validation. Besides we provide non-asymptotic theoretical results. Furthermore we study neuronal data with nonparametric estimation strategy. Instead of making any parametric assumptions for the random effect distribution, we build an estimator of its density. Future work based on this estimation could be more precise and closer to the real neuronal data. To complete the study, the method for different times of observation \(T_j\) could be settled up. Besides, some goodness-of-fit tests could be produced; we refer to Bissantz et al. (2007) who construct confidence bands for an estimator of f in the ordinary smooth deconvolution problem.

The model can be completed by adding another random effect: the time constant of the neuron. Picchini and Ditlevsen (2011) have investigated this model in a parametric way and Dion and Genon-Catalot (2015) in a nonparametric way. A recent work Delattre et al. (2016) assumes that the density of the random effect is a Gaussian mixture and uses data clustering method, which is an interesting approach for the data described in the present paper. The question of a random effect in the diffusion coefficient also is open (see Delattre et al. 2015). Moreover, a model with a drift \(b(X_j(t))+\phi _j\), where b is a known function, can be treated with the same method. However, dealing with a diffusion \(\sigma (X_j(t))\) where \(\sigma \) is a known function is a more complex problem.