1 Introduction

Longitudinal data are widely collected in clinical trials, epidemiology, pharmacokinetic pharmacodynamics experiments and agriculture. The interest may focus on population effects among individuals and individual specific behaviour. In mixed effects models, random effects are incorporated to accomodate variability among subjects or inter-individual variability, while the same structural model rules the dynamics of each subject. In stochastic differential equations with mixed effects (SDEMEs), the structural model is a set of stochastic differential equations. The use of SDEMEs is comparatively recent. It has first been motivated by pharmacological applications (see Ditlevsen and De Gaetano 2005; Donnet and Samson 2008; Delattre and Lavielle 2013; Leander et al. 2015; Forman and Picchini 2016 and many others) but also neurobiological ones (as in Dion 2016 for example). The main issue in mixed-effects models is the estimation of the parameters in the distribution of the random effects. This is generally difficult in practice due to the intractable likelihood function. To overcome this problem, many methods based on approximate solutions associated with computationally intensive numerical methods have been proposed (Picchini et al. 2010; Picchini and Ditlevsen 2011; Delattre and Lavielle 2013). For general mixed models as well as for SDEMEs, these methods, due to their iterative settings, lead to large computation times for the estimation. Moreover, when the stuctural model is a stochastic differential equation, there is an additional problem which derives from the intractability of the likelihood associated with discrete observations of one path.

The SDEME framework allows to take account of two sources of randomness in the structural SDE model, random effects in the drift and random effects in the diffusion coefficient, and the temptation to incorporate in this SDE model a joint distribution for these two random effects is quite natural. This modeling has been proposed by several authors: Picchini et al. (2010), Picchini and Ditlevsen (2011), Berglund et al. (2001), Forman and Picchini (2016), Whitaker et al. (2017). But to our knowledge, it has never been studied theoretically. However, it is well known that, for discretely observed diffusion processes, estimation of parameters in the drift coefficient and diffusion coefficient have different properties (see e.g., Kessler et al. 2012). Our aim here is to investigate the statistical properties of such a situation according to n, the number of observations per individual (or path), and the total number of individuals N. The results of Nie and Yang (2005), Nie (2006, 2007) do not provide answers to these questions, and all the numerical methods compulsory to the likelihood intractability, are unable to tackle this problem. Understanding how estimation performs in these kinds of models is an important issue, necessary to clarify what can be expected from all the numerical methods widely used in this domain. Since the work of Ditlevsen and De Gaetano (2005), where the special case of a mixed-effects Brownian motion with drift is treated, the main contributions to our knowledge for a theoretical study of parametric inference in SDEMEs, are from Delattre et al. (2013, 2015, 2017) and Grosse Ruse et al. (2017).

In this paper, we consider discrete observations and simultaneous random effects in the drift and diffusion coefficients by means of a joint parametric distribution. The inclusion of random effects both in the drift and the diffusion coefficient raises new problems which were not addressed in our previous works. We focus on specific distributions for the random effects. They derive from a Bayesian choice of distributions and lead to explicit approximate likelihoods (this choice indeed corresponds to explicit posterior and marginal distributions for an n-sample of Gaussian distributions with a specific prior distribution on the parameters).

More precisely, we consider N real valued stochastic processes \((X_i(t), t \ge 0)\), \(i=1, \ldots ,N\), with dynamics ruled by the following random effects stochastic differential equation (SDE):

$$\begin{aligned} d\, X_i(t) = \Phi _i' b(X_i(t))dt+\Psi _i \sigma (X_i(t))\, dW_{i}(t),\quad X_{i}(0) =x,\, i=1, \ldots ,N , \end{aligned}$$
(1)

where \((W_1, \ldots , W_N)\) are N independent Wiener processes, \(((\Phi _i, \Psi _i), i=1, \ldots N)\) are N i.i.d. \({{\mathbb {R}}}^d\times (0,+\infty )\)-valued random variables, \(((\Phi _i, \Psi _i), i=1, \ldots ,N)\) and \((W_1, \ldots , W_N)\) are independent and x is a known real value. The functions \(\sigma (.):{{\mathbb {R}}}\rightarrow {{\mathbb {R}}}\) and \( b(.)=(b_1(.),\ldots , b_d(.))' :{{\mathbb {R}}}\rightarrow {{\mathbb {R}}}^d\) are known. The notation \(X'\) for a vector or a matrix X denotes the transpose of X. Each process \((X_i(t))\) represents an individual and the \(d+1\)-dimensional random vector \((\Phi _i, \Psi _i)\) represents the (multivariate) random effect of individual i. In Delattre et al. (2013), the N processes are assumed to be continuously observed throughout a fixed time interval [0, T], \(T>0\) and \(\Psi _i= \gamma ^{-1/2}\) is non random and known. When \(\Phi _i\) follows a Gaussian distribution \({{\mathcal {N}}}_d({\varvec{\mu }}, \gamma ^{-1}{\varvec{\Omega }})\), the exact likelihood associated with \((X_i(t), t\in [0,T], i=1, \ldots ,N)\) is explicitely computed and asymptotic properties of the exact maximum likelihood estimators are derived under the asymptotic framework \(N\rightarrow +\infty \). In Delattre et al. (2015), the case of \(b(.)=0\) and \(\Psi _i= \Gamma _i^{-1/2}\) with \(\Gamma _i\) following a Gamma distribution \(G(a, \lambda )\) is investigated; and Delattre et al. (2017) is concerned with estimation of mixed effects either in the drift or in the diffusion coefficient from discrete observations. From now on, each process \((X_i(t))\) is discretely observed on a fixed-length time interval [0, T] at n times \(t_j=jT/n\) and the random effects \((\Phi _i, \Psi _i)\) follow a joint parametric distribution. Our aim is to estimate the unknown parameters from the observations \((X_i(t_j), j=1, \ldots ,n; i=1, \ldots , N)\). We focus on distributions that give rise to explicit approximations of the likelihood functions so that the construction of estimators is easy and their asymptotic study feasible. To this end, we assume that \((\Phi _i,\Psi _i)\) has the following distribution:

$$\begin{aligned} \Psi _i=\frac{1}{\Gamma _i^{1/2}}, \quad \Gamma _i \sim G(a, \lambda ), \text{ and } \text{ given } \Gamma _i=\gamma , \quad \Phi _i \sim {{\mathcal {N}}}_d({\varvec{\mu }},\gamma ^{-1} {\varvec{\Omega }}). \end{aligned}$$
(2)

Contrary to the common Gaussian assumption for random effects, the marginal distribution of \(\Phi _i\) is not Gaussian: \(\Phi _i -{\varvec{\mu }}= \Gamma _i^{-1/2} \eta _i\), with \(\eta _i \sim {{\mathcal {N}}}_d(0, {\varvec{\Omega }})\), is a Student distribution. We propose two distinct approximate likelihood functions which yield asymptotically equivalent estimators as both N and the number n of observations per individual tend to infinity. The first approximate likelihood (Method 1) is natural but proving its existence raises technical difficulties (Proposition 1, Lemma 1). The second one (Method 2) derives from Method 1 and has the advantage of splitting the estimation of parameters in the diffusion coefficient and parameters in the drift term. We obtain consistent and \(\sqrt{N}\)-asymptotically Gaussian estimators for all parameters under the condition \(N/n \rightarrow 0\). For the parameters \((\lambda ,a)\) of random effects in the diffusion coefficient, we obtain that the weaker constraint \(N/n^2\) is enough (Theorems 1, 2). We prove that the estimators obtained with these two approximate likelihoods are asymptotically equivalent (Theorem 3). We compare these results with the estimation for N i.i.d. direct observations of the random effects \((\Phi _i, \Psi _i)\).

The methods of the present paper and of the previous ones (Delattre et al. 2013, 2015, 2017) are now avalaible in the R package MsdeParEst (Delattre and Dion 2017) for mixed Ornstein–Uhlenbeck and CIR.

The structure of the paper is the following. Some working assumptions and two approximations of the model likelihood are introduced in Sect. 2. Two estimation methods are derived from these approximate likelihoods in Sect. 3, and their respective asymptotic properties are studied. Section 4 provides numerical simulation results for several examples of SDEMEs and illustrates the performances of the proposed methods in practice. Theoretical proofs are gathered in the “Appendix”. Some technical proofs are given in an electronic supplementary material. Auxiliary results are given in Sect. 7.

2 Approximate likelihoods

2.1 Framework and assumptions

Let \((X_i(t), t \ge 0)\), \(i=1, \ldots ,N\) be N real valued stochastic processes ruled by (1). The processes \((W_1, \ldots , W_N)\) and the r.v.’s \((\Phi _i,\Psi _i), i=1, \ldots ,N\) are defined on a common probability space \((\Omega , {{\mathcal {F}}}, {{\mathbb {P}}})\) and we set \(({{\mathcal {F}}}_t=\sigma ( \Phi _i, \Psi _i, W_i(s), s\le t, i=1, \ldots ,N), t\ge 0)\). We assume that:

  1. (H1)

    The real valued functions \(x \rightarrow b_j(x)\), \(j=1, \ldots , d\) and \(x \rightarrow \sigma (x)\) are \(C^2\) on \({{\mathbb {R}}}\) with first and second derivatives bounded by L. The function \(\sigma ^2(.)\) is lower bounded : \(\exists \; \sigma _0\ne 0, \forall x\in {{\mathbb {R}}}, \sigma ^2(x)\ge \sigma ^2_0.\)

  2. (H2)

    There exists a constant K such that, \(\forall x \in {{\mathbb {R}}}\), \(\Vert b(x)\Vert +|\sigma (x)|\le K\).

Assumption (H1) corresponds to the usual linear growth condition and regularity assumptions on b(.), \(\sigma (.)\) which ensure the existence and uniqueness of strong solutions of Eq. (1). However, we need here, for the accurate study of discretizations (see Sect. 7.2), that \(X_i(t)\) has finite moments. For stochastic differential equations, the property {\(E(X_i^{2p}(t)) <\infty \)} holds if the initial condition satisfies \(\{ E(X_i^{2p}(0))<\infty \}\). This does no longer hold here, even if {\( E(||\Phi _i||^{2p}+ \Psi _i^{2p})<\infty \)} (see Sect. 7.1). We have either to assume that \((\Phi _i,\Psi _i)\) belongs to a bounded set or (H2). To circumvent this last assumption, we could think of applying a localization device (see Jacod and Protter 2012, Chapter 4.4.1). However, while it is straightforward to apply this method for one path, the extension to N paths \(X_i(.)\) simultaneously, is complex, especially as \(X_i(t)\) has possibly no finite moments (see Sect. 7.1).

The processes \((X_i(t))\) are discretely observed with sampling interval \(\Delta _n\) on a fixed time interval [0, T] and for sake of clarity we assume a regular sampling on [0, T]:

$$\begin{aligned} \Delta _n=\Delta =\frac{T}{n}, \; X_{i,n}=X_i=(X_i(t_{j,n}),\;\;t_{j,n}=t_j=jT/n, \;\; j=1, \ldots ,n). \end{aligned}$$
(3)

Let \(\vartheta \) denote the unknown parameter and \(\Theta \) the parameter set with

$$\begin{aligned} \vartheta =(\lambda ,a, {\varvec{\mu }},{\varvec{\Omega }}) \end{aligned}$$
(4)

The canonical space associated with one trajectory on [0, T] is defined by \((({{\mathbb {R}}}^d\times (0,+\infty )\times C_T), P_\vartheta )\) where \(C_T\) denotes the space of real valued continuous functions on [0, T], \(P_\vartheta \) denotes the distribution of \((\Phi _i,\Psi _i, (X_i(t), t\in [0,T]))\) and \(\vartheta \) the unknown parameter. For the N trajectories, the canonical space is \(\prod _{i=1}^N (({{\mathbb {R}}}^d\times (0,+\infty )\times C_T),{{\mathbb {P}}}_\vartheta = \otimes _{i=1}^N P_\vartheta )\). Below, the true value of the parameter is denoted \(\vartheta _0\).

We introduce the statistics and the assumptions

$$\begin{aligned} S_{i,n}= & {} \frac{1}{\Delta }\sum _{j=1}^{n} \frac{\left( X_i(t_{j})-X_i(t_{j-1})\right) ^2}{\sigma ^2(X_i(t_{j-1}))},\nonumber \\ V_{i,n}= & {} \left( \sum _{j=1}^{n}\Delta \frac{b_k(X_i(t_{j-1}))b_\ell (X_i(t_{j-1}))}{ \sigma ^2(X_i(t_{j-1}))}\right) _{1\le k,\ell \le d}, \end{aligned}$$
(5)
$$\begin{aligned} U_{i,n}= & {} \left( \sum _{j=1}^{n} \frac{b_k(X_i(t_{j-1}))(X_i(t_{j})-X_i(t_{j-1}))}{ \sigma ^2(X_i(t_{j-1}))}\right) _{1\le k\le d}. \end{aligned}$$
(6)
  1. (H3)

    The matrix \(V_i(T)\) is positive definite a.s., where

    $$\begin{aligned} V_i(T)= \left( \int _0^T \frac{b_k(X_i(s))b_\ell (X_i(s))}{ \sigma ^2(X_i(s))}ds\right) _{1\le k,\ell \le d}. \end{aligned}$$
    (7)
  2. (H4)

    The parameter set \(\Theta \) satisfies that, for constants \(\ell _0, \ell _1, \alpha _0, \alpha _1 ,m,c_0,c_1,\) \(0< \ell _0\le \lambda \le \ell _1,\; 0<\alpha _0 \le a\le \alpha _1,\; \Vert {\varvec{\mu }}\Vert \le m,\; c_0\le \lambda _{max}({\varvec{\Omega }})\le c_1,\) where \(\lambda _{max}({\varvec{\Omega }})\) denotes the maximal eigenvalue of \({\varvec{\Omega }}\).

Assumption (H3) ensures that all the components of \(\Phi _i\) can be estimated. If the functions \((b_k/\sigma ^2)\) are not linearly independent, the dimension of \(\Phi _i\) is not well defined and (H3) is not fulfilled. Note that, as n tends to infinity, the matrix \(V_{i,n}\) defined in (5) converges a.s. to \(V_i(T)\) so that, under (H3), for n large enough, \(V_{i,n}\) is positive definite. Assumption (H4) is classically used in a parametric setting. Under (H4), the matrix \({\varvec{\Omega }}\) may be non invertible which allows including non random effects in the drift term.

2.2 First approximation of the likelihood

Let us now compute our first approximate likelihood \({{\mathcal {L}}}_n(X_{i,n},\vartheta )\) of \(X_{i,n}\), with \(\vartheta \) defined in (4). The exact likelihood of the i-th vector of observations is obtained by computing first the conditional likelihood given \(\Phi _i=\varphi , \Psi _i=\psi \), and then integrating the result with respect to the joint distribution of \((\Phi _i, \Psi _i)\). The conditional likelihood given fixed values \((\varphi , \psi )\), i.e. the likelihood of \((X_i^{\varphi , \psi }(t_j), j=1, \ldots ,n)\) being not explicit, is approximated by using the Euler scheme likelihood of

$$\begin{aligned} d\, X_i^{\varphi , \psi }(t) = \varphi ' b(X_i^{\varphi ,\psi }(t))dt+\psi \sigma (X_i^{\varphi ,\psi }(t))\, dW(t),\quad X_i^{\varphi ,\psi }(0) =x. \end{aligned}$$
(8)

Setting \(\psi = \gamma ^{-1/2}\), the exact likelihood of \((X_i^{\varphi , \psi }(t_j), j=1, \ldots ,n)\) is replaced by the likelihood of \((Y_{i,j},j=1, \ldots ,n)\):

$$\begin{aligned} Y_{i,j}-Y_{i,j-1}= \Delta \varphi ' b(Y_{i,j-1}) + \sqrt{\Delta }\; \psi \; \sigma (Y_{i,j-1})\; \epsilon _{i,j}, \end{aligned}$$

with \(Y_{i,0}=x\) and \(\epsilon _{i,j}= \frac{W_i(t_j)-W_i(t_{j-1})}{\sqrt{\Delta }}\) i.i.d. \(\mathcal{N}(0,1)\). Therefore, this yields the approximate conditional likelihood:

$$\begin{aligned} L_n(X_{i,n},\gamma , \varphi )= L_n(X_{i},\gamma ,\varphi )= \gamma ^{n/2} \exp {\left[ -\frac{\gamma }{2}(S_{i,n} +\varphi ' V_{i,n} \varphi -2\varphi ' U_{i,n})\right] }, \end{aligned}$$
(9)

where this formula ignores multiplicative functions which do not contain the unknown parameters.

The unconditional approximate likelihood is obtained integrating with respect to the joint distribution \(\nu _{\vartheta }(d\gamma ,d\varphi )\) of the random effects \((\Gamma _i=\Psi _i^{-2},\Phi _i)\).

For this, we first integrate \(L_n(X_i, \gamma ,\varphi )\) with respect to the Gaussian distribution \({{\mathcal {N}}}_d({\varvec{\mu }},\gamma ^{-1}{\varvec{\Omega }})\). Then, we integrate the result w.r.t. the distribution of \(\Gamma _i\). At this point, a difficulty arises. This second integration is only possible on the subset \(E_{i,n}(\vartheta )\) defined in (12).

Let \(I_d\) denote the identity matrix of \({{\mathbb {R}}}^d\) and set, for \(i=1, \ldots ,N\), under (H3)

$$\begin{aligned} R_{i,n}= & {} V_{i,n}^{-1}+ {\varvec{\Omega }}= V_{i,n}^{-1}(I_d + V_{i,n}{\varvec{\Omega }})= (I_d + {\varvec{\Omega }}V_{i,n})V_{i,n}^{-1}, \end{aligned}$$
(10)
$$\begin{aligned} T_{i,n}({\varvec{\mu }}, {\varvec{\Omega }})= & {} ({\varvec{\mu }}- V_{i,n}^{-1}U_{i,n})'R_{i,n}^{-1}({\varvec{\mu }}- V_{i,n}^{-1}U_{i,n})- U_{i,n}'V_{i,n}^{-1}U_{i,n}, \end{aligned}$$
(11)
$$\begin{aligned} E_{i,n}(\vartheta )= & {} \{S_{i,n}+ T_{i,n} ( {\varvec{\mu }}, {\varvec{\Omega }})>0\};\;\; \mathbf E _{N,n}(\vartheta )= \cap _{i=1}^N E_{i,n}(\vartheta ). \end{aligned}$$
(12)

Proposition 1

Assume that, for \(i=1, \ldots ,N\), \((\Phi _i, \Psi _i)\) has distribution (2). Under (H1) and (H3), an explicit approximate likelihood for the observation \((X_{i,n}, i=1, \ldots ,N)\) is on the set \(\displaystyle \mathbf E _{N,n}(\vartheta )\) (see 12),

$$\begin{aligned} {{\mathcal {L}}}_{N,n}(\vartheta )= & {} \prod _{i=1}^N {{\mathcal {L}}}_n(X_{i,n}, \vartheta ),\quad \text{ where } \end{aligned}$$
(13)
$$\begin{aligned} {{\mathcal {L}}}_n(X_{i,n}, \vartheta )= & {} \frac{\lambda ^a \Gamma (a+(n/2)) }{\Gamma (a)\left( \lambda + \frac{S_{i,n}}{2}+ \frac{T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})}{2}\right) ^{a+(n/2)} }\frac{1}{ (det(I_d+ V_{i,n}{\varvec{\Omega }}))^{1/2}} .\quad \end{aligned}$$
(14)

Note that this approximate likelihood also ignores multiplicative functions which do not contain the unknown parameters. We must now deal with the set \(\displaystyle \mathbf E _{N,n}(\vartheta )\). For each i, elementary properties of quadratic variations yield that, as n tends to infinity, \(S_{i,n}/n\) tends to \(\Gamma _i^{-1}\) in probability. On the other hand, the random matrix \(V_{i,n}\) tends a.s. to the integral \(V_i(T)\) and the random vector \(U_{i,n}\) tends in probability to the stochastic integral

$$\begin{aligned} U_i(T)= \left( \int _0^T \frac{b_k(X_i(s))}{ \sigma ^2(X_i(s))}dX_i(s)\right) _{1\le k\le d}. \end{aligned}$$
(15)

Therefore \( T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})/n\) tends to 0 (see 11). This implies that, for all \(i=1, \ldots , N\) and for all \((\vartheta _0, \vartheta )\), \({{\mathbb {P}}}_{\vartheta _0}(E_{i,n}(\vartheta ))\rightarrow 1\) as n tends to infinity. However, we need the more precise result

$$\begin{aligned} \forall \vartheta ,\vartheta _0, \quad {{\mathbb {P}}}_{\vartheta _0}( \mathbf E _{N,n}(\vartheta ))\rightarrow 1. \end{aligned}$$

Moreover, the set on which the approximate likelihood is considered should not depend on \(\vartheta \).

To this end, let us define, for \(\alpha >0\), using the notations of (H4),

$$\begin{aligned} M_{i,n}= & {} \max \{ c_1+2,2 m^2 \} ( 1+\Vert U_{i,n}\Vert ^2 );\nonumber \\ F_{i,n}= & {} \{S_{i,n}- M_{i,n} \ge \alpha \sqrt{n}\} ; \;\; \mathbf F _{N,n}= \cap _{i=1}^N F_{i,n}. \end{aligned}$$
(16)

Lemma 1

Assume (H1)–(H4). For all \(\vartheta \) satisfying (H4) and all i, we have \( F_{i,n} \subset E_{i,n}(\vartheta )\). If \(a_0>4\) and as \(N,n \rightarrow \infty \), \(N=N(n)\) is such that \(N/n^{2} \rightarrow 0\), then,

$$\begin{aligned} \forall \vartheta _0, \quad {{\mathbb {P}}}_{\vartheta _0} (\mathbf F _{N,n})\rightarrow 1. \end{aligned}$$

For this, we prove that, if \(a_0>4\),

$$\begin{aligned} {{\mathbb {P}}}_{\vartheta _0}( F_{1,n}^c)\lesssim \frac{1}{n^2}, \end{aligned}$$
(17)

where \(\lesssim \) means lower than or equal to up to a multiplicative known constant which does not depend on n. This explains the constraint \(N/n^2\rightarrow 0\).

For this proof, the condition \(c_0\le \lambda _{max}({\varvec{\Omega }})\) of (H4) is not required.

The condition \(a_0>4\) is a moment condition for \(\Psi _i\) required by the proof. It is equivalent to the fact that \({{\mathbb {E}}} \Psi _i^8={{\mathbb {E}}} \Gamma _i^{-4}= \lambda _0^4/(a_0-1)(a_0-2)(a_0-3)(a_0-4)<+\infty \). Note that \({{\mathbb {E}}} \Psi _i^2={{\mathbb {E}}} \Gamma _i^{-1}=\lambda _0/(a_0-1)\) so that the random effect takes moderate values which is reasonable.

As a consequence of Lemma 1, for all \(\vartheta \), \({{\mathbb {P}}}_{\vartheta _0}(\mathbf E _{N,n}(\vartheta ))\rightarrow 1\) and (13) is well defined on the set \(\mathbf F _{N,n}\) which is independent of \(\vartheta \) and has probability tending to 1. The proof of Lemma 1 is surprisingly technical and detailed in an electronic supplementary material.

2.3 Second approximation of the likelihood

Formulae (13)–(14) suggest another approximation of the likelihood which is simpler. We give the heuristics for this approximation. We can write:

$$\begin{aligned} {{\mathcal {L}}}_n(X_{i,n}, \vartheta )= & {} {{\mathcal {L}}}_n^{(1)}(X_{i,n}, \vartheta )\times {{\mathcal {L}}}_n^{(2)}(X_{i,n}, \vartheta ) \nonumber \\ {{\mathcal {L}}}_n^{(1)}(X_{i,n}, \vartheta )= & {} {{\mathcal {L}}}_n^{(1)}(X_{i,n}, \lambda ,a)= \frac{\lambda ^a \Gamma (a+(n/2)) }{\Gamma (a)\left( \lambda + \frac{S_{i,n}}{2}\right) ^{a+(n/2)} } \nonumber \\ {{\mathcal {L}}}_n^{(2)}(X_{i,n}, \vartheta )= & {} \frac{\left( \lambda + \frac{S_{i,n}}{2}\right) ^{a+(n/2)}}{\left( \lambda + \frac{S_{i,n}}{2}+ \frac{T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})}{2}\right) ^{a+(n/2)} } \frac{1}{(det(I_d+ V_{i,n}{\varvec{\Omega }}))^{1/2}} . \end{aligned}$$
(18)

The first term \({{\mathcal {L}}}_n^{(1)}(X_{i,n}, \vartheta )\) only depends on \((\lambda ,a)\) and is equal to the approximate likelihood function obtained in Delattre et al. (2015) for \(b\equiv 0\) and \(\Gamma _i\sim G(a,\lambda )\). For the second term, we have:

$$\begin{aligned} \log {{{\mathcal {L}}}_n^{(2)}(X_{i,n}, \vartheta )}= & {} -\left( a+\frac{n}{2}\right) \log {\left( 1+ \frac{1}{n} \frac{T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})}{\left( 2\frac{\lambda }{n} + \frac{S_{i,n}}{n}\right) }\right) }\\&- \frac{1}{2}\log {(det(I_d+ V_{i,n}{\varvec{\Omega }})}). \end{aligned}$$

As for n tending to infinity, \(T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})\) tends to a fixed limit and \(S_{i,n}/n\) tends to \(\Gamma _i^{-1}\), this yields:

$$\begin{aligned} \log {{{\mathcal {L}}}_n^{(2)}(X_{i,n}, \vartheta )}\sim & {} - \frac{\left( a+\frac{n}{2}\right) }{n} \frac{T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})}{\left( 2\frac{\lambda }{n} + \frac{S_{i,n}}{n}\right) }- \frac{1}{2}\log {(det(I_d+ V_{i,n}{\varvec{\Omega }})})\\\sim & {} - \frac{n}{2 \,S_{i,n}}T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})- \frac{1}{2}\log {(det(I_d+ V_{i,n}{\varvec{\Omega }})}). \end{aligned}$$

Now, the above expression only depends on \(({\varvec{\mu }}, {\varvec{\Omega }})\). Morevover, as the term \(U_{i,n}'V_{i,n}^{-1}U_{i,n} \) in \(T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})\) does not contain parameters, we can forget it and set:

$$\begin{aligned} {{\varvec{V}}}_n(X_{i,n}, \vartheta )= & {} {{\varvec{V}}}_n^{(1)}(X_{i,n}, \lambda ,a)+{{\varvec{V}}}_n^{(2)}(X_{i,n}, {\varvec{\mu }}, {\varvec{\Omega }}) \quad \text{ with }\nonumber \\ {{\varvec{V}}}_n^{(1)}(X_{i,n}, \lambda ,a)= & {} a\log {\lambda } - \log {\Gamma (a) }- \left( a+\frac{n}{2}\right) \log {\left( \lambda + \frac{S_{i,n}}{2}\right) }\nonumber \\&+\, \log { \Gamma (a+(n/2)) },\nonumber \\ {{\varvec{V}}}_n^{(2)}(X_{i,n}, {\varvec{\mu }}, {\varvec{\Omega }})= & {} - \frac{n}{2 \,S_{i,n}} ({{\varvec{\mu }}}- V_{i,n}^{-1}U_{i,n})'R_{i,n}^{-1}({{\varvec{\mu }}}- V_{i,n}^{-1}U_{i,n})\nonumber \\&-\, \frac{1}{2}\log {(det(I_d+ V_{i,n}{\varvec{\Omega }})}), \end{aligned}$$
(19)

and define the second approximation for the log-likelihood: \({{\varvec{V}}}_{N,n}( \vartheta )= \sum _{i=1}^N {{\varvec{V}}}_n(X_{i,n}, \vartheta ) \). Thus, estimators of \((\lambda ,a)\) and \(({\varvec{\mu }}, {\varvec{\Omega }})\) can be computed separately. It is noteworthy that this second approximation overcomes the difficulties encountered in Lemma 1.

3 Asymptotic properties of estimators

In this section, we study the asymptotic behaviour of the estimators based on the two approximate likelihood functions of the previous section. To serve as a baseline, the estimation when an i.i.d. sample \((\Phi _i,\Gamma _i), i=1, \ldots , N\) is observed is presented in Sect. 7.4.

3.1 Estimation based on the first approximation of the likelihood

We study the estimation of \(\vartheta =(\lambda ,a, {\varvec{\mu }}, {\varvec{\Omega }})\) using the approximate likelihood \({{\mathcal {L}}}_{N,n}( \vartheta )\) given in Proposition 1 on the set \(\mathbf {F}_{N,n}\) studied in Lemma 1 (see 13, 14, 16). Let

$$\begin{aligned} Z_{i,n}(a,\lambda , {\varvec{\mu }}, {\varvec{\Omega }})=Z_i= \frac{ \lambda + (S_{i,n}/2)+(T_{i,n}( {\varvec{\mu }}, {\varvec{\Omega }})/2)}{a+n/2}. \end{aligned}$$
(20)

By (H4), using notations (16), we have \(\lambda + S_{i,n}/2 + T_{i,n}({\varvec{\mu }}, {\varvec{\Omega }})/2 \ge (S_{i,n} -M_{i,n})/2\). We refer to the supplementary material (proof of Lemma 1), where we give a proof of the nontrivial property that, if \(N/n^2\) tends to 0, with probability tending to 1, \(T_{i,n}({\varvec{\mu }}, {\varvec{\Omega }})\ge -M_{i,n}\) for all \(i=1, \ldots , N\).

More precisely, on the set \( F_{i,n}\), \(Z_{i,n}> \alpha / (\sqrt{n}+ 2(\alpha _1/\sqrt{n}))>0\) where \(\alpha _1\) is defined in (H4). Instead of considering \(\log {{{\mathcal {L}}}_n(X_i, \vartheta )}\) on \(F_{i,n}\), to define a contrast, we replace \(\log Z_{i,n} \) by \(1_{F_{i,n}}\log Z_{i,n}\) and set \(\mathbf U _{N,n}(\vartheta )= \sum _{i=1}^N \mathbf U _n(X_i, \vartheta ),\) with

$$\begin{aligned} \mathbf U _n(X_{i,n}, \vartheta )= & {} a\log \lambda -\log \Gamma (a) +\log \Gamma (a+(n/2))-(a+(n/2))\log (a+(n/2))\\&-\,\frac{1}{2} \log {det(I_d+ V_{i,n}{\varvec{\Omega }})} - (a+(n/2))1_{F_{i,n}}\log Z_{i,n} . \end{aligned}$$

Define the pseudo-score function and the associated estimators \({\tilde{\vartheta }}_{N,n}\):

$$\begin{aligned} {{\mathcal {G}}}_{N,n}(\vartheta ) =\nabla _{\vartheta }{} \mathbf U _{N,n}(\vartheta ), \quad {{\mathcal {G}}}_{N,n}({\tilde{\vartheta }}_{N,n})=0. \end{aligned}$$
(21)

To investigate their asymptotic behaviour, we need to prove that \(Z_{i,n}\) defined in (20) is close to \(\Gamma _i^{-1}\) and that \(1_{F_{i,n}}Z_{i,n}^{-1}\) is close to \(\Gamma _i\). For this, we introduce the random variable

$$\begin{aligned} S_{i,n}^{(1)}=S_{i}^{(1)}=\Psi _i^2 \sum _{j=1}^{n}\frac{(W_i(t_{j})-W_i(t_{j-1}))^2}{\Delta }=\frac{1}{ \Gamma _i} C_{i,n}^{(1)} \end{aligned}$$
(22)

which corresponds to \(S_{i,n}\) when \(b(.)=0, \sigma (.) =1\). Then, we split \(Z_{i,n}-\Gamma _i^{-1}\) into \(Z_{i,n} -\frac{S_{i,n}^{(1)}}{n}+ \frac{S_{i,n}^{(1)}}{n} -\Gamma _i^{-1}\) and study successively the two terms. The second term has explicit distribution as \(C_{i,n}^{(1)}\) has \(\chi ^2(n)\) distribution and is independent of \(\Gamma _i\). The first term is treated below. We proceed analogously for \(1_{F_{i,n}}Z_{i,n}^{-1}- \Gamma _i\) introducing \(n/ S_{i,n}^{(1)}\).

Lemma 2

Assume (H1)–(H4). For all \(p\ge 1\), we have

$$\begin{aligned} {{\mathbb {E}}}_{\vartheta }\left( \left| Z_{i,n} -\frac{S_{i,n}^{(1)}}{n}\right| ^p |\Psi _i=\psi , \Phi _i=\varphi \right)\lesssim & {} \frac{1}{n^p}(1+\varphi ^{2p}+\psi ^{2p}\nonumber \\&+\,\psi ^{4p}+\varphi ^{2p}\psi ^{2p}), \end{aligned}$$
(23)
$$\begin{aligned} {{\mathbb {E}}}_{\vartheta }\left( \left[ \left| Z_{i,n}^{-1} -\frac{n}{S_{i,n}^{(1)}}\right| ^p 1_{F_{i,n}}\right] | \Psi _i=\psi , \Phi _i=\varphi \right)\lesssim & {} \frac{1}{n^p}(1+(1+\varphi ^{2p})(\psi ^{-2p}+ \psi ^{-4p})\nonumber \\&+\,\varphi ^{4p}+ \psi ^{4p}+\varphi ^{4p}\psi ^{-4p}). \end{aligned}$$
(24)

Using (5), (6), (10), we set

$$\begin{aligned} A_{i,n}= & {} (I_d+ V_{i,n}{\varvec{\Omega }})^{-1}(U_{i,n}-V_{i,n}{\varvec{\mu }})\nonumber \\= & {} B_{i,n}(V_{i,n}^{-1}U_{i,n}-{\varvec{\mu }}) \quad \text{ where } \; B_{i,n}= B_{i,n}({\varvec{\Omega }})=R_{i,n}^{-1}. \end{aligned}$$
(25)

Let \(\psi (u)=\frac{\Gamma '(u)}{\Gamma (u)}\) and \(F^c\) denote the complementary set of F. We obtain,using (20), (25),

$$\begin{aligned} \frac{\partial \mathbf U _{N,n}}{\partial \lambda }(\vartheta )= & {} \sum _{i=1}^N \left( \frac{a}{\lambda } -Z_{i,n}^{-1}\;1_{F_{i,n}}\right) ,\\ \frac{\partial \mathbf U _{N,n}}{\partial a}(\vartheta )= & {} N \left( \psi (a+n/2)-\log {(a+n/2)} -\psi (a)+\log {\lambda }\right) \\&-\,\sum _{i=1}^N \left( 1_{F_{i,n}} \log { Z_{i,n} }-1_{F_{i,n}^c}\right) , \\ \nabla _{{\varvec{\mu }}}{} \mathbf U _{N,n}(\vartheta )= & {} -\frac{1}{2} \sum _{i=1}^N 1_{F_{i,n}} Z_{i,n}^{-1} \nabla _{{\varvec{\mu }}}T_{i,n} ({\varvec{\mu }}, {\varvec{\Omega }})=\sum _{i=1}^N 1_{F_{i,n}} Z_{i,n}^{-1}A_{i,n},\\ \nabla _{{\varvec{\Omega }}} \mathbf U _{N,n}(\vartheta )= & {} -\frac{1}{2}\sum _{i=1}^N \left( B_{i,n} +1_{F_{i,n}} Z_{i,n}^{-1}\;\nabla _{{\varvec{\Omega }}}T_{i,n}({\varvec{\mu }}, {\varvec{\Omega }})\right) \\= & {} \frac{1}{2}\sum _{i=1}^N \left( 1_{F_{i,n}} Z_{i,n}^{-1}\; A_{i,n}A_{i,n}'- B_{i,n}\right) . \end{aligned}$$

Using (7), (15), we set

$$\begin{aligned} B_i(T;{\varvec{\Omega }})= (V_i(T)^{-1}+{\varvec{\Omega }})^{-1} ,\quad A_i(T;{\varvec{\mu }},{\varvec{\Omega }})= B_i(T;{\varvec{\Omega }})(V_{i}(T)^{-1} U_{i}(T)-{\varvec{\mu }}) \end{aligned}$$
(26)

and let \(J(\vartheta )\) denote the covariance matrix of

$$\begin{aligned} \begin{pmatrix} \Gamma _1 A_1(T;{\varvec{\mu }},{\varvec{\Omega }})\\ \frac{1}{2} \Gamma _1 A_1(T;{\varvec{\mu }},{\varvec{\Omega }})A'_1(T;{\varvec{\mu }},{\varvec{\Omega }})- B_1(T;{\varvec{\Omega }}) \end{pmatrix} \end{aligned}$$
(27)

The Fisher information matrix associated with the direct observation \((\Gamma _1, \ldots , \Gamma _N)\) (see Sect. 7.4) is

$$\begin{aligned} I_0(\lambda ,a)=\left( \begin{array}{ll} \frac{a}{\lambda ^2} &{} -\frac{1}{\lambda }\\ -\frac{1}{\lambda } &{} \psi '(a) \end{array}\right) \end{aligned}$$
(28)

We now state:

Theorem 1

Assume (H1)–(H4), \(a>4\), and that n and \(N=N(n)\) tend to infinity. Then, for all \(\vartheta \),

  • If \(N/n^2\rightarrow 0\), \(N^{-1/2}\left( \frac{\partial \mathbf U _{N,n}}{\partial \lambda }(\vartheta ), \frac{\partial \mathbf U _{N,n}}{\partial a}(\vartheta )\right) '\) converges in distribution under \({{\mathbb {P}}}_{\vartheta }\) to the Gaussian law \({{\mathcal {N}}}_2(0, I_0(\lambda , a))\).

  • If \(N/n\rightarrow 0\), \(N^{-1/2}\nabla _{\vartheta } \mathbf U _{N,n}(\vartheta )\) converges in distribution under \({{\mathbb {P}}}_{\vartheta }\) to \({{\mathcal {N}}}_q(0, {{\mathcal {J}}}(\vartheta ))\) (\(q=2+d+d(d+1)/2\)) where

    $$\begin{aligned} {{\mathcal {J}}}(\vartheta )= \left( \begin{array}{c|c} I_0(\lambda ,a) &{}\quad \mathbf 0 \\ \hline \mathbf 0 &{}\quad J(\vartheta ) \end{array}\right) \end{aligned}$$

    with \(J(\vartheta ),I_0( \lambda ,a)\) defined in (27), (28).

  • Define the opposite of the Hessian of \(\mathbf U _{N,n}(\vartheta )\) by \({{\mathcal {J}}}_{N,n}(\vartheta )=-\nabla ^2_{\vartheta }{} \mathbf U _{N,n}(\vartheta )\). Then, if \(a>6\), as n and \(N=N(n)\) tend to infinity, \(\frac{1}{N} {{\mathcal {J}}}_{N,n}(\vartheta )\) converges in \({{\mathbb {P}}}_{\vartheta }\)-probability to \({{\mathcal {J}}}(\vartheta )\).

Remark 1

The constraint \(a>6\) might seem too stringent. Noting that \({{\mathbb {E}}}(\Gamma _i^{-1})= \frac{\lambda }{a-1}\), it is indeed an assumption on the quadratic variations of \((X_i(t))\) and \(a>6\) corresponds to moderate quadratic variations.

Remark 2

In the case of univariate random effect \(\Phi _i \sim \mathcal{N}(\mu ,\omega ^2)\), \(J(\vartheta )\) writes,

$$\begin{aligned}&J(\vartheta )\nonumber \\&\quad = \left( \begin{array}{ll}{{\mathbb {E}}}_{\vartheta }\Gamma _1 B_1(T; \omega ^2) &{}\quad {{\mathbb {E}}}_{\vartheta }\Gamma _1 A_1(T;\mu , \omega ^2) B_1(T; \omega ^2)\\ {{\mathbb {E}}}_{\vartheta }\Gamma _1 A_1(T;\mu , \omega ^2) B_1(T; \omega ^2) &{}\quad {{\mathbb {E}}}_{\vartheta }\left( \Gamma _1 A_1^2(T;\mu , \omega ^2)B_1(T; \omega ^2)-\frac{1}{2}B_1^2(T; \omega ^2)\right) \end{array}\right) . \end{aligned}$$
(29)

For the proof, we use that \(A_{i,n}\) (resp. \( B_{i,n}\)) converges to \(A_i(T;{\varvec{\mu }},{\varvec{\Omega }})\) (resp. \(B_i(T;{\varvec{\Omega }})\)) as n tends to infinity and that \(Z_{i,n}^{-1}1_{F_{i,n}}\) is close to \(\Gamma _i\) for large n. Note that \(I_0(\lambda ,a)\) is invertible for all \((\lambda ,a) \in (0,+\infty )^2\) (see Sect. 7.4). We conclude:

Theorem 2

Assume (H1)–(H4), \(a_0>6\), that n and \(N=N(n)\) tend to infinity with \(N/n \rightarrow 0\) and that \(J(\vartheta _0)\) is invertible. Then, with probability tending to 1, a solution \({\tilde{\vartheta }}_{N,n}\) to (21) exists which is consistent and such that \({\sqrt{N}}({\tilde{\vartheta }}_{N,n} - \vartheta _0)\) converges in distribution to \({{\mathcal {N}}}_q(0, {{\mathcal {J}}}^{-1}(\vartheta _0))\) under \({{\mathbb {P}}}_{\vartheta _0}\) for all \(\vartheta _0\).

For the first two components, the constraint \(N/n^2 \rightarrow 0\) is enough.

The proof of Theorem 2 is deduced standardly from the previous theorem and omitted.

The estimators of \((\lambda , a)\) are asymptotically equivalent to the exact maximum likelihood estimators of the same parameters based on the observation of \((\Gamma _i, i=1, \ldots , N)\) under the constraint \(N/n^2 \rightarrow 0\). There is no loss of information for parameters coming from the random effects in the diffusion coefficient. For the parameters \(({\varvec{\mu }}, {\varvec{\Omega }})\), which come from the random effects in the drift, the constraint is \(N/n \rightarrow 0\) and there is a loss of information (see 49 in Sect. 7.4) w.r.t. the direct observation of \(((\Phi _i, \Gamma _i), i=1, \ldots ,N)\). For instance, consider a univariate random effect \(\Phi \sim \mathcal{N}(\mu ,\omega ^2)\). Then, if \(\omega ^2\) is known, one can see that

$$\begin{aligned} {{\mathbb {E}}}_{\vartheta }\Gamma _1 B_1(T; \omega ^2)\le {{\mathbb {E}}}_{\vartheta }\Gamma _1 \omega ^{-2}= \frac{a}{\lambda \omega ^2}. \end{aligned}$$

If for all i, \(\Gamma _i=\gamma \) is deterministic, there is no loss of information for the estimation of \((\mu , \omega ^2)\), with respect to the continuous observation of the processes \((X_i(t), t\in [0,T])\) (see Delattre et al. 2013).

3.2 Estimation based on the second approximation of the likelihood

Now, we consider the second approximation of the loglikelihood (19). We set

$$\begin{aligned} \xi _{i,n}= & {} \frac{ \lambda + (S_{i,n}/2)}{a+n/2}, \quad \quad \text{ so } \text{ that }\nonumber \\ \mathbf V _n^{(1)}(X_{i,n}, \lambda ,a)= & {} a\log \lambda -\log \Gamma (a) +\log \Gamma (a+(n/2))\nonumber \\&-\,(a+(n/2))\log (a+(n/2)) - (a+(n/2))\log \xi _{i,n} . \end{aligned}$$
(30)

We do not need to truncate \(\xi _{i,n}\) as it is bounded from below. For the second term \(\mathbf V _n^{(2)}(X_{i,n}, {\varvec{\mu }},{\varvec{\Omega }})\), we need a truncation to deal with \(n/S_{i,n}\) and make a slight modification. Let, for k a given constant,

$$\begin{aligned} \mathbf W _{n}(X_{i,n},{\varvec{\mu }}, {\varvec{\Omega }})= & {} - \frac{n}{2 \,S_{i,n}}1_{S_{i,n}\ge k\sqrt{n}} ({{\varvec{\mu }}}- V_{i,n}^{-1}U_{i,n})'R_{i,n}^{-1}({{\varvec{\mu }}}- V_{i,n}^{-1}U_{i,n})\nonumber \\&-\, \frac{1}{2}\log {(det(I_d+ V_{i,n}{\varvec{\Omega }})}), \end{aligned}$$
(31)

and \(\mathbf W _{N,n}({\varvec{\mu }}, {\varvec{\Omega }})= \sum _{i=1}^N \mathbf W _{n}(X_{i,n},{\varvec{\mu }}, {\varvec{\Omega }})\).

We define the estimators \({\vartheta }_{N,n}^*\) by

$$\begin{aligned} {{\mathcal {H}}}_{N,n}(\vartheta )=(\nabla _{\lambda ,a} \mathbf V _{N,n}^{(1)}(\lambda , a), \nabla _{{\varvec{\mu }}, {\varvec{\Omega }}}{} \mathbf W _{N,n}({\varvec{\mu }}, {\varvec{\Omega }})), \quad {{\mathcal {H}}}_{N,n}({\vartheta }_{N,n}^*)=0. \end{aligned}$$
(32)

We have the following result.

Theorem 3

Assume (H1)–(H4), \(a_0>6\), that n and \(N=N(n)\) tend to infinity with \(N/n \rightarrow 0\) and that \(J(\vartheta _0)\) is invertible. Then, with probability tending to 1, a solution \(\vartheta ^*_{N,n}\) to (32) exists which is consistent and such that \({\sqrt{N}}( \vartheta ^*_{N,n} - \vartheta _0)\) converges in distribution to \({{\mathcal {N}}}_q(0, {{\mathcal {J}}}^{-1}(\vartheta _0))\) under \({{\mathbb {P}}}_{\vartheta _0}\) for all \(\vartheta _0\) (see (29), (28) and the statement of Theorem 2). For the first two components, the constraint \(N/n^2 \rightarrow 0\) is enough.

The estimators \(\vartheta ^*_{N,n}\) and \({\tilde{\vartheta }}_{N,n}\) are asymptotically equivalent.

This equivalence result is quite important: it could be expected that Method 1, being a natural approximation of the likelihood, would lead to better asymptotic results. This is not the case. Implementing Method 2 to compute the estimators is numerically easier because it splits the estimation of random effect parameters in the diffusion coefficient and those in the drift coefficient. On simulated data, both methods are comparable even for reasonably small sample sizes.

4 Simulation study

We illustrate and assess the properties of the parameter estimators. Model parameters are estimated by the two approximations of the likelihood. We found little difference between results of both methods, and so we only report results for Method 2.

Several models are simulated:

Model 1 Mixed effects Brownian motion:

$$\begin{aligned} dX_i(t)=&\Phi _i dt + \Gamma _i^{-1/2} dW_i(t), X_i(0)=0\\ \Gamma _i \underset{i.i.d}{\sim }&G(a,\lambda ) \; , \, \Phi _i | \Gamma _i = \gamma \sim \mathcal {N}(\mu ,\gamma ^{-1}\omega ^2) \end{aligned}$$

Model 2 A model satisfying (H1)–(H2) with constant \(\sigma (\cdot )\):

$$\begin{aligned} dX_i(t)=&\frac{\Phi _i X_i(t)}{1+X_i^2(t)} dt + \Gamma _i^{-1/2} dW_i(t), X_i(0)=0\\ \Gamma _i \underset{i.i.d}{\sim }&G(a,\lambda ), \, \Phi _i | \Gamma _i = \gamma \sim \mathcal {N}(\mu ,\gamma ^{-1}\omega ^2) \end{aligned}$$

Model 3 A model satisfying (H1)–(H2) with non constant \(\sigma (\cdot )\):

$$\begin{aligned} dX_i(t) = \left( \frac{\Phi _{i,1} X_i(t)}{1+X_i^2(t)} +\Phi _{i,2}\right) dt + \left( 1+\frac{X_i(t)}{1+X_i^2(t)}\right) \Gamma _i^{-1/2} dW_i(t), X_i(0)=0.1 , \end{aligned}$$

\(\Gamma _i \sim G(a,\lambda )\) , \(\Phi _i = (\Phi _{i,1}, \Phi _{i,2})'| \Gamma _i = \gamma \sim \mathcal {N}_2({\varvec{\mu }},\gamma ^{-1}{\varvec{\Omega }})\)  with \({\varvec{\mu }}=(\mu _1,\mu _2)'\) and \({\varvec{\Omega }}=\mathrm {diag}(\omega _1^2,\omega _2^2)\)

Model 4 Mixed Ornstein–Uhlenbeck process:

$$\begin{aligned} dX_i(t)&= (\Phi _{i,1} X_i(t)+\Phi _{i,2}) dt + \Gamma _i^{-1/2} dW_i(t), X_i(0)=0, \end{aligned}$$

\(\Gamma _i \sim G(a,\lambda )\) , \(\Phi _i = (\Phi _{i,1}, \Phi _{i,2})'| \Gamma _i = \gamma \sim \mathcal {N}_2({\varvec{\mu }},\gamma ^{-1}{\varvec{\Omega }})\)  with \({\varvec{\mu }}=(\mu _1,\mu _2)'\) and \({\varvec{\Omega }}=\mathrm {diag}(\omega _1^2,\omega _2^2)\).

Note that Model 4 does not fulfill assumptions (H1)–(H2) but it is widely used in practice and the estimation results show that the estimation methods still perform well.

Table 1 Empirical mean and (empirical s.d.) of estimators for different values of N and n, truncation \(k=0.1\), \(m=a/\lambda \), \(t=\psi (a)-\log \lambda \)
Table 2 Empirical mean and (empirical s.d.) of estimators for different values of N and n, truncation \(k=0.1\), \(m=a/\lambda \), \(t=\psi (a)-\log \lambda \)
Table 3 Empirical mean and (empirical s.d.) of estimators for different values of N and n, truncation \(k=0.1\), \(m=a/\lambda \), \(t=\psi (a)-\log \lambda \)
Table 4 Empirical mean and (empirical s.d.) of estimators for different values of N and n, truncation \(k=0.1\), \(m=a/\lambda \), \(t=\psi (a)-\log \lambda \)

For each SDE model, 100 data sets are generated with N subjects on the same time interval [0, T], \(T=5\). Each data set is simulated as follows. First, the random effects are drawn, then, the diffusion sample paths are simulated with a very small discretization step-size \(\delta = 0.001\). Exact simulation is performed for Models 1 and 4 whereas a Euler discretization scheme is used for Models 2 and 3. The time interval between consecutive observations is taken equal to \(\Delta = 0.025\) or 0.005 with a resulting number of observations \(n= 200,1000\) and fixed time interval [0, 5]. The model parameters are then estimated by using the second approximation of the likelihood from each simulated dataset. The empirical mean and standard deviation of the estimates are computed from the 100 datasets (Tables 1, 2, 3 and 4). For Gamma distributions, the parameters \(m=a/\lambda \) and \(t = \psi (a) - \log (\lambda )\) have unbiased empirical estimators and are easier to estimate. This is why we present the estimates of m and t rather than the estimates of a and \(\lambda \) which are highly biased even for direct observations. The estimation procedure requires a truncation (see 31), we use \(k=0.1\).

We observe from Tables 1, 2, 3 and 4 that the estimation method has satisfactory performances overall. The parameters are estimated with very little bias whatever the model and the values of n and N. The bias becomes smaller as both N and n increase. The standard deviations of the estimates are small, they become smaller when N increases and they do not depend on n. These results are in accordance with the theory since the asymptotic distribution of the estimates is obtained when \(N/n \rightarrow 0\) or \(N/n^2 \rightarrow 0\) respectively, and the rate of convergence is \(\sqrt{N}\). Although for \(n=200\), we don’t have N / n very small in our simulation design, the results in this case are quite satisfactory which encourages the use of this estimation method not only for high but also moderate values of numbers of observations per path n. In all the examples below, the Gamma distribution parameters are \(a=8,\lambda =2\). The associated theoretical s.d. for m and t for direct observations are respectively 0.2, 0.05 for \(N=50\) and 0.14, 0.04 for \(N=100\) (see 7.4 and Table 5). This agrees with the results obtained in Examples 1–4 for \((a,\lambda )\) .

Table 5 Empirical mean and standard deviation from 100 datasets; theoretical s.d. of estimators for different values of N, \(a=8\), \(\lambda =2\), \(m=a/\lambda \), \(t=\psi (a)-\log \lambda \)

5 Concluding comments

In this paper, we have addressed the new problem of estimating parameters when having discrete observations and simultaneous random effects in the drift and in the diffusion coefficients by means of a joint parametric distribution. We have considered N paths and n observations per path. For linear random effects and a specific joint distribution for these random effects, we have proved that the model parameters in the drift and in the diffusion can be estimated consistently and with a rate \(\sqrt{N}\) under the condition \(N/n \rightarrow 0\). For the parameters in the diffusion coefficient, the constraint is weaker (\(N/n^2 \rightarrow 0\)). We have proposed two methods for the estimation leading to asymptotically equivalent properties. The second one is now implemented in a R package (MsdeParEst). Including random effects both in the drift and in the diffusion coefficient of SDE has been proposed in several data applications but to our knowledge, it had not been studied from a theoretical point of view.

Our results are obtained for a regular sampling but could be easily extended to non regular ones. They are derived in an asymptotic framework. An important issue would be to study non asymptotic properties. It would also be interesting to consider more general SDEMEs where fixed and random effects are no longer linear and different distributions for the random effects.