1 Introduction

In this introduction, we first try to motivate our problem and outline our results. We also argue that only a part of the question can be dealt with in a single paper. We briefly sketch a possible program for the remaining tasks in a second part of the introduction.

1.1 Motivations and outline of the results

The inference problem for diffusion processes is now a fairly well understood problem. In particular, during the last two decades, several advances have allowed to tackle the problem of inference based on discretely observed diffusions (Durham and Gallant 2002; Pedersen 1995; Sorensen 2009), which is of special practical interest.

More specifically, consider a family of stochastic differential equations of the form

$$\begin{aligned} Y_t=a+\int _{0}^{t}\mu (Y_s;\theta ) \, ds +\sum _{l=1}^{d}\int _{0}^{t}\sigma ^l(Y_s;\theta )\, dB_s^l, \qquad t\in [0,T], \end{aligned}$$
(1)

where \(a\in \mathbb{R }^m,\,\mu (\cdot ;\theta ):\mathbb{R }^m\rightarrow \mathbb{R }^m\) and \(\sigma (\cdot ;\theta ):\mathbb{R }^m\rightarrow \mathbb{R }^{m,d}\) are smooth enough functions, \(B\) is a \(d\)-dimensional Brownian motion with Hurst parameter \(H>1/2\) (the stochastic integral in (1) being understood in the Young sense) and \(\theta \) is a parameter varying in a subset \(\Theta \subset \mathbb{R }^q\). If one wishes to identify \(\theta \) from a set of discrete observations of \(Y\), most of the methods which can be found in the literature are based on (or are closely linked to) the maximum likelihood principle. Indeed, if \(B\) is a Brownian motion and \(Y\) is observed at some equally distant instants \(t_i=i\tau \) for \(i=0,\ldots ,n\), then the log-likelihood of a sample \((Y_{t_{1}},\ldots ,Y_{t_{n}})\) can be expressed as

$$\begin{aligned} \ell _n(\theta )=\sum _{i=1}^{n} \ln \left( p\left( \tau ,Y_{t_{i-1}},Y_{t_{i}};\, \theta \right) \right) , \end{aligned}$$
(2)

where \(p\) stands for the transition semi-group of the diffusion \(Y\). If \(Y\) enjoys some ergodic properties, with invariant measure \(\nu _{\theta _0}\) under \(\mathbf P _{\theta _0}\), then we get

$$\begin{aligned} \mathrm{a.s.-}\lim _{n\rightarrow \infty }\frac{1}{n}\ell _n(\theta ) =\mathbf E _{\theta _0}\left[ p\left( \tau ,Z_1,Z_2;\, \theta \right) \right] \triangleq J_{\theta _0}(\theta ), \end{aligned}$$
(3)

where \(Z_1\sim \nu _{\theta _0}\) and \(\mathcal{L }(Z_2|\, Z_1)=p(\tau ,Z_1,\cdot \, ;\,\theta )\). Furthermore, it can be shown in a general context that \(\theta \mapsto J_{\theta _0}(\theta )\) admits a maximum at \(\theta =\theta _0\). This opens the way to a MLE analysis which is similar to the one performed in the case of i.i.d observations, at least theoretically.

However, in many interesting cases, the transition semi-group \(p\) is not amenable to explicit computations, and thus expression (2) has to be approximated in some sense. The most common approach, advocated for instance in Pedersen (1995), is based on a linearization of each \(p( \tau ,Y_{t_{i-1}},Y_{t_{i}};\, \theta )\), which transforms it into a Gaussian density

$$\begin{aligned} \mathcal{N }\left( Y_{t_{i-1}}+\mu (Y_{t_{i-1}};\theta ) \,\tau , \, \sigma \sigma ^*(Y_{t_{i-1}};\theta )\,\tau \right) \!. \end{aligned}$$

This linearization procedure is equivalent to the approximation of Eq. (1) by an Euler (first order) numerical scheme. Refinements of this procedure, based on Milstein type discretizations, are proposed in Durham and Gallant (2002).

Some special situations can be treated differently (and often more efficiently): for instance, in case of a constant diffusion coefficient, the continuous time likelihood can be computed explicitly by means of Girsanov’s theorem. When the dimension of the driving Brownian motion \(B\) is \(d=1\), one can also apply Itô’s formula in order to be back to an equation with constant diffusion coefficient, or use Doss-Sousman representation of solutions to (1). Let us also mention that statistical inference for SDEs driven by Lévy processes is currently intensively investigated, with financial motivations in mind.

The current article is concerned with the estimation problem for equations of the form (1), when the driving process \(B\) is a fractional Brownian motion. Let us recall that a fractional Brownian motion \(B\) with Hurst parameter \(H\in (0,1)\), defined on a complete probability space \((\Omega ,{\fancyscript{F}},\mathbf P )\), is a \(d\)-dimensional centered Gaussian process. Its law is thus characterized by its covariance function, which is given by

$$\begin{aligned} \mathbf E \left[ B_t^i B_s^j \right] = \frac{1}{2} \left( t^{2H} + s^{2H} - |t-s|^{2H} \right) \, \mathbf{1}_{(i=j)}, \qquad s,t\in \mathbb{R }_+. \end{aligned}$$

The variance of the increments of \(B\) is then given by

$$\begin{aligned} \mathbf E \left[ \left( B_t^i-B_s^i \right) ^2\right] = |t-s|^{2H}, \qquad s,t\in \mathbb{R }_+, \quad i=1,\ldots , d, \end{aligned}$$

and this implies that almost surely the fBm paths are \(\gamma \)-Hölder continuous for any \(\gamma <H\). Furthermore, for \(H=1/2\), fBm coincides with the usual Brownian motion, converting the family \(\{B^H;\, H\in (0,1)\}\) into the most natural generalization of this classical process.

In the last decade, some important advances have allowed to solve (Nualart and Rǎşcanu 2002; Zähle 1998) and understand (Hu and Nualart 2007; Nualart and Saussereau 2009) differential systems driven by fBm for \(H\in (1/2,1)\). The rough paths machinery also allows to handle fBm with \(H\in (1/4,1/2)\), as nicely explained in (Friz and Victoir 2010; Gubinelli 2004; Lejay 2003; Lyons and Qian 2002). However, the irregular situation \(H\in (1/4,1/2)\) is not amenable to useful moments estimates for the solution \(Y\) to (1) together with its Jacobian (that is the derivative with respect to the initial condition). This is why we concentrate, in the sequel, on the simpler case \(H>1/2\) for our estimation problem. In any case, many real world noisy systems are currently modeled by equations like (1) driven by fBm, and this is particularly present in the Biophysics literature (Kou and Sunney-Xie 2004; Odde et al. 1996), or for Finance oriented applications (Cheridito 2003; Gubinelli 2004; Hairer and Ohashi 2007; Hu and Nualart 2007; Rogers 1997; Willinger et al. 1999). This leads to a demand for rigorous estimation procedures for SDEs driven by fractional Brownian motion, which is the object of our paper.

Concerns about the inference problem for fractional diffusion processes started a decade ago with the analysis of fractional Ornstein-Uhlenbeck processes Kleptsyna and Le Breton (2002). Then a more recent representative set of references on the topic includes Papavasiliou and Ladroue (2012) and Tudor and Viens (2007). More specifically, Tudor and Viens (2007) handle the case of a one-dimensional equation of the form

$$\begin{aligned} Y_t=a+ \theta \int _{0}^{t}\mu (Y_s) \, ds + B_t, \qquad t\in [0,T], \end{aligned}$$
(4)

where \(\mu \) is regular enough, and where \(B\) is a fBm with \(H\in (0,1)\). The simple dependence on the parameter \(\theta \) and the fact that an additive noise is considered enables the use of Girsanov’s transform in order to get an exact expression for the MLE. Convergence of the estimator is then obtained through an extensive use of Malliavin calculus.

The article by Papavasiliou and Ladroue (2012) is focused on the case of a polynomial equation, for which the exact moments of the solution can be computed. The estimator relies then on a generalization of the moments method, which tries to fit empirical moments of the solution with their theoretical value. The range of application of this method is however confined to specific situations, for the following reasons:

  • It assumes that \(N\) independent runs of Eq. (1) can be obtained, which is usually not the case.

  • It hinges on multiple integrals computations, which are time consuming and are avoided in most numerical schemes.

As can be seen from this brief review, parameter estimation for rough equations is still in its infancy. We shall also argue that it is a hard problem.

Indeed, if one wishes to transpose the MLE methods used for diffusion processes to the fBm context, an equivalent of the log-likelihood functions (2) should first be produced. But the covariance structure of \(B\) is quite complex and the attempts to put the law of \(Y\) defined by (1) into a semigroup setting are cumbersome, as illustrated by Baudoin and Coutin (2007); Hairer and Ohashi (2007); Neuenkirch et al. (2009). We have thus decided to consider a highly simplified version of the log-likelihood. Namely, still assuming that \(Y\) is observed at a discrete set of instants \(0<t_1<\cdots <t_n<\infty \), set

$$\begin{aligned} \ell _n(\theta )=\sum _{i=1}^{n} \ln \left( f(t_i,Y_{t_i};\theta )\right) , \end{aligned}$$
(5)

where we suppose that under \(\mathbf P _\theta \) the random variable \(Y_{t_i}\) admits a density \(z\mapsto f(t_i, z;\theta )\). Notice that in case of an elliptic diffusion coefficient \(\sigma \) the density \(f(t_i,\cdot ;\theta )\) is strictly positive, and thus expression (5) makes sense by a straightforward application of Proposition 19.6 in Friz and Victoir (2010). However, the successful replication of the strategy implemented for Brownian diffusions (that we have tried to summarize above) relies on some highly non trivial questions: existence of an invariant measure for Eq. (1), rate of convergence to this invariant measure, convergence of expressions like (5), characterization of the limit in terms of \(\theta \) as in (3), to mention just a few. We shall come back to these considerations in the next section, but let us insist at this point on the fact that all those questions would fit into a research program over several years.

Fig. 1
figure 1

Empirical Distribution of the estimators for \(\alpha \) and \(\beta \).

Our aim in this paper is in a sense simpler: we assume that quantities like (5) are meaningful for estimation purposes. Then we shall implement a method which enables to compute \(\ell _n(\theta )\) and optimize it in \(\theta \), producing thus a pseudo MLE estimator. We focused first on this specific aspect of the problem for the following reasons:

  1. 1.

    From a statistical point of view, it is obviously essential to obtain a computationally efficient estimation procedure. This will allow us for instance to evaluate numerically the accuracy of our method.

  2. 2.

    The procedure itself is nontrivial, and requires the use of advanced stochastic analysis tools: probabilistic representation of the density, Malliavin type integration by parts, Stratonovich-Skorohod correction terms, discretization of systems of pathwise stochastic differential equations for instance.

We have thus decided to tackle the implementation issues first, and we shall produce a practical recursive method in order to approximate our pseudo-likelihhood \(\ell _n(\theta )\). If this method turns out to be satisfying, we shall then try to proceed to a full justification of our method.

Let us also mention that it might not be clear to the reader that \(\ell _n(\theta )\) can be meaningful in terms of statistical estimation, since it only involves evaluations at single points \(Y_{t_i}\). However our numerical experiments indicate that this quantity behaves nicely for our purposes. Moreover, it will become clear from the forthcoming computations that our methodology can be extended to handle quantities like

$$\begin{aligned} \tilde{\ell }_n(\theta ):=\sum _{i=1}^{n} \ln \left( f(t_i,t_{i+1},Y_{t_i},Y_{t_{i+1}};\theta )\right) , \end{aligned}$$

where \(f(s,t,x,z;\theta )\) stands for the density of the couple \((Y_s,Y_t)\). This kind of pseudo log-likelihood is obviously closer in spirit to the diffusion case. Densities of tuples could also be considered at the price of technical complications.

Let us now try to give a flavor of the kind of result we shall obtain in this article, in a very loose form:

Theorem 1

Consider Eq. (1) driven by a \(d\)-dimensional fractional Brownian motion \(B\) with Hurst parameter \(H>1/2\). Assume \(\mu \) and \(\sigma \) are smooth enough coefficients, and that \(\sigma \sigma ^*\) is strictly elliptic. For a sequence of times \(t_0<\cdots <t_n<\infty \), let \(y_{t_{i}},\,i=1,\ldots ,n\) be the corresponding observations. Then:

  1. (i)

    The gradient of the log-likelihood function admits the following probabilistic representation: \(\nabla _{l}\ell _n(\theta ) =\sum _{i=1}^{n}\frac{V_i(\theta )}{W_i(\theta )}\), with

    $$\begin{aligned} W_i(\theta )=\mathbf E \biggl [ \mathbf 1 _{(Y_{t_{i}}(\theta )>y_{t_{i}})}\; H_{(1,\ldots ,m)}\Bigl ( Y_{t_{i}}(\theta ) \Bigr ) \biggr ] \end{aligned}$$
    (6)

where \(H_{(1,\ldots ,m)}( Y_{t_{i}}(\theta ))\) is an expression involving Malliavin derivatives and Skorohod integrals of \(Y(\theta )\). A similar expression is also available for \(V_i(\theta )\).

  1. (ii)

    A computational recursive procedure is constructed in order to obtain \(H_{(1,\ldots ,m)}( Y_{t_{i}}(\theta ))\) in a suitable way.

  2. (iii)

    When \(Y_t\) is replaced by its Euler scheme approximation with step \(T/M\) and expected values in (6) are approximated thanks to \(N\) Monte Carlo steps, we show that

  • \(N\) can be chosen in function of \(M\) in an optimal way (see Proposition 13).

  • The corresponding approximation of \(\nabla _{l}\ell _n(\theta )\) converges to the real one with rate \(n^{-(2\gamma -1)}\) for any \(1/2<\gamma <H\).

All those results are stated in a more rigorous way in the remainder of the article. We shall also apply the computational procedure described in Theorem 1 to explicit examples of equations arising in finance.

Here is how our article is structured: we give some preliminaries and notations on Young and Malliavin calculus for fractional Brownian motion at Sect. 2. The probabilistic representation for the log-likelihood is given at Sect. 3. Discretization procedures are designed at Sect. 4, and finally numerical examples are given at Sect. 5.

1.2 Remaining open problems

We emphasized above the fact that only a part of the problem at stake was going to be solved in the current article. We now briefly sketch the remaining tasks to be treated.

The most important obstacle in order to fully justify our methodology is to get a suitable convergence theorem for \(\ell _n(\theta )/n\), where \(\ell _n(\theta )\) is defined by (5). In a natural way, this should be based on some strong ergodicity properties for \(Y_t\). After a glance at the literature on ergodicity for fractional systems, one can distinguish two cases:

  1. (i)

    When \(\sigma (\cdot ;\theta )\) is constant, the convergence of \(\mathcal{L }(Y_t)\) as \(t\rightarrow \infty \) is established in Hairer (2005), with a (presumably non optimal) rate of convergence \(t^{-1/8}\).

  2. (ii)

    For a general smooth and elliptic coefficient \(\sigma \), only the uniqueness of the invariant measure is shown in Hairer and Ohashi (2007), with an interesting extension to the hypoelliptic case in Hairer and Pillai (2011). Nothing is known about the convergence of \(\mathcal{L }(Y_t)\), not to mention rates.

This brief review already indicates that the convergence to invariant measures is still quite mysterious for fractional differential equations, at least for a non constant coefficient \(\sigma \). Moreover, recall that if \(\nu (\theta )\) stands for the invariant measure corresponding to the system with coefficients \(\mu (\cdot ;\theta ),\sigma (\cdot ;\theta )\), we also wish to retrieve some information on the dependence \(\theta \mapsto \nu (\theta )\) (See Hairer and Majda (2010) for some partial results in this direction).

Let us mention another concrete problem: even in the case of a constant \(\sigma \), the convergence of \(\mathcal{L }(Y_t)\) to an invariant measure \(\nu (\theta )\) is proven in Hairer (2005) in the total variation sense. In terms of the density \(p(t,x;\theta )\) of \(Y_t\), it means that \(p(t,\cdot ;\theta )\) converges to the density of \(\nu \) in \(L^1\) topology. However, in order to get a limit for \(\ell _n(\theta )/n\) (recall that \(\ell _n(\theta )\) is defined by (5)), one expects to use at least a convergence in some Sobolev space \(W^{\alpha ,p}\) for \(\alpha ,p\) large enough.

One possibility in order to get this sharper convergence is to bound first the density \(p(t,\cdot ;\theta )\) in another Sobolev space \(W^{\alpha ^{\prime },p^{\prime }}\) and then to use interpolation theory. It seems thus sufficient to obtain Gaussian bounds on \(p(t,\cdot ;\theta )\), uniformly in \(t\). In case of Brownian diffusions, these Gaussian bounds are obtained by analytic tools, thanks to the Markov property. This method being obviously not available for systems driven by fBm, a possible inspiration is contained in the upper Gaussian bounds for the stochastic wave equation which can be found in Dalang and Nualart (2004). The latter technical results stem from an intensive use of Malliavin calculus, which should also be invoked in our case, and notice the recent efforts (Baudoin and Ouyang 2012; Baudoin et al. 2012) in this direction. Let us point out at this stage that Gaussian bounds on densities are also useful for the very definition of the quantity \(\nabla _{l}\ell _n(\theta )\), which requires lower bounds on the density \(p(t,\cdot ;\theta )\).

Finally, let us mention that it seems possible to produce some reasonable convergent parametric estimators for equations driven by fBm in a rather general context. Among the methods which can be adapted from the diffusion case with the current stochastic analysis techniques, let us mention the least square estimator of Kasonga (1990), as well as the local asymptotic normality property shown in Gobet (2001). However, it seems obvious that the road to a complete picture of parameter estimation for stochastic equations driven by fBm is still hard and long. We hope to complete it in some subsequent communications.

2 Preliminaries and notations

As mentioned in the introduction, we are concerned with equations driven by a \(d\)-dimensional fractional Brownian motion \(B\). We recall here some basic facts about the way to solve those equations, and some Malliavin calculus tools which will be needed later on. Let us introduce first some general notation for Hölder type spaces:

Notation 2

We will denote by \({\fancyscript{C}}^{\alpha }(V)\) the set of \(V\)-valued \(\alpha \)-Hölder functions for any \(\alpha \in (0,1)\), and by \({\fancyscript{C}}_{b}^{n}(U;V)\) the set of \(n\) times differentiable functions, bounded together with all their derivatives, from \(U\) to \(V\). In the previous notation, \(U\) and \(V\) stand for two finite dimensional vector spaces. The state space \(V\) can be omitted for notational sake when its value is non ambiguous. When we want to stress the fact that we are working on a finite interval \([0,T]\), we write \({\fancyscript{C}}_{T}^{\alpha }(V)\) for the space of \(\alpha \)-Hölder functions \(f\) from \([0,T]\) to \(V\). The corresponding Hölder norms shall be denoted by \(\Vert f\Vert _{\alpha ,T}\).

2.1 Differential equations driven by fBm

Recall that the equation we are interested in is of the form (1). Before stating the assumptions on our coefficients we need an additional notation:

Notation 3

For \(n,p\ge 1\), a function \(f\in {\fancyscript{C}}^{p}(\mathbb{R }^{n};\mathbb{R })\) and any tuple \((i_1,\ldots i_p)\in \{1,\ldots ,d\}^{p}\), we set \(\partial _{i_1\ldots i_p} f\) for \(\frac{\partial ^{p} f}{\partial x_{i_1}\ldots \partial x_{i_p}}\). Similarly, consider a function \(g_{\theta }\in {\fancyscript{C}}^{p}(\varTheta ;\mathbb{R })\), for \(n,p\ge 1\) and a vector of parameters \(\theta \in \varTheta \subset \mathbb{R }^{q} \). For any tuple \((i_1,\ldots i_p)\in \{1,\ldots ,q\}^{p}\), we set \(\nabla _{i_1\ldots i_p} g_{\theta }^{i}\) for \(\frac{\partial ^{p} g_{\theta }^{i}}{\partial \theta _{i_1}\ldots \partial \theta _{i_p}}\), where \(i=1,\ldots , n\).

Using this notation, we work under the following set of assumptions:

Hypothesis 4

For any \(\theta \in \varTheta \), we assume that \(\mu (\cdot ;\theta ):\mathbb{R }^m\rightarrow \mathbb{R }^m\) and \(\sigma (\cdot ;\theta ):\mathbb{R }^m\rightarrow \mathbb{R }^{m,d}\) are \({\fancyscript{C}}_b^{2}\) coefficients. Furthermore, we have

$$\begin{aligned} \sup _{\theta \in \varTheta } \sum _{l=0}^{2} \sum _{1\le i_1,\ldots ,i_l \le q} \Vert \nabla _{i_1 \cdots i_l}^{l}\mu (\cdot ;\theta )\Vert _{\infty } + \Vert \nabla _{i_1 \cdots i_l}^{l}\sigma (\cdot ;\theta )\Vert _{\infty } <\infty . \end{aligned}$$

When Eq. (1) is driven by a fBm with Hurst parameter \(H>1/2\) it can be solved, thanks to a fixed point argument, with the stochastic integral interpreted in the (pathwise) Young sense (see e.g. Gubinelli 2004). Let us recall that Young’s integral can be defined in the following way:

Proposition 1

Let \(f\in {\fancyscript{C}}^\gamma ,\,g\in {\fancyscript{C}}^\kappa \) with \(\gamma +\kappa >1\), and \(0\le s\le t\le 1\). Then the integral \(\int _s^t g_\xi \; df_\xi \) is well-defined as limit of Riemann sums along partitions of \([s,t]\). Moreover, the following estimation is fulfilled:

$$\begin{aligned} \left| \int _s^t g_\xi \; df_\xi \right| \le C \Vert f\Vert _\gamma \Vert g\Vert _\kappa |t-s|^\gamma , \end{aligned}$$
(7)

where the constant \(C\) only depends on \(\gamma \) and \(\kappa \). A sharper estimate is also available:

$$\begin{aligned} \left| \int _s^t g_\xi \; df_\xi \right| \le |g_s| \, \Vert f\Vert _\gamma |t-s|^\gamma + c_{\gamma ,\kappa } \Vert f\Vert _\gamma \Vert g\Vert _\kappa |t-s|^{\gamma +\kappa }. \end{aligned}$$
(8)

With this definition in mind and under assumptions 4, we can solve our differential system of interest, and the following moments bounds are proven in Friz and Victoir (2010) and Hu and Nualart (2007):

Proposition 2

Consider a fBm \(B\) with Hurst parameter \(H>1/2\). Then:

  1. (i)

    Under Hypothesis 4, Eq. (1) driven by \(B\) admits a unique \(\beta \)-Hölder continuous solution \(Y\), for any \(\beta <H\).

  2. (ii)

    Furthermore,

    $$\begin{aligned} \Vert Y\Vert _{T,\beta } \le |a|+ c_{f,T} \Vert B\Vert _{\beta ,T}^{1/\beta }. \end{aligned}$$
  3. (iii)

    If we denote by \(Y^a\) the solution to (1) with initial condition \(a\), then

    $$\begin{aligned} \Vert Y^b-Y^a\Vert _{T,\beta } \le |b-a| \, \exp \left( c_{f,T} \Vert B\Vert _{\beta ,T}^{1/\beta }\right) \!. \end{aligned}$$
  4. (iv)

    If we only assume that \(f\) has linear growth, with \(\nabla f,\nabla ^2 f\) bounded, the following estimate holds true:

    $$\begin{aligned} sup_{t\in [0,T]} |Y_t| \le \left( 1+ |a|\right) \, \exp \left( c_{f,T} \Vert B\Vert _{\beta ,T}^{1/\beta }\right) \!. \end{aligned}$$

Remark 1

The framework of fractional integrals is used in Hu and Nualart (2007) in order to define integrals with respect to \(B\). It is however easily seen to be equivalent to the Young setting we have chosen to work with.

Some differential calculus rules for processes controlled by fBm will also be useful in the sequel:

Proposition 3

Let \(B\) be a \(d\)-dimensional fBm with Hurst parameter \(H>1/2\). Consider \(a,\hat{a}\in \mathbb{R },\,b,\hat{b}\in {\fancyscript{C}}^{\alpha }_T(\mathbb{R }^d)\) with \(\alpha +H>1\), and \(c,\hat{c}\in {\fancyscript{C}}_T(\mathbb{R })\) (all these assumptions are understood in the almost sure sense). Define two processes \(z,\hat{z}\) on \([0,T]\) by

$$\begin{aligned} z_t=a+\sum _{j=1}^{d}\int _0^t b_u^{j}\, dB_u^{j} + \int _0^t c_u \, du, \quad \text{ and}\quad \hat{z}_t= \hat{a}+\sum _{j=1}^{d}\int _0^t \hat{b}_u^{j}\, dB_u^{j} + \int _0^t \hat{c}_u \, du. \end{aligned}$$

Then for \(t\in [0,T]\), one can decompose the product \(z_t\hat{z}_t\) into

$$\begin{aligned} z_t \, \hat{z}_t= a \, \hat{a} + \sum _{j=1}^{n} \int _0^t \left[ \hat{z}_u \, b_u^{j}+ z_u \, \hat{b}_u^{j}\right] \, dB_u^{j} + \int _0^t \left[ z_u \, \hat{c}_u + \hat{z}_u c_u \right] \, du, \end{aligned}$$

where all the integrals with respect to \(B\) are understood in the Young sense.

The proof of this elementary and classical result is omitted here. See Proposition 2.8 in León and Tindel (2012) for the proof of a similar rule.

2.2 Malliavin calculus techniques

Our representation of the density for the solution to (1) obviously relies on Malliavin calculus tools that we proceed now to recall. As already mentioned in the introduction, on a finite interval \([0,T]\) and for some fixed \(H\in (1/2,1)\), we consider \((\Omega ,{\fancyscript{F}},P)\) the canonical probability space associated with a fractional Brownian motion with Hurst parameter \(H\). That is, \(\Omega ={\fancyscript{C}}_0([0,T];\mathbb{R }^d)\) is the Banach space of continuous functions vanishing at \(0\) equipped with the supremum norm, \({\fancyscript{F}}\) is the Borel sigma-algebra and \(P\) is the unique probability measure on \(\Omega \) such that the canonical process \(B=\{B_t, \; t\in [0,T]\}\) is a \(d\)-dimensional fractional Brownian motion with Hurst parameter \(H\). Remind that this means that \(B\) has \(d\) independent coordinates, each one being a centered Gaussian process with covariance \( R_H(t,s)=\frac{1}{2} (s^{2H}+t^{2H}-|t-s|^{2H}). \)

2.2.1 Functional spaces

Let \({\fancyscript{E}}\) be the space of \(d\)-dimensional elementary functions on \([0,T]\):

$$\begin{aligned} {\fancyscript{E}}=\Big \{ f=(f_1,\ldots ,f_d);\,\,f_j=\sum _{i=0}^{n_j-1} a_i^j \mathbf{1}_{[t_i^j, t_{i+1}^j)}\,, \nonumber \\ 0=t_0<t_1^j<\cdots <t_{n_j-1}^j<t_{n_j}^j=T, \text{ for} j=1,\ldots ,d\Big \}. \end{aligned}$$
(9)

We call \({\fancyscript{H}}\) the completion of \({\fancyscript{E}}\) with respect to the semi-inner product

$$\begin{aligned} \left\langle f,\, g\right\rangle _{{\fancyscript{H}}}=\sum _{i=1}^{d} \left\langle f_{i},\, g_{i}\right\rangle _{{\fancyscript{H}}_{0}}, \quad \text{ where}\quad \langle \mathbf{1}_{[0,t]}, \mathbf{1}_{[0,s]} \rangle _{{\fancyscript{H}}_{0}} := R(s,t), \quad s,t \in [0,T]. \end{aligned}$$

Then, one constructs an isometry \(K^*_H: {\fancyscript{H}}\rightarrow L^2([0,1];\mathbb{R }^d)\) such that

$$\begin{aligned} K^*_H\left( \mathbf{1}_{[0,t_{1}]},\ldots ,\mathbf{1}_{[0,t_{d}]}\right) = \left( \mathbf{1}_{[0,t_1]} K_H(t_{1},\cdot ),\ldots , \mathbf{1}_{[0,t_d]} K_H(t_d,\cdot )\right) , \end{aligned}$$

where the kernel \(K_H\) is given by

$$\begin{aligned} K_H(t,s)= c_H s^{\frac{1}{2} -H} \int _s^t (u-s)^{H-\frac{3}{2}} u^{H-\frac{1}{2}} \, du \end{aligned}$$

and verifies that \(R_H(t,s)= \int _0^{s\wedge t} K_H(t,r) K_H(s,r)\, dr\), for some constant \(c_H\). Moreover, let us observe that \(K^*_H\) can be represented in the following form: for \(\varphi =(\varphi _1,\ldots ,\varphi _d)\in {\fancyscript{H}}\), we have \(K^*_H \varphi \)

$$\begin{aligned} K^*_H \varphi =\left( K^*_H \varphi ^1,\ldots ,K^*_H\varphi ^d \right) , \quad \text{ where}\quad [K^*_H \varphi ^i]_t = \int _t^1 \varphi _r^i \partial _r K_H(r,t) \, dr. \end{aligned}$$

2.2.2 Malliavin derivatives

Let us start by defining the Wiener integral with respect to \(B\): for any element \(f\) in \({\fancyscript{E}}\) whose expression is given as in (9), we define the Wiener integral of \(f\) with respect to \(B\) as

$$\begin{aligned} B(f):=\sum _{j=1}^d\sum _{i=0}^{n_j-1} a_i^j (B_{t_{i+1}^j}^{j} -B_{t_i^j}^{j}). \end{aligned}$$

We also denote this integral as \( \int _0^T f_{t} dB_t\), since it coincides with a pathwise integral with respect to \(B\).

For \(\theta :\mathbb{R }\rightarrow \mathbb{R }\), and \(j\in \{1,\ldots ,d\}\), denote by \(\theta ^{[j]}\) the function with values in \(\mathbb{R }^d\) having all the coordinates equal to zero except the \(j\)-th coordinate that equals to \(\theta \). It is readily seen that

$$\begin{aligned} \mathbf E \left[ B\left( \mathbf{1}_{[0,s)}^{[j]}\right) \, B\left( \mathbf{1}_{[0,t)}^{[k]}\right) \right] =\delta _{j,k}R_{s,t}. \end{aligned}$$

This definition can be extended by linearity and closure to elements of \({\fancyscript{H}}\), and we obtain the relation

$$\begin{aligned} \mathbf E \left[ B(f) \, B(g)\right] =\langle f,g\rangle _{{\fancyscript{H}}}, \end{aligned}$$

valid for any couple of elements \(f,g\in {\fancyscript{H}}\). In particular, \(B(\cdot )\) defines an isometric map from \({\fancyscript{H}}\) into a subspace of \(L^2(\Omega )\).

We can now proceed to the definition of Malliavin derivatives. With this notation 3 in hand, let us consider \({\fancyscript{S}}\) be the family of smooth functionals \(F\) of the form

$$\begin{aligned} F=f(B(h_1),\dots ,B(h_n)), \end{aligned}$$
(10)

where \(h_1,\dots ,h_n\in {\fancyscript{H}},\,n\ge 1\), and \(f\) is a smooth function with polynomial growth, together with all its derivatives. Then, the Malliavin derivative of such a functional \(F\) is the \({\fancyscript{H}}\)-valued random variable defined by

$$\begin{aligned} D F= \sum _{i=1}^n \partial _{i} f(B(h_1),\dots ,B(h_n)) \, h_i. \end{aligned}$$

For all \(p>1\), it is known that the operator \(D\) is closable from \(L^p(\Omega )\) into \(L^p(\Omega ; {\fancyscript{H}})\) (see e.g. Sect. 1 in Nualart (2006)). We will still denote by \(D\) the closure of this operator, whose domain is usually denoted by \(\mathbb{D }^{1,p}\) and is defined as the completion of \({\fancyscript{S}}\) with respect to the norm

$$\begin{aligned} \Vert F\Vert _{1,p}:= \left( E(|F|^p) + E( \Vert D F\Vert _{\fancyscript{H}}^p ) \right) ^{\frac{1}{p}}. \end{aligned}$$

It should also be noticed that partial Malliavin derivatives with respect to each component \(B^{j}\) of \(B\) will be invoked: they are defined, for a functional \(F\) of the form (10) and \(j=1,\dots ,d\), as

$$\begin{aligned} D^j F=\sum _{i=1}^n \partial _{i} f(B(h_1),\dots ,B(h_n)) h_i^{[j]}, \end{aligned}$$

and then extended by closure arguments again. We refer to Sect. 1 in Nualart (2006) for the definition of higher derivatives and Sobolev spaces \(\mathbb{D }^{k,p}\) for \(k>1\). Another essential object related to those derivatives is the so-called Malliavin matrix of a \(\mathbb{R }^m\)-valued random variable \(F\in \mathbb{D }^{1,2}\), defined by

$$\begin{aligned} \gamma _{F}= \biggl (\Bigl \langle DF^{i}, DF^{j} \Bigr \rangle \biggr )_{1\le i,j \le m}. \end{aligned}$$
(11)

2.2.3 Skorohod integrals

We will denote by \(\delta \) the adjoint of the operator \(D\) (also referred to as the divergence operator). This operator is closed and its domain, denoted by \(\mathrm{Dom}(\delta )\), is the set of \({\fancyscript{H}}\)-valued square integrable random variables \(u\in L^2(\Omega ;{\fancyscript{H}})\) such that

$$\begin{aligned} |\mathbf E \left[ \langle D F,u\rangle _{{\fancyscript{H}}}\right] |\le C\,\Vert F\Vert _2, \end{aligned}$$

for all \(F\in \mathbb{D }^{1,2}\), where \(C\) is some constant depending on \(u\). Moreover, for \(u\in \mathrm{Dom}(\delta )\), \(\delta (u)\) is the element of \(L^2(\Omega )\) characterized by the duality relationship:

$$\begin{aligned} \mathbf E \left[ F\delta (u)\right] =\mathbf E \left[ \langle D F,u\rangle _{{\fancyscript{H}}}\right] , \quad \text{ for} \text{ any}\quad F\in \mathbb{D }^{1,2}. \end{aligned}$$
(12)

The quantity \(\delta (u)\) is usually called Skorohod integral of the process \(u\).

Skorohod integrals are obviously analytic objects, not suitable for easy numerical implementations. However, they can be related to the Young type integrals introduced at Proposition 1. For this, we need to define another functional space as follows:

Notation 5

We call \(|{\fancyscript{H}}|\) the space of measurable functions \(\varphi :[0,T]\rightarrow \mathbb{R }^d\) such that

$$\begin{aligned} \Vert \varphi \Vert ^2_{|{\fancyscript{H}}|}:= c_H \int _0^1 \int _0^1 |\varphi _r| |\varphi _u| |r-u|^{2H-2} dr du <+\infty , \end{aligned}$$

where \(c_H=H(2H-1)\), and we denote by \(\langle \cdot ,\cdot \rangle _{|{\fancyscript{H}}|}\) the associated inner product. We also write \(\mathbb{D }^{k,p}(|{\fancyscript{H}}|)\) for the space of \(\mathbb{D }^{k,p}\) functionals with values in \(|{\fancyscript{H}}|\).

The following proposition is then a slight extension of Proposition 5.2.3 in Nualart (2006):

Proposition 4

Let \(\{u_t^{ij},\; t\in [0,1]\}\), for \(i=1,\ldots ,m\) and \(j=1,\ldots ,d\), be a stochastic process in \(\mathbb{D }^{1,2}(|{\fancyscript{H}}|)\) such that

$$\begin{aligned} \sum _{j=1}^{d}\int _0^1 \int _0^1 |D_s^{j} u_t^{ij}| \, |t-s|^{2H-2} ds dt <+\infty \quad a.s. \end{aligned}$$
(13)

We also assume that almost surely, \(u\) has \(\beta \)-Hölder paths with \(\beta +H>1\). Then the Young integral \(\sum _{j=1}^{d}\int _0^T u_t^{ij} \, dB_t^{j}\) exists and for all \(i=1,\ldots ,m\) can be written as

$$\begin{aligned} \sum _{j=1}^{d}\int _0^T u_t^{ij} \, dB_t^{j} = \delta (u^{i}) + \sum _{j=1}^{d} \int _0^T \int _0^T D_s^{j} u_t^{ij} |t-s|^{2H-2} ds dt, \end{aligned}$$

where \(\delta (u)\) stands for the Skorohod integral of \(u\).

3 Probabilistic expression for the log-likelihood

Recall that we are focusing on Eq. (1) driven by a \(d\)-dimensional fBm \(B\), and that we have chosen to use expression (5) as a substitute to the log-likelihood function. We have thus reduced the initial maximization problem to the solution of \(\nabla _{l}\ell _{n} (\theta ) = 0 \). This will be performed numerically by means of a root approximation algorithm.

Observe first that in order to define (5), the density of \(Y_t(\theta )\) must exist for any \(t>0\). Let us thus recall the classical setting (given in Hu and Nualart 2007) under which \(Y_t\) admits a smooth density:

Hypothesis 6

Let \(\mu \) and \(\sigma \) be coefficients satisfying Hypothesis 4. For \(\xi \in \mathbb{R }^m\) and \(\theta \in \varTheta \), set \(\alpha (\xi )=\sigma (\xi ,\theta ) \sigma ^*(\xi ,\theta )\). Then we assume that

  1. (i)

    For any \(k\ge 0\) and \(j_1,\ldots ,j_k\in \{1,\ldots ,m\}\) we have

    $$\begin{aligned} \sup _{\theta \in \varTheta } \sum _{l=0}^{2} \sum _{1\le p_1,\ldots ,p_l \le q} \Vert \nabla _{p_1 \cdots p_l}^{l} \partial _{j_1,\ldots ,j_k}^{k} \mu (\cdot ;\theta )\Vert _{\infty } + \Vert \nabla _{p_1 \cdots p_l}^{l} \partial _{j_1,\ldots ,j_k}^{k} \sigma (\cdot ;\theta )\Vert _{\infty } \le c_{k}, \end{aligned}$$

    for a strictly positive constant \(c_{k}\).

  2. (ii)

    There exists a strictly positive constant \(\varepsilon \) such that \(\langle \alpha (\xi ;\,\theta )\eta ,\, \eta \rangle _{\mathbb{R }^m}\ge \varepsilon |\eta |^2_{\mathbb{R }^m}\) for any couple of vectors \(\eta ,\xi \in \mathbb{R }^m\), uniformly in \(\theta \in \varTheta \).

Then the density result for \(Y_t\) can be read as follows:

Theorem 7

Consider the stochastic differential Eq. (1) with initial condition \(a\in \mathbb{R }^m\). Assume Hypothesis 6 is satisfied. Then, for any \(t>0\) and \(\theta \in \varTheta \), the law of \(Y_t(\theta )\) admits a \({\fancyscript{C}}^\infty \) density, denoted by \(f(t,\cdot ;\,\theta )\), with respect to Lebesgue’s measure.

In the sequel, we shall suppose that the density \(f(t,\cdot ;\,\theta )\) exists without further mention, the aim of this section being to produce a probabilistic representation of \(f(t,\cdot ;\,\theta )\) for computational purposes. To this aim, we shall first give the equations governing the Malliavin derivatives of the processes \(Y(\theta )\) and \(\nabla Y(\theta )\), and then use a stochastic analysis formula in order to represent our log-likelihood. We separate these tasks in two different subsections.

3.1 Some Malliavin derivatives

This section is devoted to a series of preliminary lemmas which will enable to formulate our probabilistic representation of \(f(t,\cdot ;\,\theta )\). Let us first introduce a notation which will prevail until the end of the paper:

Notation 8

For a set of indices or coordinates \((k_1,\ldots ,k_r)\) of length \(r\ge 1\) and \(1\le j\le r\), we denote by \((k_1,\ldots ,\check{k}_j,\ldots ,k_r)\) the set of indices or coordinates of length \(r-1\) where \(k_j\) has been omitted.

We now give a general expression for the higher order derivatives of \(Y_t\), borrowed from Nualart and Saussereau (2009).

Lemma 1

Assume Hypothesis 4 and 6 hold true. For \(n\ge 1\) and \((i_{1},\ldots ,i_{n})\in \{1,\ldots ,d\}^{n}\), denote by \(D^{i_{1},\ldots ,i_{n}}Y_{t}^{i}(\theta )\) the \(n^{th}\) Malliavin derivative of \(Y_{t}^{i}(\theta )\) with respect to the coordinates \(B^{i_1},\ldots ,B^{i_n}\) of \(B\). Then \(D^{i_{1},\ldots ,i_{n}}Y_{t}^{i}(\theta )\), considered as an element of \({\fancyscript{H}}^{\otimes n}\), satisfies the following linear equation: for \(t \ge r_{1}\vee \cdots \vee r_{n}\),

$$\begin{aligned}&D_{r_{1},\ldots ,r_{n}}^{i_{1},\ldots ,i_{n}} Y_{t}^{i}(\theta ) = \sum _{p=1}^{n} \alpha ^{i}_{i_p,i_{1}\ldots ,\check{\imath }_p,\ldots , i_{n}}(r_{p};r_{1},\ldots ,\check{r}_{p},\ldots ,r_{n};\theta )\nonumber \\&+ \int _{r_{1}\vee \cdots \vee r_{n}}^{t} \beta ^{i}_{i_{1},\ldots ,i_{n}} (s;r_{1},\ldots ,r_{n};\theta ) \;ds + \sum _{l=1}^{d} \int _{r_{1}\vee \cdots \vee r_{n}}^{t} \alpha ^{i}_{l,i_{1},\ldots ,i_{n}} (s;r_{1},\ldots ,r_{n};\theta ) \;dB_{s}^{l},\nonumber \\ \end{aligned}$$
(14)

where

$$\begin{aligned} \alpha ^{i}_{j,i_{1},\ldots ,i_{n}} (s;r_{1},\ldots ,r_{n};\theta )&= \sum \sum _{k_{1},\ldots ,k_{\nu }=1 }^{m} \partial _{k_{1}\ldots k_{\nu }}^{\nu } \sigma ^{ij}(Y_{s}(\theta );\theta ) \; D_{r(I_{1})}^{i(I_{1})} Y_{s}^{k_{1}}(\theta ) \ldots D_{r(I_{\nu })}^{i(I_{\nu })} Y_{s}^{k_{\nu }}(\theta )\\ \beta ^{i}_{i_{1},\ldots ,i_{n}} (s;r_{1},\ldots ,r_{n};\theta )&= \sum \sum _{k_{1},\ldots ,k_{\nu }=1 }^{m} \partial _{k_{1}\ldots k_{\nu }}^{\nu } \mu ^{i} (Y_{s}(\theta );\theta )\; D_{r(I_{1})}^{i(I_{1})} Y_{s}^{k_{1}}(\theta ) \ldots D_{r(I_{\nu })}^{i(I_{\nu })} Y_{s}^{k_{\nu }}(\theta ). \end{aligned}$$

In the expressions above, the first sums are extended to the set of all partitions \(I_{1}, \ldots , I_{\nu }\) of \(\{1,\ldots ,n\}\) and for any subset \(K=\{i_{1},\ldots ,i_{\eta }\}\) of \(\{1,\ldots ,n\}\) we set \(D_{r(K)}^{i(K)}\) for the derivative operator \(D_{r_{1},\ldots ,r_{\eta }}^{i_{1},\ldots ,i_{\eta }} \). Notice that \(D_{r_{1},\ldots ,r_{n}}^{i_{1},\ldots ,i_{n}} Y_{t}^{i}(\theta ) =0\) whenever \(t < r_{1}\vee \cdots \vee r_{n}\).

The formulas above might seem intricate. The following examples illustrate their use in a simple enough situation:

Example 1

The first order derivative \(D_{r_{1}}^{1}Y_{t}^{1}\) can be computed as

$$\begin{aligned} D_{r_{1}}^{2}Y_{t}^{1}&= \sigma ^{12}(Y_{r_{1}}(\theta ); \theta ) + \sum _{k=1}^{m} \partial _{k} \mu ^{1}(Y_{s}(\theta ); \theta ) D_{r_{1}}^{2}Y_{s}(\theta )^{k}ds\\&+ \sum _{\ell =1}^{d} \sum _{k=1}^{m} \int _{r_{1}}^{t} \partial _{k} \sigma ^{1\ell }(Y_{s}(\theta ); \theta ) D_{r_{1}}^{2}Y_{s}(\theta )^{k}dB_{s}^{\ell }, \end{aligned}$$

if \(r_{1}\le t\) and 0, if \(r_{1}>t\).

Example 2

The second order derivative \(D_{r_{1},r_{2}}^{1,3}Y_{t}^{2}(\theta )\) can be computed as:

$$\begin{aligned} D_{r_{1},r_{2}}^{1,3}Y_{t}^{2}(\theta )&= \alpha _{1,3}^{2}(r_{1},r_{2};\theta ) + \alpha _{3,1}^{2}(r_{2},r_{1};\theta ) \\&+ \int _{r_{1}\vee r_{2}}^{t} \beta ^{2}_{1,3}(s,r_{1},r_{2};\theta )\;ds + \sum _{l=1}^{d} \int _{r_{1}\vee r_{2}}^{t} \alpha _{l,1,3}^{2}(s,r_{1},r_{2};\theta ) dB_{s}^{l}, \end{aligned}$$

where

$$\begin{aligned} \alpha _{1,3}^{2}(r_{1},r_{2};\theta )&= \sum _{k=1}^{m} \partial _{k} \sigma ^{21}(Y_{r_{2}}(\theta );\theta )\;D_{r_{2}}^{3} Y_{r_{1}}^{k}(\theta ),\\ \alpha _{3,1}^{2}(r_{2},r_{1};\theta )&= \sum _{k=1}^{m} \partial _{k} \sigma ^{23}(Y_{r_{1}}(\theta );\theta )\;D_{r_{1}}^{1} Y_{r_{2}}^{k}(\theta ) \end{aligned}$$

and

$$\begin{aligned} \beta ^{2}_{1,3}(s,r_{1},r_{2};\theta )&= \partial ^{2}_{kk} \mu ^{2}(Y_{s}(\theta );\theta ) D_{r_{1},r_{2}}^{1,3} Y_{s}^{k}(\theta ) +\partial _{k_{1}k_{2}}^{2} \mu ^{2}(Y_{s}(\theta );\theta ) D_{r_{1}}^{1} Y_{s}^{k_{1}}(\theta ) D_{r_{2}}^{3} Y_{s}^{k_{2}}(\theta ), \\ \alpha _{l,1,3}^{2}(s,r_{1},r_{2};\theta )&= \partial ^{2}_{kk} \sigma ^{2l}(Y_{s}(\theta );\theta ) D_{r_{1},r_{2}}^{1,3} Y_{s}^{k}(\theta ) + \partial _{k_{1}k_{2}}^{2} \sigma ^{2l}(Y_{s}(\theta );\theta ) D_{r_{1}}^{1} Y_{s}^{k_{1}}(\theta ) D_{r_{2}}^{3} Y_{s}^{k_{2}}(\theta ), \end{aligned}$$

where we have used the convention of summation over repeated indices.

Our formula for the log-likelihood will also involve some derivatives of the process \(Y(\theta )\) with respect to the parameter \(\theta \). The existence of this derivative is assessed below:

Proposition 5

Under the same hypothesis as for Lemma 1, the random variable \(Y_{t}^{i}(\theta )\) is a smooth function of \(\theta \) for any \(t\ge 0\). We denote by \(\nabla _{l}Y_{t}^{i}(\theta )\) the derivative of \(Y_{t}^{i}(\theta )\) with respect to the \(l^{th}\) element of the vector of parameters \(\theta \). This process satisfies the following SDE:

$$\begin{aligned} \nabla _{l}Y_{t}^{i}(\theta )&= \int _{0}^{t} [\partial _{i} \mu ^{i}(Y_{u}(\theta );\theta ) \nabla _{l}Y_{u}^{i}(\theta ) + \nabla _{l}\mu ^{i}(Y_{u}(\theta );\theta )]du \\&+ \sum _{j=1}^{d}\int _{0}^{t}[\partial \sigma ^{ij}(Y_{u}(\theta );\theta ) \nabla _{l}Y_{u}^{i}(\theta ) + \nabla _{l}\sigma ^{ij}(Y_{u}(\theta );\theta )]dB_{u}^{j}. \end{aligned}$$

Proof

The proof goes exactly along the same lines as for Proposition 4 in Nualart and Saussereau (2009), and the details are left to the reader. \(\square \)

We shall also need some equations describing the gradient of the Malliavin derivatives of \(\nabla _{l}Y(\theta )\) with respect to \(\theta \). This is the aim of the following lemma:

Lemma 2

For any \(l\in \{1,\ldots ,q\}\) and \(n\ge 1\), the process \(\nabla _{l}D^{i_{1},\ldots ,i_{n}}Y(\theta )\) is \(n\)-times differentiable in the Malliavin calculus sense. Moreover, taking up the notations of Lemma 1, the process \(\nabla _{l}D^{i_{1},\ldots ,i_{n}}Y_{t}^{i}(\theta )\) satisfies the following linear equation: for \(t \ge r_{1}\vee \cdots \vee r_{n}\),

$$\begin{aligned}&\nabla _{l}D_{r_{1},\ldots ,r_{n}}^{i_{1},\ldots ,i_{n}} Y_{t}^{i}(\theta )= \sum _{p=1}^{n} \hat{\alpha }^{i,l}_{i_{p},i_{1}\ldots ,\check{\imath }_p,\ldots , n}(r_{i_{p}},r_1,\ldots ,\check{r}_{p},\ldots ,r_{n};\theta ) \\&+ \int _{r_{1}\vee \cdots \vee r_{n}}^{t} \hat{\beta }^{i,l}_{i_{1},\ldots ,i_{n}} (s;r_{1},\ldots ,r_{n};\theta ) \;ds +\sum _{l=1}^{d} \int _{r_{1}\vee \cdots \vee r_{n}}^{t} \hat{\alpha }^{i,l}_{l,i_{1},\ldots ,i_{n}} (s;r_{1},\ldots ,r_{n};\theta ) \;dB_{s}^{l}, \end{aligned}$$

where \(\hat{\alpha }^{i,l}_{j,i_{1},\ldots ,i_{n}}=\nabla _{l} \alpha ^{i}_{j,i_{1},\ldots ,i_{n}}\) and \(\hat{\beta }^{i,l}_{j,i_{1},\ldots ,i_{n}}=\nabla _{l} \beta ^{i}_{i_{1},\ldots ,i_{n}}\). More specifically, \(\hat{\beta }^{i,p}_{j,i_{1},\ldots ,i_{n}}\) is defined recursively by

$$\begin{aligned}&\hat{\beta }^{i,p}_{i_{1},\ldots ,i_{n}} (s;r_{1},\ldots ,r_{n};\theta ) \\&=\sum _{I_{1} \cup \ldots \cup I_{\nu }} \sum _{k_{1},\ldots ,k_{\nu }=1 }^{m} \Big \{ \nabla _{p} [\partial _{k_{1}\ldots k_{\nu }}^{\nu } \mu ^{i}(Y_{s}(\theta );\theta )]\; D_{r(I_{1})}^{i(I_{1})} Y_{s}^{k_{1}}(\theta ) \cdots D_{r(I_{\nu })}^{i(I_{\nu })} Y_{s}^{k_{\nu }}(\theta ) \\&\quad +\partial _{k_{1}\ldots k_{\nu }}^{\nu } \mu ^{i}(Y_{s}(\theta );\theta )\; \sum _{p=1}^{\nu } \nabla _{p}D_{r(I_{p})}^{i(I_{p})} Y_{s}^{k_{p}}(\theta ) \, D_{r(I_{1})}^{i(I_{1})} Y_{s}^{k_{1}}(\theta ) \cdots D_{\check{r}(I_p)}^{\check{\imath }(I_p)}Y_s^{\check{k}_p}(\theta ) \cdots D_{r(I_{\nu })}^{i(I_{\nu })} Y_{s}^{k_{\nu }}(\theta ) \Big \}, \end{aligned}$$

where we have set

$$\begin{aligned} \nabla _{p} [\partial _{k_{1}\ldots k_{\nu }}^{\nu } \mu ^{i}(Y_{s}(\theta );\theta )] = \nabla _{p}\partial _{k_{1}\ldots k_{\nu }}^{\nu } \mu ^{i}(Y_{s}(\theta );\theta ) + \partial \partial _{k_{1}\ldots k_{\nu }}^{\nu } \mu ^{i}(Y_{s}(\theta );\theta ) \nabla _{p} Y_{s}(\theta ). \end{aligned}$$

Notice that the same kind of equation (skipped here for sake of conciseness) holds true for the coefficients \(\hat{\alpha }^{i,l}_{j,i_{1},\ldots ,i_{n}}\).

The next object we need for our calculations is the inverse of the Malliavin matrix \(\gamma _{Y_t(\theta )}\) of \(Y_t(\theta )\). Recall that according to (11), the Malliavin matrix of \(Y_t(\theta )\) is defined by

$$\begin{aligned} \gamma _t(\theta ):=\gamma _{Y_{t}(\theta )} = \left( \left\langle D_{\cdot }Y_{t}^{i}(\theta ), D_{\cdot }Y_{t}^{j}(\theta ) \right\rangle \right) _{1\le i,j \le m}, \end{aligned}$$
(15)

where we have set \(\gamma _t(\theta ):=\gamma _{Y_{t}(\theta )}\) for notational sake in the computations below. We shall now compute \(\gamma _t^{-1}(\theta )\) as the solution to a SDE:

Proposition 6

The matrix valued process \(\gamma _t^{-1}(\theta )\) is the unique solution to the following linear equation in \(\eta \):

$$\begin{aligned} \eta _{t}(\theta )&= \tilde{\alpha }_{0}^{-1}(Y_{t}(\theta );\theta ) - \sum _{l=1}^{d}\int _{0}^{t} [ \eta _{u}(\theta ) \tilde{\alpha }_{l}(Y_{u}(\theta );\theta ) + \tilde{\alpha }_{l}^{T}(Y_{u}(\theta );\theta )\eta _{u} ] dB_{u}^{l} \nonumber \\&- \int _{0}^{t} [\eta _{u}(\theta ) \tilde{\beta }(Y_{u}(\theta );\theta ) + \tilde{\beta }^{T}(Y_{u}(\theta );\theta )\eta _{u}(\theta ) ] du, \end{aligned}$$
(16)

with

$$\begin{aligned} \tilde{\alpha }_{0}(Y_{t}(\theta );\theta ) \!=\! \sum _{j=1}^{m} \int _{0}^{t} \int _{0}^{t} \sigma ^{ij}(Y_{r}(\theta );\theta ) \sigma ^{i^{\prime }j}(Y_{r^{\prime }}(\theta );\theta )\; |r-r^{\prime }|^{2H-2} dr\;dr^{\prime } , i,i^{\prime } \!=\! 1,\ldots , m \end{aligned}$$

and where the other coefficients \(\tilde{\alpha }\) and \(\tilde{\beta }\) are defined by

$$\begin{aligned} \tilde{\alpha }_{l}(Y_{u}(\theta );\theta ) = \Bigl (\partial _{k} \sigma ^{i^{\prime }l} (Y_{u}(\theta );\theta ) \Bigr )_{1\le i^{\prime }, k \le m} \;\; \text{ and} \;\; \tilde{\beta }(Y_{u}(\theta );\theta )= \Bigl (\partial _{k} \mu ^{i^{\prime }} (Y_{u}(\theta );\theta ) \Bigr )_{1\le i^{\prime }, k \le m}.\nonumber \\ \end{aligned}$$
(17)

Proof

The proof of this fact is an adaptation of Theorem 7 in Hu and Nualart (2007) to the case of a SDE with drift. We include it here for sake of completeness, and we drop the dependence of \(Y\) on \(\theta \) for notational sake in the computations below.

Let us start by invoking Proposition 3 and Eq. (14) in order to compute the product of two first-order Malliavin derivatives:

$$\begin{aligned}&D_{r}^{j}Y_{t}^{i}\;D_{r^{\prime }}^{j}Y_{t}^{i^{\prime }} = \sigma ^{ij}(Y_{r}) \sigma ^{i^{\prime }j}(Y_{r^{\prime }}) \\&+ \sum _{k=1}^{m} \Biggl \{ \int _{0}^{t} \sum _{l=1}^{d} \biggl [ \partial _{k} \sigma ^{il}(Y_{u})\; D_{r^{\prime }}^{j}Y_{u}^{i^{\prime }}\; D_{r}^{j}Y_{u}^{k} + \partial _{k} \sigma ^{i^{\prime }l}(Y_{u})\; D_{r}^{j}Y_{u}^{i}\; D_{r^{\prime }}^{j}Y_{u}^{k}\; \biggr ]\; dB_{u}^{l} \nonumber \\&+ \int _{0}^{t} \biggl [ \partial _{k} \mu ^{i}(Y_{u})\; D_{r^{\prime }}^{j}Y_{u}^{i^{\prime }}\; D_{r}^{j}Y_{u}^{k}\;+ \partial _{k} \mu ^{i^{\prime }}(Y_{u})\; D_{r}^{j}Y_{u}^{i}\; D_{r^{\prime }}^{j}Y_{u}^{k} du \biggr ]\Biggr \}. \nonumber \end{aligned}$$
(18)

Moreover, recall that \(\gamma _t\) is defined by (15). Thus, the covariance matrix becomes

$$\begin{aligned} \gamma _{t}^{i i^{\prime }} = \sum _{j=1}^{d} \left\langle D^{j}Y_{t}^{i} ,\, D^{j}Y_{t}^{i^{\prime }}\right\rangle _{{\fancyscript{H}}} = c_H \sum _{j=1}^{d} \int _{0}^{t} \int _{0}^{t} D_{r}^{j}Y_{t}^{i}(\theta )\; D_{r^{\prime }}^{j}Y_{t}^{i^{\prime }}(\theta )\; |r-r^{\prime }|^{2H-2}\;dr\;dr^{\prime }. \end{aligned}$$

Plugging (18) into this relation, we end up with the following equation for \(\gamma ^{i i^{\prime }}\):

$$\begin{aligned} \gamma _{t}^{i i^{\prime }}&= \tilde{\alpha }_{0}^{ii^{\prime }} + \sum _{l=1}^{d} \int _{0}^{t} \sum _{k=1}^{m}\biggl [ \partial _{k} \sigma ^{il}(Y_{u})\; \gamma _{u}^{i^{\prime }k} + \partial _{k} \sigma ^{i^{\prime }l}(Y_{u})\; \gamma _{u}^{ik}\biggr ] dB_{u}^{l}\\&+ \int _{0}^{t} \sum _{k=1}^{m} \biggl [ \partial _{k} \mu ^{i}(Y_{u})\; \gamma _{u}^{i^{\prime }k} + \partial _{k} \mu ^{i^{\prime }}(Y_{u})\; \gamma _{u}^{ik} \biggr ] du. \end{aligned}$$

Using our notation (17) and matrix product rules, we obtain that \(\gamma _t\) is solution to:

$$\begin{aligned} \gamma _{t} = \sum _{l=1}^{d} \int _{0}^{t} ( \tilde{\alpha }_{l}(Y_{u}) \gamma _{u} + \gamma _{u} \tilde{\alpha }_{l}^{T}(Y_{u}))dB_{u}^{l} + \int _{0}^{t} ( \tilde{\beta }(Y_{u}) \gamma _{u} + \gamma _{u} \tilde{\beta }^{T}(Y_{u}) ) du. \end{aligned}$$

Consider now \(\eta \) solution to (16). Applying again Proposition 3, it is readily checked that \(\gamma _t\eta _t=\mathrm{Id}\) for any \(t\in [0,T]\), which ends the proof. \(\square \)

Remark 2

Gathering Eq. (16) and Proposition 2, it is easily seen that for any \(t>0\) and \(\theta \in \varTheta ,\,Y_t(\theta )\) is a non degenerate random variable in the sense given at Definition 2.1.2 in Nualart (2006): we have \(\det (\gamma _t^{-1})\in L^p(\Omega )\) for any \(p>1\).

Now that we have derived an equation for \(\eta =\gamma ^{-1}\), an equation for the Malliavin derivative of \(\eta \) is also available:

Proposition 7

For any \(l\in \{1,\ldots ,q\}\) and \(n\ge 1\), the process \(\eta _{t}=\gamma _t^{-1}\) is \(n\)-time differentiable in the Malliavin calculus sense. Moreover, the process \(D^{i_{1},\ldots ,i_{n}}\eta _{t}\) satisfies the following equation: for \(t \ge r_{1}\vee \cdots \vee r_{n}\),

$$\begin{aligned}&D_{r_{1},\ldots ,r_{n}}^{i_{1},\ldots ,i_{n}} \eta _{t}^{ij}(\theta ) = - \sum _{k_{1}=1}^{n} \sum _{k_{2}=1}^{k_{1}} ( D_{r_{1},\ldots ,r_{k_{2}}}^{i_{1},\ldots ,i_{k_{2}}} \tilde{\alpha }_{0}^{-1} \; D_{r_{1},r_{2},\ldots ,r_{k_{1}-k_{2}}}^{i_{1},i_{2},\ldots ,i_{k_{1}-k_{2}}} \tilde{\alpha }_{0}\; D_{r_{1},\ldots ,r_{n-k_{1}}}^{i_{1},\ldots ,i_{n-k_{1}}} \tilde{\alpha }_{0}^{-1} )^{ij} \\&- \sum _{\ell =1}^{d} \int _{r_{1}\vee \ldots \vee r_{n}}^{t} C_{\ell ,i_{1},\ldots ,i_{n}}^{ij} (s;r_{1},\ldots ,r_{n};\theta ) dB^{\ell }_{s} - \int _{r_{1}\vee \ldots \vee r_{n}}^{t} A_{i_{1},\ldots ,i_{n}}^{ij} (s;r_{1},\ldots ,r_{n};\theta ) ds, \end{aligned}$$

where

$$\begin{aligned}&A_{i_{1},\ldots ,i_{n}}^{ij} (s;r_{1},\ldots ,r_{n};\theta )= \sum \sum _{k_{1},\ldots ,k_{\nu }}^{m} \sum _{k=1}^{m}\\&\Big \{ [ \partial _{k_{1},\ldots ,k_{\nu }}^{\nu } (\tilde{\beta }(Y_{u}(\theta );\theta ))^{kj}\; D_{r(I_{1})}^{i(I_{1})}\eta _{s}^{ik}(\theta ) \ldots D_{r(I_{\nu })}^{i(I_{\nu })}\eta _{s}^{ik}(\theta )\;\;D_{r(I_{1})}^{i(I_{1})}Y_{s}^{i}(\theta ) \ldots D_{r(I_{\nu })}^{i(I_{\nu })}Y_{s}^{i}(\theta ) ] \\&+ [ \partial _{k_{1},\ldots ,k_{\nu }}^{\nu } (\tilde{\beta }(Y_{u}(\theta );\theta ))^{ik}\; D_{r(I_{1})}^{i(I_{1})}\eta _{s}^{kj}(\theta ) \ldots D_{r(I_{\nu })}^{i(I_{\nu })}\eta _{s}^{kj}(\theta )\;\; D_{r(I_{1})}^{i(I_{1})}Y_{s}^{j}(\theta ) \ldots D_{r(I_{\nu })}^{i(I_{\nu })}Y_{s}^{j}(\theta ) ] \Big \} \end{aligned}$$

and the same kind of equation holds for \(C_{l, i_{1},\ldots ,i_{n}}^{ij} (s;r_{1},\ldots ,r_{n};\theta )\), with the coefficients \(\beta \) replaced by \(\alpha _l\).

Proof

The proof of this proposition is based on Lemma 1 and the fact that \(\frac{dA_{\lambda }^{-1}}{d\lambda } = -A_{\lambda }^{-1} \frac{dA_{\lambda }}{d\lambda } A_{\lambda }^{-1}\). \(\square \)

Finally, one can also differentiate \(\eta \) with respect to our standing parameter \(\theta \), which yields:

Lemma 3

The derivative of the inverse of the Malliavin matrix \(\eta _{t}\) with respect to \(\theta \) satisfies the following SDE

$$\begin{aligned}&\nabla _{l} \eta _{t}(\theta )=\nabla _{l} \tilde{\alpha }_{0}^{-1} - \sum _{\ell =1}^{d} \int _{0}^{t} \left\{ {\nabla _{l}} \eta _{u}(\theta ) \tilde{\alpha }_{\ell } (Y_{u}(\theta );\theta ) +\eta _{u}(\theta ) \nabla _{l}[\tilde{\alpha }_{\ell }(Y_{u}(\theta );\theta )]\right. \\&+ \left. \nabla _{l} [\tilde{\alpha }_{\ell }^{T}(Y_{u}(\theta );\theta )] \eta _{u}(\theta ) + \tilde{\alpha }_{\ell }^{T}(Y_{u}(\theta );\theta ) \nabla _{l} {\eta _{u}(\theta )}\right\} dB_{u}^{\ell }- \int _{0}^{t} \{\nabla _{l} \eta _{u}(\theta ) \tilde{\beta }(Y_{u}(\theta );\theta ) \\&+ \eta _{u}(\theta ) \nabla _{l} [\tilde{\beta }(Y_{u}(\theta );\theta )] +\nabla _{l} [\tilde{\beta }^{T}(Y_{u}(\theta );\theta )] \eta _{u}(\theta ) + \tilde{\beta }^{T}(Y_{u}(\theta );\theta ) \nabla _{l} \eta _{u}(\theta )\} du, \end{aligned}$$

where \(\nabla _{l}[\tilde{\beta }_{\ell }(Y_{u})] = \partial \tilde{\beta }_{\ell }(Y_{u}) \nabla _{l}Y_{u} + \nabla _{l}\tilde{\beta }_{\ell }(Y_{u})\) and \(\nabla _{l}[\tilde{\alpha }_{\ell }(Y_{u})] = \partial \tilde{\alpha }_{\ell }(Y_{u}) \nabla _{l}Y_{u} + \nabla _{l}\tilde{\alpha }_{\ell }(Y_{u})\).

3.2 Probabilistic representation of the likelihood

We have chosen to represent the log-likelihood of our sample thanks to the following formula borrowed from the stochastic analysis literature:

Proposition 8

Let \(F\) be a \(\mathbb{R }^m\)-valued non degenerate random variable (see Remark 2 for references on this concept), and let \(f\) be the density of \(F\). For \(n\ge 1\) and \((j_1,\ldots ,j_n)\in \{1,\ldots ,m\}^{n}\), let \(H_{(j_1,\ldots ,j_n)}(F)\) be defined recursively by \(H_{(j_1)}(F)=\sum _{j=1}^{m} \delta ( (\gamma _F^{-1})^{j_1j} DF^j)\) and

$$\begin{aligned} H_{(j_1,\ldots ,j_n)}(F)=\sum _{j=1}^{m} \delta \left( \left( \gamma _F^{-1}\right) ^{j_nj} DF^j H_{(j_1,\ldots ,j_{n-1})}(F)\right) , \end{aligned}$$
(19)

where the Skorohod operator \(\delta \) is defined at Sect. 2.2.3. Then one can write

$$\begin{aligned} f(x)= \mathbf E \left[ \mathbf{1}_{(F>x)} H_{(1,\ldots , m)}(F)\right] = \mathbf E \left[ \left( F-x\right) _+ H_{(1,\ldots , m,1,\ldots , m)}(F)\right] , \end{aligned}$$
(20)

where \(\mathbf{1}_{(F>x)}:=\prod _{i=1}^{m} \mathbf{1}_{(F^{i}>x_i)}\) and \((F-x)_+:=\prod _{i=1}^{m}(F^i-x_i)_+\).

Proof

The first formula is a direct application of Proposition 2.1.5 in Nualart (2006). The second one is obtained along the same lines, integrating by parts \(m\) additional times with respect to the first one. \(\square \)

The formula above can obviously be applied to \(Y_t(\theta )\) for any strictly positive \(t\), since we have noticed at Remark 2 that \(Y_t(\theta )\) is a non-degenerate random variable. However, the expression of \(H_{(j_1,\ldots ,j_n)}(Y_t(\theta ))\) given by (19) is written in terms of Skorohod integrals, which are not amenable to numerical computations. We will thus recast this expression in terms of Young integrals plus some correction terms:

Proposition 9

Under Hypothesis 4 and 6, let us define \(Q_{st}^{pji}:=(\gamma _s^{-1})^{pj} D_s^{i}Y_t^{j}(\theta )\) for \(0\le s<t\le T,\,p,j\in \{1,\ldots ,m\}\) and \(i\in \{1,\ldots , d\}\). Consider \(p\in \{1,\ldots ,m\}\) and a real valued random variable \(G\) which is smooth in the Malliavin calculus sense. Set

$$\begin{aligned} U_p(G)=\sum _{i=1}^{m} \sum _{j=1}^{d} G \int _0^{t} Q_{st}^{pji} \, dB_s^{i} - c_{H} \sum _{i=1}^{m} \sum _{j=1}^{d} \int _0^{t} \int _0^{t} D_s^{i} \left[ G Q_{rt}^{pji}\right] |r-s|^{2H-2} dr ds,\nonumber \\ \end{aligned}$$
(21)

where the integral with respect to \(B\) is understood in the Young sense. Then the quantities \(H_{(j_1,\ldots , j_n)}(Y_t(\theta ))\) defined at Proposition 8 can be expressed as

$$\begin{aligned} H_{(j_1,\ldots , j_n)}(Y_t(\theta ))=\sum _{j=1}^{m} U_{j_{n}}\circ \cdots \circ U_{j_{1}}\left( Y_t^{j}(\theta ) \right) \!. \end{aligned}$$
(22)

Proof

It is an immediate consequence of Proposition 4, since we have noticed in our Remark 2 that \(Y_t(\theta )\) is a non-degenerate random variable. \(\square \)

The previous proposition is still not sufficient to warranty an effective computation of the log-likelihood. Indeed, the right hand side of (21) contains terms of the form \(D_s [ G Q_{rt}^{pji}]\), which should be given in a more explicit form. This is the content of our next proposition.

Proposition 10

Set \(H_{(j_1,\ldots , j_n)}(Y_t(\theta )):=K_{j_{1}\ldots j_{n}}\). Then the term \(D_s [ K_{j_{1}\ldots j_{n}} Q_{rt}^{pji}]\) in (21) can be computed inductively as follows:

(i) We have \(D_s [ K_{j_{1}\ldots j_{n}} Q_{rt}^{pji}]=D_s K_{j_{1}\ldots j_{n}} \, Q_{rt}^{pji} + K_{j_{1}\ldots j_{n}} \, D_sQ_{rt}^{pji}\), and \(D_sQ_{rt}^{pji}\) is computed by invoking Proposition 7 for the derivative of \(\gamma _t^{-1}\) and Lemma 1 for the derivative of \(Y_t(\theta )\). We are thus left with the computation of \(D_s K_{j_{1}\ldots j_{n}}\).

(ii) Assume now that we can compute \(n-r\) Malliavin derivatives of \(K_{j_{1}\ldots j_{r}}\). Notice that this condition is met for \(r=0\), since \(Y_t(\theta )\) itself can be differentiated \(n\) times in an explicit way according to Lemma 1 again. Then for any \(j_1,\ldots ,j_{r+1}\) and \(k\le n-r-1\), the quantity \(K_{j_{1}\ldots j_{r+1}}\) can be differentiated \(k\) times, with a Malliavin derivative given by

$$\begin{aligned} D_{\rho _{1} \ldots \rho _{k}}^{i_{1},\ldots ,i_{k}} K_{j_{1}\ldots j_{r+1}}&= \sum _{\ell =1}^{k} D_{\rho _{1} \ldots \check{\rho }_{\ell } \ldots \rho _{k}}^{i_{1},\ldots ,\check{i}_{\ell } \ldots ,i_{k}} (K_{j_{1}\ldots j_{r}}\; Q_{\rho _{\ell } t}^{pji}) + \sum _{j=1}^{d}\int _{0}^{t} D_{\rho _{1} \ldots \rho _{k}}^{i_{1},\ldots ,i_{k}} (K_{j_{1}\ldots j_{r}}\; Q_{s t}^{pji}) dB_{s}^{j} \nonumber \\&- c_{H} \int _{0}^{t} \int _{0}^{t} D_{r_{1} \rho _{1} \ldots \rho _{k}}^{k+1} (K_{j_{1}\ldots j_{r}}\; Q_{r_{2} t}^{pji}) |r_{1}-r_{2}|^{2H-2}dr_{1} dr_{2}. \end{aligned}$$
(23)

Proof

We focus on the induction step (ii), the other one being straightforward: for a smooth random variable \(W\), one easily gets by induction that

$$\begin{aligned} D_{r_{1}\ldots r_{p}}^{i_{1},\ldots ,i_{p}} \delta (W) = \sum _{\ell =1}^{p} D_{r_{1}\ldots \check{r}_{\ell } \ldots r_{p}}^{i_{1},\ldots ,\check{i}_{\ell } \ldots ,i_{p}} W_{r_\ell } + \delta (D^{i_{1},\ldots ,i_{p}}_{r_{1}\ldots r_{p}} W ). \end{aligned}$$
(24)

Suppose we know the \(n-r\) Malliavin derivatives for \(U_{j_{r}} \circ \dots \circ U_{j_{1}}(F) := K_{j_{1}\ldots j_{r}}\). Recall moreover that

$$\begin{aligned} K_{j_{1}\ldots j_{r+1}} = U_{j_{r+1}} ( K_{j_{r}\ldots j_{1}} ) = \delta ( K_{j_{r}\ldots j_{1}}\; Q_{\cdot t}) \end{aligned}$$

Applying directly relation (24) we thus get, for \(k \le m-1\):

$$\begin{aligned} D_{\rho _{1} \ldots \rho _{k}}^{i_{1},\ldots ,i_{k}} \delta (K_{j_{r}\ldots j_{1}}\; Q_{\cdot t}) = \sum _{\ell =1}^{k} D_{\rho _{1} \ldots \check{\rho }_{\ell } \ldots \rho _{k}}^{i_{1},\ldots ,\check{i}_{\ell }, \ldots ,i_{k-1}} (K_{j_{r}\ldots j_{1}}\; Q_{\rho _{\ell } t}) + \delta (D_{\rho _{1} \ldots \rho _{k}}^{i_{1},\ldots ,i_{k}} (K_{j_{r}\ldots j_{1}}\; Q_{\cdot t})). \end{aligned}$$

Our formula (23) is now obtained by applying Proposition 4 to the Skorohod integral

\(\delta (D_{\rho _{1} \ldots \rho _{k}}^{k} (K_{j_{r}\ldots j_{1}}\; Q_{\cdot t}))\) above. \(\square \)

Example 3

As an illustration of the proposition above, we compute \(U_{2}\circ U_{1}(F)\) for \(F=Y_{t}^{i},\,i\in \{1,\ldots ,m\}\) and our \(d\)-dimensional fBm \(B\).

Write first \(U_{1}(Y_{t}^{i}) = \delta (Y_{t}^{i}\;(\gamma ^{-1})^{1j_{1}} \; D^{j_{1}}Y_{t}^{i})\), and since this quantity has to be expressed in a suitable way for numerical approximations, we have

$$\begin{aligned} U_{1}(Y_{t}^{i}) =\sum _{j_{1}=1}^{d} Y_{t}^{i} \int _{0}^{t} Q_{ut}^{1ij_{1}} dB_{u}^{j_{1}} - c_{H} \sum _{j_{1}=1}^{d} \int _{0}^{t} \int _{0}^{t} D_{u_{1}}^{j_{1}} [Y_{t}^{i} Q_{u_{2}t}^{1ij_{1}}] |u_{1}-u_{2}|^{2H-2} du_{1} du_{2}, \end{aligned}$$

where \(Q\) is defined at Proposition 9 and where the first integral in the right hand side is understood in the Young sense. In order to compute the second one, we have to compute Malliavin derivatives. This is done through Lemma 1 for \(Y\) and Proposition 7 for \(Q\).

We now have to differentiate \(U_{1}(Y_{t}^{i})\): the derivation rules for Skorohod integrals immediately yield

$$\begin{aligned} D_{u_{2}}^{j_{2}}[U_{1}(Y_{t}^{i})] = \sum _{j_{2}=1}^{d}Y_{t}^{i}Q_{u_2t}^{2ij_{2}} + \sum _{j_{2}=1}^{d}\delta (D_{u_{2}}^{j_{2}}[Y_{t}^{i} Q_{\cdot t}^{2ij_{2}}]). \end{aligned}$$

Once again, the Skorohod integral above is not suitable for numerical approximations. Write thus

$$\begin{aligned} D_{u_{2}}^{j_{2}}[U_{1}(Y_{t}^{i})]= \sum _{j_{2}=1}^{d} Y_{t}^{i}Q_{u_{2}t}^{2ij_{2}} + \sum _{j_{2}=1}^{d}\int _{0}^{t} D_{u_{2}}^{j_{2}}[Y_{t}^{i}Q_{rt}^{2ij_{2}}] dB_{r}^{j_{2}} \\ \qquad \qquad \qquad - c_{H} \sum _{j_{2}=1}^{d}\int _{0}^{t} \int _{0}^{t} D_{u_{2}}^{j_{2}}D_{u_{1}}^{j_{1}} [Y_{t}^{i} Q_{u_{2}t}^{2ij_{2}}] |u_{2}-u_{1}|^{2H-2} du_{1}du_{2}, \end{aligned}$$

and compute the Malliavin derivatives of the products \(YQ\) thanks to Lemma 1 for \(Y\) and Proposition 7 for \(Q\). Once this is done, just write

$$\begin{aligned}&U_{2}(U_{1}(Y_{t}^{i})) = \delta (U_{1}(Y_{t}^{i}) Q_{\cdot t}^{i})\\&= \sum _{j_{2}=1}^{d}U_{1}(Y_{t}^{i}) \int _{0}^{t} Q_{ut}^{2ij_{2}} dB_{u}^{j_{2}} - c_{H} \sum _{j_{2}=1}^{d}\int _{0}^{t} \int _{0}^{t} D_{u_{2}}^{j_{2}} [U_{1}(Y_{t}^{i}) Q_{u_{1}t}^{2ij_{2}}] |u_{2} - u_{1}|^{2H-2} du_{1}du_{2}. \end{aligned}$$

In order to give our formula for the derivative of the log-likelihood, we still need to compute the derivative with respect to \(\theta \) of \(H_{(j_1,\ldots , j_n)}(Y_{t}(\theta ))\). For this we state the following lemma

Lemma 4

The derivative with respect to \(\theta \) of \(U_{p}(Y_{t}(\theta ))\) can be written as

$$\begin{aligned} \nabla _{l}U_p(Y_{t}^{i}(\theta ))&= \sum _{j=1}^{d} [\nabla _{l}Y_{t}^{i}(\theta ) \int _0^{t} Q_{st}^{pij}(\theta )\, dB_s^{j} + Y_{t}^{i}(\theta ) \int _0^{t} \nabla _{l}[Q_{st}^{pij}(\theta )]\, dB_s^{j}]\\&- c_{H}\sum _{j=1}^{d} \int _0^{t} \int _0^{t} \nabla _{l} [D_s^{j} Y_{t}^{i}(\theta ) Q_{rt}^{pij}(\theta )] |r-s|^{2H-2} dr ds, \end{aligned}$$

where \(\nabla _{l}Y_{t}^{i}(\theta )\) is computed according to Proposition 5 and \( \nabla _{l} [D_s^{j} Y_{t}^{i}]\) is given by Lemma 2. As far as \(\nabla _{l}[Q_{st}^{pj} (\theta )]\) is concerned, it is obtained through the following equation:

$$\begin{aligned} \nabla _{l}[Q_{st}^{pj} (\theta )]= \nabla _{l}\eta _{s}^{pj}(\theta )\;D_{s}Y_{t}^{j}(\theta )+\eta _{s}^{pj}(\theta ) \nabla _{l}[D_{s}Y_{t}^{j}(\theta )], \end{aligned}$$

where the expression for \(\nabla _{l}\eta _{s}^{pj}(\theta )\) is a consequence of Lemma 3.

We are now ready to state our probabilistic expression for the log-likelihood function (5).

Theorem 9

Assume Hypothesis 4 and 6 hold true. Let \(y_{t_{i}},\,i=1,\ldots ,n\) be the observation arriving at time \(t_{i}\). Let also \(Y_{t_{i}}\) be the solution to the SDE (1) at time \(t_{i}\). Then, the gradient of the log-likelihood function admits the following probabilistic representation: \(\nabla _{l}\ell _n(\theta ) =\sum _{i=1}^{n}\frac{V_i(\theta )}{W_i(\theta )}\), with

$$\begin{aligned} W_i(\theta )=\mathbf E \biggl [ \mathbf 1 _{(Y_{t_{i}}(\theta )>y_{t_{i}})}\; H_{(1,\ldots ,m)}\Bigl ( Y_{t_{i}}(\theta ) \Bigr ) \biggr ] \end{aligned}$$
(25)

and

$$\begin{aligned} V_i(\theta )= \mathbf E \biggl [\nabla _{l}Y_{t_{i}}(\theta ) \; \mathbf 1 _{(Y_{t_{i}}(\theta )>y_{t_{i}})} \;H_{(1,\ldots ,m,1,\ldots , m)} \Bigl ( Y_{t_{i}}(\theta ) \Bigr )\nonumber \\ + \Bigl ( Y_{t_{i}}(\theta ) - y_{t_{i}} \Bigr )_{+} \nabla _{l}H_{(1,\ldots ,m,1,\ldots , m)} \Bigl ( Y_{t_{i}}(\theta ) \Bigr ) \biggr ], \end{aligned}$$
(26)

where (i) \(H_{(j_1,\ldots ,j_n)}( Y_{t_{i}}(\theta ))\) is given recursively by (22) and computed at Proposition 10 (ii) \(\nabla _{l}Y_{t_{i}}(\theta )\) is given by Proposition 5 (iii) \(\nabla _{l}H_{(1,\ldots ,m,1,\ldots , m)}\) is obtained by applying Lemma 4.

Proof

Recall that under Hypothesis 4 and 6, \(Y_t(\theta )\) admits a \({\fancyscript{C}}^\infty \) density \(f(t,\cdot ;\, \theta )\) for any \(t>0\) and \(\theta \in \varTheta \). Moreover, we have defined \(\ell _n(\theta )\) as \(\ell _n(\theta )=\sum _{i=1}^{n} \ln (f(t_i,y_{t_i};\, \theta ))\). Thus

$$\begin{aligned} \nabla _{l}\ell _n(\theta )=\sum _{i=1}^{n}\frac{\nabla _l f(t_i,y_{t_i};\, \theta )}{f(t_i,y_{t_i};\, \theta )} :=\sum _{i=1}^{n}\frac{V_i(\theta )}{W_i(\theta )}. \end{aligned}$$

Now \(W_i(\theta )\) can be expressed like (25) by a direct application of (20), first relation. As far as \(V_i(\theta )\) is concerned, write

$$\begin{aligned} f(t_i,y_{t_i};\, \theta )= \mathbf E \left[ \left( Y_{t_i}(\theta )-y_{t_i}\right) _+ H_{(1,\ldots , m,1,\ldots , m)}(Y_{t_i}(\theta ))\right] , \end{aligned}$$

according to the second relation in (20). By using standard arguments, one is allowed to differentiate this expression within the expectation, which directly yields (26). \(\square \)

4 Discretization of the log-likelihood

The expression of the log-likelihood that we derived in Proposition 9 is a fraction of two expectations that do not have explicit formulas even in the one-dimensional case. In addition, our goal is to find the root of this non-explicit expression, the ML estimator, which is an even harder task. To solve this problem in practice we first use a stochastic approximation algorithm in order to find the root of \(\nabla _{l} \ell _{n}(\theta )\). In each iteration of the algorithm we compute the value of the expression using Monte-Carlo (MC) simulations. For each Monte-Carlo simulation, since we do not have available an exact way of simulating the kernels of the expectation, we use an Euler approximation scheme. More specifically, we simulate using Euler approximation terms such as \(Y_{t},\,DY_{t}\), which are solutions to fractional stochastic differential equations.

Therefore, in our approach we have three types of error in the computation of the MLE: the error of the stochastic approximation algorithm, the Monte-Carlo error and the discretization bias introduced by the Euler approximation for the stochastic differential equations. Our aim here is to combine the Monte Carlo and Euler approximations in an optimal way in order to get a global error bound for the computation of \(\nabla _{l} \ell _{n}(\theta )\).

4.1 Pathwise convergence of the Euler scheme

The Euler scheme is the main source of error in our computations. There is always a trade-off between the number of Euler steps and the number of simulations, but what is usually computationally costly is the number of Euler steps. This is even worse when we deal with fractional SDEs, since the rate of convergence depends on \(H\) and the closer the value of \(H\) to \(1/2\), the more steps are required for the simulation.

In this section, we compute the magnitude of the discretization error we introduce. We measure the bias of the Euler scheme via the root mean square error. That is, we want to estimate the quantity \(\sup _{\tau \in [0,T]}( \mathbf E |Y_{\tau }(\theta ) - \bar{Y}_{\tau }^{M}(\theta ) |^{2})^{1/2}\), where \(Y_t(\theta )\) is the solution to the SDE (1) and \(\bar{Y}_{\tau }^{M}(\theta )\) is the Euler approximation of \(Y_{\tau }(\theta )\) given on the grid \(\{\tau _k;\, k\le M\}\) by

$$\begin{aligned} \bar{Y}_{\tau _{k+1}}^{M} (\theta )= \bar{Y}_{\tau _{k}}^{M}(\theta ) + \mu (\bar{Y}_{\tau _{k}}^{M}(\theta );\theta ) (\tau _{k+1}-\tau _{k}) + \sum _{j=1}^{d} \sigma ^{j}(\bar{Y}_{\tau _{k}}^{M}(\theta );\theta ) \delta B^{M,j}_{\tau _{k} \tau _{k+1}}, \end{aligned}$$
(27)

in which we denote \(\delta B^{M,j}_{\tau _{k} \tau _{k+1}} = B_{\tau _{k+1}}^{M,j} - B_{\tau _{k}}^{M,j}\) and \(\tau _{k} = \frac{kT}{M}\) for \(k = 0,\ldots , M-1\). Notice that those estimates can be found in Deya et al. (2012); Friz and Victoir (2010) and Mishura and Shevchenko (2008). We include their proof here because it is simple enough, and also because they can be easily generalized to the case of a linear equation. This latter case is of special interest for us, since it corresponds to Malliavin derivatives, and is not included in the aforementioned references.

Notation 10

For simplicity, in this section we write \(Y:=Y(\theta )\).

Proposition 11

Let \(T>0\) and recall that \(\bar{Y^{M}}\) is defined by Eq. (27). Then, there exists a random variable \(C\) with finite \(L^{p}\) moments such that for all \(\gamma < H\) and \(H>1/2\) we have

$$\begin{aligned} \Vert Y_t - \bar{Y} \Vert _{\gamma ,T} \le C_T\; M^{1-2\gamma } \end{aligned}$$
(28)

Consequently, we obtain that the MSE is of order \({\fancyscript{O}}(M^{1-2\gamma })\).

Proof

In order to prove (28) we apply techniques of the classical numerical analysis for the flow of an ordinary differential equation driven by a smooth path. Namely, the exact flow of (1) is given by \(\varPhi (y; s, t) := Y_{t}\), where \(Y_{t}\) is the unique solution of (1) when \(t\in [s,T]\) and the initial condition is \(Y_{s} = y\). Introduce also the numerical flow

$$\begin{aligned} \varPsi (y; \tau _{k}, \tau _{k+1}) := y + \mu (y) (\tau _{k+1}-\tau _{k}) + \sum _{j=1}^{d} \sigma ^{j}(y) \delta B^{M,j}_{\tau _{k} \tau _{k+1}}, \end{aligned}$$
(29)

where \(\tau _{k} = \frac{kT}{M},\,k=0,\ldots ,M-1\). Thus, we can write that

$$\begin{aligned} \bar{Y}_{\tau _{k+1}}^{M}&= \varPsi \Bigl (\bar{Y}_{\tau _{k}}^{M};\; \tau _{k}, \tau _{k+1}\Bigr ),\; k=0, \ldots , M-1\\ Y_{0}^{M}&= \alpha . \end{aligned}$$

For \(q>k\) we also have that

$$\begin{aligned} \varPsi (y; \tau _{k}, \tau _{q}) := \varPsi (\cdot ; \tau _{q-1}, \tau _{q})\circ \varPsi (\cdot ; \tau _{q-2}, \tau _{q-1})\circ \ldots \circ \varPsi (y; \tau _{k}, \tau _{k+1}). \end{aligned}$$

The one-step error computes as

$$\begin{aligned} r_{k}&= \varPhi (y; \tau _{k}, \tau _{k+1}) - \varPsi (y; \tau _{k}, \tau _{k+1}) \nonumber \\&= \int _{\tau _{k}}^{\tau _{k+1}} \Bigl [ \mu (Y_{s}) - \mu (y) \Bigr ] ds + \int _{\tau _{k}}^{\tau _{k+1}} \Bigl [ \sigma (Y_{s}) - \sigma (y) \Bigr ] dB_{s} \end{aligned}$$
(30)

Furthermore, since \(Y\in {\fancyscript{C}}^{\gamma }\) and \(B \in {\fancyscript{C}}^{\gamma }\) for \(\gamma >1/2\), using (8) we have

$$\begin{aligned} \Bigl |\int _{\tau _{k}}^{\tau _{k+1}} \Bigl [ \sigma (Y_{s}) - \sigma (y) \Bigr ] dB_{s} \Bigr |&\le c_{ \gamma }\;\Vert \partial \sigma \Vert _{\infty } \Vert Y\Vert _{\gamma }\;\Vert B\Vert _{\gamma } \; \left| \frac{T}{M}\right| ^{2\gamma } \\&\le c_{\gamma ,\sigma }\;\Vert \partial \sigma \Vert _{\infty }\; \Vert B\Vert _{\gamma }^{1/\gamma }\;\Vert B\Vert _{\gamma } \; \left| \frac{T}{M}\right| ^{2 \gamma }, \end{aligned}$$

where we used the fact that \(\Vert Y\Vert _{\gamma } \le c_{\sigma }\Vert B\Vert _{\gamma }^{1/\gamma }\) (see Proposition 2). Similarly, for the drift part we have

$$\begin{aligned} \Bigl |\int _{\tau _{k}}^{\tau _{k+1}} \Bigl [ \mu (Y_{s}) - \mu (y) \Bigr ] ds \Bigr |&\le c_{\gamma }\;\Vert \partial \mu \Vert _{\infty }\;\Vert Y\Vert _{\gamma } \; \left| \frac{T}{M}\right| ^{\gamma +1}\\&\le c_{\gamma ,\mu }\;\Vert \partial \mu \Vert _{\infty }\;\Vert B\Vert _{\gamma }^{1/\gamma } \; \left| \frac{T}{M}\right| ^{\gamma +1}. \end{aligned}$$

Therefore, the one-step error (30) satisfies

$$\begin{aligned} |r_{k}| \le c_{\mu ,\sigma }\; \Vert B\Vert _{\gamma }^{1+1/\gamma }\; \left| \frac{T}{M}\right| ^{2\gamma }. \end{aligned}$$
(31)

Now, we can write the classical decomposition of the error in terms of the exact and numerical flow. Since \(\bar{Y}_{\tau _{k}}^{M} = \varPhi (\bar{Y}_{\tau _k}^{M}; \tau _{k}, \tau _{k})\) and \(Y_{\tau _{k}} = \varPhi (\bar{Y}_{\tau _0}^{M}; \tau _{0}, \tau _{k})\) we have

$$\begin{aligned} Y_{\tau _{q}} \!-\! \bar{Y}_{\tau _{q}}^{M} \!=\! \varPhi (\bar{Y}_{\tau _{0}}; \tau _{0}, \tau _{k}) \!-\! \varPhi (\bar{Y}_{\tau _{q}}^{M}; \tau _{q}, \tau _{q}) \!=\! \sum _{k=0}^{q-1} \Bigl ( \varPhi (\bar{Y}_{\tau _{k}}^{M}; \tau _{k}, \tau _{q}) - \varPhi (\bar{Y}_{\tau _{k+1}}^{M}; \tau _{k+1}, \tau _{q}) \Bigr ).\nonumber \\ \end{aligned}$$
(32)

Since \( \varPhi \Bigl (\bar{Y}_{\tau _{k}}^{M}; \tau _{k}, \tau _{q}\Bigr ) = \varPhi \Bigl ( \varPhi (\bar{Y}_{\tau _{k}}^{M}; \tau _{k}, \tau _{k+1}); \tau _{k+1}, \tau _{q}\Bigr )\) we obtain

$$\begin{aligned} \Bigl | \varPhi (\bar{Y}_{\tau _{k}}^{M}; \tau _{k}, \tau _{q}) \!-\! \varPhi (\bar{Y}_{\tau _{k+1}}^{M}; \tau _{k+1}, \tau _{q})\Bigr | \!&= \! \Bigl | \varPhi \Bigl ( \varPhi (\bar{Y}_{\tau _{k}}^{M}; \tau _{k}, \tau _{k+1}); \tau _{k+1}, \tau _{q}\Bigr ) \!-\! \varPhi (\bar{Y}_{\tau _{k+1}}^{M}; \tau _{k+1}, \tau _{q})\Bigr | \\&\le C_{T}(\Vert B\Vert _{\gamma })\; | \varPhi (\bar{Y}_{\tau _{k}}^{M}; \tau _{k}, \tau _{k+1}) - \bar{Y}_{\tau _{k+1}}^{M}|, \end{aligned}$$

where we have used the fact that

$$\begin{aligned} | \varPhi (\alpha ; t, s) - \varPhi (\beta ; t, s)\le C_{T}(\Vert B\Vert _{\gamma }) |\alpha -\beta |, \end{aligned}$$

where \(C_{T}\) is a subexponential function (see Proposition 2 again). Moreover, owing to relation (31),

$$\begin{aligned} | \varPhi (\bar{Y}_{\tau _{k}}^{M}; \tau _{k}, \tau _{k+1}) - \bar{Y}_{\tau _{k+1}}^{M}| = | r_k | \le c_{\mu ,\sigma }\; \Vert B\Vert _{\gamma }^{1+1/\gamma }\; \left| \frac{T}{M}\right| ^{2 \gamma }. \end{aligned}$$
(33)

Therefore, replacing (33) in (32) for any \(q \le n\) we obtain

$$\begin{aligned} | \bar{Y}_{\tau _{q}}^{M} - Y_{\tau _{q}} |&\le c_{\mu ,\sigma }\; C_{T}\; \Vert B\Vert _{\gamma }^{1+1/\gamma }\; \sum _{k=0}^{q-1} \left| \frac{T}{M}\right| ^{2 \gamma } \end{aligned}$$

Let us push forward this analysis to Hölder type norms on the grid \(0 \le \tau _{1}<\ldots < \tau _{n}=T\). We have for \(q\ge p\)

$$\begin{aligned}&\delta \Bigl (Y-\bar{Y}^{M}\Bigr )_{\tau _{p} \tau _{q}} \\&= \Bigl ( \varPhi (Y_{\tau _{p}}; \tau _{p}, \tau _{q}) \!-\! Y_{\tau _{p}}\Bigr ) \!-\! \Bigl ( \varPsi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q}) \!-\! \bar{Y}_{\tau _{p}}^{M}\Bigr )\\&= \Bigl ( \varPhi (Y_{\tau _{p}}; \tau _{p}, \tau _{q}) \!-\!Y_{\tau _{p}}\Bigr ) \!-\! \Bigl ( \varPhi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q}) \!-\! \bar{Y}_{\tau _{p}}^{M}\Bigr ) \!-\! \Bigl ( \varPsi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q}) \!-\! \varPhi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q})\Bigr )\\&= \biggl ( \Bigl ( \varPhi (Y_{\tau _{p}}; \tau _{p}, \tau _{q}) \!-\! \varPhi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q}) \Bigr ) \!-\! \Bigl ( Y_{\tau _{p}} \!-\! \bar{Y}_{\tau _{p}}^{M} \Bigr ) \biggr ) \!-\! \Bigl ( \varPsi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q}) \!-\! \varPhi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q})\Bigr ). \end{aligned}$$

Similar to the calculations leading to (33) we obtain

$$\begin{aligned} \Bigl | \varPsi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q}) - \varPhi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q})\Bigr | \le c_{\mu ,\sigma }\; \Vert B\Vert _{\gamma }^{1+1/\gamma }\;\sum _{k=p}^{q-1} \left| \frac{T}{M}\right| ^{2\gamma }. \end{aligned}$$

Moreover, owing to Proposition 2 part (3), observe that

$$\begin{aligned} \frac{\Bigl | \Bigl ( \varPhi (Y_{\tau _{p}}; \tau _{p}, \tau _{q}) - \varPhi (\bar{Y}_{\tau _{p}}^{M}; \tau _{p}, \tau _{q}) \Bigr ) - \Bigl ( Y_{\tau _{p}} - \bar{Y}_{\tau _p}^{M} \Bigr ) \bigr |}{|\tau _{q}-\tau _{p}|^{\gamma }} \le c(\Vert B\Vert _{\gamma })\;|Y_{\tau _{p}}-\bar{Y}_{\tau _{p}}^{M}|. \end{aligned}$$

Consequently, we have that for \(0 \le p < q \le M\)

$$\begin{aligned} \Bigl |\delta \Bigl (Y-\bar{Y}^{M}\Bigr )_{\tau _{p} \tau _{q}}\Bigr | \le c^{\prime }(\Vert B\Vert _{\gamma }^{1+1/\gamma }) \Bigl \{ \sum _{k=p}^{q-1} \left| \frac{T}{M}\right| ^{2 \gamma } +\;|\tau _{q} - \tau _{p}|^{\gamma } \sum _{k=0}^{q} \left| \frac{T}{M}\right| ^{2 \gamma } \Bigr \} \end{aligned}$$

which easily yields that

$$\begin{aligned} \sup _{p,q=0,1,\ldots ,M-1, p\ne q} \frac{\Bigl |\delta \Bigl (Y-\bar{Y}^{M}\Bigr )_{\tau _{p} \tau _{q} } \Bigr |}{|\tau _{p}-\tau _{q}|^{\gamma }} \le c(\Vert B\Vert _{\gamma })\; M^{1-2\;\gamma }. \end{aligned}$$

By “lifting” this error estimate to \([0,T]\) and since \(|t-s| \le T/M\),

$$\begin{aligned} \Vert Y_t - \bar{Y} \Vert _{\gamma ,\infty ,T} \le C\;M^{1-2\;\gamma }, \end{aligned}$$
(34)

which concludes the first part of the proof. Regarding the order of the Mean Square Error, it suffices to note that the constant \(C\) has finite \(L^{p}\) moments. \(\square \)

As mentioned before, an elaboration of Proposition 11 is needed in the sequel. Indeed, in the expression of the log-likelihood in Proposition 9 we need to discretize more complicated quantities of the underlying process, such as (14) or (16). To this aim, let us notice first that all those equations can be written under the following generic form:

$$\begin{aligned} Z_{t} = \alpha + \int _{0}^{t} \xi _{u}^{2} Z_{u} du + \int _{0}^{t} \xi _{u}^{1,j} Z_{t} dB_{u}^{j}, \end{aligned}$$
(35)

where \(\xi ^1,\,\xi ^{2}\) are stochastic processes with bounded moments of any order. The corresponding Euler discretization is

$$\begin{aligned} \bar{Z}_{\tau _{k}}^{M} = \bar{Z}_{\tau _{k}}^{M} + \xi _{\tau _{k}}^{2} \bar{Z}_{\tau _{k}}^{M} (\tau _{k+1}-\tau _{k}) + \sum _{j=1}^{d} \xi _{\tau _{k}}^{1,j} \bar{Z}_{\tau _{k}} \; \delta B_{\tau _{k}\tau _{k+1}}^{j,M}, \end{aligned}$$
(36)

and we give first an approximation result in this general context:

Proposition 12

Let \(T>0\), and consider the \(\mathbb{R }^q\)-valued solution \(Z\) to Eq. (35), where \(\alpha \in \mathbb{R }^q,\,\xi ^{2}, \xi ^{1,j}\in \mathbb{R }^{q,q}\) and we suppose that \(\Vert \xi ^{2}\Vert _{\gamma }\) and \(\Vert \xi ^{1,j}\Vert _{\gamma }\) belong to \(L^p(\Omega )\) for any value of \(p\ge 1\). Let \(\bar{Z}^{M}\) be defined by Eq. (36). Then, there exists a random variable \(C^{^{\prime }}\) with \(L^p\) finite moments, such that for all \(\gamma < H\) and \(H>1/2\) we have

$$\begin{aligned} \Vert Z - \bar{Z} \Vert _{\gamma ,T} \le C^{^{\prime }}_T\; M^{1-2\gamma } \end{aligned}$$
(37)

Consequently, we obtain that the Mean Square Error is of order \(\mathcal{O }(M^{1-2\gamma })\)

Proof

We follow a similar approach as in the previous proposition. Thus, the exact flow is equal to \(\varPhi (\zeta ; s, t):= Z_{t}\), where \(Z_{t}\) is the unique solution of Eq. (35) when \(t\in [s,T]\) and the initial condition is \(Z_{s} = \zeta \). Consider also the numerical flow

$$\begin{aligned} \varPsi (\zeta ; \tau _{k}, \tau _{k+1}) := \zeta + \xi _{u}^{2} \zeta (\tau _{k+1}-\tau _{k}) + \sum _{j=1}^{d} \xi _{u}^{1,j} \zeta \delta B_{\tau _{k}\tau _{k+1}}^{j,M}, \end{aligned}$$

where \(\tau _{k} = kT/M,\,n=0,\ldots ,M-1\). Thus, we have

$$\begin{aligned} \bar{Z}_{\tau _{k+1}}^{M}&= \varPsi (\bar{Z}_{\tau _{k}}^{M}; \tau _{k+1}, \tau _{k}), \;k=0,\ldots ,M-1\\ \bar{Z}_{0}^{M}&= \alpha . \end{aligned}$$

In this case, the one-step error can be written as

$$\begin{aligned} r_{k}&= \varPhi (\zeta ; \tau _{k}, \tau _{k+1}) - \varPsi (\zeta ; \tau _{k}, \tau _{k+1})\\&= \int _{\tau _{k}}^{\tau _{k+1}} \xi _{u}^{2} (Z_{s} - \zeta ) du + \int _{\tau _{k}}^{\tau _{k+1}} \xi _{u}^{1} (Z_{s} - \zeta ) dB_{u} \end{aligned}$$

We now treat each term separately. Therefore, using the fact that \(\Vert Z\Vert _{\gamma } \le \exp (c\Vert B\Vert _{\gamma }^{1/\gamma }) \), which is recalled at Proposition 2 point (4) in a slightly different context, we have that

$$\begin{aligned} \Bigl |\int _{\tau _{k}}^{\tau _{k+1}} \xi _{s}^{1}(Z_{s} - \zeta ) dB_{s} \Bigr |&\le c_{ \gamma }\; \Vert Z \xi ^{1}\Vert _{\gamma }\;\Vert B\Vert _{\gamma } \; \left| \frac{T}{M}\right| ^{2\gamma } \\&\le c_{\gamma }\; \Vert \exp (\Vert B\Vert _{\gamma }^{1/\gamma })\;\Vert B\Vert _{\gamma } \; \left| \frac{T}{M}\right| ^{2 \gamma }. \end{aligned}$$

Similarly, we also have

$$\begin{aligned} \Bigl |\int _{\tau _{k}}^{\tau _{k+1}} \xi _{s}^{2}(Z_{s} - \zeta ) d{s} \Bigr |&\le c_{ \gamma }\; \Vert Z \xi ^{2}\Vert _{\gamma }\;\Vert B\Vert _{\gamma } \; \left| \frac{T}{M}\right| ^{2\gamma } \\&\le c_{\gamma }\; \Vert \exp (\Vert B\Vert _{\gamma }^{1/\gamma })\;\Vert B\Vert _{\gamma } \; \left| \frac{T}{M}\right| ^{2 \gamma }. \end{aligned}$$

Therefore, the one-step error satisfies the following inequality

$$\begin{aligned} |r_{k}| \le c_{\gamma } \exp (\Vert B\Vert _{\gamma }^{1/\gamma })\;\Vert B\Vert _{\gamma } \; \left| \frac{T}{M}\right| ^{2 \gamma }. \end{aligned}$$

Along the same lines as for Proposition 11, the decomposition of the error in terms of the exact and numerical flow becomes

$$\begin{aligned} \bar{Z}^{M}_{\tau _{q}} \!-\! Z_{\tau _{q}} \!=\! \varPhi (\bar{Z}^{M}_{\tau _{q}}; \tau _{q}, \tau _{q})-\varPhi (\bar{Z}^{M}_{\tau _{0}}; \tau _{0}, \tau _{k}) \!=\! \sum _{k=0}^{q-1} \Bigl ( \varPhi (\bar{Z}^{M}_{\tau _{k+1}}; \tau _{k+1}, \tau _{q})\!-\!\varPhi (\bar{Z}_{\tau _{k}}^{M}; \tau _{k}, \tau _{q}) \Bigr ), \end{aligned}$$

and the same inequalities allowing to go from (32) to (33) yield

$$\begin{aligned} | \bar{Z}_{\tau _{q}} - Z_{\tau _{q}} |&\le c_{\gamma }\; \left| \frac{T}{M}\right| ^{2 \gamma }. \end{aligned}$$

The claim of the proposition follows now as in Proposition 11. \(\square \)

We now use the previous proposition in order to approximate the kernels of the expectations in \(\nabla _{l}\ell _{n}(\theta )\). Let us first introduce the following notation:

Notation 11

Let \(W_{i}(\theta ),\,V_{i}(\theta )\) as in (25) and (26) respectively and define \(w_{i}(\theta )\) and \(v_{i}(\theta )\) as

$$\begin{aligned} w_i(\theta )&= \mathbf 1 _{(Y_{t_{i}}(\theta )>y_{t_{i}})}\; H_{(1,\ldots ,m)}\Bigl ( Y_{t_{i}}(\theta ) \Bigr )\end{aligned}$$
(38)
$$\begin{aligned} v_i(\theta )&= \nabla _{l}Y_{t_{i}}(\theta ) \; \mathbf 1 _{(Y_{t_{i}}(\theta )>y_{t_{i}})} \;H_{(1,.,m,1,., m)} + \Bigl ( Y_{t_{i}}(\theta ) - y_{t_{i}} \Bigr )_{+} \nabla _{l}H_{(1,.,m,1,., m)}. \end{aligned}$$
(39)

Let also \(\bar{w}_{i}^{M}\) and \(\bar{v}_{i}^{M}\) to be the Euler discretized versions of (39) and (39) using Proposition 12, and set \(\bar{W}_{i}^{M}(\theta )=\mathbf E [\bar{w}_{i}^{M}]\) and \(\bar{V}_{i}^{M}(\theta )=\mathbf E [\bar{v}_{i}^{M}]\).

Our convergence result for \(\nabla _{l}\ell _{n}(\theta )\) can be read as follows:

Theorem 12

Recall from Theorem 9 that \(\nabla _{l}\ell _n(\theta )\) can be decomposed as \(\nabla _{l}\ell _n(\theta ) =\sum _{i=1}^{n}\frac{V_i(\theta )}{W_i(\theta )}\). Then the following approximation result holds true:

$$\begin{aligned} \left| V_i(\theta )- \bar{V}_{i}^{M}(\theta )\right| + \left| W_i(\theta )- \bar{W}_{i}^{M}(\theta )\right| \le \frac{c}{M^{2\gamma -1}}, \end{aligned}$$

for a strictly positive constant \(c\).

Proof

We focus on the bound for \(| V_i(\theta )- \bar{V}_{i}^{M}(\theta )|\), the other one being very similar. Now, applying Proposition 12 to the particular case of the equations governing Malliavin derivatives, we easily get

$$\begin{aligned} \Vert v_{t} - \bar{v}\Vert _{\gamma , T} \le C_{2} M^{1-2\gamma }, \end{aligned}$$

for an integrable random variable \(C_2\). The proof is now easily finished by invoking the inequality

$$\begin{aligned} \left| V_i(\theta )- \bar{V}_{i}^{M}(\theta )\right| \le \mathbf E \left[ \Vert v_{t} - \bar{v}\Vert _{\gamma , T} \right] \!. \end{aligned}$$

\(\square \)

Remark 3

We have given two separate approximations for \(V_i(\theta )\) and \(W_i(\theta )\). In order to fully estimate \((V_i(\theta )/W_i(\theta ))-(\bar{V}_{i}^{M}(\theta )/\bar{W}_{i}^{M}(\theta ))\), one should also prove that \(W_i(\theta )\) is bounded away from 0. This requires a lower bound for densities of differential equations driven by fractional Brownian motion, which are out of the scope of the current article.

4.2 Efficiency of the Monte Carlo simulation

In this section we aim to study the computational tradeoff between the length of a time period in the Euler discretization (i.e. \(1/M\)) and the number of Monte Carlo simulations of the sample path (i.e. \(N\)). In order to do so we consider \(\bar{w}_{i}^{M}\) and \(\bar{v}_{i}^{M}\) as above.

Recall that, given \(t\) units of computer time, the Monte-Carlo estimators for \(W_{i}(\theta )\) and \(V_{i}(\theta )\) can be written as

$$\begin{aligned} \frac{1}{c_{1}(t,\frac{1}{M})} \sum _{k=1}^{c_{1}(t,\frac{1}{M})} {w}_{i,k}^{M},\;\;\frac{1}{c_{2}(t,\frac{1}{M})} \sum _{k=1}^{c_{2}(t,\frac{1}{M})} {v}_{i,k}^{M} \end{aligned}$$

where \(\{{w}_{i, \ell }^{M}; \, \ell \ge 1\}\) (resp. \(\{{v}_{i, \ell }^{M}; \, \ell \ge 1\}\)) is a sequence of i.i.d. copies of \({w}_{i}^{M}\) (resp. of \({v}_{i}^{M}\)), and \(c_{1}(t,\frac{1}{M}),c_{2}(t,\frac{1}{M})\) are the maximal number of runs one is allowed to consider with \(t\) units of computer time. Using the result by Durham and Gallant (2002) we can state the following proposition:

Proposition 13

Let \(N\) be the number of Monte Carlo simulations and \(M\) the number of steps of the Euler scheme, then the tradeoff between \(N\) and \(M\) for computing \(W_{i}(\theta )\) (and similarly \(V_{i}(\theta )\)) is

$$\begin{aligned} N \asymp M^{\frac{\tilde{\gamma }}{2\gamma -1}-3}, \end{aligned}$$

for all \(1/2<\gamma <H\) and \(\tilde{\gamma } = Tm(d+1)\), where \(T\) is the time horizon, \(m\) the dimension of the observed process and \(d\) the dimension of the noise process.

Proof

We discuss the proof only for \(W_{i}\), by following exactly the same steps we can obtain the same result for \(V_{i}\).

We only need to check that our process \(w\) satisfies the conditions of 1 Duffie and Glynn (1995).

  1. (i)

    We can easily see that the discretized \(\bar{w}^{M}_{t_{i}}\) converges uniformly to \(w_{t_{i}}\).

  2. (ii)

    In addition, we have bounded moments of \(w_{t_{i}}\), thus \(\mathbf E [\bar{W}_{t_{i}}^{2}] \rightarrow \mathbf E [w_{t_{i}}^{2}]\).

  3. (iii)

    From Theorem 12 we have that the rate of convergence of the Euler scheme of \(\bar{w}_{t_{i}}^{M}\) is \(M^{1-2\gamma }\), for \(1/2<\gamma <H\).

  4. (iv)

    The computer time required to generate \(\bar{w}_{t_{i}}^{M}\) is given by \(\tau (1/M)\), which satisfies:

    $$\begin{aligned} \tau (1/M) = T m (d+1)M = \tilde{\gamma }M \end{aligned}$$

where \(T\) is the length of the time period, \(m\) is the dimension of the SDE, \(d\) is the dimension of the fBm and \(M\) is the number of Euler steps. By applying Theorem 1 (by Duffie and Glynn (1995)) the optimal rule for choosing the number of Monte-Carlo simulations and the number of Euler steps is chosen such that the asymptotic error is minimized. Therefore, for \(t\) the total budget of computer time, as \(t\) increases, then the Euler step should converge to zero with order \(\frac{1-2\gamma }{\tilde{\gamma }+2-4\gamma }\) or equivalently:

$$\begin{aligned} \frac{1}{M} \asymp t^{\frac{1-2\gamma }{\tilde{\gamma }+2-4\gamma }} \text{ thus} t \asymp M^{-\frac{\tilde{\gamma }+2-4\gamma }{1-2\gamma }}. \end{aligned}$$

But the number of operations needed for an arbitrary Monte Carlo simulation \(t_0\) is equal to \(\tilde{\gamma } M N\). Thus, we finally obtain that \(N \asymp M^{-\frac{\tilde{\gamma }+2-4\gamma }{1-2\gamma }-1}\). \(\square \)

4.3 Discretization of the score function

Consider the following discretized version of the score function, i.e. \(\nabla _{l}\ell _{n}(\theta )\):

$$\begin{aligned} {\hat{\nabla }_{l}\ell _{n}(\theta )} = \frac{\hat{V_{i}}}{\hat{W_{i}}} := \frac{\frac{1}{N} \sum _{k=1}^{N}{\bar{v}_{i,k}^{M}}}{\frac{1}{N} \sum _{k=1}^{N}{\bar{w}_{i,k}^{M}}}, \end{aligned}$$
(40)

where \({\bar{w}_{1,k}^{M}},{\bar{w}_{2,k}^{M}}, \ldots \) and \({\bar{v}_{1,k}^{M}},{\bar{v}_{2,k}^{M}}, \ldots \) are iid copies of \({\bar{w}_{i}^{M}}\) and \({\bar{v}_{i}^{M}}\) respectively. Our aim in this section is to give a global bound for the mean square error obtained by approximating \(\nabla _{l}\ell _{n}(\theta )\) by \({\hat{\nabla }_{l}\ell _{n}(\theta )}\). As mentioned in Remark 3, this convergence result will be expressed in terms of the convergence of \(\hat{V_{i}}\) and \(\hat{W_{i}}\) since lower bounds for quantities like \(W_{i}\) are currently unavailable.

Proposition 14

Consider the decomposition (40) for the discretized score function \({\hat{\nabla }_{l}\ell _{n}(\theta )}\). Then \(\hat{V_{i}}\) and \(\hat{W_{i}}\) converge to their continuous counterparts \(V_{i}\) and \(W_{i}\) with rate of convergence of order \(M^{-(2\gamma -1)}\), where \(1/2<\gamma <H\) and \(M\) is the number of Euler steps used in the discretization.

Proof

We discuss the idea of the proof for the \(W_{i}\) term first:

$$\begin{aligned} \mathbf E \left( \hat{W_{i}} - W_{i}\right) ^{2}&= \mathbf E \left( \frac{1}{N} \sum _{k=1}^{N} \bar{w}^{M}_{i,k} - \mathbf E [w_{i}(\theta )]\right) ^{2}\\&= \mathbf E \biggl ( \frac{1}{N} \sum _{k=1}^{N} \bar{w}^{M}_{i,k} - \frac{1}{N} \sum _{k=1}^{N} w_{i,k} + \frac{1}{N} \sum _{k=1}^{N} w_{i,k} - \mathbf E [w_{i}(\theta )]\biggr )^{2}. \end{aligned}$$

Thanks now to the independence property between Monte Carlo runs, we get

$$\begin{aligned}&\mathbf E \left( \hat{W_{i}} - W_{i}\right) ^{2} \le \frac{2}{N} \sum _{k=1}^{N} \mathbf E (\bar{w}^{M}_{i,k} - w_{i,k})^{2} \;+\; 2\, \mathbf E \biggl ( \frac{1}{N} \sum _{k=1}^{N} w_{i,k} - \mathbf E [w_{i}(\theta )]\biggr )^{2}\\&=\; \frac{1}{N} \sum _{k=1}^{N} \text{(Euler} \text{ MSE)}^{2} \;+\; \text{(Monte} \text{ Carlo} \text{ MSE)}^{2} \asymp (M^{1-2\gamma })^{2} + \frac{1}{N}, \end{aligned}$$

and thus

$$\begin{aligned} \mathrm{MSE} \left( \hat{W_{i}} - W_{i}\right) \asymp \sqrt{(M^{1-2\gamma })^{2} + \frac{1}{N}}. \end{aligned}$$

Now, if we use Proposition 13, i.e. \(N \asymp M^{-\frac{\tilde{\gamma } +2-4\gamma }{1-2\gamma } -1}\), for all \(1/2<\gamma <H\), and \(\tilde{\gamma } = Tm(d+1)\), where \(T\) is the time horizon, \(m\) the dimension of the observed process and \(d\) the dimension of the noise process, we have

$$\begin{aligned} MSE \left( \hat{W_{i}} - W_{i}\right) \asymp \sqrt{M^{2-4\gamma } + M^{\frac{\tilde{\gamma }}{1-2\gamma }+3}}\asymp M^{1-2\gamma }, \end{aligned}$$

since the first is the dominant term above.

Following the same procedure, we can show that \(\mathrm{MSE}( \hat{V_{i}} - V_{i}) \asymp M^{1-2\gamma }\) and thus the claim of the proposition follows easily. \(\square \)

Remark 4

In Proposition 14 the rate of convergence is independent of the dimension of the problem, i.e. it is independent of the parameter \(\tilde{\gamma } = T m (d+1)\).

5 Numerical examples

In this section our aim is to investigate the performance of the suggested maximum likelihood method in practice. We study the one-dimensional fractional Ornstein-Uhlenbeck process, a linear two-dimensional system of fractional SDEs and then some real data given by a financial time series. Before presenting our results, we first discuss some technical issues raised by the algorithmic implementation of our method.

The goal is to find the root of the quantity \(\nabla _{l}\ell _{n}(\theta )\) with respect to \(\theta \). We can divide this procedure in two parts. The first part consists in computing the root of the log-likelihood using a stochastic approximation algorithm. This is a stochastic optimization technique firstly introduced by Robbins and Monro (1951) that is used when only noisy observations of the function are available. In our case it is appropriate, since we want to solve

$$\begin{aligned} \nabla _{l}\ell _{n}(\theta ) = 0, \end{aligned}$$

where \(\nabla _{l}\ell _{n}(\theta )\) is given by Theorem 9 and has to be approximated by \({\hat{\nabla }_{l}\ell _{n}}(\theta )\). Thus, the recursive procedure is of the following form

$$\begin{aligned} \hat{\theta }_{k+1} = \hat{\theta }_{k} - a_{k} {\hat{\nabla }_{l}\ell _{n}} (\hat{\theta }_{k}). \end{aligned}$$
(41)

where \({\hat{\nabla }_{l}\ell _{n}}\) is the estimate of \(\nabla _{l}\ell _{n}\) at the k-th iteration based on the observations and \(a_{k}\) is a sequence of real numbers such that \(\sum _{k=1}^{\infty }a_k=\infty \) and \(\sum _{k=1}^{\infty }a_k^2<\infty \). Under appropriate conditions (see for example Blum 1954), the iteration in (41) converges to \( \theta \) almost surely. The step sizes satisfy \( a_{k} > 0\) and the way that we choose them can be found in Kushner and Yin (1997).

The second part consists of the computation of \({\hat{\nabla }_{l}\ell _{n}} (\hat{\theta }_{k})\) at each step of the stochastic approximation algorithm. Thus, for a given value of \(\theta _{k}\) (the one computed at the \(k\)-th iteration) we want to compute \({\hat{\nabla }_{l}\ell _{n}}(\theta _{k})\) when we are given \(n\) discrete observations of the process: \(y_{t_{i}},\,i=1,\ldots , n\). Here, we describe the main idea of the algorithm we use for only one step. Thus, assume that we are at \([t_{i-1}, t_{i}]\), and at time \(t_{i}\) we obtain the \(i\)-th observation. We want to compute \(W_{i}(\theta )\) and \(V_{i}(\theta )\) according to expressions (25) and (26) respectively. To compute the expectations we use simple Monte-Carlo simulations.Therefore, we discretize the time interval into \(N\) steps

$$\begin{aligned} t_{i-1} = s_{0} < s_{1} < \cdots < s_{N} = t_{i}. \end{aligned}$$

From each simulated path (apart from that of fBm) we only need to keep the terminal value which is the value of the process at time \(t_{i}\). The algorithm is the following

  1. 1.

    Simulate \(N\) values of fBm in the interval \([t_{i-1}, t_{i}]\) using for example the circulant matrix method (any exact -preferably- simulation technique can be used).

  2. 2.

    Using the simulated values from step 5 and an Euler scheme for the SDE (1), simulate the value of the process at time \(t_{i}\). For example, for \(k=0,\ldots , N\)

    $$\begin{aligned} \bar{Y}_{s_{k}}^{M} = \bar{Y}_{s_{k-1}}^{M} + \mu (\bar{Y}_{s_{k}}^{M}) (s_{k}-s_{k-1}) + \sum _{j=1}^{d} \sigma ^{(j)}(\bar{Y}_{s_{k-1}}^{M}) (B^{(j)}_{s_{k}} - B^{(j)}_{s_{k-1}}). \end{aligned}$$
  3. 3.

    Using step 5 and the observation at time \(t_{i}\), compute the indicator function \(\mathbf 1 _{(Y_{t_{i}}(\theta )> y_{t_{i}})}\).

  4. 4.

    Using step 5 and an Euler scheme simulate \(D_{t_{i}}Y_{\tau }^{i}\), as given in Lemma 1 for \(n=1\) -first Malliavin derivative.

  5. 5.

    Using step 5 and an Euler scheme simulate \(\eta _{t_{i}}^{kj},\,k, j=1,\ldots ,m\), as given in Proposition 3.8.

  6. 6.

    Steps 5 and 5 are used to compute \(Q_{st_{i}}^{pj},\,p\in \{1,\ldots ,m\},\,j\in \{1,\ldots ,d\}\) as defined in Propositions 8 and 10.

  7. 7.

    Simulate the Malliavin derivative of the product \(D_s [Y_{t} Q_{rt}^{pj}]\).

  8. 8.

    Using the previous steps, numerical integration for the double integral and numerical integration for the stochastic integral we compute \(U_{p}(Y_{t_{i}}(\theta ))\) as defined in Proposition 8.

  9. 9.

    Recursively compute \(H_{(1,\ldots ,m)}(Y_{t_{i}}(\theta ))\) as given in (8).

  10. 10.

    Combine steps 5 and 5 to obtain the kernel \(W_{i}(\theta )\).

  11. 11.

    We repeat steps 5 through 5 \(N\) times and we average these values to obtain an estimate for the expectation \(W_{i}(\theta )\).

Using a similar procedure we can obtain an estimate for the expectation \(V_{i}(\theta )\). Finally, for each \(i=1, \ldots , n\) we compute \(V_{i}(\theta )/W_{i}(\theta )\) and sum over \(i\) to obtain the desired value of the log-likelihood at \(\theta _{k}\).

We have completed the study of our numerical approximation of the log-likelihood, and are now ready for the analysis of some numerical examples.

5.1 Fractional Ornstein-Uhlenbeck process

Though our method can be applied to highly nonlinear contexts, we focus here on some linear situations, which allow easier comparisons with existing methods or exact computations. Let us first study the one-dimensional fractional Ornstein-Uhlenbeck process, i.e.

$$\begin{aligned} dY_{t} = -\lambda Y_{t} dt \; + \; dB_{t}, \end{aligned}$$
(42)

where the solution is given \(Y_{t}(\lambda ) = \int _{0}^{t} e^{-\lambda (t-s)} dB_{s}\) (notice the existence of an explicit solution here). In this case our methodology is quite simplified. The log-likelihood can be written as follows:

$$\begin{aligned} \partial _{\lambda }\ell (\lambda ;y) = \sum _{i=1}^{n} \frac{\mathbf{E} \biggl [\partial _{\lambda } Y_{t}(\lambda ) \; \mathbf{1}_{(Y_{t}(\lambda )>y)} \;H_{(1,1)}(\lambda ) + \Bigl ( Y_{t}(\lambda ) - y \Bigr )_{+} \partial _{\lambda } H_{(1,1)}(\lambda ) \biggr ]}{\mathbf{E} \biggl [ \mathbf{1}_{(Y_{t}(\lambda )>y)}\; H_{(1)}\Bigl ( Y_{t}(\lambda ), 1\Bigr ) \biggr ]}. \end{aligned}$$

The Malliavin derivative of \(Y_{t}(\lambda )\) satisfies the following ODE

$$\begin{aligned} D_{s}Y_{t}(\lambda ) = 1 - \lambda \int _{s}^{t} D_{s}Y_{u}(\lambda ) du, \end{aligned}$$

with solution \(D_{s}Y_{t}(\lambda ) = e^{-\lambda \; t}\; \mathbf 1 _{\{s\le t\}}\). The corresponding norm is

$$\begin{aligned} \Vert D_{\cdot } Y_{t} (\lambda ) \Vert ^{2} = c_H\; \int _{s}^{t} \int _{s}^{t} e^{-\lambda (u+v)} |u-v|^{2H-2} dudv. \end{aligned}$$

The higher order derivatives of \(Y_{t}(\lambda )\) are equal to zero. Therefore,

$$\begin{aligned} H_{(1)}\Bigl ( Y_{t}(\lambda ) \Bigr ) = \frac{1}{\left\| D_{\cdot } Y_{t}(\lambda ) \right\| ^{2} }\; \int _{s}^{t} e^{-\lambda u} dB_{u} \end{aligned}$$

and thus

$$\begin{aligned} H_{(1,1)}(\lambda )&= \frac{1}{\left\| D_{\cdot } Y_{t}(\lambda ) \right\| ^{4} } \int _{s}^{t} \int _{s}^{t} e^{-\lambda (u+v)} dB_{u} dB_{v} - c_{H} \left\| D_{\cdot } Y_{t}(\lambda ) \right\| ^{-2}. \end{aligned}$$

The derivative with respect to the unknown parameter \(\lambda \) satisfies

$$\begin{aligned} \partial _{\lambda } Y_{t}(\lambda ) = -\int _{0}^{t} Y_{s}(\lambda ) -\lambda \partial _{\lambda } Y_{s}(\lambda ) ds \end{aligned}$$

with solution \(\partial _{\lambda } Y_{t}(\lambda ) = \int _{0}^{t} (t-s) e^{-\lambda (t-s)} dB_s\). The last term we need to compute is:

$$\begin{aligned} \partial _{\lambda } H_{(1,1)}(\lambda )&= \frac{1}{\Vert D_{\cdot } Y_{t}(\lambda ) \Vert ^{8}} \biggl [ \Vert D_{\cdot } Y_{t}(\lambda ) \Vert ^{4} \int _{s}^{t} \int _{r}^{t} -(u+v) e^{-\lambda (u+v)} dB_{u} dB_{v} \\&- 2c_H\Vert D_{\cdot } Y_{t}(\lambda ) \Vert ^{2} \int _{s}^{t} \int _{r}^{t} -(u+v) e^{-\lambda (u+v)} |u-v|^{2H-2} du dv \biggr ]\\&- \frac{c_H^{2} \int _{s}^{t} \int _{r}^{t} -(u+v) e^{-\lambda (u+v)} |u-v|^{2H-2} du dv }{\Vert D_{\cdot } Y_{t}(\lambda ) \Vert ^{4}}. \end{aligned}$$

Now, we compute the MLE following the algorithm we described above. The results we obtained are summarized in the following table:

Remark 5

The value of \(H\) used for the simulation of the process is 0.6. The number of observations is \(n=50\), the number of Euler steps is \(M=500\), the number of stochastic approximation steps is \(K=50\) and the number of MC simulations \(N=500\) (Table 1).

Table 1 MLE for the unknown parameter (\(\lambda \)) of a fractional Ornstein-Uhlenbeck process

5.2 Two-dimensional fractional SDE

In this section we study the following system of fractional OU processes:

$$\begin{aligned} dY_{t}^{(1)} = -\alpha Y_{t}^{(2)} dt \; + \;\beta dB_{t}^{(1)}\nonumber \\ dY_{t}^{(2)} = -\beta Y_{t}^{(1)} dt \; + \; \beta dB_{t}^{(2)}. \end{aligned}$$
(43)

In this case, the computations are more involved even though the SDEs are linear functions of \(Y\). Furthermore, the parameter we want to estimate is two-dimensional as well (\(\theta = (\alpha , \beta )^T\)), which complicated the optimization procedure. Therefore, instead of computing only one derivative, we need to compute both derivatives with respect to \(\alpha \) and \(\beta \) and then compute the solution of the system of two equations

$$\begin{aligned} \nabla _{\alpha }\ell (\alpha , \beta ;y) = 0, \qquad \nabla _{\beta }\ell (\alpha , \beta ;y) = 0, \end{aligned}$$

where

$$\begin{aligned} \nabla _{l}\ell (\alpha , \beta ;y) = \sum _{i=1}^{n} [\mathbf E [ \mathbf 1 _{(Y_{t}(\alpha , \beta )>y)}\; H_{(1,2)}( Y_{t}(\alpha , \beta )) ]^{-1}\\ \times \left\{ \mathbf{E [\nabla _{l} Y_{t}(\alpha , \beta ) \; \mathbf 1 _{(Y_{t}(\alpha , \beta )>y)} \;H_{(1,2,1,2)}(\alpha , \beta ) + ( Y_{t}(\alpha , \beta ) - y )_{+} \nabla _{l} H_{(1,2,1,2)}(\alpha , \beta ) ]}\right\} \end{aligned}$$

and \(l=\alpha \text{ or} \beta \). The Malliavin derivative of \(Y_{t}\) computes as follows:

$$\begin{aligned} D_{s}Y_{t}^{(1)} = \beta - \alpha \int _{s}^{t} D_{s}Y_{u}^{(2)} du \qquad D_{s}Y_{t}^{(2)} = \beta - \beta \int _{s}^{t} D_{s}Y_{u}^{(1)} du. \end{aligned}$$

The covariance matrix \(\gamma _{t}\) is given by \((\langle D_{\cdot }Y_{t}^{i}, D_{\cdot }Y_{t}^{j} \rangle )_{1\le i,j\le 2}\). The inverse of the covariance matrix satisfies the following SDE

$$\begin{aligned} \gamma _{t}^{-1} = -\int _{0}^{t} [\gamma _{u}^{-1} M + M^{T} \gamma _{u}^{-1} ] du, \end{aligned}$$

where

$$\begin{aligned} M=\left[ \begin{array}{ll} 0 &{} \alpha \\ \beta &{} 0 \end{array}\right] \end{aligned}$$

Now, it remains to compute the quantities \(H_{(1,2)}\) and \(H_{(1,2,1,2)}\). This can be done using the recursive formulas in Proposition 3.12, but we need to keep in mind that higher order derivatives of \(Y\) are equal to zero, thus they will be simplified (Table 2). Indeed,

$$\begin{aligned} H_{(1)}(Y_{t}) = \sum _{j=1}^{2} Y_{t} \int _{0}^{t} (\gamma _{s}^{-1})^{1j} D_{s}Y_{t}^{j} dB_{s}^{j} - c_{H} \int _{0}^{t}\int _{0}^{t} D_{s}Y_{t}^{j}Q_{rt} |r-s|^{2H-2}dr ds. \end{aligned}$$

Moreover, we can easily see that

$$\begin{aligned} H_{(1,2)} (Y_{t}) = H_{(1)}(Y_{t})\int _{0}^{t} Q_{st}dB_{s} - c_{H} \int _{0}^{t}\int _{0}^{t} D_{s}H_{(1)}(Y_{t}) Q_{rt}|r-s|^{2H-2}dr ds\\ H_{(1,2,1,2)} (Y_{t}) = H_{(1,2,1)}(Y_{t})\int _{0}^{t} Q_{st}dB_{s} - c_{H} \int _{0}^{t}\int _{0}^{t} D_{s}H_{(1,2,1)}(Y_{t}) Q_{rt}|r-s|^{2H-2}dr ds \end{aligned}$$
Table 2 MLE for the unknown parameters (\(\alpha ,\,\beta \)) of a two-dimensional system of fractional SDEs

Of course, recall that \(Q_{st}^{pj} = (\gamma _{s}^{-1})^{pj} D_{s}Y_{t}^{j}\). In practice, these quantities are computed recursively. The last step is to compute the derivative of \(H_{(1,2,1,2)} (Y_{t})\) with respect to \(\alpha \) and \(\beta \), which in this case is not as complicated and compute the MLEs using the algorithm discussed in the previous section. The table below summarizes our results, and we have plotted the corresponding histograms in Fig. 1.

Remark 6

The value of \(H\) used for the simulation of the process is 0.6. The number of observations is \(n=50\), the number of Euler steps is \(N=500\), the number of stochastic approximation steps is \(K=50\) and the number of MC simulations \(M=500\).

5.3 Application to financial data

One of the most popular applications of fractional SDEs is in finance. Hu and Oksendal (2003) and Hu et al. (2003) introduced the fractional Black-Scholes model in order to account for inconsistencies of the existing models in practice. More specifically, the stock price is described therein by a fractional geometric Brownian motion with Hurst parameter \(1/2< H <1\). The choice of this model is based on empirical studies that displayed the presence of long-range dependence on stock prices, for example in Willinger et al. (1999).

Estim. parameters

Group 1: \(\hat{H_{1}}=0.59\)

Group 2: \(\hat{H_{2}}=0.63\)

Group 3: \(\hat{H_{3}}=0.61\)

\(\hat{\mu }\)

0.015 (0.0123)

0.019 (0.0144)

0.011 (0.0214)

\(\hat{\sigma }\)

0.352 (0.058)

0.339 (0.046)

0.341 (0.024)

However, the presence of fractional Brownian motion in the model allows for arbitrage in the general setting. It has been shown that arbitrage opportunities can be avoided in a number of ways, for example the reader can refer to Rogers (1997); Dasgupta and Kallianpur (2000) and Cheridito (2003). We choose to model the stock price as as follows:

$$\begin{aligned} dS_{t} = \mu S_{t} dt + \sigma dB_{t}, \end{aligned}$$
(44)

where \(B\) is a fractional Brownian motion with Hurst index \(1/2<H<1\). For this SDE (as well as for a more general class of fractional SDEs) Guasoni (2006) proved that there is no arbitrage when transaction costs are present.

Our goal is to estimate the unknown parameters \(\mu \) and \(\sigma \) based on daily observations of the S&P 500 index (data from June 2010 until December 2010). We consider the Hurst parameter to be piece-wise constant, we devide the data in three groups (of 50 daily observations each) and we compute for each one the Hurst index using the Rescaled-Range (R/S) statistic. We obtain that for the first group of data \(\hat{H_{1}}=0.59\), for the second \(\hat{H_{2}}=0.63\) and for the third one \(\hat{H_{3}}=0.61\). For the different groups, we apply our maximum likelihood approach in order to estimate \(\mu \) and \(\sigma \). The estimates are summarized in the following table:

Remark 7

The volatility is computed in years. In addition, during this period of time the historical volatility is around 0.38, which is coherent with our own estimation.