Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 The Brownian Motion and Related Processes

1.1 A Brief History of Brownian Motion

Historically, the Brownian motion (BM in short) is associated with the analysis of motions which time evolution is so disordered that it seems difficult to forecast their evolution, even in a very short time interval. It plays a central role in the theory of random processes, because in many theoretical and applied problems, the Brownian motion (or the diffusion processes that are built from Brownian motion) provides simple limit models on which many calculations can be made.

In 1827, the English botanist Robert Brown (1773–1858) first described the erratic motion of fine organic particles in suspension in a gas or a fluid. At the nineteenth century, after him, several physicists had admitted that this motion is very irregular and does not seem to admit a tangent; thus one could not speak of his speed, nor apply the laws of mechanics to him! In 1900 [4], Louis Bachelier (1870–1946) introduced the Brownian motion to model the dynamics of the stock prices, but his approach then is forgotten until the sixties…His Ph.D. thesis, Théorie de la spéculation, is the starting point of modern finance.

But Physics is the field at the beginning of the twentieth century which is at the origin of great interest for this process. In 1905, Albert Einstein (1879–1955) built a probabilistic model to describe the motion of a diffusive particle: he found that the law of the particle position at the time t, given the initial state x, admits a density which satisfies the heat equation, and actually it is Gaussian. Its theory is then quickly confirmed by experimental measurements of satisfactory diffusion constants. The same year as Einstein, a discrete version of the Brownian motion is proposed by the Polish physicist Smoluchowski using random walks.

In 1923, Norbert Wiener (1894–1964) built rigorously the random function that is called Brownian motion; he established in particular that the trajectories are continuous. By 1930, while following an idea of Paul Langevin, Ornstein and Uhlenbeck studied the Gaussian random function which bears their name and which seems to be the stationary or mean-reverting equivalent model associated to the Brownian motion.

It is the beginning of a very active theoretical research in Mathematics. Paul Lévy (1886–1971) discovered then, with other mathematicians, many properties of the Brownian motion [55] and introduced a first form of the stochastic differential equations, the study of which is later systematized by K. Itô (1915–2008). His work is gathered in a famous treaty published in 1948 [44] which is usually referred to as Itô stochastic calculus.

But History knows sometimes incredible bounces. Indeed in 2000, the French Academy of Science opened a manuscript remained sealed since 1940 pertaining to the young mathematician Doeblin (1915–1940), a French telegraphist died during the German offensive. Doeblin was already known for his remarkable achievements in the theory of probability due to his works on the stable laws and the Markov processes. This sealed manuscript gathered in fact his recent research, written between November 1939 and February 1940: it was actually related to his discovery (before Itô) of the stochastic differential equations and their relations with the Kolmogorov partial differential equations. Perhaps the Itô stochastic calculus could have been called Doeblin stochastic calculus…

1.2 The Brownian Motion and Its Paths

In the following, we study the basic properties of the Brownian motion and its paths.

1.2.1 Definition and Existence

The very erratic path which is a specific feature of the Brownian motion is in general associated with the observation that the phenomenon, although very disordered, has a certain time homogeneity, i.e. the origin date does not have importance to describe the time evolution. These properties underly the next definition.

Definition 1 (of Standard Brownian Motion).

A standard Brownian motion is a random process {W t ; t ≥ 0} with continuous paths, such that

  • W 0 = 0.

  • The time increment W t W s with 0 ≤ s < t has the Gaussian law,Footnote 1 with zero mean and variance equal (ts).

  • For any 0 = t 0 < t 1 < t 2 . .  < t n , the increments \(\{W_{t_{i+1}} - W_{t_{i}};0 \leq i \leq n - 1\}\) are independentFootnote 2 random variables.

There are important remarks following from the definition.

  1. 1.

    The state W t of the system at time t is distributed as a Gaussian r.v. with mean 0 and variance t (increasing as time gets larger). Its probability density is

    $$\displaystyle{ \mathbb{P}(W_{t} \in [x,x + \mathrm{d}x]) = g(t,x)\mathrm{d}x = \frac{1} {\sqrt{2\pi t}}\exp (-x^{2}/2t)\mathrm{d}x. }$$
    (1)
  2. 2.

    With probability 95 %, we have \(\vert W_{t}\vert \leq 1.96\sqrt{t}\) (see Fig. 1) for a given time t. However, it may occur that W goes out this confidence interval.

    Fig. 1
    figure 1

    Simulation of a Brownian motion with the 95 %-confidence interval curves \(f_{\pm }(t) = \pm 2\sqrt{t}\)

  3. 3.

    The random variable W t , as the sum of its increments, can be decomposed as a sum of independent Gaussian r.v.: this property serves as a basis from the further stochastic calculus.

Theorem 1.

The Brownian motion exists!

Proof.

There are different constructive ways to prove the existence of Brownian motion. Here, we use a Fourier based approach (proposed by Wiener), showing that W can be represented as a superposition of Gaussian signals. Also, we use a equivalent characterization of Brownian motion as a Gaussian processFootnote 3 with zero mean and covariance function \(\mathbb{C}\mathrm{ov}(W_{t},W_{s}) =\min (s,t) = s \wedge t\).

Let (G m ) m ≥ 0 be a sequence of independent Gaussian r.v. with zero mean and unit variance and set

$$\displaystyle{W_{t} = \frac{t} {\sqrt{\pi }}G_{0} + \sqrt{\frac{2} {\pi }} \sum _{m\geq 1}\frac{\sin (\mathit{mt})} {m} G_{m}.}$$

We now show that W is a Brownian motion on [0, π]; then it is enough to concatenate and sum up such independent processes to get finally a Brownian motion defined on \(\mathbb{R}^{+}\). We sketch the proof of our statement on W. First, the series is a.s. Footnote 4 convergent since this is a Cauchy sequence in L 2: indeed, thanks to the independence of the Gaussian random variables, we have

$$\displaystyle{\|\sum _{m_{1}\leq m\leq m_{2}} \frac{\sin (\mathit{mt})} {m} G_{m}\|_{L_{2}}^{2} =\sum _{ m_{1}\leq m\leq m_{2}} \frac{\sin ^{2}(\mathit{mt})} {m^{2}} \leq \sum _{m_{1}\leq m} \frac{1} {m^{2}}\mathop{\longrightarrow }\limits_{m_{1} \rightarrow +\infty }0.}$$

The partial sum has a Gaussian distribution, thus the a.s. limitFootnote 5 too. The same argument gives that W is a Gaussian process. It has zero mean and its covariance is the limit of the covariance of partial sums: thus

$$\displaystyle{\mathbb{C}\mathrm{ov}(W_{t},W_{s}) = \frac{ts} {\pi } + \frac{2} {\pi } \sum _{m\geq 1}\frac{\sin (\mathit{mt})} {m} \frac{\sin (\mathit{ms})} {m}.}$$

The above series is equal to min(s, t) for (s, t) ∈ [0, π]2, by a standard computation of the Fourier coefficients of the function \(t \in [-\pi,\pi ]\mapsto \min (s,t)\) (for s fixed). The proof of continuity of W is based on the uniform convergence of the function series along some appropriate subsequences, which we do not detail (see [45, pp. 21–22]). □ 

In many applications, it is useful to consider non standard Brownian motion.

Definition 2 (of Arithmetic Brownian Motion).

An arithmetic Brownian motion (ABM in short) is a random process {X t ; t ≥ 0} where \(X_{t} = x_{0} + bt +\sigma W_{t}\) and

  • W is a standard Brownian motion.

  • \(x_{0} \in \mathbb{R}\) is the starting value of X.

  • \(b \in \mathbb{R}\) is the drift parameter.

  • \(\sigma \in \mathbb{R}\) is the diffusion parameter.

Usually, σ can be taken non-negative due to the symmetry of Brownian motion (see Proposition 1). X is still a Gaussian process, which position X t at time t is distributed as \(\mathcal{N}(x_{0} + bt,\sigma ^{2}t)\) (Fig. 2).

Fig. 2
figure 2

Arithmetic Brownian motion with different drift parameters

1.2.2 First Easy Properties of the Brownian Path

Proposition 1.

Let \(\{W_{t};t \in \mathbb{R}^{+}\}\) a standard Brownian motion.

  1. i)

    Symmetry property: \(\{-W_{t};t \in \mathbb{R}^{+}\}\) is a standard Brownian motion.

  2. ii)

    Scaling property: for any c > 0, \(\{W_{t}^{c};t \in \mathbb{R}^{+}\}\) is a standard Brownian motion where

    $$\displaystyle{ W_{t}^{c} = c^{-1}W_{ c^{2}t}. }$$
    (2)
  3. iii)

    Time reversal: for any fixed T, \(\hat{W}_{t}^{T} = W_{T} - W_{T-t}\) defines a standard Brownian motion on [0,T].

  4. iv)

    Time inversion: \(\{\hat{W}_{t} = \mathit{tW }_{1/t},t > 0,\quad \hat{W}_{0} = 0\}\) is a standard Brownian motion.

The scaling property is important and illustrates the fractal feature of Brownian motion path: \(\varepsilon\) times W t behaves like a Brownian motion at time \(\varepsilon ^{2}t\).

Proof.

It is a direct verification of the Brownian motion definition, related to independent, stationary and Gaussian increments. The continuity is also easy to verify, except for the case iv) at time 0. For this, we use that \(\lim \limits _{t\rightarrow 0^{+}}\mathit{tW }_{1/t} =\lim \limits _{s\rightarrow +\infty }\frac{W_{s}} {s} = 0\), see Proposition 7. □ 

1.3 Time-Shift Invariance and Markov Property

Previously, we have studied simple spatial transformation of Brownian motion. We now consider time-shifts, by first considering deterministic shifts.

Proposition 2 (Invariance by a Deterministic Time-Shift).

The Brownian Motion shifted by h ≥ 0, given by \(\{\bar{W}_{t}^{h} = W_{t+h} - W_{h};t \in \mathbb{R}^{+}\}\) , is another Brownian motion, independent of the Brownian Motion stopped at h, {W s ;s ≤ h}.

In other words, \(\{W_{t+h} = W_{h} +\bar{ W}_{t}^{h};t \in \mathbb{R}^{+}\}\) is a Brownian motion starting from W h . The above property is associated to the weak Markov property which states (possibly applicable to other processes) that the distribution of W after h conditionally on the past up to time h depends only on the present value W h .

Proof.

The Gaussian property of \(\bar{W}^{h}\) is clear.

The independent increments of W induce those of \(\bar{W}^{h}\).

It remains to show the independence w.r.t. the past up to h, i.e. the sigma-field generated by {W s ; s ≤ h}, or equivalently w.r.t. the sigma-field generated by \(\{W_{s_{1}},\ldots W_{s_{N}}\}\) for any \(0 \leq s_{1} \leq \ldots \leq s_{N} \leq h\). The independence of increments of W ensures that \((\bar{W}_{t_{1}}^{h},\bar{W}_{t_{2}}^{h}-\bar{W}_{t_{1}}^{h},\cdots \,,\bar{W}_{t_{k}}^{h}-\bar{W}_{t_{k-1}}^{h}) = (W_{t_{1}+h}-W_{h},\cdots \,,W_{t_{k}+h}-W_{t_{k-1}+h})\) is independent of \((W_{s_{1}},W_{s_{2}} - W_{s_{1}},\cdots \,,W_{s_{j}} - W_{s_{j-1}})\). Then \((\bar{W}_{t_{1}}^{h},\bar{W}_{t_{2}}^{h},\cdots \,,\bar{W}_{t_{k}}^{h})\) is independent of {W s ; s ≤ h}. □ 

As a consequence, we can derive a nice symmetry result making the connection between the maximum of Brownian motion monitored along a finite time grid \(t_{0} = 0 < t_{1} < \cdots < t_{N} = T\) and that of W T only.

Proposition 3.

For any y ≥ 0, we have

$$\displaystyle{ \mathbb{P}[\sup _{i\leq N}W_{t_{i}} \geq y] \leq 2\mathbb{P}[W_{T} \geq y] = \mathbb{P}[\vert W_{T}\vert \geq y]. }$$
(3)

Proof.

The equality at the r.h.s. comes from the symmetric distribution of W T . Now we show the inequality on the left. Denote by t y the first time t j when W reaches the level y. Notice that \(\{\sup _{i\leq N}W_{t_{i}} \geq y\} =\{ t_{y}^{{\ast}}\leq T\}\) and \(\{t_{y}^{{\ast}} = t_{j}\} =\{ W_{t_{i}} < y,\forall i < j,W_{t_{j}} \geq y\}\). For each j < N, the symmetry of Brownian increments gives \(\mathbb{P}[W_{T} - W_{t_{j}} \geq 0] = \frac{1} {2}\). Since the shifted Brownian motion \((\bar{W}_{t}^{t_{j}} =\bar{ W}_{t_{ j}+t} - W_{t_{j}}: t \in \mathbb{R}^{+})\) is independent of (W s : s ≤ t j ), we have

$$\displaystyle\begin{array}{rcl} & \frac{1} {2} & \mathbb{P}[\sup _{i\leq N}W_{t_{i}} \geq y] = \frac{1} {2}\mathbb{P}[t_{y}^{{\ast}}\leq T] = \frac{1} {2}\sum _{j=0}^{N}\mathbb{P}[t_{ y}^{{\ast}} = t_{ j}] {}\\ & =& \frac{1} {2}\mathbb{P}[W_{t_{i}} < y,\forall i < N,W_{T} \geq y] +\sum _{ j=0}^{N-1}\mathbb{P}[W_{ t_{i}} < y,\forall i < j,W_{t_{j}} \geq y]\mathbb{P}[W_{T} - W_{t_{j}} \geq 0] {}\\ & =& \frac{1} {2}\mathbb{P}[W_{t_{i}} < y,\forall i < N,W_{T} \geq y] +\sum _{ j=0}^{N-1}\mathbb{P}[W_{ t_{i}} < y,\forall i < j,W_{t_{j}} \geq y,W_{T} - W_{t_{j}} \geq 0] {}\\ & \leq & \mathbb{P}[W_{t_{i}} < y,\forall i < N,W_{T} \geq y] +\sum _{ j=0}^{N-1}\mathbb{P}[W_{ t_{i}} < y,\forall i < j,W_{t_{j}} \geq y,W_{T} \geq y] {}\\ & =& \mathbb{P}[t_{y}^{{\ast}}\leq T,W_{ T} \geq y] = \mathbb{P}[W_{T} \geq y]. {}\\ \end{array}$$

At the two last lines, we have used \(\{W_{t_{j}} \geq y,W_{T} - W_{t_{j}} \geq 0\} \subset \{ W_{t_{j}} \geq y,W_{T} \geq y\}\) and \(\{W_{T} \geq y\} \subset \{ t_{y}^{{\ast}}\leq T\}\). □ 

Taking a grid with time step TN with N → +, we have \(\sup _{i\leq N}W_{t_{i}} \uparrow \sup _{0\leq t\leq T}W_{t_{i}}\). Then, we can pass to the limit (up to some probabilistic convergence technicalities) in the inequality (3) to get

$$\displaystyle{ \mathbb{P}[\sup _{0\leq t\leq T}W_{t} \geq y] \leq \mathbb{P}[\vert W_{T}\vert \geq y]. }$$
(4)

Actually, the inequality (4) is an equality: it is proved later in Proposition 5.

Now, our aim is to extend Proposition 2 to the case of stochastic time-shifts h. Without extra assumption on h, the result is false in general: a counter-example is the last passage time of W at zero before the time 1 (\(L =\sup \{ t \leq 1: W_{t} = 0\}\)), which does not satisfy the property. Indeed, since \((W_{s+L} - W_{s})_{s\geq 0}\) do not vanish a.s. at short time (due to the definition of L), the marginal distribution can not be Gaussian and the time-shifted process can not be a Brownian motion.

The right class for extension is the class of stopping times, defined as follows.

Definition 3 (Stopping Time).

A stopping time is non-negative random variable U (taking possibly the value + ), such that for any t ≥ 0, the event {U ≤ t} depends only on the Brownian motion values {W s ; s ≤ t}.

The stopping time is discrete if it takes only a countable set of values (u 1, ⋯ , u n , ⋯ ).

In other words, it suffices to observe the Brownian motion until time t to know whether or not the event {U ≤ t} occurs. Of course, deterministic times are stopping times. A more interesting example is the first hitting time of a level y > 0

$$\displaystyle{T_{y} =\inf \{ t > 0;W_{t} \geq y\};}$$

it is a stopping time, since \(\{T_{y} \leq t\} =\{ \exists s \leq t,W_{s} = y\}\) owing to the continuity of W. Observe that the counter-example of last passage time L is not a stopping time.

Proposition 4.

Let U be a stopping time. On the event {U < +∞}, the Brownian motion shifted by U ≥ 0, i.e. \(\{\bar{W}_{t}^{U} = W_{t+U} - W_{U};t \in \mathbb{R}^{+}\}\) , is a Brownian motion independent of {W t ;t ≤ U}.

This result is usually referred to as the strong Markov property.

Proof.

We show that for any 0 ≤ t 1 < ⋯ < t k , any 0 ≤ s 1 < ⋯ < s l , any (x 1, ⋯ , x k ) and any measurable sets (B 1, ⋯ , B l−1), we have

$$\displaystyle\begin{array}{rcl} & & \mathbb{P}(\bar{W}_{t_{1}}^{U} < x_{ 1},\cdots \,,\bar{W}_{t_{k}}^{U} < x_{ k},W_{s_{1}} \in B_{1},\cdots \,,W_{s_{l-1}} \in B_{l-1},s_{l} \leq U < +\infty ) \\ & =& \mathbb{P}(W_{t_{1}}^{{\prime}} < x_{ 1},\cdots \,,W_{t_{k}}^{{\prime}} < x_{ k})\mathbb{P}(W_{s_{1}} \in B_{1},\cdots \,,W_{s_{l-1}} \in B_{l-1},s_{l} \leq U < +\infty ),\qquad {}\end{array}$$
(5)

where W is a Brownian motion independent of W. We begin with the easier case where U is a discrete stopping time valued in (u n ) n ≥ 1: then

$$\displaystyle\begin{array}{rcl} & & \mathbb{P}(\bar{W}_{t_{1}}^{U} < x_{ 1},\cdots \,,\bar{W}_{t_{k}}^{U} < x_{ k},W_{s_{1}} \in B_{1},\cdots \,,W_{s_{l-1}} \in B_{l-1},s_{l} \leq U < +\infty ) {}\\ & =& \sum _{n}\mathbb{P}(\bar{W}_{t_{1}}^{U} < x_{ 1},\cdots,\bar{W}_{t_{k}}^{U} < x_{ k},W_{s_{1}} \in B_{1},\cdots \,,W_{s_{l-1}} \in B_{l-1},s_{l} \leq U,U = u_{n}) {}\\ & =& \sum _{n}\mathbb{P}(\bar{W}_{t_{1}}^{u_{n} } < x_{1},\cdots,\bar{W}_{t_{k}}^{u_{n} } < x_{k},W_{s_{1}} \in B_{1},\cdots \,,W_{s_{l-1}} \in B_{l-1},s_{l} \leq U,U = u_{n}) {}\\ & =& \sum _{n}\mathbb{P}(W_{t_{1}}^{{\prime}} < x_{ 1},\cdots,W_{t_{k}}^{{\prime}} < x_{ k})\mathbb{P}(W_{s_{1}} \in B_{1},\cdots \,,W_{s_{l-1}} \in B_{l-1},s_{l} \leq U,U = u_{n}) {}\\ & =& \mathbb{P}(W_{t_{1}}^{{\prime}} < x_{ 1},\cdots \,,W_{t_{k}}^{{\prime}} < x_{ k})\mathbb{P}(W_{s_{1}} \in B_{1},\cdots \,,W_{s_{l-1}} \in B_{l-1},s_{l} \leq U < +\infty ) {}\\ \end{array}$$

applying at the last equality but one the time-shift invariance with deterministic shift u n . For the general case for U, we apply the result to the discrete stopping time \(U_{n} = \frac{[\mathit{nU}]+1} {n}\), and then pass to the limit using the continuity of W. □ 

1.4 Maximum, Behavior at Infinity, Path Regularity

We apply the strong Markov property to identify the law of the Brownian motion maximum.

Proposition 5 (Symmetry Principle).

For any y ≥ 0 and any x ≤ y, we have

$$\displaystyle\begin{array}{rcl} \mathbb{P}[\sup _{t\leq T}W_{t} \geq y;W_{T} \leq x]& =& \mathbb{P}[W_{T} \geq 2y - x],{}\end{array}$$
(6)
$$\displaystyle\begin{array}{rcl} \mathbb{P}[\sup _{t\leq T}W_{t} \geq y]& =& \mathbb{P}[\vert W_{T}\vert \geq y] = 2\int _{ \frac{y} {\sqrt{T}} }^{+\infty }\frac{e^{-\frac{1} {2} x^{2} }} {\sqrt{2\pi }} \mathrm{d}x.{}\end{array}$$
(7)

Proof.

Denote by \(T_{y} =\inf \{ t > 0: W_{t} \geq y\}\) and + if the set is empty. Observe that T y is a stopping time and that \(\{\sup _{t\leq T}W_{t} \geq y;W_{T} \leq x\} =\{ T_{y} \leq T;W_{T} \leq x\}\). By Proposition 4, on {T y  ≤ T}, \((W_{T_{y}+t} =\bar{ W}_{t}^{T_{y}} + y: t \in \mathbb{R}^{+})\) is a Brownian motion starting from y, independent of (W s : s ≤ T y ). By symmetry (see Fig. 3), the events {T y  ≤ T, W T  < x} and {T y  ≤ T, W T  > 2yx} has the same probability. But for x ≤ y, we have \(\{T_{y} \leq T,W_{T} > 2y - x\} =\{ W_{T} > 2y - x\}\) and the first result is proved.

For the second result, take y = x and write \(\mathbb{P}[\sup _{t\leq T}W_{t} \geq y] = \mathbb{P}[\sup _{t\leq T}W_{t} \geq y,W_{T} > y]+\mathbb{P}[\sup _{t\leq T}W_{t} \geq y,W_{T} \leq y] = \mathbb{P}[W_{T} > y]+\mathbb{P}[W_{T} \geq y] = 2\mathbb{P}(W_{T} \geq y) = \mathbb{P}(\vert W_{T}\vert \geq y)\). □ 

Fig. 3
figure 3

Brownian motion \((W_{T_{y}+t} =\bar{ W}_{t}^{T_{y}} + y: t \in \mathbb{R}^{+})\) starting from y and its symmetric path

As a consequence of the identification of the law of the maximum up to a fixed time, we prove that the range of Brownian motion becomes \(\mathbb{R}\) at time goes to infinity.

Proposition 6.

With probability 1, we have

$$\displaystyle{\limsup _{t\rightarrow +\infty }W_{t} = +\infty,\qquad \liminf _{t\rightarrow +\infty }W_{t} = -\infty.}$$

Proof.

For T ≥ 0, set \(M_{T} =\sup _{t\leq T}W_{t}\). As T↑ +, it defines a sequence of increasing r.v., thus converging a.s. to a limit r.v. M . Applying twice the monotone convergence theorem, we obtain

$$\displaystyle\begin{array}{rcl} \mathbb{P}[M_{\infty } = +\infty ] =\lim _{y\uparrow +\infty }\mathbb{P}[M_{\infty } > y]& =& \lim _{y\uparrow +\infty }\big(\lim _{T\uparrow +\infty }\mathbb{P}[M_{T} > y]\big) {}\\ & =& \lim _{y\uparrow +\infty }(\lim _{T\uparrow +\infty }\mathbb{P}[\vert W_{T}\vert \geq y]\big) = 1 {}\\ \end{array}$$

using (7). This proves that \(\limsup \limits _{t\rightarrow +\infty }W_{t} = +\infty \) a.s. and a symmetry argument gives the liminf. □ 

However, the increasing rate of W is sublinear as time goes to infinity.

Proposition 7.

With probability 1, we have

$$\displaystyle{\lim _{t\rightarrow +\infty }\frac{W_{t}} {t} = 0.}$$

Proof.

The strong law of large numbers yields that \(\frac{W_{n}} {n} = \frac{1} {n}\sum _{i=1}^{n}(W_{ i} - W_{i-1})\) converges a.s. to \(\mathbb{E}(W_{1}) = 0\). The announced result is thus proved along the sequence of integers. To fill the gaps between integers, set \(\tilde{M}_{n} =\sup _{n<t\leq n+1}(W_{t} - W_{n})\) and \(\tilde{M}_{n}^{{\prime}} =\sup _{n<t\leq n+1}(W_{n} - W_{t})\): due to Proposition 5, \(\tilde{M}_{n}\) and \(\tilde{M}_{n}^{{\prime}}\) have the same distribution as | W 1 | . Then, the Chebyshev inequality writes

$$\displaystyle{\mathbb{P}(\vert \tilde{M}_{n}\vert + \vert \tilde{M}_{n}^{{\prime}}\vert \geq n^{3/4}) \leq 2\frac{\mathbb{E}(\vert \tilde{M}_{n}\vert ^{2}) + \mathbb{E}(\vert \tilde{M}_{ n}^{{\prime}}\vert ^{2})} {n^{3/2}} = 4n^{-3/2},}$$

implying that \(\sum _{n\geq 0}\mathbb{P}(\vert \tilde{M}_{n}\vert + \vert \tilde{M}_{n}\vert \geq n^{3/4}) < +\infty \). Thus, by Borel–Cantelli’s lemma, we obtain that with probability 1, for n large enough \(\vert \tilde{M}_{n}\vert + \vert \tilde{M}_{n}^{{\prime}}\vert < n^{3/4}\), i.e. \(\frac{\tilde{M}_{n}} {n}\) and \(\frac{\tilde{M}_{n}^{{\prime}}} {n}\) both converge a.s. to 0. □ 

By time inversion, \(\hat{W}_{t} = tW_{1/t}\) is another Brownian motion: the \(\hat{W}\)-growth in infinite time gives an estimate on W at 0, which writes

$$\displaystyle{+\infty =\limsup _{t\rightarrow +\infty }\vert \hat{W}_{t}\vert =\limsup _{s\rightarrow 0^{+}} \frac{\vert W_{s} - W_{0}\vert } {s} }$$

which shows that W is not differentiable at time 0. By time-shift invariance, this is also true at any given time t. The careful reader may notice that the set of full probability measure depends on t and it is unclear at this stage if a single full set is available for any t, i.e. if

$$\displaystyle{\mathbb{P}(\exists t_{0}\text{ such that }t\mapsto W_{t}\text{ is differentiable at }t_{0}) = 0.}$$

Actually, the above result holds true and it is due to Paley–Wiener–Zygmund (1933). The following result is of comparable nature: we claim that a.s. there does not exist any interval on which W is monotone.

Proposition 8 (Nowhere Monotonicity).

We have

$$\displaystyle{\mathbb{P}(t\mapsto W_{t}\mathrm{is\ monotone\ on\ an\ interval}) = 0.}$$

Proof.

Define \(M_{s,t}^{\uparrow } =\{\omega: u\mapsto W_{u}(\omega )\text{ is increasing on the interval}]s,t[\}\) and M s, t similarly. Observe that

$$\displaystyle{M =\{ t\mapsto W_{t}\text{ is monotone on the interval}\} =\bigcup _{s,t\in \mathbb{Q},0\leq s<t}(M_{s,t}^{\uparrow }\cup M_{ s,t}^{\downarrow }),}$$

and since this is a countable union, it is enough to show \(\mathbb{P}(M_{s,t}^{\uparrow }) = \mathbb{P}(M_{s,t}^{\downarrow }) = 0\) to conclude \(\mathbb{P}(M) \leq \sum _{s,t\in \mathbb{Q},0\leq s<t}[\mathbb{P}(M_{s,t}^{\uparrow }) + \mathbb{P}(M_{s,t}^{\downarrow })] = 0\). For fixed n, set \(t_{i} = s + i(t - s)/n\), then

$$\displaystyle{\mathbb{P}(M_{s,t}^{\uparrow }) \leq \mathbb{P}(W_{ t_{i+1}}-W_{t_{i}} \geq 0,0 \leq i < n) =\prod _{ i=0}^{n-1}\mathbb{P}(W_{ t_{i+1}}-W_{t_{i}} \geq 0) = \frac{1} {2^{n}},}$$

leveraging the symmetric distribution of the increments. Taking now n large gives \(\mathbb{P}(M_{s,t}^{\uparrow }) = 0\). We argue similarly for \(\mathbb{P}(M_{s,t}^{\downarrow }) = 0\). □ 

In view of this lack of smoothness, it seems impossible to define differential calculus along the paths of Brownian motion. However, as it will be further developed, Brownian motion paths enjoy a nice property of finite quadratic variations, which serves to build an appropriate stochastic calculus.

There are much more to tell about the properties of Brownian motion. We mention few extra properties without proof:

  • Holder regularity: for any \(\rho \in (0, \frac{1} {2})\) and any deterministic T > 0, there exists a a.s. finite r.v. C ρ, T such that

    $$\displaystyle{\forall \ 0 \leq s,t \leq T,\quad \vert W_{t} - W_{s}\vert \leq C_{\rho,T}\vert t - s\vert ^{\rho }.}$$
  • Law of iterated logarithm: setting \(h(t) = \sqrt{2t\log \log t^{-1}}\), we have

    $$\displaystyle{\limsup _{t\downarrow 0} \frac{W_{t}} {h(t)} = 1\quad \textit{a.s.}\quad \text{and}\quad \liminf _{t\downarrow 0} \frac{W_{t}} {h(t)} = -1\quad \textit{a.s.}}$$
  • Zeros of Brownian motion: the set \(\chi =\{ t \geq 0: W_{t} = 0\}\) of the zeros of W is closed, unbounded, with null Lebesgue measure and it has no isolated points.

1.5 The Random Walk Approximation

Another algorithmic way to build a Brownian motion consists in rescaling a random walk. This is very simple and very useful for numerics: it leads to the so-called tree methods and it has some connections with finite differences in PDEs.

Consider a sequence (X i ) i of independent random variables with Rademacher distribution: \(\mathbb{P}(X_{i} = \pm 1) = \frac{1} {2}\). Then

$$\displaystyle{S_{n} =\sum _{ i=1}^{n}X_{ i}}$$

defines a random walk on \(\mathbb{Z}\). Like Brownian motion, it is a process with stationary independent increments, but it is not Gaussian. Actually S n has a binomial distribution:

$$\displaystyle{\mathbb{P}(S_{n} = -n+2k) = \mathbb{P}(k\ \mathrm{rises}) = 2^{-n}\Big(\begin{array}{c} n\\ k \end{array} \Big).}$$

A direct computation shows that \(\mathbb{E}(S_{n}) = 0\) and \(\mathbb{V}\mathrm{ar}(S_{n}) = n\). When we rescale the walk and we let n go towards infinity, we observe however that due the Central Limit Theorem, the distribution of \(\frac{S_{n}} {\sqrt{n}}\) converges to the Gaussian law with zero mean and unit variance. The fact that it is equal to the law of W 1 is not a coincidence, since it can be justified that the full trajectory of the suitably rescaled random walk converges towards that of a Brownian motion, see Fig. 4. This result is known as Donsker theorem, see for instance [12] for a proof.

Proposition 9.

Define (Y t n ) t as the piecewise constant process

$$\displaystyle{ Y _{t}^{n} = \frac{1} {\sqrt{n}}\sum _{i=1}^{\lfloor \mathit{nt}\rfloor }X_{ i}. }$$
(8)

The distribution of the process (Y t n ) t converges to that of a Brownian motion (W t ) t as n → +∞, i.e. for any continuous functional

$$\displaystyle{\lim _{n\rightarrow \infty }\mathbb{E}(\varPhi (Y _{t}^{n}: t \leq 1)) = \mathbb{E}(\varPhi (W_{ t}: t \leq 1)).}$$

The last result gives a simple way to evaluate numerically expectations of functionals of Brownian motion. It is the principle of the so-called binomial tree methods.

Fig. 4
figure 4

The random walk rescaled in time and space. From left to right: the process Y n for n = 50, 100, 200. The pieces of path with same color are built with the same X i

Link with Finite Difference Scheme. The random walk can be interpreted as a explicit FD scheme for the heat equation. We anticipate a bit on the following where the connection between Brownian motion and heat equation will be more detailed.

For \(t = \frac{i} {n}\) (\(i \in \{ 0,\ldots,n\}\)) and \(x \in \mathbb{R}\), set

$$\displaystyle{u^{n}(t,x) = \mathbb{E}\Big(f\big(x + Y _{ \frac{ i} {n} }^{n}\big)\Big).}$$

The independence of (X i ) i gives

$$\displaystyle\begin{array}{rcl} u^{n}\big( \frac{i} {n},x\big)& =& \mathbb{E}\big(f(x + Y _{\frac{i-1} {n} }^{n} + \frac{X_{i}} {\sqrt{n}})\big) {}\\ & =& \frac{1} {2}u^{n}\big(\frac{i - 1} {n},x + \frac{1} {\sqrt{n}}\big) + \frac{1} {2}u^{n}\big(\frac{i - 1} {n},x - \frac{1} {\sqrt{n}}\big), {}\\ \frac{u^{n}\big( \frac{i} {n},x\big) - u^{n}\big(\frac{i-1} {n},x\big)} { \frac{1} {n}} & =& \frac{1} {2} \frac{u^{n}\big(\frac{i-1} {n},x + \frac{1} {\sqrt{n}}\big) - 2u^{n}\big(\frac{i-1} {n},x\big) + u^{n}\big(\frac{i-1} {n},x - \frac{1} {\sqrt{n}}\big)} {\big( \frac{1} {\sqrt{n}}\big)^{2}}.{}\\ \end{array}$$

Thus, u n related to the expectation of the random walk can be read as an explicit FD scheme of the heat equation \(\partial _{t}u(t,x) = \frac{1} {2}\partial _{\mathit{xx}}^{2}u(t,x)\) and u(0, x) = f(x), with time step \(\frac{1} {n}\) and space step \(\frac{1} {\sqrt{n}}\).

1.6 Other Stochastic Processes

We present other one-dimensional processes, with continuous trajectories, which derive from the Brownian motion.

  1. 1.

    Geometric Brownian motion: this model is popular in finance to model stocks and other assets by a positive process.

  2. 2.

    Ornstein–Uhlenbeck process: it has important applications in physics, mechanics, economy and finance to model stochastic phenomena exhibiting mean-reverting features (like spring endowed with random forces, interest-rates or inflation, …).

  3. 3.

    Stochastic differential equations: it gives the more general framework.

1.6.1 Geometric Brownian Motion

Definition 4.

A Geometric Brownian Motion (GBM in short) with deterministic initial value S 0 > 0, drift coefficient μ and diffusion coefficient σ, is a process (S t ) t ≥ 0 defined by

$$\displaystyle{ S_{t} = S_{0}e^{(\mu -\frac{1} {2} \sigma ^{2})t+\sigma W_{ t}}, }$$
(9)

where {W t ; t ≥ 0} is a standard Brownian motion.

As the argument in the exponential has a Gaussian distribution, the random variable S t (with t fixed) is known as Lognormal.

This is a process with continuous trajectories, which takes strictly positive values. The Geometric Brownian motion is often used as a model of asset price (see Samuelson [65]): this choice is justified on the one hand, by the positivity of S and on the other hand, by the simple Gaussian properties of its returns:

  • The returns log(S t ) − log(S s ) are Gaussian with mean \((\mu -\frac{1} {2}\sigma ^{2})(t - s)\) and variance σ 2(ts).

  • For all 0 < t 1 < t 2 . .  < t n , the relative increments \(\{\frac{S_{t_{i+1}}} {S_{t_{i}}};0 \leq i \leq n - 1\}\) are independent.

The assumption of Gaussian returns is not valid in practice but this model still serves as a proxy for more sophisticated models.

Naming μ the drift parameter may be surprising at first sight since it appears in the deterministic component as \((\mu -\frac{1} {2}\sigma ^{2})t\). Actually, a computation of expectation gives easily

$$\displaystyle{\mathbb{E}(S_{t}) = S_{0}e^{(\mu -\frac{1} {2} \sigma ^{2})t }\mathbb{E}(e^{\sigma W_{t} }) = S_{0}e^{(\mu -\frac{1} {2} \sigma ^{2})t }e^{\frac{1} {2} \sigma ^{2}t } = S_{0}e^{\mu t}.}$$

The above equality gives the interpretation to μ as a mean drift term: \(\mu = \frac{1} {t} \log [\mathbb{E}(S_{t})/S_{0}]\).

1.6.2 Ornstein–Uhlenbeck Process

Let us return to physics and to the Brownian motion by Einstein in 1905. In order to propose a more adequate modeling of the phenomenon of particles diffusion, we introduce the process of Ornstein–Uhlenbeck and its principal properties.

So far we have built the Brownian motion like a model for a microscopic particle in suspension in a liquid subjected to thermal agitation. An important criticism made with this modeling concerns the assumption that displacement increments are independent and they do not take into account the effects of the particle speed due to particle inertia.

Let us denote by m the particle mass and by \(\dot{X}(t)\) its speed. Owing to Newton’s second law, the momentum change \(m\dot{X}(t +\delta (t)) - m\dot{X}(t)\) is equal to the resistance \(-k\dot{X}(t)\delta t\) of the medium during time δ t, plus the momentum change due to molecular shocks, that we assume to be with stationary independent increments and thus associated with a Brownian motion. The process thus modeled is called sometimes the physical Brownian motion. The equation for the increments becomes

$$\displaystyle{ m\,\delta [\dot{X}(t)] = -k\dot{X}(t)\delta t + m\sigma \delta W_{t}. }$$

Trajectories of the Brownian motion being not differentiable, the equation has to be read in an integral form

$$\displaystyle{m\dot{X}(t) = m\dot{X}(0) -\int _{0}^{t}k\dot{X}(s)\mathrm{d}s + m\sigma W_{ t}.}$$

\(\dot{X}(t)\) is thus solution of the linear stochastic differential equation (known as Langevin equation)

$$\displaystyle{ V _{t} = v_{0} - a\int _{0}^{t}V _{ s}\mathrm{d}s +\sigma W_{t} }$$

where \(a = \frac{k} {m}\). If a = 0, we recover an arithmetic Brownian motion and to avoid this trivial reduction, we assume a ≠ 0 in the sequel. However, the existence of solution is not clear since W is not differentiable. To overcome this difficulty, set \(Z_{t} = V _{t} -\sigma W_{t}\): that leads to the new equation

$$\displaystyle{ Z_{t} = v_{0} - a\int _{0}^{t}(Z_{ s} +\sigma W_{s})\mathrm{d}s, }$$

which is now a linear ordinary differential equation that can be solved path by path. The variation of parameter method gives the representation of the unique solution of this equation like

$$\displaystyle{Z_{t} = v_{0}e^{-\mathit{at}} -\sigma \int _{ 0}^{t}\mathit{ae}^{-a(t-s)}W_{ s}\mathrm{d}s.}$$

The initial solution is thus

$$\displaystyle{ V _{t} = v_{0}e^{-\mathit{at}} +\sigma W_{ t} -\sigma \int _{0}^{t}\mathit{ae}^{-a(t-s)}W_{ s}\mathrm{d}s. }$$
(10)

Using stochastic calculus, we will derive later (see Sect. 3.3) another convenient representation of V as follows:

$$\displaystyle{ V _{t} = v_{0}e^{-\mathit{at}} +\sigma \int _{ 0}^{t}e^{-a(t-s)}\mathrm{d}W_{ s} }$$
(11)

using a stochastic integral not yet defined. From (10), assuming that v 0 is deterministic, we can show the following properties (see also Sect. 3.3).

  • For a given t, V t has a Gaussian distribution: indeed, as the limit of a Riemann sum, it is the a.s. limit of a Gaussian r.v., see footnote 5 page 111.

  • More generally, V is a Gaussian process.

  • Its mean is v 0 e at, its covariance function \(\mathbb{C}\mathrm{ov}(V _{t},V _{s}) = e^{-a(t-s)} \frac{\sigma ^{2}} {2a}(1 - e^{-2\mathit{as}})\) for t > s.

Observe that for a > 0, the Gaussian distribution of V t converges to \(\mathcal{N}(0, \frac{\sigma ^{2}} {2a})\) as t → +: it does not depend anymore on v 0 and illustrates the mean-reverting feature of this model, see Fig. 5.

Fig. 5
figure 5

Ornstein–Uhlenbeck paths with V 0 = 1, a = 2 and σ = 0. 1

1.6.3 Stochastic Differential Equations and Euler Approximations

The previous example gives the generic form of a Stochastic Differential Equation, that generalizes the usual Ordinary Differential Equations x t  = b(x t ) or in integral form \(x_{t} = x_{0} +\int _{ 0}^{t}b(x_{s})\mathrm{d}s\).

Definition 5.

Let \(b,\sigma: x \in \mathbb{R}\mapsto \mathbb{R}\) be two functions, respectively the drift and the diffusion coefficient. A Stochastic Differential Equation (SDE in short) with parameter (b, σ) and initial value x is a stochastic process (X t ) t ≥ 0 solution of

$$\displaystyle{X_{t} = x +\int _{ 0}^{t}b(X_{ s})\mathrm{d}s +\int _{ 0}^{t}\sigma (X_{ s})\mathrm{d}W_{s},\quad t \geq 0,}$$

where (W t ) t is a standard Brownian motion.

A slightly more general definition (not considered here) could include the case of time-dependent coefficients b(t, x) and σ(t, x), the subsequent analysis would be quite similar. In the definition above, we use a stochastic integral \(\int _{0}^{t}\ldots \mathrm{d}W_{s}\) which has not yet been defined: it will be explained in the next section. For the moment, the reader needs to know that in the simplest case where σ is constant, we simply have \(\int _{0}^{t}\sigma (X_{s})\mathrm{d}W_{s} =\sigma W_{t}\). The previous examples fit this setting:

  • The arithmetic Brownian motion corresponds to b(x) = b et σ(x) = σ.

  • The Ornstein–Uhlenbeck process corresponds to \(b(x) = -\mathit{ax}\) et σ(x) = σ.

Taking σ to be non constant allows for more general situations and more flexible models. Instead of discussing now the important issues of existence and uniqueness to such SDE, we rather consider natural approximations of them, namely the Euler scheme (which is the direct extension of Euler scheme for ODEs).

Definition 6.

Let (b, σ) be given drift and diffusion coefficients. The Euler scheme associated to the SDE with coefficients (b, σ), initial value x and time step h, is defined by

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} X_{0}^{h} = x, \quad \\ X_{t}^{h} = X_{\mathit{ih}}^{h} + b(X_{\mathit{ih}}^{h})(t -\mathit{ih}) +\sigma (X_{\mathit{ih}}^{h})(W_{t} - W_{\mathit{ih}}),\quad i \geq 0,t \in (\mathit{ih},(i + 1)h].\quad \end{array}\right. }$$
(12)

In other words, X h is a piecewise arithmetic Brownian motion with coefficients on the interval (ih, (i + 1)h] computed according to the functions (b, σ) evaluated at X ih h. In general, the law of X t h is not known analytically: at most, we can give explicit representations using an induction of the time-step. On the other hand, as it will be seen further, the random simulation of X h at time (ih) i ≥ 0 is easily performed by simulating the independent Brownian increments \((W_{(i+1)h} - W_{\mathit{ih}})\). The accuracy of the approximation of X by X h is expected to get improved as h goes to 0.

Complementary References. See [48, 57, 63].

2 Feynman–Kac Representations of PDE Solutions

Our purpose in this section is to make the connection between the expectations of functionals of Brownian motion and the solution of second order linear parabolic partial differential equations (PDE in short): this leads to the well-known Feynman–Kac representations. We extend this point of view to other simple processes introduced before.

2.1 The Heat Equations

2.1.1 Heat Equation in the Whole Space

Let us return to the law of x + W t , the Gaussian density of which is

$$\displaystyle{g(t,x,y):= g(t,y - x) = \frac{1} {\sqrt{2\pi t}}\exp (-(y - x)^{2}/2t),}$$

often called in this context the fundamental solution of the heat equation. One of the key properties is the property of convolution

$$\displaystyle{ g(t + s,x,y) =\int _{\mathbb{R}}g(t,x,z)g(s,z,y)\mathrm{d}z }$$
(13)

which says in an analytical language that \(x + W_{t+s}\) is the sum of the independent Gaussian variables x + W t and \(W_{t+s} - W_{t}\). A direct calculation on the density shows that the Gaussian density is solution to the heat equation w.r.t. the two variables x and y

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} g_{t}^{{\prime}}(t,x,y) = \frac{1} {2}g_{\mathit{yy}}^{{\prime\prime}}(t,x,y),\quad \\ g_{t}^{{\prime}}(t,x,y) = \frac{1} {2}g_{\mathit{xx}}^{{\prime\prime}}(t,x,y).\quad \end{array} \right. }$$
(14)

This property is extended to a large class of functions built from the Brownian motion.

Theorem 2 (Heat Equation with Cauchy Initial Boundary Condition).

Let f be a bounded Footnote 6 measurable function. Consider the function

$$\displaystyle{u(t,x,f) = \mathbb{E}[f(x + W_{t})] =\int _{\mathbb{R}}g(t,x,y)f(y)\mathrm{d}y:}$$

the function u is infinitely continuously differentiable in space and time for t > 0 and solves the heat equation

$$\displaystyle{ u_{t}^{{\prime}}(t,x,f) = \frac{1} {2}u_{\mathit{xx}}^{{\prime\prime}}(t,x,f),\quad u(0,x,f) = f(x). }$$
(15)

Equation (15) is the heat equation with initial boundary condition (Cauchy problem, see [22]).

Proof.

Standard Gaussian estimates allow to differentiate u w.r.t. t or x by differentiating under the integral sign: then, we have

$$\displaystyle{u_{t}^{{\prime}}(t,x,f) =\int _{ \mathbb{R}}g_{t}^{{\prime}}(t,x,y)f(y)\mathit{dy} =\int _{ \mathbb{R}} \frac{1} {2}g_{\mathit{xx}}^{{\prime\prime}}(t,x,y)f(y)\mathit{dy} = \frac{1} {2}u_{\mathit{xx}}^{{\prime\prime}}(t,x,f).}$$

 □ 

When the function considered is regular, another formulation can be given to this relation, which will play a significant role in the following.

Proposition 10.

If f is of class \(\mathcal{C}_{b}^{2}\) (bounded and twice continuously differentiable with bounded derivatives), Footnote 7 we have

$$\displaystyle{ u_{t}^{{\prime}}(t,x,f) = u(t,x, \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}), }$$

or equivalently using a probabilistic viewpoint

$$\displaystyle{ \mathbb{E}[f(x + W_{t})] = f(x) +\int _{ 0}^{t}\mathbb{E}\big[\frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(x + W_{ s})\big]\mathrm{d}s. }$$
(16)

Proof.

Write \(u(t,x,f) = \mathbb{E}[f(x + W_{t})] =\int _{\mathbb{R}}g(t,0,y)f(x + y)\mathrm{d}y =\int _{\mathbb{R}}g(t,x,z)f(z)\mathrm{d}z\) and differentiate under the integral sign: it gives

$$\displaystyle\begin{array}{rcl} u_{\mathit{xx}}^{{\prime\prime}}(t,x,f)& =& \int _{ \mathbb{R}}g(t,0,y)f_{\mathit{xx}}^{{\prime\prime}}(x + y)\mathrm{d}y = u(t,x,f_{\mathit{ xx}}^{{\prime\prime}}) =\int _{ \mathbb{R}}g_{\mathit{xx}}^{{\prime\prime}}(t,x,z)f(z)\mathrm{d}z, {}\\ u_{t}^{{\prime}}(t,x,f)& =& \int _{ \mathbb{R}}g_{t}^{{\prime}}(t,x,z)f(z)\mathrm{d}z = \frac{1} {2}\int _{\mathbb{R}}g_{\mathit{xx}}^{{\prime\prime}}(t,x,z)f(z)\mathrm{d}z = \frac{1} {2}u(t,x,f_{\mathit{xx}}^{{\prime\prime}}), {}\\ \end{array}$$

using at the first line two integration by parts and at the second line the heat equation satisfied by g. Then the probabilistic representation (16) easily follows:

$$\displaystyle\begin{array}{rcl} \mathbb{E}[f(x + W_{t})] - f(x)& =& u(t,x,f) - u(0,x,f) =\int _{ 0}^{t}u_{ t}^{{\prime}}(s,x,f)\mathrm{d}s {}\\ & =& \int _{0}^{t}u(s,x, \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}})\mathrm{d}s =\int _{ 0}^{t}\mathbb{E}\big[\frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(x + W_{ s})\big]\mathrm{d}s. {}\\ \end{array}$$

 □ 

2.1.2 Heat Equation in an Interval

We now extend the previous results in two directions: first, we allow the function f to also depend smoothly on time and second, the final time t is replaced by a stopping time U. The first extension is straightforward and we state it without proof.

Proposition 11.

Let f be a function of class \(\mathcal{C}_{b}^{1,2}\) (bounded, once continuously differentiable in time, twice in space, with bounded derivatives): we have

$$\displaystyle\begin{array}{rcl} \mathbb{E}[f(t,x + W_{t})]& =& f(0,x) +\int _{ 0}^{t}\mathbb{E}[f_{ t}^{{\prime}}(s,x + W_{ s}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(s,x + W_{ s})]\mathrm{d}s \\ & =& f(0,x) + \mathbb{E}\left [\int _{0}^{t}(f_{ t}^{{\prime}}(s,x + W_{ s}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(s,x + W_{ s}))\mathrm{d}s\right ].{}\end{array}$$
(17)

The second equality readily follows from Fubini’s theorem to invert \(\mathbb{E}\) and time integral: this second form is more suitable for an extension to stochastic times t.

Theorem 3.

Let f be a function of class \(\mathcal{C}_{b}^{1,2}\) , we have

$$\displaystyle{ \mathbb{E}[f(U,x + W_{U})] = f(0,x) + \mathbb{E}\left [\int _{0}^{U}(f_{ t}^{{\prime}}(s,x + W_{ s}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(s,x + W_{ s}))\mathrm{d}s\right ] }$$
(18)

for any bounded Footnote 8 stopping time U.

The above identity between expectations is far to be obvious to establish by hand since the law of U is quite general and an analytical computation is out of reach. This level of generality on U is quite interesting for applications: it provides a powerful tool to determine the distribution of hitting times, to show how often multidimensional Brownian motion visits a given point or a given set. Regarding this lecture, it gives a key tool to derive probabilistic representations of heat equation with Dirichlet boundary conditions.

Proof.

Let us start by giving alternatives of the relation (17). We observe that it could have been written with a random initial condition X 0, like for instance

$$\displaystyle\begin{array}{rcl} & & \mathbb{E}[\mathbf{1}_{A_{0}}f(t,X_{0} + W_{t})] {}\\ & & = \mathbb{E}\left [\mathbf{1}_{A_{0}}f(0,X_{0}) + \mathbf{1}_{A_{0}}\int _{0}^{t}(f_{ t}^{{\prime}}(s,X_{ 0} + W_{s}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(s,X_{ 0} + W_{s}))\mathrm{d}s\right ], {}\\ \end{array}$$

with W independent of X 0 and where the event A 0 depends on X 0. Similarly, using the time-shifted Brownian motion \(\{\bar{W}_{t}^{u} = W_{t+u} - W_{u};t \in \mathbb{R}^{+}\}\) that is independent of the initial condition x + W u (Proposition 2), it leads to

$$\displaystyle\begin{array}{rcl} & & \mathbb{E}[\mathbf{1}_{A_{u}}f(t + u,x + W_{u} +\bar{ W}_{t}^{u})] = \mathbb{E}\bigg[\mathbf{1}_{ A_{u}}f(u,x + W_{u}) + {}\\ & & \qquad \quad \qquad \mathbf{1}_{A_{u}}\int _{0}^{t}(f_{ t}^{{\prime}}(u + s,x + W_{ u} +\bar{ W}_{s}^{u}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(u + s,x + W_{ u} +\bar{ W}_{s}^{u}))\mathrm{d}s\bigg] {}\\ \end{array}$$

for any event A u depending only of the values {W s : s ≤ u}, or equivalently

$$\displaystyle\begin{array}{rcl} \mathbb{E}[\mathbf{1}_{A_{u}}f(t + u,x + W_{t+u})]& =& \mathbb{E}\bigg[\mathbf{1}_{A_{u}}f(u,x + W_{u}) {}\\ & & +\mathbf{1}_{A_{u}}\int _{u}^{t+u}(f_{ t}^{{\prime}}(s,x + W_{ s}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(s,x + W_{ s}))\mathrm{d}s\bigg].{}\\ \end{array}$$

Set \(M_{t} = f(t,x + W_{t}) - f(0,x) -\int _{0}^{t}(f_{t}^{{\prime}}(s,x + W_{s}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(s,x + W_{ s}))\mathrm{d}s\): our aim is to prove \(\mathbb{E}(M_{U}) = 0\). Observe that the preliminary computation has shown that

$$\displaystyle{ \mathbb{E}(\mathbf{1}_{A_{u}}(M_{t+u} - M_{u})) = 0 }$$
(19)

for t ≥ 0. In particular, taking A u  = Ω we obtain that the expectation \(\mathbb{E}(M_{t})\) is constantFootnote 9 w.r.t. t.

Now, consider first that U is a discrete stopping time valued in \(\{0 = u_{0} < u_{1} < \cdots < u_{n} = T\}\): then

$$\displaystyle{\mathbb{E}(M_{U}) =\sum _{ k=0}^{n-1}\mathbb{E}(M_{ U\wedge u_{k+1}}-M_{U\wedge u_{k}}) =\sum _{ k=0}^{n-1}\mathbb{E}(\mathbf{1}_{ U>u_{k}}(M_{u_{k+1}}-M_{u_{k}})) = 0}$$

by applying (19) since {U ≤ u k } does depend only of {W s : s ≤ u k } (by definition of a stopping time). □ 

Second, for a general stopping time (bounded by T), we take \(U_{n} = \frac{[\mathit{nU}]+1} {n}\) which is a stopping time converging to U: since \((M_{t})_{0\leq t\leq T}\) is bounded and continuous, the dominated convergence theorem gives \(0 = \mathbb{E}(M_{U_{n}})\mathop{\longrightarrow }\limits_{n \rightarrow \infty }\mathbb{E}(M_{U})\). □ 

As a consequence, we now make explicit the solutions of the heat equation in an interval and with initial condition: it is a partial generalizationFootnote 10 of Theorem 2, which characterized them in the whole space. The introduction of (non-homogeneous) boundary conditions of Dirichlet type is connected to the passage time of the Brownian motion.

Corollary 1 (Heat Equation with Cauchy–Dirichlet Boundary Condition).

Consider the PDE

$$\displaystyle{\left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x) = \frac{1} {2}u_{\mathit{xx}}^{{\prime\prime}}(t,x),\quad &\mbox{ for $t > 0$ and $x \in ]a,b[$,} \\ u(0,x) = f(0,x) \quad &\mbox{ for $t = 0$ and $x \in [a,b]$}, \\ u(t,x) = f(t,x) \quad &\mbox{ for $x = a$ or $b$, with $t \geq 0$}. \end{array} \right.}$$

If a solution u of class C b 1,2 ([0,T] × [a,b]) exists, then it is given by

$$\displaystyle{u(t,x) = \mathbb{E}[f(t - U,x + W_{U})]}$$

where \(U = T_{a} \wedge T_{b} \wedge t\) (using the previous notation for the first passage time T y at the level y for the Brownian motion starting at x, i.e. (x + W t ) t≥0 ).

Proof.

First, extend smoothly the function u outside the interval [a, b] in order to apply previous results. The way to extend is unimportant since u and its derivatives are only evaluated inside [a, b]. Clearly U is a bounded (by t) stopping time. Apply now the equality (18) to the function \((s,y)\mapsto u(t - s,y) = v(s,y)\) of class \(C_{b}^{1,2}([0,t] \times \mathbb{R})\), satisfying \(v_{s}^{{\prime}}(s,y) + \frac{1} {2}v_{\mathit{yy}}^{{\prime\prime}}(s,y) = 0\) for (s, y) ∈ [0, t] × [a, b]. We obtain

$$\displaystyle{\mathbb{E}[v(U,x+W_{U})] = v(0,x)+\mathbb{E}\left [\int _{0}^{U}(v_{ s}^{{\prime}}(s,x + W_{ s}) + \frac{1} {2}v_{\mathit{yy}}^{{\prime\prime}}(s,x + W_{ s}))\mathrm{d}s\right ] = v(0,x),}$$

since for s ≤ U, (s, x + W s ) ∈ [0, t] × [a, b]. To conclude, we easily check that v(0, x) = u(t, x) and \(v(U,x + W_{U}) = f(t - U,x + W_{U})\). □ 

2.1.3 A Probabilistic Algorithm to Solve the Heat Equation

To illustrate our purpose, we consider a toy example regarding the numerical evaluation of \(u(t,x) = \mathbb{E}(f(x + W_{t}))\) using random simulations, in order to discuss main ideas underlying to Monte Carlo methods. Actually, the arguments below apply also to \(u(t,x) = \mathbb{E}[f(t - U,x + W_{U})]\) with \(U = T_{a} \wedge T_{b} \wedge t\), although there are some extra significant issues in the simulation of (U, W U ).

For the notational simplicity, denote by X the random variable inside the expectation to compute, that is \(X = f(x + W_{t})\) in our toy example. As a difference with a PDE method (based on finite differences or finite elements), a standard Monte Carlo method provides an approximation of u(t, x) at a given point (t, x), without evaluating the values at other points. Actually, this fact holds because the PDE u is linear; in Sect. 5 related to non-linear PDEs, the situation is different.

The Monte Carlo method is able to provide a convergent, tractable approximation of u(t, x), with a priori error bounds, under two conditions.

  1. 1.

    An arbitrary large number of independent realizations of X can be generated (denote them by (X i ) i ≥ 1): in our toy example, this is straightforward since it requires only the simulation of W t which is distributed as a Gaussian r.v. \(\mathcal{N}(0,t)\) and then we have to compute \(X = f(x + W_{t})\). The independence of simulations is achieved by using a good generator of random numbers, like the excellent Mersenne Twister Footnote 11 generator.

  2. 2.

    Additionally, X which is already integrable (\(\mathbb{E}\vert X\vert < +\infty \)) is assumed to be square integrable: \(\mathbb{V}\mathrm{ar}(X) < +\infty \).

Then, by the law of large numbers, we have

$$\displaystyle{ \overline{X}_{M} = \frac{1} {M}\sum _{i=1}^{M}X_{ i}\mathop{\longrightarrow }\limits_{M \rightarrow +\infty }\mathbb{E}(X); }$$
(20)

hence the empirical mean of simulations of X provides a convergent approximation of the expectation \(\mathbb{E}(X)\). As a difference with PDE methods where some stability conditions may be required (like the Courant–Friedrichs–Lewy condition), the above Monte Carlo method does not require any extra condition to converge: it is unconditionally convergent. The extra moment condition is used to define a priori error bounds on the statistical error: the approximation error is controlled by means of the Central Limit Theorem

$$\displaystyle{\lim _{M\rightarrow +\infty }\mathbb{P}\left (\sqrt{ \frac{M} {\mathbb{V}\mathrm{ar}(X)}}\left (\overline{X}_{M} - \mathbb{E}(X)\right ) \in [a,b]\right ) = \mathbb{P}(G \in [a,b]),}$$

where G is a centered unit variance Gaussian r.v. Observe that the error bounds are stochastic: we can not do better than arguing that with probability \(\mathbb{P}(G \in [a,b])\), the unknown expectation (asymptotically as M → +) belongs to the interval

$$\displaystyle{\left [\overline{X}_{M} - b\sqrt{\frac{\mathbb{V}\mathrm{ar } (X)} {M}},\overline{X}_{M} - a\sqrt{\frac{\mathbb{V}\mathrm{ar } (X)} {M}} \right ].}$$

This is known as a confidence interval at level \(\mathbb{P}(G \in [a,b])\). The larger a and b, the larger the confidence interval, the higher the confidence probability.

To obtain a fully explicit confidence interval, one may replace \(\mathbb{V}\mathrm{ar}(X)\) by its estimator using the same simulations:

$$\displaystyle{\mathbb{V}\mathrm{ar}(X) = \mathbb{E}(X^{2}) - (\mathbb{E}(X))^{2} \approx \frac{M} {M - 1}\left ( \frac{1} {M}\sum _{i=1}^{M}X_{ i}^{2} -\overline{X}_{ M}^{2}\right ):=\sigma _{ M}^{2}.}$$

The factor \(M/(M - 1)\) plays the role of unbiasingFootnote 12 the value \(\mathbb{V}\mathrm{ar}(X)\), although it is not a big deal for M large (M ≥ 100). Anyway, we can prove that the above conditional intervals are asymptotically unchanged by taking the empirical variance σ M 2 instead of \(\mathbb{V}\mathrm{ar}(X)\). Gathering these different results and seeking a symmetric confidence interval \(-a = b = 1.96\) and \(\mathbb{P}(G \in [a,b]) \approx 95\,\%\), we obtain the following: with probability 95 %, approximatively for M large enough, we have

$$\displaystyle{ \mathbb{E}(X) \in \left [\overline{X}_{M} - 1.96 \frac{\sigma _{M}} {\sqrt{M}},\overline{X}_{M} + 1.96 \frac{\sigma _{M}} {\sqrt{M}}\right ]. }$$
(21)

The symmetric confidence interval at level 99 % is given by \(-a = b = 2.58\). Since a Monte Carlo method provides random evaluations of \(\mathbb{E}(X)\), different program runs will give different results (as a difference with a deterministic method which systematically has the same output) which seems uncomfortable: that is why it is important to produce a confidence interval. This is also very powerful and useful to have at hand a numerical method able to state that the error is at most of xxx with high probability.

The confidence interval depends on

  • The confidence level \(\mathbb{P}(G \in [a,b])\), chosen by the user.

  • The number of simulations: improving the accuracy by a factor 2 requires 4 times more simulations.

  • The variance \(\mathbb{V}\mathrm{ar}(X)\) or its estimator σ M 2, which depends on the problem to handle (and not much on M as soon as M is large). This variance can be very different from one problem to another: on Fig. 6, see the width of confidence intervals for two similar computations. There exist variance reduction techniques able to significantly reduce this factor in order to provide thinner confidence intervals while maintaining the same computational cost.

    Fig. 6
    figure 6

    Monte Carlo computations of \(\mathbb{E}(e^{G/10}) = e^{\frac{1} {2} \frac{1} {10^{2}} } \approx 1.005\) on the left and \(\mathbb{E}(e^{2G}) = e^{\frac{1} {2} 2^{2} } \approx 7.389\) on the right, where G is a Gaussian r.v. with zero mean and unit variance. The empirical mean and the symmetric 95 %-confidence intervals are plotted w.r.t. the number of simulations

Another advantage of such a Monte Carlo algorithm is the simplicity of code, consisting of one loop on the number of simulations; within this loop, the empirical variance should be simultaneously computed. However, the simulation procedure of X can be delicate in some situations, see Sect. 4.

At last, we focus our discussion on the impact of the dimension of the underlying PDE, which has been equal to 1 so far. Consider now a state variable in \(\mathbb{R}^{d}\) (d ≥ 1) and a heat equation with Cauchy initial condition in dimension d; (15) becomes

$$\displaystyle{ u_{t}^{{\prime}}(t,x,f) = \frac{1} {2}\varDelta u(t,x,f),\quad u(0,x,f) = f(x),\quad t > 0,x \in \mathbb{R}^{d}, }$$
(22)

where \(\varDelta =\sum _{ i=1}^{d}\partial _{x_{i}x_{i}}^{2}\) stands for the Laplacian in \(\mathbb{R}^{d}\). Using similar arguments as in dimension 1, we check that

$$\displaystyle{u(t,x,f) =\int _{\mathbb{R}^{d}} \frac{1} {(2\pi t)^{d/2}}\exp (-\vert y - x\vert ^{2}/2t)f(y)\mathrm{d}y = \mathbb{E}[f(x + W_{ t})]}$$

where \(W = \left (\begin{array}{c} W_{1}\\ \vdots \\ W_{d} \end{array} \right )\) is a d-dimensional Brownian motion, i.e. each W i is a one-dimensional Brownian motion and the d components are independent (Fig. 7).

Fig. 7
figure 7

Brownian motion in dimension 2 and 3

  • The Monte Carlo computation of u(t, x) is then achieved using independent simulations of \(X = f(x + W_{t})\): the accuracy is then of order \(1/\sqrt{N}\) and the computational effort is N × d. Thus, the dimension has a very low effect on the complexity of the algorithm.

  • As a comparison with a PDE discretization scheme, to achieve an accuracy of order 1∕N, we essentiallyFootnote 13 need N points in each spatial direction and it follows that the resulting linear system to invert is of size N d: thus, without going into full details, it is clear that the computational cost to achieve a given accuracy depends much on the dimension d. And the situation becomes less and less favourable as the dimension increases. Also, the memory required to run a PDE algorithm increases exponentially with the dimension, as a difference with a Monte Carlo approach.

It is commonly admitted that a PDE approach is more suitable and efficient in dimension 1 and 2, whereas a Monte Carlo procedure is more adapted for higher dimensions. On the other hand, a PDE-based method computes a global approximation of u (at any point (t, x)), while a Monte Carlo scheme gives a pointwise approximation only. The probabilistic approach can be directly used for Parallel Computing, each processor being in charge of a bunch of simulations at a given point (t, x).

2.2 PDE Associated to Other Processes

We extend the Feynman–Kac representation for the Brownian motion to the Arithmetic Brownian Motion and the Ornstein–Uhlenbeck process.

2.2.1 Arithmetic Brownian Motion

First consider the Arithmetic Brownian motion defined by \(\{X_{t}^{x} = x + \mathit{bt} +\sigma W_{t},t \geq 0\}\). The distribution of X t is Gaussian with mean x +bt and variance σ 2 t: we assume in the following that σ ≠ 0 which ensures that its density exists and is given by

$$\displaystyle\begin{array}{rcl} g_{b,\sigma ^{2}}(t,x,y)& =& \frac{1} {\sqrt{2\pi \sigma ^{2 } t}}\exp -\frac{(y - x -\mathit{bt})^{2}} {2\sigma ^{2}t} = g(\sigma ^{2}t,x + bt,y) {}\\ & =& g(\sigma ^{2}t,x,y - bt). {}\\ \end{array}$$

Denote by \(L_{b,\sigma ^{2}}^{\mathtt{ABM}}\) the second order operator

$$\displaystyle{ L_{b,\sigma ^{2}}^{\mathtt{ABM}} = \frac{1} {2}\sigma ^{2}\partial _{\mathit{ xx}}^{2}\ + b\partial _{ x}\, }$$
(23)

also called infinitesimal generator Footnote 14 of X. A direct computation using the heat equation for g(t, x, y) gives

$$\displaystyle{ \partial _{t}g_{b,\sigma ^{2}}(t,x,y) = \frac{1} {2}\sigma ^{2}g_{\mathit{\mathit{ xx}}}^{{\prime\prime}}(\sigma ^{2}t,x + \mathit{bt},y) + \mathit{bg}_{ x}^{{\prime}}(\sigma ^{2}t,x + \mathit{bt},y) = L_{ b,\sigma ^{2}}^{\mathtt{ABM}}g_{ b,\sigma ^{2}}(t,x,y). }$$

Hence, multiplying by f(y) and integrating over \(y \in \mathbb{R}\), we obtain the following representation that generalizes Theorem 2.

Theorem 4.

Let f be a bounded measurable function. The function

$$\displaystyle{ u_{b,\sigma ^{2}}(t,x,f) = \mathbb{E}[f(X_{t}^{x})] =\int _{ \mathbb{R}}g_{b,\sigma ^{2}}(t,x,y)f(y)\mathrm{d}y }$$
(24)

solves

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x,f) = L_{b,\sigma ^{2}}^{\mathtt{ABM}}u(t,x,f) = \frac{1} {2}\sigma ^{2}u_{\mathit{ xx}}^{{\prime\prime}}(t,x,f) + bu_{ x}^{{\prime}}(t,x,f),\quad \\ u(0,x,f) = f(x). \quad \end{array} \right. }$$
(25)

The extension of Propositions 10 and 11 follows the arguments used for the BM case.

Proposition 12.

If \(f \in \mathcal{C}_{b}^{1,2}\) and U is a bounded stopping time (including deterministic time), then

$$\displaystyle{\mathbb{E}[f(U,X_{U}^{x})] = f(0,x) + \mathbb{E}\big[\int _{ 0}^{U}[L_{ b,\sigma ^{2}}^{\mathtt{ABM}}f(s,X_{ s}^{x}) + f_{ t}^{{\prime}}(s,X_{ s}^{x})]\mathrm{d}s\big].}$$

Theorem 4 gives the Feynman–Kac representation of the Cauchy problem written w.r.t. the second order operator \(L_{b,\sigma ^{2}}^{\mathtt{ABM}}\). When Dirichlet boundary conditions are added, Corollary 1 extends as follows, using Proposition 12.

Corollary 2.

Assume the existence of a solution u of class C b 1,2 ([0,T] × [a,b]) to the PDE

$$\displaystyle{\left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x,f) = L_{b,\sigma ^{2}}^{\mathtt{ABM}}u(t,x,f),\quad &\mbox{ for $t > 0$ and $x \in ]a,b[$,} \\ u(0,x,f) = f(0,x) \quad &\mbox{ for $t = 0$ and $x \in [a,b]$}, \\ u(t,x,f) = f(t,x) \quad &\mbox{ for $x = a$ or $b$, with $t \geq 0$}. \end{array} \right.}$$

Then it is given by

$$\displaystyle{u(t,x) = \mathbb{E}[f(t - U^{x},X_{ U^{x}}^{x})]}$$

where \(U^{x} =\inf \{ s > 0: X_{s}^{x}\notin ]a,b[\}\wedge t\) is the first exit time from the interval ]a,b[ by the process X x before t.

As for the standard heat equation, this representation naturally leads to a probabilistic algorithm to compute the PDE solution, by empirical mean of independent simulation of \(f(t - U^{x},X_{U^{x}}^{x})\).

2.2.2 Ornstein–Uhlenbeck Process

Now consider the process solution to \(V _{t}^{x} = x - a\int _{0}^{t}V _{s}^{x}\mathrm{d}s +\sigma W_{t}\): we emphasize in our notation the dependence w.r.t. the initial value V 0 = x. We define an appropriate second order operator

$$\displaystyle{L_{a,\sigma ^{2}}^{\mathtt{OU}}g(t,x) = \frac{1} {2}\sigma ^{2}g_{\mathit{ xx}}^{{\prime\prime}}(t,x) -\mathit{ax}g_{ x}^{{\prime}}(t,x)}$$

which plays the role of the infinitesimal generator for the Ornstein–Uhlenbeck process. We recall that the Gaussian distribution of V t x has mean xe at and variance \(\frac{\sigma ^{2}} {2a}(1 - e^{-2\mathit{at}})\), the density of which at y (assuming σ ≠ 0 for the existence) is

$$\displaystyle{p(t,x,y) = g(v_{t},\mathit{xe}^{-\mathit{at}},y).}$$

Using the heat equation satisfied by g, we easily derive that

$$\displaystyle{ p_{t}^{{\prime}}(t,x,y) = \frac{1} {2}\sigma ^{2}p_{\mathit{ xx}}^{{\prime\prime}}(t,x,y) -\mathit{ax}p_{ x}^{{\prime}}(t,x,y) = L_{ a,\sigma ^{2}}^{\mathtt{OU}}p(t,x,y), }$$

from which we deduce the PDE satisfied by \(u(t,x,f) = \mathbb{E}[f(V _{t}^{x})]\). Incorporating Dirichlet boundary conditions is similar to the previous cases. We state the related results without extra details.

Theorem 5.

Let f be a bounded measurable function. The function

$$\displaystyle{ u(t,x,f) = \mathbb{E}[f(V _{t}^{x})] =\int _{ \mathbb{R}}p(t,x,y)f(y)\mathrm{d}y }$$

solves

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x,f) = L_{a,\sigma ^{2}}^{\mathtt{OU}}u(t,x,f),\quad \\ u(0,x,f) = f(x). \quad \end{array} \right. }$$

Proposition 13.

If \(f \in \mathcal{C}_{b}^{1,2}\) and U is a bounded stopping time, then

$$\displaystyle{\mathbb{E}[f(U,V _{U}^{x})] = f(0,x) + \mathbb{E}\big[\int _{ 0}^{U}[L_{ a,\sigma ^{2}}^{\mathtt{OU}}f(s,V _{ s}^{x}) + f_{ t}^{{\prime}}(s,V _{ s}^{x})]\mathrm{d}s\big].}$$

Corollary 3.

Assume the existence of a solution u of class C b 1,2 ([0,T] × [a,b]) to the PDE

$$\displaystyle{\left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x,f) = L_{a,\sigma ^{2}}^{\mathtt{OU}}u(t,x,f),\quad &\mbox{ for $t > 0$ and $x \in ]a,b[$,} \\ u(0,x,f) = f(0,x) \quad &\mbox{ for $t = 0$ and $x \in [a,b]$}, \\ u(t,x,f) = f(t,x) \quad &\mbox{ for $x = a$ or $b$, with $t \geq 0$}. \end{array} \right.}$$

Then u is given by

$$\displaystyle{u(t,x) = \mathbb{E}[f(t - U^{x},V _{ U^{x}}^{x})]}$$

where \(U^{x} =\inf \{ s > 0: V _{s}^{x}\notin ]a,b[\}\wedge t\) .

2.2.3 A Natural Conjecture for Stochastic Differential Equations

The previous examples serve as a preparation for more general results, relating the dynamics of a process and its Feynman–Kac representation. Denote X x the solution (whenever it exists) to the Stochastic Differential Equation

$$\displaystyle{X_{t}^{x} = x +\int _{ 0}^{t}b(X_{ s}^{x})\mathrm{d}s +\int _{ 0}^{t}\sigma (X_{ s}^{x})\mathrm{d}W_{ s},\quad t \geq 0.}$$

In view of the results in simpler models, we announce the following facts.

  1. 1.

    Set \(L_{b,\sigma ^{2}}^{X}g = \frac{1} {2}\sigma ^{2}(x)g_{\mathit{ xx}}^{{\prime\prime}} + b(x)g_{ x}^{{\prime}}\).

  2. 2.

    \(u(t,x) = \mathbb{E}(f(X_{t}^{x}))\) solves

    $$\displaystyle{ u_{t}^{{\prime}}(t,x) = L_{ b,\sigma ^{2}}^{X}u(t,x),\quad u(0,x) = f(x). }$$
  3. 3.

    If \(f \in \mathcal{C}_{b}^{1,2}\) and U is a bounded stopping time, then

    $$\displaystyle{\mathbb{E}[f(U,X_{U}^{x})] = f(0,x) + \mathbb{E}\big[\int _{ 0}^{U}[L_{ b,\sigma ^{2}}^{X}f(s,X_{ s}^{x}) + f_{ t}^{{\prime}}(s,X_{ s}^{x})]\mathrm{d}s\big].}$$
  4. 4.

    If u of class C b 1, 2([0, T] × [a, b]) solves the PDE

    $$\displaystyle{\left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x) = L_{b,\sigma ^{2}}^{X}u(t,x),\quad &\mbox{ for $t > 0$ and $x \in ]a,b[$,} \\ u(0,x) = f(0,x) \quad &\mbox{ for $t = 0$ and $x \in [a,b]$}, \\ u(t,x) = f(t,x) \quad &\mbox{ for $x = a$ or $b$, with $t \geq 0$},\end{array} \right.}$$

    then it is given by \(u(t,x) = \mathbb{E}[f(t - U^{x},X_{U^{x}}^{x})]\) where \(U^{x} =\inf \{ s > 0: X_{s}^{x}\notin ]a,b[\}\wedge t\).

The above result could be extended to PDE with a space variable in \(\mathbb{R}^{d}\) (d ≥ 1) by considering a \(\mathbb{R}^{d}\)-valued SDE: it would be achieved by replacing W by a d-dimensional standard Brownian motion, by having a drift coefficient \(b: \mathbb{R}^{d}\mapsto \mathbb{R}^{d}\) and a diffusion coefficient \(\sigma: \mathbb{R}^{d}\mapsto \mathbb{R}^{d} \otimes \mathbb{R}^{d}\), a reward function \(f: [0,T] \times \mathbb{R}^{d}\mapsto \mathbb{R}\), by replacing the interval [a, b] by a domain D in \(\mathbb{R}^{d}\) and defining U x as the first exit time by X x from that domain. Then the operator L would be a linear parabolic second order operator of the form

$$\displaystyle{L_{b,\sigma \sigma ^{\top }}^{X}g = \frac{1} {2}\sum _{i,j=1}^{d}[\sigma \sigma ^{\top }]_{ i,j}(x)\partial _{x_{i}x_{j}}^{2}g +\sum _{ i=1}^{d}b_{ i}(x)\partial _{x_{i}}g,}$$

where ⊤ denotes the transpose. We could also add a zero-order term in \(L_{b,\sigma \sigma ^{\top }}^{X}\), by considering a discounting factor for f; we do not develop further this extension.

The next section provides stochastic calculus tools, that allow to show the validity of these Feynman–Kac type results, under some appropriate smoothness and growth assumptions on b, σ, f. To allow non smooth f or Dirichlet boundary conditions, we may additionally assume a non-degeneracy condition on \(L_{b,\sigma \sigma ^{\top }}^{X}\) (like ellipticity condition \(\vert \sigma \sigma ^{\top }(x)\vert \geq \frac{1} {c}\) for some c > 0).

Complementary References. See [1, 15, 20, 22, 23, 48].

3 The Itô Formula

One achievement of Itô’s formula is to go from an infinitesimal time-decomposition in expectation like

$$\displaystyle{ \mathbb{E}[f(t,x + W_{t})] - f(0,x) =\int _{ 0}^{t}\mathbb{E}[f_{ t}^{{\prime}}(s,x + W_{ s}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(s,x + W_{ s})]\mathrm{d}s }$$

(see (17)) to a pathwise infinitesimal time-decomposition of

$$\displaystyle{f(t,x + W_{t}) - f(0,x).}$$

Since Brownian motion paths are not differentiable, it is hopeless to apply standard differential calculus based on usual first order Taylor formula. Instead of this, we go up to the second order, taking advantage of the fact that W has a finite quadratic variation. The approach presented below is taken from the nice paper Calcul d’Itô sans probabilité by Föllmer [19], which does not lead to the most general and deepest approach but it has the advantage of light technicalities and straightforward arguments compared to the usual tough arguments using L 2-spaces and isometry (see for instance [48] or [63] among others).

3.1 Quadratic Variation

3.1.1 Notations and Definitions

Brownian increments in a small interval [t, t + h] are centered Gaussian r.v. with variance h, which thus behave like \(\sqrt{h}\). The total variation does not exist, because the trajectories are not differentiable, but the quadratic variation has interesting properties.

To avoid convergence technicalities, we consider particular time subdivisions.

Definition 7 (Dyadic Subdivision of Order n).

Let n be an integer. The subdivision of \(\mathbb{R}^{+}\) defined by \(\mathbb{D}_{n} =\{ t_{0} < \cdots < t_{i} < \cdots \,\}\) where \(t_{i} = i2^{-n}\) is called the dyadic subdivision of order n. The subdivision step is \(\delta _{n} = 2^{-n}.\)

Definition 8 (Quadratic Variation).

The quadratic variation of a Brownian motion W associated with the dyadic subdivision of order n is defined, for t ≥ 0, by

$$\displaystyle{ V _{t}^{n} =\sum _{ t_{i}\leq t}(W_{t_{i+1}} - W_{t_{i}})^{2}. }$$
(26)

3.1.2 Convergence

Then there is the following remarkable result.

Proposition 14 (Pointwise Convergence).

With probability 1, we have

$$\displaystyle{\lim _{n\rightarrow \infty }V _{t}^{n} = t}$$

for any \(t \in \mathbb{R}^{+}\) .

Had W been differentiable, the limit of V n would be equal to 0.

Proof.

First let us show the a.s. convergence for a fixed time t, and denote by n(t) the index of the dyadic subdivision of order n such that \(t_{n(t)} \leq t < t_{n(t)+1}.\) Then observe that \(V _{t}^{n} - t =\sum _{ j=0}^{n(t)}Z_{j} + (t_{n(t)+1} - t)\) where \(Z_{j} = (W_{t_{j+1}} - W_{t_{j}})^{2} - (t_{j+1} - t_{j})\). The term \(t_{n(t)+1} - t\) converges to 0 as the subdivision step shrinks to 0. The random variables Z j are independent, centered, square integrable (since the Gaussian law of \(W_{t_{j+1}} - W_{t_{j}}\) has finite fourth moments): additionally, the scaling property of Proposition 1 ensures that \(\mathbb{E}(Z_{j}^{2}) = C_{2}(t_{j+1} - t_{j})^{2}\) for a positive constant C 2. Thus

$$\displaystyle{\mathbb{E}\left (\sum _{j=0}^{n(t)}Z_{ j}\right )^{2} =\sum _{ j=0}^{n(t)}\mathbb{E}\left (Z_{ j}^{2}\right ) =\sum _{ j=0}^{n(t)}C_{ 2}(t_{j+1} - t_{j})^{2} \leq C_{ 2}(T + 1)\delta _{n}.}$$

This proves the L 2-convergence of \(\sum _{j=0}^{n(t)}Z_{j}\) towards 0.

Moreover we obtain \(\sum _{n\geq 1}\mathbb{E}\left (\sum _{j=0}^{n(t)}Z_{j}\right )^{2} < \infty \), i.e. the random series \(\sum _{n\geq 1}\left (\sum _{j=0}^{n(t)}Z_{j}\right )^{2}\) has a finite expectation, whence a.s. finite and consequently its general term converges a.s. to 0. This shows that for any fixed t, V t n → t except on a negligible set N t .

We now extend the result to any time: first the set \(N = \cup _{t\in \mathbb{Q}^{+}}N_{t}\) is still negligible because the union of negligible sets is taken on a countable family. For an arbitrary t, take two monotone sequences of rational numbers r p ↑ t and s p ↓ t as p → +. Since tV t n is increasing for fixed n, we deduce, for any ωN

$$\displaystyle{r_{p} =\lim _{n\rightarrow \infty }V _{r_{p}}^{n}(\omega ) \leq \liminf _{ n\rightarrow \infty }V _{t}^{n}(\omega ) \leq \limsup _{ n\rightarrow \infty }V _{t}^{n}(\omega ) \leq \lim _{ n\rightarrow \infty }V _{s_{p}}^{n}(\omega ) = s_{ p}.}$$

Passing to the limit in p gives the result. □ 

As a consequence, we obtain the formula giving the infinitesimal decomposition of W t 2.

Proposition 15 (A First Itô Formula).

Let W be a standard Brownian motion. With probability 1, we have for any t ≥ 0

$$\displaystyle{ W_{t}^{2} = 2\int _{ 0}^{t}W_{ s}\mathrm{d}W_{s} + t }$$
(27)

where the stochastic integral \(\int _{0}^{t}W_{s}\mathit{dW }_{s}\) is the a.s. limit of \(\sum _{t_{i}\leq t}W_{t_{i}}(W_{t_{i+1}} - W_{t_{i}})\) , along the dyadic subdivision.

For usual C 1-function f(t), we have \(f^{2}(t) - f^{2}(0) = 2\int _{0}^{t}f(s)\mathrm{d}f(s)\): the extra term t in (27) is intrinsically related to Brownian motion paths.

Proof.

Adopting once again the notation with n(t), we have

$$\displaystyle\begin{array}{rcl} W_{t}^{2}& =& W_{ t}^{2} - W_{ t_{n(t)+1}}^{2} +\sum _{ t_{i}\leq t}(W_{t_{i+1}}^{2} - W_{ t_{i}}^{2}) {}\\ & =& W_{t}^{2} - W_{ t_{n(t)+1}}^{2} +\sum _{ t_{i}\leq t}(W_{t_{i+1}} - W_{t_{i}})^{2} + 2\sum _{ t_{i}\leq t}W_{t_{i}}(W_{t_{i+1}} - W_{t_{i}}). {}\\ \end{array}$$

The first term at the r.h.s. tends towards 0 by continuity of the Brownian paths. The second term is equal to V t n and converges towards t. Consequently, the third term at the right-hand side must converge a.s. towards a term that we call stochastic integral and that we denote by \(2\int _{0}^{t}W_{s}\mathrm{d}W_{s}\). □ 

The random function V t n, as a function of t, is increasing and can be associated to the cumulative distribution function of the positive discrete measure

$$\displaystyle{\sum _{i\geq 0}(W_{t_{i+1}} - W_{t_{i}})^{2}\delta _{ t_{i}}(.) =\mu ^{n}(.)}$$

satisfying \(\mu _{n}(f) =\sum _{i\geq 0}f(t_{i})(W_{t_{i+1}} - W_{t_{i}})^{2}\).

The convergence of cumulative distribution function of μ n(. ) (Proposition 14) can then be extended to integrals of continuous functions (possibly random as well). It is the purpose of the following result which is of deterministic nature.

Proposition 16 (Convergence as a Positive Measure).

For any continuous function f, with probability 1 we have

$$\displaystyle{\lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f(t_{i})(W_{t_{i+1}} - W_{t_{i}})^{2} =\int _{ 0}^{t}f(s)\mathrm{d}s}$$

for any t ≥ 0.

The proof is standard: the result first holds for functions of the form \(f(s) = \mathbf{1}_{]r_{1},r_{2}]}(s)\), then for piecewise constant functions, at last for continuous functions by simple approximations.

3.2 The Itô Formula for Brownian Motion

Differential calculus extends to other functions than x → x 2. To the usual classical formula with functions that are smooth in time, a term should be added, due to the non-zero quadratic variation.

Theorem 6 (Itô Formula).

Let \(f \in \mathcal{C}^{1,2}(\mathbb{R}^{+} \times \mathbb{R}, \mathbb{R})\) . Then with probability 1, we have t ≥ 0

$$\displaystyle\begin{array}{rcl} f(t,x + W_{t})& =& f(0,x) +\int _{ 0}^{t}f_{ x}^{{\prime}}(s,x + W_{ s})\,\mathrm{d}W_{s} \\ & & +\int _{0}^{t}f_{ t}^{{\prime}}(s,x + W_{ s})\,\mathrm{d}s + \frac{1} {2}\int _{0}^{t}f_{\mathit{ xx}}^{{\prime\prime}}(s,x + W_{ s})\,\mathrm{d}s.{}\end{array}$$
(28)

The term \(\mathcal{I}_{t}(f) =\int _{ 0}^{t}f_{x}^{{\prime}}(s,x + W_{s})\mathrm{d}W_{s}\) is called the stochastic integral of f x (s,x + W s ) w.r.t. W and it is the a.s. limit of

$$\displaystyle{\mathcal{I}_{t}^{n}(f,W) =\sum _{ t_{i}\leq t}f_{x}^{{\prime}}(t_{ i},x + W_{t_{i}})(W_{t_{i+1}} - W_{t_{i}})}$$

taken along the dyadic subdivision of order n.

The reader should compare the equality (28) with (17) to see that, under the extra assumptions that f is bounded with bounded derivatives, we have proved that the stochastic integral \(\mathcal{I}_{t}(f)\) is centered:

$$\displaystyle{ \mathbb{E}(\int _{0}^{t}f_{ x}^{{\prime}}(s,x + W_{ s})\,\mathrm{d}W_{s}) = 0. }$$
(29)

This explains how we can expect to go from (28) to (17):

  1. 1.

    Apply Itô formula.

  2. 2.

    Take expectation.

  3. 3.

    Prove that the stochastic integral is centered.

This is an interesting alternative proof to the property satisfied by the Gaussian kernel, which is difficult to extend to more general (non Gaussian) process.

Proof.

As before, let us introduce the index n(t) such that t n(t) ≤ t < t n(t)+1; then we can write

$$\displaystyle\begin{array}{rcl} f(t&,& x + W_{t}) = f(0,x) + [f(t,x + W_{t}) - f(t_{n(t)+1},x + W_{t_{n(t)+1}})] {}\\ & & +\sum _{t_{i}\leq t}[f(t_{i+1},x + W_{t_{i+1}}) - f(t_{i},x + W_{t_{i+1}})] {}\\ & & +\sum _{t_{i}\leq t}[f(t_{i},x + W_{t_{i+1}}) - f(t_{i},x + W_{t_{i}})]. {}\\ \end{array}$$
  • The second term of the r.h.s. \([f(t,x + W_{t}) - f(t_{n(t)+1},x + W_{t_{n(t)+1}})]\) converges to 0 by continuity of f(t, x + W t ).

  • The third term is analyzed by means of the first order Taylor formula:

    $$\displaystyle{f(t_{i+1},x + W_{t_{i+1}}) - f(t_{i},x + W_{t_{i+1}}) = f_{t}^{{\prime}}(\tau _{ i},x + W_{t_{i+1}})(t_{i+1} - t_{i})}$$

    for τ i  ∈ ]t i , t i+1[. The uniform continuity of (W s )0 ≤ s ≤ t+1 ensures that \(\sup _{i}\vert f_{t}^{{\prime}}(\tau _{i},x + W_{t_{i+1}}) - f_{t}^{{\prime}}(t_{i},x + W_{t_{i}})\vert \rightarrow 0\): thus \(\lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f_{t}^{{\prime}}(\tau _{i},x + W_{t_{i+1}})(t_{i+1} - t_{i})\) equals to

    $$\displaystyle\begin{array}{rcl} \lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f_{t}^{{\prime}}(t_{ i},x + W_{t_{i}})(t_{i+1} - t_{i}) =\int _{ 0}^{t}f_{ t}^{{\prime}}(s,x + W_{ s})\mathrm{d}s.& & {}\\ \end{array}$$
  • A second order Taylor formula allows to write the fourth term: \(f(t_{i},x + W_{t_{i+1}}) - f(t_{i},x + W_{t_{i}})\) equals

    $$\displaystyle\begin{array}{rcl} f_{x}^{{\prime}}(t_{ i},x + W_{t_{i}})(W_{t_{i+1}} - W_{t_{i}}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(t_{ i},x +\xi _{i})(W_{t_{i+1}} - W_{t_{i}})^{2}& & {}\\ \end{array}$$

    where \(\xi _{i} \in (W_{t_{i}},W_{t_{i+1}})\). Similarly to before, \(\sup _{i}\vert f_{\mathit{xx}}^{{\prime\prime}}(t_{i},x +\xi _{i}) - f_{\mathit{xx}}^{{\prime\prime}}(t_{i},x + W_{t_{i}})\vert =\epsilon _{n} \rightarrow 0\) and it leads to

    $$\displaystyle\begin{array}{rcl} \Big\vert \sum _{t_{i}\leq t}(f_{\mathit{xx}}^{{\prime\prime}}(t_{ i},x +\xi _{i}) - f_{\mathit{xx}}^{{\prime\prime}}(t_{ i},x + W_{t_{i}}))(W_{t_{i+1}} - W_{t_{i}})^{2}\Big\vert \leq \epsilon _{ n}V _{t}^{n},& & {}\\ \lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f_{\mathit{xx}}^{{\prime\prime}}(t_{ i},x + W_{t_{i}})(W_{t_{i+1}} - W_{t_{i}})^{2} =\int _{ 0}^{t}f_{\mathit{ xx}}^{{\prime\prime}}(s,x + W_{ s})\mathrm{d}s,& & {}\\ \end{array}$$

    by applying Proposition 16.

Observe that in spite of the non-differentiability of W, \(\sum _{t_{i}\leq t}f_{x}^{{\prime}}(t_{ i},x + W_{t_{i}})(W_{t_{i+1}} - W_{t_{i}})\) is necessarily convergent as a difference of convergent terms. □ 

Interestingly, we obtain a representation of the random variable f(x + W t ) as a stochastic integral, in terms of the derivatives of solution u to the heat equation

$$\displaystyle{ u_{t}^{{\prime}}(t,x) = \frac{1} {2}u_{\mathit{xx}}^{{\prime\prime}}(t,x),\quad u(0,x) = f(x). }$$

Corollary 4.

Assume that \(u \in \mathcal{C}_{b}^{1,2}([0,T] \times \mathbb{R})\) . We have

$$\displaystyle{ f(x + W_{T}) = u(T,x) +\int _{ 0}^{T}u_{ x}^{{\prime}}(T - s,x + W_{ s})\mathrm{d}W_{s}. }$$
(30)

Proof.

Apply the Itô formula to \(v(t,x) = u(T - t,x)\) (which satisfies \(v_{t}^{{\prime}}(t,x) + \frac{1} {2}v_{\mathit{xx}}^{{\prime\prime}}(t,x) = 0\)) at time T. This gives \(f(x + W_{T}) = u(0,x + W_{T}) = u(T,x) +\int _{ 0}^{T}u_{x}^{{\prime}}(T - s,x + W_{s})\mathrm{d}W_{s}\). □ 

This representation formula leads to important remarks.

  • If the above stochastic integral has zero expectation (as for the examples presented before), taking the expectation shows that

    $$\displaystyle{u(T,x) = \mathbb{E}(f(x + W_{T})),}$$

    recovering the Feynman–Kac representation of Theorem 2.

  • Then, the above representation writes, setting \(\varPsi = f(x + W_{T})\),

    $$\displaystyle{\varPsi = \mathbb{E}(\varPsi ) +\int _{ 0}^{T}h_{ s}\mathrm{d}W_{s}.}$$

    Actually, a similar stochastic integral representation theorem holds in a larger generality on the form of Ψ, since any boundedFootnote 15 functional of (W t )0 ≤ t ≤ T can be represented as its expectation plus a stochastic integral: the process h is not tractable in general, whereas here it is explicitly related to the derivative of u along the Brownian path.

  • Assume \(u \in \mathcal{C}_{b}^{1,2}([0,T] \times \mathbb{R})\) imposes that \(f \in \mathcal{C}_{b}^{2}(\mathbb{R})\) which is too strong for many applications: however, the assumptions on u can be relaxed to handle bounded measurable function f, because the heat equation is immediately smoothing out the initial condition. The proof of this extension involves extra stochastic calculus technicalities that we do not develop.

3.3 Wiener Integral

In general, it is not possible to make explicit the law of the stochastic integral \(\int _{0}^{t}f_{x}^{{\prime}}(s,x + W_{s})\mathrm{d}W_{s}\), except in a situation where f x (s, x) = h(s) is independent of x and square integrable. In that case, \(\int _{0}^{t}h(s)\mathrm{d}W_{s}\) is distributed as a Gaussian r.v. The resulting stochastic integral is called Wiener integral. We sum up its important properties.

Proposition 17 (Wiener Integral and Integration by Parts).

Let \(f: [0,T]\mapsto \mathbb{R}\) be a continuously differentiable function, with bounded derivatives on [0,T].

  1. 1.

    With probability 1, for any t ∈ [0,T] we have

    $$\displaystyle{ \int _{0}^{t}f(s)\mathrm{d}W_{ s} = f(t)W_{t} -\int _{0}^{t}W_{ s}f^{{\prime}}(s)\mathrm{d}s. }$$
    (31)
  2. 2.

    The process \(\{\int _{0}^{t}f(s)\mathrm{d}W_{s};t \in [0,T]\}\) is a continuous Gaussian process, with zero mean and with a covariance function

    $$\displaystyle{ \mathbb{C}\mathrm{ov}(\int _{0}^{t}f(u)\mathrm{d}W_{ u},\int _{0}^{s}f(u)\mathrm{d}W_{ u}) =\int _{ 0}^{s\wedge t}f^{2}(u)\mathrm{d}u. }$$
    (32)
  3. 3.

    For another function g satisfying the same assumptions, we have

    $$\displaystyle{ \mathbb{C}\mathrm{ov}(\int _{0}^{t}f(u)\mathrm{d}W_{ u},\int _{0}^{s}g(u)\mathrm{d}W_{ u}) =\int _{ 0}^{s\wedge t}f(u)g(u)\mathrm{d}u. }$$
    (33)

Proof.

The first item is a direct application of Theorem 6 to the function (t, x)↦f(t)x.

For any coefficients (α i )1 ≤ i ≤ N and times (T i )1 ≤ i ≤ N , \(\sum _{i=1}^{N}\alpha _{i}\int _{0}^{T_{i}}f(u)\mathrm{d}W_{u}\) is a Gaussian r.v. since it can written as a limit of Gaussian r.v. of the form \(\sum _{j}\beta _{j}(W_{t_{j+1}} - W_{t_{j}})\): thus, \(\{\int _{0}^{t}f(s)\mathrm{d}W_{s};t \in [0,T]\}\) is a Gaussian process. Its continuity is obvious in view of (31). Its expectation is the limit of the expectation of \(\sum _{t_{i}\leq t}f(t_{i})[W_{t_{i+1}} - W_{t_{i}}]\), thus equal to 0. The covariance is the limit of the covariance

$$\displaystyle\begin{array}{rcl} & & \mathbb{C}\mathrm{ov}(\sum _{t_{i}\leq t}f(t_{i})[W_{t_{i+1}} - W_{t_{i}}],\sum _{t_{i}\leq s}f(t_{i})[W_{t_{i+1}} - W_{t_{i}}]) {}\\ & =& \sum _{t_{i}\leq t,t_{j}\leq s}f(t_{i})f(t_{j})\mathbb{C}\mathrm{ov}(W_{t_{i+1}} - W_{t_{i}},W_{t_{i+1}} - W_{t_{i}}) {}\\ & =& \sum _{t_{i}\leq t,t_{j}\leq s}f(t_{i})f(t_{j})\delta _{i,j}(t_{i+1} - t_{i})\mathop{\longrightarrow }\limits_{n \rightarrow +\infty }\int _{0}^{s\wedge t}f^{2}(u)\mathrm{d}u. {}\\ \end{array}$$

The second item is proved. The last item is proved similarly. □ 

As a consequence, going back to the Ornstein–Uhlenbeck process (Sect. 1.6.2), we can complete the proof of its representation (11) using a stochastic integral, starting from (10). For this apply the result below to the function \(f(s) = e^{-a(t-s)}\) (t fixed): it gives \(\int _{0}^{t}e^{-a(t-s)}\mathrm{d}W_{s} = W_{t} - a\int _{0}^{t}e^{-a(t-s)}W_{s}\mathrm{d}s\). It leads to

$$\displaystyle{ V _{t} = v_{0}e^{-\mathit{at}} +\sigma \int _{ 0}^{t}e^{-a(t-s)}\mathrm{d}W_{ s}. }$$
(34)

Then the Gaussian property from Proposition 17 gives that the variance of V t is equal to \(\sigma ^{2}\int _{0}^{t}e^{-2a(t-s)}\mathrm{d}s = \frac{\sigma ^{2}} {2a}(1 - e^{-2\mathit{at}})\).

3.4 Itô Formula for Other Processes

The reader should have noticed that the central property for the proof of Theorem 6 is that the Brownian motion has a finite quadratic variation. Thus, the Itô formula can directly be extended to processes X which enjoy the same property.

3.4.1 The One-Dimensional Case

In this paragraph, we first consider scalar processes. The multidimensional extension is made afterwards.

Definition 9 (Quadratic Variation of a Process).

A continuous process X has a finite quadratic variation if for any t ≥ 0, the limit

$$\displaystyle{ V _{t}^{n} =\sum _{ t_{i}\leq t}(X_{t_{i+1}} - X_{t_{i}})^{2} }$$
(35)

along the dyadic subdivision of order n, exists a.s. and is finite. We denote this limit by \(\langle X\rangle _{t}\) and it is usually called the bracket of X at time t.

If X = W is a Brownian motion, we have \(\langle X\rangle _{t} = t\). More generally, it is easy to check that \(\langle X\rangle\) is increasing and continuous. We associate to it a positive measure and this extends Proposition 16 to X.

Proposition 18.

For any continuous function f, with probability 1 for any t ≥ 0 we have

$$\displaystyle{\lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f(t_{i})(X_{t_{i+1}} - X_{t_{i}})^{2} =\int _{ 0}^{t}f(s)\mathrm{d}\langle X\rangle _{ s}.}$$

Theorem 6 becomes

Theorem 7 (Itô Formula for X).

Let \(f \in \mathcal{C}^{1,2}(\mathbb{R}^{+} \times \mathbb{R}, \mathbb{R})\) and X be with finite quadratic variation. With probability 1, for any t ≥ 0 we have

$$\displaystyle\begin{array}{rcl} f(t,X_{t})& =& f(0,X_{0}) +\int _{ 0}^{t}f_{ x}^{{\prime}}(s,X_{ s})\,\mathrm{d}X_{s} +\int _{ 0}^{t}f_{ t}^{{\prime}}(s,X_{ s})\,\mathrm{d}s \\ & & +\frac{1} {2}\int _{0}^{t}f_{\mathit{ xx}}^{{\prime\prime}}(s,X_{ s})\,\mathrm{d}\langle X\rangle _{s}, {}\end{array}$$
(36)

where \(\int _{0}^{t}f_{x}^{{\prime}}(s,X_{s})\mathrm{d}X_{s}\) is the stochastic integral of f x (s,X s ) w.r.t. X and it is the a.s. limit of \(\sum _{t_{i}\leq t}f_{x}^{{\prime}}(t_{i},X_{t_{i}})(X_{t_{i+1}} - X_{t_{i}})\) along dyadic subdivision of order n.

Often, the Itô formula is written formally under a differential form

$$\displaystyle{\mathrm{d}f(t,X_{t}) = f_{x}^{{\prime}}(t,X_{ t})\,\mathrm{d}X_{t} + f_{t}^{{\prime}}(t,X_{ t})\,\mathrm{d}t + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(t,X_{ t})\,\mathrm{d}\langle X\rangle _{t}.}$$

We now provide hand-made tools to compute the bracket of X in practice.

Proposition 19 (Computation of the Bracket).

Let A and M two continuous processes such that A has a finite variation Footnote 16 and M has a finite quadratic variation:

  1. 1.

    \(\langle A\rangle _{t} = 0\) .

  2. 2.

    If \(X_{t} = x + M_{t}\) , then \(\langle X\rangle _{t} =\langle M\rangle _{t}\) .

  3. 3.

    If X t = λM t , then \(\langle X\rangle _{t} =\lambda ^{2}\langle M\rangle _{t}\) .

  4. 4.

    If \(X_{t} = M_{t} + A_{t}\) , then \(\langle X\rangle _{t} =\langle M\rangle _{t}\) .

  5. 5.

    If X t = f(A t ,M t ) with f ∈ C 1 , then \(\langle X\rangle _{t} =\int _{ 0}^{t}[f_{m}^{{\prime}}(A_{s},M_{s})]^{2}\mathrm{d}\langle M\rangle _{s}\) .

The proof is easy and it uses deterministic arguments based on the definition of \(\langle X\rangle\), we skip it. Item (5) shows that the class of processes with finite quadratic variation is stable by smooth composition. The following examples are important.

Example 1 (Arithmetic Brownian Motion).

(\(X_{t} = x + bt +\sigma W_{t}\)): we have

$$\displaystyle{\langle X\rangle _{t} =\langle \sigma W\rangle _{t} =\sigma ^{2}\langle W\rangle _{ t} =\sigma ^{2}t.}$$

Itô’s formula becomes

$$\displaystyle\begin{array}{rcl} \mathrm{d}f(t,X_{t})& =& (f_{t}^{{\prime}}(t,X_{ t}) + f_{x}^{{\prime}}(t,X_{ t})b + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(t,X_{ t})\sigma ^{2})\mathrm{d}t + f_{ x}^{{\prime}}(t,X_{ t})\sigma \mathrm{d}W_{t} \\ &:=& (f_{t}^{{\prime}}(t,X_{ t}) + L_{b,\sigma ^{2}}^{\mathtt{ABM}}f(t,X_{ t}))\mathrm{d}t + f_{x}^{{\prime}}(t,X_{ t})\sigma \mathrm{d}W_{t}. {}\end{array}$$
(37)

An important example is associated to f(x) = exp(x):

$$\displaystyle{ \mathrm{d}[\exp (X_{t})] =\exp (X_{t})(b + \frac{1} {2}\sigma ^{2})\mathrm{d}t +\exp (X_{ t})\sigma \mathrm{d}W_{t}. }$$
(38)

Example 2 (Geometric Brownian Motion).

(\(S_{t} = S_{0}e^{(\mu -\frac{1} {2} \sigma ^{2})t+\sigma W_{ t}}\)): we have

$$\displaystyle{\langle S\rangle _{t} =\int _{ 0}^{t}\sigma ^{2}S_{ s}^{2}\mathrm{d}s.}$$

From (38), we obtain a linear equation for the dynamics of S,

$$\displaystyle{\mathrm{d}S_{t} = S_{t}\mu \mathrm{d}t + S_{t}\sigma \mathrm{d}W_{t}}$$

also written \(\frac{\mathrm{d}S_{t}} {S_{t}} =\mu \mathrm{d}t +\sigma \mathrm{d}W_{t}\) putting an emphasize of the financial interpretation as returns. The Itô formula writes

$$\displaystyle\begin{array}{rcl} \mathrm{d}f(t,S_{t})& =& (f_{t}^{{\prime}}(t,S_{ t}) + f_{x}^{{\prime}}(t,S_{ t})S_{t}\mu + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(t,S_{ t})S_{t}^{2}\sigma ^{2})\mathrm{d}t + f_{ x}^{{\prime}}(t,S_{ t})\sigma S_{t}\mathrm{d}W_{t} \\ &:=& (f_{t}^{{\prime}}(t,S_{ t}) + L_{\mu,\sigma ^{2}}^{\mathtt{GBM}}f(t,S_{ t}))\mathrm{d}t + f_{x}^{{\prime}}(t,S_{ t})\sigma S_{t}\mathrm{d}W_{t}. {}\end{array}$$
(39)

Example 3 (Ornstein–Uhlenbeck Process).

(\(V _{t} = v_{0} - a\int _{0}^{t}V _{s}\mathrm{d}s +\sigma W_{t}\)): we have

$$\displaystyle{\langle V \rangle _{t} =\sigma ^{2}t.}$$

The Itô formula follows

$$\displaystyle\begin{array}{rcl} \mathrm{d}f(t,V _{t})& =& (f_{t}^{{\prime}}(t,V _{ t}) - af_{x}^{{\prime}}(t,V _{ t})V _{t} + \frac{1} {2}\sigma ^{2}f_{\mathit{ xx}}^{{\prime\prime}}(t,V _{ t}))\mathrm{d}t + f_{x}^{{\prime}}(t,V _{ t})\sigma \mathrm{d}W_{t} \\ &:=& (f_{t}^{{\prime}}(t,V _{ t}) + L_{a,\sigma ^{2}}^{\mathtt{OU}}f(t,V _{ t}))\mathrm{d}t + f_{x}^{{\prime}}(t,V _{ t})\sigma \mathrm{d}W_{t}. {}\end{array}$$
(40)

Example 4 (Euler Scheme Defined in (12)).

(\(X_{t}^{h} = X_{\mathit{ih}}^{h} + b(X_{\mathit{ih}}^{h})(t - ih) +\sigma (X_{\mathit{ih}}^{h})(W_{t} - W_{\mathit{ih}})\) for i ≥ 0, t ∈ (ih, (i + 1)h]). Since X h is an arithmetic Brownian motion on each interval (ih, (i + 1)h], we easily obtain

$$\displaystyle{\langle X^{h}\rangle _{ t} =\int _{ 0}^{t}\sigma ^{2}(\varphi (s),X_{\varphi (s)}^{h})\mathrm{d}s}$$

where \(\varphi (t) = \mathit{ih}\) for t ∈ (ih, (i + 1)h]. The Itô formula writes

$$\displaystyle\begin{array}{rcl} \mathrm{d}f(t,X_{t}^{h})& =& (f_{ t}^{{\prime}}(t,X_{ t}^{h}) + b(X_{\varphi (t)}^{h})f_{ x}^{{\prime}}(t,X_{ t}^{h}) + \frac{1} {2}\sigma ^{2}(X_{\varphi (t)}^{h})f_{\mathit{ xx}}^{{\prime\prime}}(t,X_{ t}^{h}))\mathrm{d}t \\ & & +f_{x}^{{\prime}}(t,X_{ t}^{h})\sigma (X_{\varphi (t)}^{h})\mathrm{d}W_{ t}. {}\end{array}$$
(41)

3.4.2 The Multidimensional Case

We briefly expose the situation when \(X = (X_{1},\ldots,X_{d})\) takes values in \(\mathbb{R}^{d}\). The main novelty consists in considering the cross quadratic variation defined by the limit (assuming its existence, along dyadic subdivision) of

$$\displaystyle{ \langle X_{k},X_{l}\rangle _{t}^{n} =\sum _{ t_{i}\leq t}(X_{k,t_{i+1}} - X_{k,t_{i}})(X_{l,t_{i+1}} - X_{l,t_{i}})\mathop{\longrightarrow }\limits_{n \rightarrow +\infty }\langle X_{k},X_{l}\rangle _{t}. }$$
(42)

We list basic properties.

Properties 8

  1. 1.

    Symmetry: \(\langle X_{k},X_{l}\rangle _{t} =\langle X_{l},X_{k}\rangle _{t}\).

  2. 2.

    Usual bracket: \(\langle X_{k},X_{k}\rangle _{t} =\langle X_{k}\rangle _{t}\).

  3. 3.

    Polarization: \(\langle X_{k},X_{l}\rangle _{t} = \frac{1} {4}\left (\langle X_{k} + X_{l}\rangle _{t} -\langle X_{k} - X_{l}\rangle _{t}\right ).\)

  4. 4.

    \(\langle \cdot,\cdot \rangle _{t}\) is bilinear.

  5. 5.

    For any continuous function f, we have

    $$\displaystyle{\lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f(t_{i})(X_{k,t_{i+1}} - X_{k,t_{i}})(X_{l,t_{i+1}} - X_{l,t_{i}}) =\int _{ 0}^{t}f(s)\mathrm{d}\langle X_{ k},X_{l}\rangle _{s}.}$$
  6. 6.

    Let \(X_{1,t} = f(A_{1,t},M_{1,t})\) and X 2,t = g(A 2,t ,M 2,t ), where the variation (resp. quadratic variation) of A = (A 1 ,A 2 ) (resp. M = (M 1 ,M 2 )) is finite, and let f and g be two \(\mathcal{C}^{1}\) -functions: we have

    $$\displaystyle{\langle X_{1},X_{2}\rangle _{t} =\int _{ 0}^{t}f_{ m}^{{\prime}}(A_{ 1,s},M_{1,s})g_{m}^{{\prime}}(A_{ 2,s},M_{2,s})\mathrm{d}\langle M_{1},M_{2}\rangle _{s}.}$$

    In particular, \(\langle A_{1} + M_{1},A_{2} + M_{2}\rangle _{t} =\langle M_{1},M_{2}\rangle _{t}\) .

  7. 7.

    Let W 1 and W 2 be two independent Brownian motions: then

    $$\displaystyle{\langle W_{1},W_{2}\rangle _{t} = 0.}$$

Proof.

The statements (1)–(6) are easy to check from the definition or using previous arguments. The statement (7) is important and we give details: use the polarization identity

$$\displaystyle{\langle W_{1},W_{2}\rangle _{t} = \frac{1} {4}\left (\langle W_{1} + W_{2}\rangle _{t} -\langle W_{1} - W_{2}\rangle _{t}\right ).}$$

We observe that both \(\frac{1} {\sqrt{2}}(W_{1} + W_{2})\) and \(\frac{1} {\sqrt{2}}(W_{1} - W_{2})\) are Brownian motions, since each one is a continuous Gaussian process with the right covariance function. Thus, \(\langle \frac{1} {\sqrt{2}}(W_{1} + W_{2})\rangle _{t} =\langle \frac{1} {\sqrt{2}}(W_{1} - W_{2})\rangle _{t} = t\) and the result follows. □ 

The Itô formula naturally extends to this setting.

Theorem 9 (Multidimensional Itô Formula).

Let \(f \in \mathcal{C}^{1,2}(\mathbb{R}^{+} \times \mathbb{R}^{d}, \mathbb{R})\) and X be a continuous d-dimensional process with finite quadratic variation. Then, with probability 1, for any t ≥ 0 we have

$$\displaystyle\begin{array}{rcl} f(t,X_{t})& =& f(0,X_{0}) +\sum _{ k=1}^{d}\int _{ 0}^{t}f_{ x_{k}}^{{\prime}}(s,X_{ s})\,\mathrm{d}X_{k,s} {}\\ & & +\int _{0}^{t}f_{ t}^{{\prime}}(s,X_{ s})\,\mathrm{d}s + \frac{1} {2}\sum _{k,l=1}^{d}\int _{ 0}^{t}f_{ x_{k},x_{l}}^{{\prime\prime}}(s,X_{ s})\,\mathrm{d}\langle X_{k},X_{l}\rangle _{s} {}\\ \end{array}$$

where the sum of stochastic integrals are defined as before.

In particular, the integration by parts formula writes

$$\displaystyle\begin{array}{rcl} X_{1,t}X_{2,t} = X_{1,0}X_{2,0} +\int _{ 0}^{t}X_{ 1,s}\mathrm{d}X_{2,s} +\int _{ 0}^{t}X_{ 2,s}\mathrm{d}X_{1,s} +\langle X_{1},X_{2}\rangle _{t}.& & {}\\ \end{array}$$

For two independent Brownian motions, we recover the usual deterministic formula (because \(\langle W_{1},W_{2}\rangle _{t} = 0\)), but in general, formulas are different because of the quadratic variation.

3.5 More Properties on Stochastic Integrals

So far, we have defined some specific stochastic integrals, those appearing in deriving a Itô formula and which have the form

$$\displaystyle{ \int _{0}^{t}f_{ x}^{{\prime}}(s,X_{ s})\mathrm{d}X_{s} =\lim _{n\rightarrow +\infty }\sum _{t_{i}\leq t}f_{x}^{{\prime}}(t_{ i},X_{t_{i}})(X_{t_{i+1}} - X_{t_{i}}), }$$
(43)

the limit being taken along dyadic subdivision. Also, we have proved that if f has bounded derivatives and X = W is a Brownian motion, the above stochastic integral must have zero-expectation [see equality (29)]. Moreover, we also have established that in the case of deterministic integrand (Wiener integral), the second moment of the stochastic integral is explicit and given by

$$\displaystyle{\mathbb{E}(\int _{0}^{t}h_{ s}\mathrm{d}W_{s})^{2} =\int _{ 0}^{t}h_{ s}^{2}\mathrm{d}s.}$$

The aim of this paragraph is to provide extensions of the above properties on the two first moments to more general integrands, under some suitable boundedness or integrability conditions.

3.5.1 Heuristic Arguments

In view of the previous construction, there is a natural candidate to be the stochastic integral \(\int _{0}^{t}h_{s}\mathit{dW }_{s}\). When h is piecewise constant process (called simple process), that is \(h_{s} = h_{t_{i}}\) if s ∈ [t i , t i+1] for a given deterministic time grid (t i ) i , we set

$$\displaystyle{ \int _{0}^{t}h_{ s}\mathrm{d}W_{s} =\sum _{t_{i}\leq t}h_{t_{i}}(W_{t\wedge t_{i+1}} - W_{t_{i}}), }$$
(44)

Without extra assumptions on the stochasticity of h, it is not clear why its expectation equals 0. This property should come from the centered Brownian increments \(W_{t\wedge t_{i+1}} - W_{t_{i}}\) and their independence to \(h_{t_{i}}\) so that

$$\displaystyle{\mathbb{E}(\int _{0}^{t}h_{ s}\mathrm{d}W_{s}) =\sum _{t_{i}\leq t}\mathbb{E}(h_{t_{i}})\mathbb{E}(W_{t\wedge t_{i+1}} - W_{t_{i}}) = 0.}$$

To validate this computation, we shall assume that h t depends only the Brownian Motion W before t and it is integrable. To go to the second moment, assume additionally that h is square integrable: then

$$\displaystyle\begin{array}{rcl} & & \mathbb{E}\vert \int _{0}^{t}h_{ s}\mathrm{d}W_{s}\vert ^{2} \\ & =& 2\sum _{t_{i}<t_{j}\leq t}\mathbb{E}(h_{t_{i}}h_{t_{j}}(W_{t\wedge t_{i+1}} - W_{t_{i}}))\mathbb{E}(W_{t\wedge t_{j+1}} - W_{t_{j}}) +\sum _{t_{i}\leq t}\mathbb{E}(h_{t_{i}}^{2})\mathbb{E}\vert W_{ t\wedge t_{i+1}} - W_{t_{i}}\vert ^{2} \\ & =& \sum _{t_{i}\leq t}\mathbb{E}(h_{t_{i}}^{2})(t \wedge t_{ i+1} - t_{i}) = \mathbb{E}(\int _{0}^{t}h_{ s}^{2}\mathrm{d}s). {}\end{array}$$
(45)

This equality should be read as an isometry property (usually referred to as Itô isometry), on which we can rely an extension of the stochastic integral of simple process to more general process. At this point, we should need to enter into measurability considerations to describe what “h t depends only the Brownian Motion W before t” means at the most general level. It goes far beyond this introductory lecture: for the exposure of the general theory, see for instance [48] or [63].

For most of the examples considered in this lecture, we can restrict to very good integrands, in the sense that a integrand h is very good if

  1. 1.

    (h t ) t is continuous or piecewise continuous (as for simple processes).

  2. 2.

    For a given t, h t is a continuous functional of (W s : s ≤ t).

  3. 3.

    It is square integrable in time and ω: \(\mathbb{E}(\int _{0}^{t}h_{s}^{2}\mathrm{d}s) < +\infty \) for any t.

This setting ensures that we can define stochastic integrals for very good integrands as the L 2-limit of stochastic integrals for simple integrands: indeed, a Cauchy sequence (h n ) n in \(L_{2}(\mathrm{d}t \otimes \mathrm{d}\mathbb{P})\) gives a Cauchy sequence \((\int _{0}^{t}h_{n,s}\mathrm{d}W_{s})_{n}\) in \(L_{2}(\mathbb{P})\) due to the isometry (45).

3.5.2 General Results

We collect here all the stochastic integration results needed in this lecture.

Theorem 10.

Let h be a very good integrand. Then the stochastic integral \(\int _{0}^{t}h_{s}\mathrm{d}W_{s}\) is such that

  1. 1.

    It is the L 2 limit of \(\sum _{t_{i}\leq t}h_{t_{i}}(W_{t\wedge t_{i+1}} - W_{t_{i}})\) along time subdivision which time step goes to 0.

  2. 2.

    It is centered: \(\mathbb{E}(\int _{0}^{t}h_{s}\mathrm{d}W_{s}) = 0\) .

  3. 3.

    It is square integrable: \(\mathbb{E}\vert \int _{0}^{t}h_{s}\mathrm{d}W_{s}\vert ^{2} = \mathbb{E}(\int _{0}^{t}h_{s}^{2}\mathrm{d}s)\) .

  4. 4.

    For two very good integrands h 1 and h 2 , we have

    $$\displaystyle{\mathbb{E}\Big[(\int _{0}^{t}h_{ 1,s}\mathrm{d}W_{s})(\int _{0}^{t}h_{ 2,s}\mathrm{d}W_{s})\Big] = \mathbb{E}(\int _{0}^{t}h_{ 1,s}h_{2,s}\mathrm{d}s).}$$

Beyond the t-by-t construction, actually the full theory gives a construction for any t simultaneously, proving additionally time continuity property, general centering property (martingale property), tight L p -estimates on the value at time t and the extrema until time t (Burkholder–Davis–Gundy inequalities) and so on…For multidimensional W and h, the construction should be understood componentwise. Another fruitful extension is to allow t to be a bounded stopping time, similarly to the discussion we have made in the proof of Theorem 3.

Another interesting part in the theory is devoted to the existence and uniqueness of solution to Stochastic Differential Equations (also known as diffusion processes). The easiest setting is to assume globally Lipschitz coefficientsFootnote 17: it is similar to the ODE framework, and the proof is also based on the Picard fixed-point argument. We state the results without proof.

Theorem 11.

Let W be a d-dimensional standard Brownian motion.

Assume that the functions \(b: \mathbb{R}^{d}\mapsto \mathbb{R}^{d}\) and \(\sigma: \mathbb{R}^{d}\mapsto \mathbb{R}^{d} \otimes \mathbb{R}^{d}\) are globally Lipschitz. Then, for any initial condition \(x \in \mathbb{R}^{d}\) , there exists a unique Footnote 18 continuous solution \((X_{t}^{x})_{t\geq 0}\) valued in \(\mathbb{R}^{d}\) which satisfies

$$\displaystyle{ X_{t}^{x} = x +\int _{ 0}^{t}b(X_{ s}^{x})\mathrm{d}s +\int _{ 0}^{t}\sigma (X_{ s}^{x})\mathrm{d}W_{ s}, }$$
(46)

with \(\sup _{0\leq t\leq T}\mathbb{E}\vert X_{t}^{x}\vert ^{2} < +\infty \) for any given \(T \in \mathbb{R}^{+}\) .

The continuous process X x has a finite quadratic variation given by

$$\displaystyle{ \langle X_{k}^{x},X_{ l}^{x}\rangle _{ t} =\int _{ 0}^{t}[\sigma \sigma ^{\top }]_{ k,l}(X_{s}^{x})\mathrm{d}s,\quad 1 \leq k,l \leq d. }$$
(47)

Observe that this general result includes all the model considered before, such as Arithmetic and Geometric Brownian Motion, Ornstein–Uhlenbeck processes, here stated in a possibly multidimensional framework.

Complementary References. See [48, 63].

4 Monte Carlo Resolutions of Linear PDEs Related to SDEs

Probabilistic methods to solve PDEs have become very popular during the two last decades. They are usually not competitive compared to deterministic methods in low dimension, but for higher dimension they provide very good alternative schemes. In the sequel, we give a brief introduction to the topics, relying on the material presented in the previous sections. We start with linear parabolic PDEs, with Cauchy–Dirichlet boundary conditions. Next section is devoted to semi-linear PDEs.

4.1 Second Order Linear Parabolic PDEs with Cauchy Initial Condition

4.1.1 Feynman–Kac Formulas

We start with a verification Theorem generalizing Theorems 2, 4, 5 to the case of general SDEs. We incorporate a source term g.

Theorem 12.

Under the assumptions of Theorem 11, let X x be the solution (46) starting from \(x \in \mathbb{R}^{d}\) and set

$$\displaystyle{L_{b,\sigma \sigma ^{\top }}^{X} = \frac{1} {2}\sum _{i,j=1}^{d}[\sigma \sigma ^{\top }]_{ i,j}(x)\partial _{x_{i}x_{j}}^{2} +\sum _{ i=1}^{d}b_{ i}(x)\partial _{x_{i}}.}$$

Assume there is a solution \(u \in \mathcal{C}_{b}^{1,2}(\mathbb{R}^{+} \times \mathbb{R}^{d}, \mathbb{R})\) to the PDE

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x) = L_{b,\sigma \sigma ^{\top }}^{X}u(t,x) + g(x),\quad \\ u(0,x) = f(x) \quad \end{array} \right. }$$
(48)

for two given functions \(f,g: \mathbb{R}^{d} \rightarrow \mathbb{R}\) . Then u is given by

$$\displaystyle{ u(t,x) = \mathbb{E}[f(X_{t}^{x}) +\int _{ 0}^{t}g(X_{ s}^{x})\mathrm{d}s]. }$$
(49)

Proof.

Let t be fixed. We apply the general Itô formula (Theorem 9) to the process X x and to the function v: (s, y)↦u(ts, y): it gives

$$\displaystyle\begin{array}{rcl} \mathrm{d}v(s,X_{s}^{x})& =& \big[v_{ s}^{{\prime}}(s,X_{ s}^{x}) + L_{ b,\sigma \sigma ^{\top }}^{X}v(s,X_{ s}^{x})\big]\mathrm{d}s + Dv(s,X_{ s}^{x})\sigma (X_{ s}^{x})\mathrm{d}W_{ s}{}\end{array}$$
(50)
$$\displaystyle\begin{array}{rcl} & =& -g(X_{s}^{x})\mathrm{d}s + Dv(s,X_{ s}^{x})\sigma (X_{ s}^{x})\mathrm{d}W_{ s}{}\end{array}$$
(51)

where \(\mathit{Dv}:= (\partial _{x_{1}}v,\ldots,\partial _{x_{d}}v)\). Observe that the integrand h s  = Dv(s, X s x)σ(X s x) is very good, since v has bounded derivatives, σ has a linear growth, and X s has bounded second moments, locally uniformly in s: thus, the stochastic integral \(\int _{0}^{t}\mathit{Dv}(s,X_{s}^{x})\sigma (X_{s}^{x})\mathrm{d}W_{s}\) has zero expectation. Hence, applying the above decomposition between s = 0 and s = t and taking the expectation, it gives

$$\displaystyle{\mathbb{E}(f(X_{t}^{x})) = \mathbb{E}(v(t,X_{ t}^{x})) = v(0,x)-\mathbb{E}(\int _{ 0}^{t}g(X_{ s}^{x})\mathrm{d}s) = u(t,x)-\mathbb{E}(\int _{ 0}^{t}g(X_{ s}^{x})\mathrm{d}s).}$$

We are done. □ 

Smoothness assumptions on u are satisfied in f, g are smooth enough. If not, and if a uniform ellipticity condition is met on σ σ , the fundamental solution of the PDE is smoothing the data and the result can be extended. However, the derivatives blow up as time goes to 0, and more technicalities are necessary to justify the same stochastic calculus computations. The fundamental solution p(t, x, y) has a simple probabilistic interpretation: it is the density of X t x at y. Indeed, identify \(\mathbb{E}[f(X_{t}^{x}) +\int _{ 0}^{t}g(X_{s}^{x})\mathrm{d}s]\) with

$$\displaystyle{u(t,x) =\int _{\mathbb{R}^{d}}p(t,x,y)f(y)\mathrm{d}y +\int _{ 0}^{t}\int _{ \mathbb{R}^{d}}p(s,x,y)g(y)\mathit{dy}\,\mathrm{d}s.}$$

4.1.2 Monte Carlo Schemes

Since u(t, x) is represented as an expectation, it allows the use of a Monte Carlo method to numerically compute the solution. The difficulty is that in general, X can not be simulated perfectly accurately, only an approximation on a finite time-grid can be simply and efficiently produced. Namely we use the Euler scheme with time step \(h = t/N\):

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} X_{0}^{x,h} = x, \quad \\ X_{s}^{x,h} = X_{\mathit{ih}}^{x,h} + b(X_{\mathit{ih}}^{x,h})(s -\mathit{ih}) +\sigma (X_{\mathit{ih}}^{x,h})(W_{s} - W_{\mathit{ih}}),\quad i \geq 0,s \in (\mathit{ih},(i + 1)h].\quad \end{array}\right. }$$
(52)

Observe that to get X t x, h, we do not need to sample the continuous path of X x, h (as difficult as having a continuous path of a Brownian motion): in fact, we only need to compute X ih x, h iteratively for i = 0 to i = N. Each time iteration requires to sample d new independent Gaussian increments \(W_{k,(i+1)h} - W_{k,\mathit{ih}}\), centered with variance h: it is straightforward. The computational cost is essentially equal to C(d)N where the constant depends on the dimension (coming from d-dimensional vector and matrix computations).

As an approximation of the expectation of \(\mathcal{E}(f,g,X^{x}) = f(X_{t}^{x}) +\int _{ 0}^{t}g(X_{s}^{x})\mathrm{d}s\), we take the expectation

$$\displaystyle{ \mathcal{E}(f,g,X^{x,h}) = f(X_{\mathit{ Nh}}^{x,h}) +\sum _{ i=0}^{N-1}g(X_{\mathit{ ih}}^{x,h})h, }$$
(53)

a random variable of which we sample M independent copies, that are denoted by \(\{\mathcal{E}(f,g,X^{x,h,m}): 1 \leq m \leq M\}\). Then, the Monte Carlo approximation, based on this sample of M Euler schemes with time step h, is

$$\displaystyle\begin{array}{rcl} \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m}) = u(t,x)& +& \mathop{\underbrace{ \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m}) - \mathbb{E}(\mathcal{E}(f,g,X^{x,h}))}}\limits _{\text{statistical error }\mathrm{ Err}._{\mathrm{stat}.}(h,M)} \\ & & +\mathop{\underbrace{ \mathbb{E}(\mathcal{E}(f,g,X^{x,h})) - u(t,x)}}\limits _{\text{discretization error }\mathrm{Err}._{\mathrm{ disc}.}(h)}. {}\end{array}$$
(54)

The first error contribution is due to the sample of finite size: the larger M, the better the accuracy. As mentioned in Sect. 2.1.3, once renormalized by \(\sqrt{ M}\), this error is still random and its distribution is closed to the Gaussian distribution with zero mean and variance \(\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x,h}))\): the latter still depends on h but very little, since it is expected to be close to \(\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x}))\).

The second error contribution is related to the time discretization effect: the smaller the time h, the better the accuracy. In the sequel (Sect. 4.1.3), we theoretically estimate this error in terms of h, and proves that it is of order h (even equivalent to) under some reasonable and fairly general assumptions.

What Is the Optimal Tuning of h → 0 and M → +∞? An easy complexity analysis shows that the computational effort is \(\mathcal{C}_{e} = C(d)\mathit{Mh}^{-1}\). Observe that the rate does not depend on the dimension d, as a difference with a PDE method, but on the other hand, the solution is computed only at single point (t, x). The squared quadratic error is equal to

$$\displaystyle\begin{array}{rcl} [\mathrm{Err}_{2}(h,M)]^{2}&:=& \mathbb{E}\Big[ \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m}) - u(t,x)\Big]^{2} {}\\ & =& \frac{\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x,h}))} {M} +\Big [\mathbb{E}(\mathcal{E}(f,g,X^{x,h})) - u(t,x)\Big]^{2}. {}\\ \end{array}$$

Only the first factor \(\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x,h}))\) can be estimated with the same sample, for M large, and it depends little of h. Say that the second term is equivalent to (Ch)2 as h → 0, with C ≠ 0. Then, three asymptotic situations occur:

  1. 1.

    If M ≫ h −2, the statistical error becomes negligible and \(\frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m}) - u(t,x) \sim \mathit{Ch}\). The computational effort is \(\mathcal{C}_{e} \gg h^{-3}\) and thus \(\mathrm{Err}_{2}(h,M) \gg \mathcal{C}_{e}^{-1/3}\). Deriving a confidence interval as in Sect. 2.1.3 is meaningless, we face with the discretization error only.

  2. 2.

    If M ≪ h −2, the discretization error becomes negligible and the distribution of \(\sqrt{M}\left ( \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m}) - u(t,x)\right )\) converges to that a Gaussian r.v. centered with variance \(\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x}))\) (that can be asymptotically computed using the M- sample). Thus, we can derive confidence intervals: setting σ h, M 2 the empirical variance of \(\mathcal{E}(f,g,X^{x,h})\), with probability 95 % we have

    $$\displaystyle{u(t,x)\,\in \,\Big[ \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m})-1.96 \frac{\sigma _{h,M}} {\sqrt{M}}, \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m})+1.96 \frac{\sigma _{h,M}} {\sqrt{M}}\Big].}$$

    Regarding the computational effort, we have \(\mathcal{C}_{e} \gg M^{3/2}\) and thus \(\mathrm{Err}_{2}(h,M) \gg \mathcal{C}_{e}^{-1/3}\).

  3. 3.

    If M ∼ ch −2, both statistical and discretization errors have the same magnitude and one can still derive a asymptotic confidence interval, but it is no more centered (as in M ≪ h −2) and unfortunately, the bias is not easily estimated on the fly. The problem is that the bias is of same magnitude as the size of the confidence interval, thus it reduces the interest of having such a priori statistical error estimate. Here, \(\mathrm{Err}_{2}(h,M) = O(\mathcal{C}_{e}^{-1/3})\).

Summing up by considering the ability of having or not on-line error estimates and by optimizing the final accuracy w.r.t. the computational effort, the second case \(M = h^{-2+\varepsilon }\) (for a small \(\varepsilon > 0\)) may be the most attractive since it achieves (almost) the best accuracy w.r.t. the computational effort and gives a centered confidence interval (and therefore tractable and meaningful error bounds).

4.1.3 Convergence of the Euler Scheme

An important issue is to analyze the impact of time discretization of SDE. This dates back to the end of eighties, see [68] among others. The result below gives a mathematical justification of the use of the Euler scheme as an approximation for the distribution of the SDE.

Theorem 13.

Assume that b and σ are \(\mathcal{C}_{b}^{2}\) , let X x be the solution (46) starting from \(x \in \mathbb{R}^{d}\) and let X h,x be its Euler scheme defined in (52) . Assume that \(u(t,x) = \mathbb{E}[f(X_{t}^{x}) +\int _{ 0}^{t}g(X_{s}^{x})\mathrm{d}s]\) is a \(\mathcal{C}_{b}^{2,4}([0,T] \times \mathbb{R}^{d}, \mathbb{R})\) -function solution of the PDE of Theorem 12 . Then,

$$\displaystyle{\mathbb{E}\Big[f(X_{\mathit{Nh}}^{x,h}) +\sum _{ i=0}^{N-1}g(X_{\mathit{ ih}}^{x,h})h\Big] - \mathbb{E}\Big[f(X_{ t}^{x}) +\int _{ 0}^{t}g(X_{ s}^{x})\mathrm{d}s\Big] = O(h).}$$

Proof.

Denote by Err. disc. (h) the above discretization error. As in Theorem 12, we use the function v: (s, y)↦u(ts, y) (for a fixed t) and we apply the Itô formula to X h, x (Theorem 9): it gives (setting \(\mathit{Dv}:= (\partial _{x_{1}}v,\ldots,\partial _{x_{d}}v)\))

$$\displaystyle\begin{array}{rcl} \mathrm{d}v(s,X_{s}^{h,x})& =& \Big[v_{ s}^{{\prime}}(s,X_{ s}^{h,x}) + \frac{1} {2}\sum _{i,j=1}^{d}[\sigma \sigma ^{\top }]_{ i,j}(X_{\varphi (s)}^{h,x})\partial _{ x_{i}x_{j}}^{2}v(s,X_{ s}^{h,x}) {}\\ & & \qquad +\sum _{ i=1}^{d}b_{ i}(X_{\varphi (s)}^{h,x})\partial _{ x_{i}}v(s,X_{s}^{h,x})\Big]\mathrm{d}s + Dv(s,X_{ s}^{h,x})\sigma (X_{\varphi (s)}^{h,x})\mathrm{d}W_{ s}. {}\\ & =& \Big[\frac{1} {2}\sum _{i,j=1}^{d}\big([\sigma \sigma ^{\top }]_{ i,j}(X_{\varphi (s)}^{h,x}) - [\sigma \sigma ^{\top }]_{ i,j}(X_{s}^{h,x})\big)\partial _{ x_{i}x_{j}}^{2}v(s,X_{ s}^{h,x}) {}\\ & & \qquad +\sum _{ i=1}^{d}\big(b_{ i}(X_{\varphi (s)}^{h,x}) - b_{ i}(X_{s}^{h,x})\big)\partial _{ x_{i}}v(s,X_{s}^{h,x}) - g(X_{ s}^{h,x})\Big]\mathrm{d}s {}\\ & & +Dv(s,X_{s}^{h,x})\sigma (X_{\varphi (s)}^{h,x})\mathrm{d}W_{ s} {}\\ \end{array}$$

where at the second equality, we have used the PDE solved by v at (s, X s x). Then, by taking the expectation (it removes the stochastic integral term because the integrand is very good), we obtain

$$\displaystyle\begin{array}{rcl} \mathrm{Err}._{\mathrm{disc}.}(h)& =& \mathbb{E}\Big[v(\mathit{Nh},X_{\mathit{Nh}}^{x,h}) +\sum _{ i=1}^{N}hg(X_{\mathit{ ih}}^{x,h})\Big] - v(0,x) {}\\ & =& \mathbb{E}\Big(\int _{0}^{t}\Big[\frac{1} {2}\sum _{i,j=1}^{d}\big([\sigma \sigma ^{\top }]_{ i,j}(X_{\varphi (s)}^{h,x}\big)-[\sigma \sigma ^{\top }]_{ i,j}(X_{s}^{h,x})\big)\partial _{ x_{i}x_{j}}^{2}v(s,X_{ s}^{h,x})\Big]\mathrm{d}s\Big) {}\\ & & +\mathbb{E}\Big(\int _{0}^{t}\Big[\sum _{ i=1}^{d}\big(b_{ i}(X_{\varphi (s)}^{h,x}) - b_{ i}(X_{s}^{h,x})\big)\partial _{ x_{i}}v(s,X_{s}^{h,x})\big]\mathrm{d}s\Big) {}\\ & & +\mathbb{E}\Big(\int _{0}^{t}\big[g(X_{\varphi (s)}^{h,x}) - g(X_{ s}^{h,x})\big]\mathrm{d}s\Big). {}\\ \end{array}$$

The global error is represented as a summation of local errors. For instance, let us estimate the first term related to σ σ : apply once again the Itô formula on the interval \([\mathit{kh},s] \subset [\mathit{kh},(k + 1)h]\) and to the function \((s,y)\mapsto \big([\sigma \sigma ^{\top }]_{i,j}(X_{\varphi (s)}^{h,x}\big) - [\sigma \sigma ^{\top }]_{i,j}(y)\big)\partial _{x_{i}x_{j}}^{2}v(s,y)\). It gives raise to a time integral between \(\mathit{kh} =\varphi (s)\) and s and a stochastic integral that vanishes in expectation. Proceed similarly for the other contributions with b and g. Finally we obtain a representation formula of the form

$$\displaystyle\begin{array}{rcl} \mathrm{Err}._{\mathrm{disc}.}(h) =\sum _{\alpha:0\leq \vert \alpha \vert \leq 4}\mathbb{E}\Big(\int _{0}^{t}\int _{ \varphi (s)}^{s}\partial _{ x}^{\vert \alpha \vert }v(r,X_{ r}^{h,x})l_{\alpha }\big(X_{\varphi (r)}^{h,x},X_{ r}^{h,x}\big)\mathrm{d}r\mathrm{d}s\Big)& & {}\\ \end{array}$$

where the summation is made on differentiation multi-indices of length smaller than 4, where l α are functions depending on b, σ, g and their derivatives up to order 2, and where l α has at most a linear growth w.r.t its two variables. Taking advantage of the boundedness of the derivatives of v, we easily complete the proof.

Observe that, by strengthening the assumptions and by going a bit further in the analysis, we could establish an expansion w.r.t. h. □ 

The previous assumption on u implies that \(f \in \mathcal{C}_{b}^{4}\) and \(g \in \mathcal{C}_{b}^{2}\), which is too strong in practice. The extension to non smooth f is much more difficult and we have to take advantage of the smoothness coming from the non-degenerate distribution of X or X h. We may follow the same types of computations, mixing PDE techniques and stochastic arguments, see [6]. But this is a pure stochastic analysis approach (Malliavin calculus) which provides the extension under the minimal non-degeneracy assumption (i.e. only stated at the initial point x), see [38]. We state the result without proofs.

Theorem 14.

Assume that b and σ are \(\mathcal{C}_{b}^{\infty }\) , let X x be the solution (46) starting from \(x \in \mathbb{R}^{d}\) and let X h,x be its Euler scheme defined in (52) . Assume additionally that σσ (x) is invertible. Then, for any bounded measurable function f, we have

$$\displaystyle{\mathbb{E}\Big[f(X_{t}^{x,h})\Big] - \mathbb{E}\Big[f(X_{ t}^{x})\Big] = O(h).}$$

In the same reference [38], the result is also proved for hypoelliptic system, where the hypoellipticity holds only at the starting point x. On the other hand, without such a degeneracy condition and for non smooth f (like Heaviside function), the convergence may fail.

The case of coefficients b and σ with low regularity or exploding behavior is still an active fields of research.

4.1.4 Sensitivities

If in addition we are interested by computing derivatives of u(t, x) w.r.t. x or other model parameters, this is still possible using Monte Carlo simulations. For the sake of simplicity, in our discussion we focus on the gradient of u w.r.t. x. Essentially, two approaches are known.

Resimulation Method. The derivative is approximated using the finite difference method

$$\displaystyle{\partial _{x_{i}}u(t,x) \approx \frac{u(t,x +\varepsilon e_{i}) - u(t,x -\varepsilon e_{i})} {2\varepsilon } }$$

where \(e_{i} = (0,\ldots,0,\mathop{1}\limits_{i^{\mathit{th}}},0,\ldots )\), and \(\varepsilon\) is small. Then, each value function is approximated by its Monte Carlo approximation given in (54). However, we have to be careful in generating the Euler scheme starting from \(x +\varepsilon e_{i}\) and \(x -\varepsilon e_{i}\): its sampling should use the same Brownian motion increments, that is

$$\displaystyle{ \partial _{x_{i}}u(t,x) \approx \frac{1} {M}\sum _{m=1}^{M}\frac{\mathcal{E}(f,g,X^{x+\varepsilon e_{i},h,m}) -\mathcal{E}(f,g,X^{x-\varepsilon e_{i},h,m})} {2\varepsilon }. }$$
(55)

Indeed, for an infinite sample (\(M = +\infty \)), it does not have any impact on the statistical error whether or not we use the same driving noise, but for finite M, this trick likely maintains a smaller statistical error. Furthermore, the optimal choice of h, M and \(\varepsilon\) is an important issue, but here results are different according to the regularity of f and g, we do not go into details.

Likelihood Method. To avoid the latter problems of selecting the appropriate value of the finite difference parameter \(\varepsilon\), we may prefer another Monte Carlo estimator of \(\partial _{x_{i}}u(t,x)\), which consists in appropriately weighting the output. When g equals 0, it takes the following form

$$\displaystyle{ \partial _{x_{i}}u(t,x) \approx \frac{1} {M}\sum _{m=1}^{M}f(X_{ t}^{x,h,m})H_{ t}^{x,h,m} }$$
(56)

where H t x, h, m is simultaneously generated with the Euler scheme and does not depend on f. The advantage of this approach is to avoid the possibly delicate choice of the perturbation parameter \(\varepsilon\) and it is valid for any function f: thus, it may reduce much the computational time, if many sensitivities are required for the same model. On the other hand, the confidence interval may be larger than that of the resimulation method.

We now provide the formula for the weight H (known as Bismut–Elworthy–Li formula). It uses the tangent process, which is the (well-defined, see [52]) derivative of xX t x w.r.t. x and which solves

$$\displaystyle\begin{array}{rcl} \mathit{DX}_{t}^{x}:= Y _{ t}^{x} = \mathrm{Id} +\int _{ 0}^{t}\mathit{Db}(X_{ s}^{x})\;Y _{ s}^{x}\;\mathrm{d}s +\sum _{ j=1}^{d}\int _{ 0}^{t}D\sigma _{ j}(X_{s}^{x})\;Y _{ s}^{x}\;\mathrm{d}W_{ j,s}& &{}\end{array}$$
(57)

where σ j is the j-th column of the matrix σ.

Theorem 15.

Assume that b and σ are \(\mathcal{C}_{b}^{2}\) -functions, that \(u \in \mathcal{C}^{1,2}([0,T] \times \mathbb{R}^{d}, \mathbb{R})\) solves the PDE (48) , and that σ is invertible with a uniformly bounded inverse σ −1 . We have

$$\displaystyle\begin{array}{rcl} \mathit{Du}(t,x) = \mathbb{E}\left (\frac{f(X_{t}^{x})} {t} \left [\int _{0}^{t}[\sigma ^{-1}(X_{ s}^{x})Y _{ s}^{x}]^{\top }\mathrm{d}W_{ s}\right ]^{\top }\right ).& & {}\\ \end{array}$$

Proof.

First, we recall the decomposition (51) obtained from Itô formula, using \(v(s,y) = u(t - s,y)\):

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} v(r,X_{r}^{x}) = v(0,x) +\int _{ 0}^{r}\mathit{Dv}(s,X_{s}^{x})\sigma (X_{s}^{x})\mathrm{d}W_{s},\quad \forall 0 \leq r \leq t,\quad \\ f(X_{t}^{x}) = v(t,X_{t}^{x}). \quad \end{array} \right. }$$
(58)

Second, taking expectation, it gives \(v(0,x) = u(t,x) = \mathbb{E}(v(r,X_{r}^{x}))\) for any r ∈ [0, T]. By differentiating w.r.t. x, we obtain a nice relation letting the expectation constant in time (actually deeply related to martingale property):

$$\displaystyle{\mathit{Dv}(0,x) = \mathbb{E}(\mathit{Dv}(r,X_{r}^{x})Y _{ r}^{x}),\quad \forall 0 \leq r \leq t.}$$

Thus, we deduce

$$\displaystyle\begin{array}{rcl} \mathit{Du}(t,x) = \mathit{Dv}(0,x)& =& \mathbb{E}\left (\frac{1} {t}\int _{0}^{t}\mathit{Dv}(s,X_{ s}^{x})Y _{ s}^{x}\;\mathrm{d}s\right ) {}\\ & =& \mathbb{E}\left (\frac{1} {t}\left [\int _{0}^{t}\mathit{Dv}(s,X_{ s}^{x})\sigma (X_{ s}^{x})\mathrm{d}W_{ s}\right ]\left [\int _{0}^{t}[\sigma ^{-1}(X_{ s}^{x})Y _{ s}^{x}]^{\top }\mathrm{d}W_{ s}\right ]^{\top }\right ) {}\\ & =& \mathbb{E}\left (\frac{v(t,X_{t}^{x}) - v(0,x)} {t} \left [\int _{0}^{t}[\sigma ^{-1}(X_{ s}^{x})Y _{ s}^{x}]^{\top }\mathrm{d}W_{ s}\right ]^{\top }\right ) {}\\ & =& \mathbb{E}\left (\frac{f(X_{t}^{x})} {t} \left [\int _{0}^{t}[\sigma ^{-1}(X_{ s}^{x})Y _{ s}^{x}]^{\top }\mathrm{d}W_{ s}\right ]^{\top }\right ) {}\\ \end{array}$$

using Theorem 10 at the second and fourth equality, (58) at the third one. □ 

In view of the above assumptions of u, implicitly the function f is smooth. However, under the current ellipticity condition, u is still smooth even if f is not; since the formula does depend on f and not on its derivatives, it is standard to extend the formula to any bounded function f (without any regularity assumption).

The Monte Carlo evaluation of Du(t, x) easily follows by independently sampling \(\frac{f(X_{t}^{x})} {t} \left [\int _{0}^{t}[\sigma ^{-1}(X_{ s}^{x})Y _{ s}^{x}]^{\top }\mathrm{d}W_{ s}\right ]^{\top }\) and taking the empirical mean. The exact simulation is not possible and once again, we may use an Euler-type scheme, with time step h:

  • The dimension-augmented Stochastic Differential Equation (X x, Y x) is approximated using the Euler scheme.

  • We use a simple-approximation of the stochastic integral

    $$\displaystyle{\int _{0}^{t}[\sigma ^{-1}(X_{ s}^{x})Y _{ s}^{x}]^{\top }\mathrm{d}W_{ s} =\sum _{ i=0}^{N-1}[\sigma ^{-1}(X_{\mathit{ ih}}^{x,h})Y _{\mathit{ ih}}^{x,h}]^{\top }(W_{ (i+1)h} - W_{\mathit{ih}}).}$$

The analysis of discretization error is more intricate than for \(\mathbb{E}(f(X_{t}^{x,h}) - f(X_{t}^{x}))\): nevertheless, the error is still of magnitude h (the convergence order is 1 w.r.t. h, as proved in [38]).

Theorem 16.

Under the setting of Theorem 14, for any bounded measurable function f, we have

$$\displaystyle{\mathbb{E}\left (\frac{f(X_{t}^{x,h})} {t} \sum _{i=0}^{N-1}\left [[\sigma ^{-1}(X_{\mathit{ ih}}^{x,h})Y _{\mathit{ ih}}^{x,h}]^{\top }(W_{ (i+1)h} - W_{\mathit{ih}})\right ]^{\top }\right )-\mathit{Du}(t,x) = O(h).}$$

4.1.5 Other Theoretical Estimates in Small Time

The representation formula of Theorem 15 is the starting point for getting accurate probabilistic estimates on the derivatives of the underlying PDE as time is small, in terms of the fractional smoothness of f(X t x) which is related to the decay of

$$\displaystyle{\|f(X_{t}^{x}) - \mathbb{E}(f(X_{ t-s}^{y}))\vert _{ y=X_{s}^{x}}\|_{L_{2}}\quad \text{ as }s \rightarrow t.}$$

The derivatives are measured in weighted L 2-norms and surprisingly, the above results are equivalence results [36]; we are not aware of such results using PDE arguments.

Theorem 17.

Under the setting Footnote 19 of Theorem 14 , let t be fixed, for 0 < θ ≤ 1 and a bounded f, the following assertions are equivalent:

  1. i)

    For some c ≥ 0, \(\mathbb{E}\vert f(X_{t}^{x}) - \mathbb{E}(f(X_{t-s}^{y}))\vert _{y=X_{s}^{x}}\vert ^{2} \leq c(t - s)^{\theta }\) for 0 ≤ s ≤ t.

  2. ii)

    For some c ≥ 0, \(\mathbb{E}\vert \mathit{Du}(t - s,X_{s}^{x})\vert ^{2} \leq \frac{c} {(t-s)^{1-\theta }}\) for 0 ≤ s < t.

  3. iii)

    For some c ≥ 0, \(\int _{0}^{s}\mathbb{E}\vert D^{2}u(t - r,X_{r}^{x})\vert ^{2}\mathrm{d}r \leq \frac{c} {(t-s)^{1-\theta }}\) for 0 ≤ s < t.

If 0 < θ < 1, it is also equivalent to:

  1. iv)

    For some c ≥ 0, \(\mathbb{E}\vert D^{2}u(t - s,X_{s}^{x})\vert ^{2} \leq \frac{c} {(t-s)^{2-\theta }}\) for 0 ≤ s < t.

Theorem 18.

Under the setting of Theorem 14 , let t be fixed, for 0 < θ < 1 and a bounded f, the following assertions are equivalent:

  1. i)

    \(\int _{0}^{t}(t - s)^{-\theta -1}\mathbb{E}\vert f(X_{t}^{x}) - \mathbb{E}(f(X_{t-s}^{y}))\vert _{y=X_{s}^{x}}\vert ^{2}\mathrm{d}s < +\infty \) .

  2. ii)

    \(\int _{0}^{t}(t - s)^{-\theta }\mathbb{E}\vert \mathit{Du}(t - s,X_{s}^{x})\vert ^{2}\mathrm{d}s < +\infty \) .

  3. iii)

    \(\int _{0}^{t}(t - s)^{1-\theta }\mathbb{E}\vert D^{2}u(t - s,X_{s}^{x})\vert ^{2}\mathrm{d}s < +\infty \) .

4.2 The Case of Dirichlet Boundary Conditions and Stopped Processes

4.2.1 Feynman–Kac Formula

In view of Corollary 1, the natural extension of Theorem 12 in the case of Dirichlet boundary condition is the following. We state the result without source term to simplify. The proof is similar and we skip it.

Theorem 19.

Let D be a bounded domain of \(\mathbb{R}^{d}\) . Under the setting of Theorem 12 , assume there is a solution \(u \in \mathcal{C}_{b}^{1,2}([0,T] \times \overline{D}, \mathbb{R})\) to the PDE

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x) = L_{b,\sigma \sigma ^{\top }}^{X}u(t,x),\quad \text{for }(t,x) \in ]0,+\infty [\times D,\quad \\ u(0,x) = f(0,x),\quad \text{for }x \in D, \quad \\ u(t,x) = f(t,x),\quad \text{for }(t,x) \in \mathbb{R}^{+} \times \partial D, \quad \end{array} \right. }$$
(59)

for a given function \(f: \mathbb{R}^{+} \times \overline{D} \rightarrow \mathbb{R}\) . Then u is given by

$$\displaystyle{ u(t,x) = \mathbb{E}[f(t -\tau ^{x} \wedge t,X_{\tau ^{x }\wedge t}^{x})] }$$
(60)

for x ∈ D, where \(\tau ^{x} =\inf \{ s > 0: X_{s}^{x}\notin D\}\) is the first exit time from D by X.

4.2.2 Monte Carlo Simulations

Performing a Monte Carlo algorithm in this context is less easy since we have to additionally simulate the exit time of X. A simple approach consists in discretizing X using the Euler scheme with time step h, and then taking for the exit time

$$\displaystyle{\tau ^{x,h} =\inf \{ ih > 0: X_{\mathit{ ih}}^{x,h}\notin D\}.}$$

It does not require any further computations than those needed to generate (X ih x, h, 0 ≤ i ≤ N). But, the discretization error worsens much since it becomes of magnitude \(\sqrt{ h}\). Actually, even if the values of (X ih x, h, 0 ≤ i ≤ N) are generated without error (like in Brownian motion case or other simple processes), the convergence order is still \(\frac{1} {2}\) w.r.t. h [27]. The deterioration of the discretization error really comes from the high irregularity of Brownian motion paths (and SDE paths): even if two successive points X ih x, h and X (i+1)h x, h are close to the boundary but inside the domain, a discrete monitoring scheme does not detect the exit while a continuous Brownian motion-like path would likely exit from the domain between ih and (i + 1)h. Moreover, it gives a systematic (in mean) underestimation of the true exit time. To overcome this lack of accuracy, there are several improved schemes.

  • The Brownian bridge technique consists in simulating the exit time of local arithmetic Brownian motion [corresponding to the local dynamics of Euler scheme, see (12)]. For simple domain like half-space, the procedure is explicit and tractable, this is related the explicit knowledge of the distribution of the Brownian maximum, see Proposition 5. For smooth domain, we can approximate locally the domain by half-spaces. This improvement allows to recover the order 1 for the convergence, see [27, 28]. For non smooth domains (including corners for instance) and general SDEs, providing an accurate scheme and performing its error analysis is still an open issue; for heuristics and numerical experiments, see [29] for instance.

  • The boundary shifting method consists in shrinking the domain to compensate the systematic bias in the simulation of the discrete exit time. Very remarkably, there is a universal elementary rule to make the domain smaller:

    locally at a point y close to the boundary, move the boundary inwards by a quantity proportional to \(c_{0}\sqrt{h}\) times the norm of the diffusion coefficient in the normal direction.

    The constant c 0 is equal to the mean of the asymptotic overshoot of the Gaussian random walk as the ladder height goes to infinity: it can be expressed using the zeta function

    $$\displaystyle{c_{0} = -\frac{\zeta (\frac{1} {2})} {\sqrt{2\pi }} = 0.5826\ldots.}$$

    This procedure strictly improves the order \(\frac{1} {2}\) of the discrete procedure, but it is still an open question whether the convergence order is 1, although numerical experiments corroborates this fact.

The result is stated as follows, see [37].

Theorem 20.

Assume that the domain D is bounded and has a \(\mathcal{C}^{3}\) -boundary, that b,σ are \(\mathcal{C}_{b}^{2}\) and \(f \in \mathcal{C}_{b}^{1,2}\) . Let n(y) be the unit inward normal vector to the boundary ∂D at the closest Footnote 20 point to y on the boundary. Set

$$\displaystyle{\hat{\tau }^{x,h} =\inf \big\{ \mathit{ih} > 0: X_{\mathit{ ih}}^{x,h}\notin D\text{ or }d(X_{\mathit{ ih}}^{x,h},\partial D) \leq c_{ 0}\sqrt{h}\big\vert n^{\top }\sigma \big\vert (X_{\mathit{ ih}}^{x,h})\big\}.}$$

Then, we have

$$\displaystyle{\mathbb{E}[f(t -\hat{\tau }^{x,h} \wedge t,X_{\hat{\tau }^{ x,h}\wedge t}^{x})] - \mathbb{E}[f(t -\tau ^{x} \wedge t,X_{\tau ^{x }\wedge t}^{x})] = o(\sqrt{h}).}$$

Observe that this improvement is very cheap regarding the computational cost. It can be extended (regarding to the numerical scheme and its mathematical analysis) to a source term, to time-dependent domain and to stationary problems (elliptic PDE).

Complementary References. See [2, 13, 26, 49, 53, 64] for general references. For reflected processes and Neumann boundary conditions, see [10, 28]. For variance reduction techniques, see [34, 47, 58]. For domain decomposition, see [35, 62]. This list is not exhaustive.

5 Backward Stochastic Differential Equations and Semi-linear PDEs

The link between PDEs and stochastic processes have been developed since several decades and more recently, say in the last 20 years, researchers have paid attention to the probabilistic interpretation of non-linear PDEs, and in particular semi-linear PDEs. These PDEs are connected to non-linear processes, called Backward Stochastic Differential Equations (BSDE in short). In this section, we define these equations, firstly introduced by Pardoux and Peng [60], and give their connection with PDEs. Finally, we present a Monte Carlo algorithm to simulate them, using empirical regressions: it has the advantage to suit well the case of multidimensional problems, with a great generality on the type of semi-linearity.

These equations have many fruitful applications in stochastic control theory and mathematical finance, where they usually provide elegant proofs to characterize the solution to optimal investment problems for instance; for the related applications, we refer to reader to [17, 18]. Regarding the semi-linear PDE point of view, the applications are reaction-diffusion equations in chemistry [24], evolution of species in population biology [51, 66], Hodgkin–Huxley model in neuroscience [43], Allen–Cahn equation for phase transition in physics…see the introductive course [30] and references therein. For other non-linear equations with connections with stochastic processes, see the aforementioned reference.

5.1 Existence of BSDE and Feynman–Kac Formula

5.1.1 Heuristics

As a difference with a Stochastic Differential Equation defined by (46) where the initial condition is given and the dynamics is imposed, a Backward SDE is defined through a random terminal condition ξ at a fixed terminal T and a dynamics imposed by a driver g. It takes the form

$$\displaystyle{ Y _{t} =\xi +\int _{t}^{T}g(s,Y _{ s},Z_{s})\mathrm{d}s -\int _{t}^{T}Z_{ s}\mathrm{d}W_{s} }$$
(61)

where we write the integrals between t and T to emphasize the backward point of view: ξ should be thought as a stochastic target to reach at time T. A solution to (61) is the couple (Y, Z): without extra conditions, the problem has an infinite number of solutions and thus is ill-posed. For instance, if g ≡ 0 and ξ = f(W T ): taking \(c \in \mathbb{R}\), a solution is Z t  = c and \(Y _{t} =\xi +c(W_{T} - W_{t})\), thus uniqueness fails. In addition to integrability properties (appropriate L 2-spaces) that we do not detail, an important condition is that the solution does not anticipate the future of Brownian motion, i.e. the solution Y t depends on the Brownian Motion W up to t, and similarly to Z: we informally say that the solution is adapted to W. In a stochastic control problem, this adaptedness constraint is natural since it states that the value function or the decision can not be made in advance to the flow of information given by W. Observe that in the uniqueness counter-example, Y is not adapted to W since Y t depends on the Brownian motion on [0, T] and not only on [0, t].

Taking the conditional expectation in (61) gives

$$\displaystyle{ Y _{t} = \mathbb{E}\Big(\xi +\int _{t}^{T}g(s,Y _{ s},Z_{s})\mathrm{d}s\big\vert W_{s}: s \leq t\Big), }$$
(62)

because the stochastic integral (built with Brownian increments after t) is centered conditionally on the Brownian motion up to time t. Of course, this rule is fully justified by the stochastic calculus theory. Since Y t in (62) is adapted to W, it should be the right solution (if unique); then, Z serves as a control to make the equation (61) valid (with Y adapted).

5.1.2 Feynman–Kac Formula

The connection with PDE is possible when the terminal condition is a function of a (forward) SDE: this case is called Markovian BSDE. Additionally, the driver may depend also on this SDE as g(s, X s , Y s , Z s ) for a deterministic function g. We now proceed by a verification theorem. To allow a more natural presentation as backward system, we choose to write the semi-linear PDE with a terminal condition at time T instead of an initial condition at time 0.

Theorem 21.

Let T > 0 be given. Under the assumptions of Theorem 11 , let X x be the solution (46) starting from \(x \in \mathbb{R}^{d}\) , assume there is a solution \(v \in \mathcal{C}_{b}^{1,2}([0,T] \times \mathbb{R}^{d}, \mathbb{R})\) to the semi-linear PDE

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} v_{t}^{{\prime}}(t,x) + L_{b,\sigma \sigma ^{\top }}^{X}v(t,x) + g(t,x,v(t,x),\mathit{Dv}(t,(x)\sigma (x)) = 0,\quad \\ v(T,x) = f(x), \quad \end{array} \right. }$$
(63)

for two given functions \(f: \mathbb{R}^{d} \rightarrow \mathbb{R}\) and \(g: [0,T] \times \mathbb{R}^{d} \times \mathbb{R} \times (\mathbb{R} \otimes \mathbb{R}^{d}) \rightarrow \mathbb{R}\) . Then, Y t x = v(t,X t x ) and Z t x = [Dv σ](t,X t x ) solves the BSDE

$$\displaystyle{ Y _{t}^{x} = f(X_{ T}^{x}) +\int _{ t}^{T}g(s,X_{ s}^{x},Y _{ s}^{x},Z_{ s}^{x})\mathrm{d}s -\int _{ t}^{T}Z_{ s}^{x}\mathrm{d}W_{ s}. }$$
(64)

Proof.

The Itô formula (50) applied to v and X x gives

$$\displaystyle\begin{array}{rcl} \mathrm{d}v(s,X_{s}^{x})& =& \big[v_{ s}^{{\prime}}(s,X_{ s}^{x}) + L_{ b,\sigma \sigma ^{\top }}^{X}v(s,X_{ s}^{x})\big]\mathrm{d}s + \mathit{Dv}(s,X_{ s}^{x})\sigma (X_{ s}^{x})\mathrm{d}W_{ s} {}\\ & =& -g(s,X_{s}^{x},v(s,X_{ s}^{x}),[\mathit{Dv}\ \sigma ](s,X_{ s}^{x}))\mathrm{d}s + \mathit{Dv}(s,X_{ s}^{x})\sigma (X_{ s}^{x})\mathrm{d}W_{ s},{}\\ \end{array}$$

which writes between s = t and s = T:

$$\displaystyle\begin{array}{rcl} v(T,X_{T}^{x})& =& v(t,X_{ t}^{x}) -\int _{ t}^{T}g(s,X_{ s}^{x},v(s,X_{ s}^{x}),[\mathit{Dv}\ \sigma ](s,X_{ s}^{x}))\mathrm{d}s {}\\ & & \quad +\int _{ t}^{T}\mathit{Dv}(s,X_{ s}^{x})\sigma (X_{ s}^{x})\mathrm{d}W_{ s}. {}\\ \end{array}$$

Since v(T, . ) = f(. ), we complete the proof by identification. □ 

In particular, at time 0 where X 0 x = x, we obtain Y 0 x = v(0, x) and in view of (62), it gives a Feynman–Kac representation to v:

$$\displaystyle{ v(0,x) = \mathbb{E}\Big(f(X_{T}^{x}) +\int _{ t}^{T}g(s,X_{ s}^{x},Y _{ s}^{x},Z_{ s}^{x})\mathrm{d}s\Big). }$$
(65)

As in case of linear PDEs, the assumption of uniform smoothness on v up to T is too strong to include the case of non-smooth terminal function f. But with an extra ellipticity condition, as for the heat equation, the solution becomes smooth immediately away from T (see [21]) and a similar verification could be checked under milder conditions.

The above Backward SDE (64) is coupled to a Forward SDE, but the latter is not coupled to the BSDE. Another interesting extension is to allow the coupling in both directions by having the coefficients of X dependent on v, i.e. b(x) and σ(x) become functions of x, v(t, x), Dv(t, (x). The resulting process is called a Forward Backward Stochastic Differential Equations and is related to Quasi-Linear PDEs, where the operator \(L_{b,\sigma \sigma ^{\top }}^{X}\) also depends on v and Dv, see [56].

5.1.3 Other Existence Results Without PDE Framework

So far, only Markovian BSDEs are presented but from the probabilistic point of view, the Markovian structure is not required to define a solution: what is really crucial is the ability to represent a random variable built from (W s : s ≤ T) as a stochastic integral w.r.t. the Brownian motion. This point has been discussed in Corollary 4. Then, in the simple case where g is Lipschitz w.r.t. y, z, (Y, Z) are built by means of an usual fixed point procedure in suitable L 2-norms and of this stochastic integral representation. We now state a more general existence and uniqueness result for BSDE, which is valid without any underlying (finite-dimensional) semi-linear PDE, we omit the proof.

Theorem 22.

Let T > 0 be fixed and assume the assumptions of Theorem 11 for the existence of X and that

  • The terminal condition ξ = f(X s : s ≤ T) is a square integrable functional of the stochastic process (X s : s ≤ T).

  • The measurable function \(g: [0,T] \times \mathbb{R}^{d} \times \mathbb{R} \times (\mathbb{R} \otimes \mathbb{R}^{d})\) is uniformly Lipschitz in (y,z):

    $$\displaystyle{\vert g(t,x,y_{1},z_{1}) - g(t,x,y_{2},z_{2})\vert \leq C_{g}(\vert y_{1} - y_{2}\vert + \vert z_{1} - z_{2}\vert ),}$$

    uniformly in (t,x).

  • The driver is square integrable at (y,z) = (0,0): \(\mathbb{E}(\int _{0}^{T}g^{2}(t,X_{t},0,0)\mathit{dt}) < +\infty \) .

Then, there exists a unique solution (Y,Z), adapted and in L 2 -spaces, to

$$\displaystyle{ Y _{t} = f(X_{s}: s \leq T) +\int _{ t}^{T}g(s,X_{ s},Y _{s},Z_{s})\mathrm{d}s -\int _{t}^{T}Z_{ s}\mathrm{d}W_{s}. }$$

Many works have been done in the last decade to go beyond the case of Lipschitz driver, which may be too stringent for some applications. In particular, having g with quadratic growth in Z is particularly interesting in exponential utility maximization problem (the non-linear PDE term is quadratic in | Dv | ). This leads to quadratic BSDEs (see for instance [50]). A simple example of such BSDEs can be cooked up from heat equation and Brownian motion. Namely from Corollary 4, for a smooth function f with compact support, set \(u(t,x) = \mathbb{E}(\exp (f(x + W_{t})))\) and \(v(t,y) = u(1 - t,y)\), so that

$$\displaystyle\begin{array}{rcl} \exp (f(W_{1}))& =& u(1,0) +\int _{ 0}^{1}u_{ x}^{{\prime}}(1 - s,W_{ s})\mathrm{d}W_{s}, {}\\ u(1 - t,W_{t})& =& u(1,0) +\int _{ 0}^{t}u_{ x}^{{\prime}}(1 - s,W_{ s})\mathrm{d}W_{s}, {}\\ v(t,W_{t})& =& \exp (f(W_{1})) -\int _{t}^{1}v_{ x}^{{\prime}}(s,W_{ s})\mathrm{d}W_{s}, {}\\ \end{array}$$

and by setting Y t  = log(v(t, W t )) and \(Z_{t} = v_{x}^{{\prime}}(t,W_{t})/Y _{t}\), we obtain

$$\displaystyle\begin{array}{rcl} Y _{t} = f(W_{1}) +\int _{ t}^{1}\frac{1} {2}Z_{s}^{2}\mathrm{d}s -\int _{ t}^{1}Z_{ s}\mathrm{d}W_{s},& & {}\\ \end{array}$$

which is the simplest quadratic BSDE.

5.2 Time Discretization and Dynamic Programming Equation

5.2.1 Explicit and Implicit Schemes

To perform the simulation, a first stage may be the derivation of a discretization scheme, written backwardly in time (backward dynamic programming equation). For the further analysis, assume that the terminal condition is of the form ξ = f(X T ) where X is standard (forward) SDE.

Consider a time grid with N time steps \(\pi =\{ 0 = t_{0} < \cdots < t_{i} < \cdots < t_{N} = T\}\), with possibly non uniform time step, and set \(\vert \pi \vert =\max _{i}(t_{i+1} - t_{i})\). We will suppose later that | π | → 0.

We write \(\varDelta _{i} = t_{i+1} - t_{i}\) and \(\varDelta W_{i} = W_{t_{i+1}} - W_{t_{i}}\). Writing the Eq. (64) between times t i and t i+1, we have

$$\displaystyle{Y _{t_{i}} = Y _{t_{i+1}} +\int _{ t_{i}}^{t_{i+1} }g(s,X_{s},Y _{s},Z_{s})\mathrm{d}s -\int _{t_{i}}^{t_{i+1} }Z_{s}\mathrm{d}W_{s}.}$$

Then, by applying simple approximations for ds and dW s integrals and by replacing X by a Euler scheme computed along the grid π (and denoted X π), we may define the discrete BSDE as

$$\displaystyle{(Y _{t_{i}}^{\pi },Z_{t_{i}}^{\pi }) =\mathop{ \mathrm{arg}\min }\limits_{(Y,Z) \in L_{2}(\mathcal{F}_{t_{i}}^{\pi })}\mathbb{E}(Y _{t_{i+1}}^{\pi } +\varDelta _{i}g(t_{i},X_{t_{i}}^{\pi },Y,Z) - Y - Z\varDelta W_{i})^{2}}$$

with the initialization Y T π = f(X T π) at i = N, where \(L_{2}(\mathcal{F}_{t_{i}}^{\pi })\) stands for the set of random variables (with appropriate dimension) that are square integrable and depend on the Brownian motion increments (Δ W j : j ≤ i − 1). The latter property is the measurability w.r.t. the sigma-field \(\mathcal{F}_{t_{i}}^{\pi }\) generated by (Δ W j : j ≤ i − 1).

Then, a direct computation using the properties of Brownian increments gives

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} Y _{T}^{\pi } = f(X_{ T}^{\pi }), \quad \\ Z_{t_{i}}^{\pi } = \frac{1} {\varDelta _{i}} \mathbb{E}(Y _{t_{i+1}}^{\pi }\varDelta W_{i}^{\top }\vert \mathcal{F}_{ t_{i}}^{\pi }),\quad i < N \quad \\ Y _{t_{i}}^{\pi } = \mathbb{E}(Y _{t_{i+1}}^{\pi } +\varDelta _{i}g(t_{i},X_{t_{i}}^{\pi },Y _{t_{i}}^{\pi },Z_{t_{i}}^{\pi })\vert \mathcal{F}_{t_{i}}^{\pi }),\quad i < N.\quad \end{array} \right. }$$
(66)

This is the implicit scheme since the arguments of the function at the r.h.s. depend on the quantity \(Y _{t_{i}}^{\pi }\) to compute on the l.h.s. Nevertheless, since g is uniformly Lipschitz in y, it is not difficult to show that the Dynamic Programming Equation (DPE in short) (66) is well-defined for | π | small enough and that \(Y _{t_{i}}^{\pi }\) can be computed using a Picard iteration procedure.

It is easy to turn the previous scheme into an explicit scheme and therefore, to avoid this extra Picard procedure. It writes

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} Y _{T}^{\pi } = f(X_{ T}^{\pi }), \quad \\ Z_{t_{i}}^{\pi } = \frac{1} {\varDelta _{i}} \mathbb{E}(Y _{t_{i+1}}^{\pi }\varDelta W_{i}^{\top }\vert \mathcal{F}_{ t_{i}}^{\pi }),\quad i < N \quad \\ Y _{t_{i}}^{\pi } = \mathbb{E}(Y _{t_{i+1}}^{\pi } +\varDelta _{i}g(t_{i},X_{t_{i}}^{\pi },Y _{t_{i+1}}^{\pi },Z_{t_{i}}^{\pi })\vert \mathcal{F}_{t_{i}}^{\pi }),\quad i < N.\quad \end{array} \right. }$$
(67)

In our personal experience on numerics, we have not observed a significant outperformance of one scheme on another. Moreover, from the theoretical point of view, both schemes exhibit the same rates of convergence w.r.t. | π | , at least when the driver is Lipschitz.

The explicit scheme is the simplest one, and this is the one that we recommend in practice.

5.2.2 Time Discretization Error

Define the measure of the quadratic error

$$\displaystyle{\mathcal{E}(Y ^{\pi } - Y,Z^{\pi } - Z) =\max _{0\leq i\leq N}\mathbb{E}\vert Y _{t_{i}}^{\pi } - Y _{t_{i}}\vert ^{2} +\sum _{ i=0}^{N-1}\int _{ t_{i}}^{t_{i+1} }\mathbb{E}\vert Z_{t_{i}}^{\pi } - Z_{t}\vert ^{2}\mathrm{d}t.}$$

Although not explicitly mentioned in the previous existence results on BSDE, this type of norm is appropriate to perform the fixed point argument in the proof of Theorem 22. We now state an error estimate [33], in order to show the convergence of the DPE to the BSDE.

Theorem 23.

For a Lipschitz driver w.r.t. (x,y,z) and \(\frac{1} {2}\) -Hölder w.r.t. t, there is a constant C independent on π such that we have

$$\displaystyle\begin{array}{rcl} \mathcal{E}(Y ^{\pi } - Y,Z^{\pi } - Z)& \leq & C\Big(\vert \pi \vert +\sup _{i\leq N}\mathbb{E}\vert X_{t_{i}}^{\pi } - X_{t_{i}}\vert ^{2} + \mathbb{E}\vert f(X_{ T}^{\pi }) - f(X_{ T})\vert ^{2} {}\\ & & \qquad +\sum _{ i=0}^{N-1}\frac{1} {\varDelta _{i}} \int _{t_{i}}^{t_{i+1} }\int _{t_{i}}^{t_{i+1} }\mathbb{E}\vert Z_{t} - Z_{s}\vert ^{2}\mathrm{d}s\ \mathrm{d}t\Big). {}\\ \end{array}$$

Let us discuss on the nature and the magnitude of different error contributions.

  • First, we face the strong approximation error of the forward SDE by its Euler scheme. Here we rather focus on convergence of paths (in L 2-norms), whereas in Sect. 4.1.3, we have studied the convergence of expectations of function of X T π towards those of X T . Anyway, the problem is now well-understood: under a Lipschitz condition on b and σ, we can prove \(\sup _{i\leq N}\mathbb{E}\vert X_{t_{i}}^{\pi } - X_{ t_{i}}\vert ^{2} = O(\vert \pi \vert )\).

  • Second, we should ensure a good strong approximation of the terminal conditions: if f is Lipschitz continuous, it readily follows from the previous term and \(\mathbb{E}\vert f(X_{T}^{\pi }) - f(X_{T})\vert ^{2} = O(\vert \pi \vert )\). For non Lipschitz f, there are partial answers, see [3].

  • Finally, the last contribution \(\sum _{i=0}^{N-1}\frac{1} {\varDelta _{i}} \int _{t_{i}}^{t_{i+1}}\int _{t_{ i}}^{t_{i+1}}\mathbb{E}\vert Z_{ t} - Z_{s}\vert ^{2}\mathrm{d}s\ \mathrm{d}t\) is related to the L 2 -regularity of Z (or equivalently of the gradient of the semi-linear PDE along the X-path) and it is intrinsic to the BSDE-solution. For smooth data, Z has the same regularity of Brownian paths and this error term is O( | π | ). For non smooth f (but under ellipticity condition on X), the L 2-norm of Z t blows up as t → T and the rate | π | usually worsens: for instance for f(x) = 1 x ≥ 0, it becomes \(N^{-\frac{1} {2} }\) for uniform time-grid.

    The analysis is very closely related to the fractional smoothness of f(X T ) briefly discussed in Sect. 4.1.5, see also [25]. Choosing an appropriate grid of the form (see Fig. 8)

    $$\displaystyle{t_{k}^{\bar{\theta }} = T - T(1 - k/N)^{1/\bar{\theta }}\quad (\bar{\theta }\in (0,1])}$$

    compensates this blow-up (for appropriate value of \(\bar{\theta }\)) and enables to retrieve the rate N −1.

    Fig. 8
    figure 8

    On the horizontal axis, uniform grid. On the vertical axis, the grid \((t_{k}^{\bar{\theta }}: 0 \leq k \leq N)\), with T = 1

Actually in [31], it is shown that the upper bounds in Theorem 23 can be refined for smooth data, to finally obtain that the main error comes from strong approximation error on the forward component. This is an incentive to accurately approximate the SDE in L 2-sense.

5.2.3 Towards the Resolution of the Dynamic Programming Equation

The effective implementation of the explicit scheme (67) requires the iterative computations of conditional expectations: this is discussed in the next paragraphs.

Prior to this, we make some preliminary simplifications for the sake of conciseness. First, we consider the case of g independent of z,

$$\displaystyle{g(t,x,y,z) = g(t,x,y),}$$

therefore we only approximate Y π; the general case is detailed in [39, 54]. Second, it can be easily seen that it is enough to take the conditioning w.r.t. \(X_{t_{i}}^{\pi }\) instead of \(\mathcal{F}_{t_{i}}^{\pi }\), because of the Markovian property of X π along the grid π and of the independent Brownian increments. Thus, (67) becomes

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} Y _{T}^{\pi } = f(X_{ T}^{\pi }), \quad \\ Y _{t_{i}}^{\pi } = \mathbb{E}(Y _{t_{i+1}}^{\pi } +\varDelta _{i}g(t_{i},X_{t_{i}}^{\pi },Y _{t_{i+1}}^{\pi })\vert X_{t_{i}}^{\pi }),\quad i < N.\quad \end{array} \right. }$$
(68)

The same arguments apply to assert that for a (measurable) deterministic function y i π we have

$$\displaystyle{ y_{i}^{\pi }(X_{ t_{i}}^{\pi }) = Y _{t_{i}}^{\pi }. }$$
(69)

Therefore, simulating Y π is equivalent to the computation of the functions y i π for any i and the simulation of the process X π.

5.3 Approximation of Conditional Expectations Using Least-Squares Method

5.3.1 Empirical Least-Squares Problem

We adopt the point of view of conditional expectation as a projection operator in L 2. This is not the only possible approach, but it has the advantages (as it will be seen later)

  1. 1.

    To be much flexible w.r.t. the knowledge on the model for X (or X π): only independent simulations of X π are required (which is straightforward to perform).

  2. 2.

    To be little demanding on the assumptions on the underlying stochastic model: in particular, no ellipticity nor degeneracy condition are required, it could also include jumps (corresponding to PDE with a non-local Integro-Differential operator).

  3. 3.

    To provide robust theoretical error estimates, which allow to optimally tune the convergence parameters.

  4. 4.

    To be possibly adaptive to the data (data-driven scheme).

We recall that if a scalar random variable R (called the response) is square integrable, the conditional expectation of R given another possibly multidimensional r.v. O (called the observation) is given by

$$\displaystyle{\mathbb{E}(R\vert O) =\mathop{\mathrm{ Arg\ min}}\limits_{m(O)\text{ s.t. m(.) is a meas. funct. with }\mathbb{E}\vert m(O)\vert ^{2} < +\infty }\mathbb{E}\vert R-m(O)\vert ^{2}.}$$

This is a least-squares problem in infinite dimension, also called regression problem. Usually in this context of BSDE simulation, none of the distributions of O, R or (O, R) is known in analytical and tractable form: thus an exact computation of \(\mathbb{E}(R\vert O)\) is hopeless. The difficulty remains unchanged if we approximate the regression function

$$\displaystyle{m(\cdot ) = \mathbb{E}(R\vert O = \cdot )}$$

on a finite dimensional functions basis. Alternatively, we can rely on independent simulations of (O, R) to compute an empirical version of m. This is the approach subsequently developed.

The basis functions are (ϕ k (. ))1 ≤ k ≤ K and we assume that \(\mathbb{E}\vert \phi _{k}(O)\vert ^{2} < +\infty \) for any k. We emphasize that

$$\displaystyle{{\it \text{we can not assume that}}\ (\phi _{k}(O))_{1\leq k\leq K}\ {\it \text{forms an orthonormal basis in}}\ L_{2},}$$

since in our setting, the distribution of O is not explicit. Using this finite dimensional approximation, we anticipate to unfortunately retrieve the curse of dimensionality: the larger the dimension d of O, the larger the required K for a good accuracy of m, the larger the complexity.

We compute the coefficients on the basis by solving a empirical least-squares problem

$$\displaystyle{(\alpha _{k}^{M})_{ k} =\mathop{\mathrm{ arg}\min }\limits_{\alpha \in \mathbb{R}^{K}} \frac{1} {M}\sum _{i=1}^{M}(R_{ i} -\sum _{k=1}^{K}\alpha _{ k}\phi _{k}(O_{i}))^{2},}$$

where \((O_{i},R_{i})_{1\leq i\leq M}\) are independent simulations of the couple (O, R). Then, for the approximation of m, we set

$$\displaystyle{\tilde{m}_{M}(.) =\sum _{ k=1}^{K}\alpha _{ k}^{M}\phi _{ k}(.).}$$

To efficiently compute the coefficients (α k M) k , we might use a SVD decomposition to account for instability issues, see [41].

5.3.2 Model-Free Error Estimates

Without extra assumptions on the model, we can derive model-free error estimates, see [42].

Theorem 24.

Assume that

  • \(R = m(O)+\epsilon\) with \(\mathbb{E}(\epsilon \vert O) = 0\).Footnote 21

  • (O 1 ,R 1 ),⋯ ,(O M ,R M ) are independent copies of (O,R).

  • \(\sigma ^{2} =\sup _{x}\mathbb{V}\mathrm{ar}(R\vert O = x) < +\infty \) .

  • Let K be a finite positive integer and Φ be the linear vector space spanned by some functions (ϕ 1 ,…ϕ K ), with dim(Φ) ≤ K. Footnote 22

Denote by μ M the empirical measure associated to (O 1 ,⋯ ,O M ), μ the probability measure of O and by \(\vert \phi \vert _{M}^{2} = \frac{1} {M}\sum _{i=1}^{M}\phi ^{2}(O_{ i})\) the empirical L 2 -measure of ϕ w.r.t. μ M , and set:

$$\displaystyle{ \tilde{m}_{M}(.) =\mathop{\mathrm{ arg}\min }\limits_{\phi \in \varPhi } \frac{1} {M}\sum _{i=1}^{M}\vert \phi (O_{ i}) - R_{i}\vert ^{2}. }$$
(70)

Then

$$\displaystyle{\mathbb{E}(\vert \tilde{m}_{M} - m\vert _{M}^{2}) \leq \sigma ^{2} \frac{K} {M} + \min _{\phi \in \varPhi }\vert \phi - m\vert _{L_{2}(\mu )}^{2}.}$$

The first term in the r.h.s. above is interpreted as a statistical error Footnote 23 term (due to a finite sample to compute the empirical coefficients), while the second term is an approximation error of the functions class Footnote 24 (due to finite-dimensional vector space). The first term converges to 0 as M → + but it blows up if K → +, while the second one converges to 0 as K → + (as least if Φ asymptotically spans all the functions in L 2(μ)). This bias-variance decomposition shows that there is a necessary trade-off between K and M to ensure a convergent approximation. Without this right balance, the approximation (70) may be not convergent. Furthermore, the parameter tuning can also be made optimally.

In the quoted reference [42], the space Φ could also depend on the simulations (data-driven approximation spaces).

Proof.

Assume that

$$\displaystyle{ \mathbb{E}\Big(\vert \tilde{m}_{M} - m\vert _{M}^{2}\big\vert O_{ 1},\cdots \,,O_{M}\Big) \leq \sigma ^{2} \frac{K} {M} +\min _{\phi \in \varPhi }\vert \phi - m\vert _{M}^{2}. }$$
(71)

Then, the announced result directly follows by taking expectations and observing that

$$\displaystyle{\mathbb{E}\big(\min _{\phi \in \varPhi }\vert \phi - m\vert _{M}^{2}) \leq \min _{\phi \in \varPhi }\mathbb{E}(\vert \phi - m\vert _{M}^{2}) =\min _{\phi \in \varPhi }\vert \phi - m\vert _{L_{2}(\mu )}^{2}.}$$

We now prove (71). As far as computations conditionally on O 1, ⋯ , O M are concerned, without loss of generality we can assume that \((\phi _{1},\ldots \phi _{K_{M}})\) is an orthonormal family in L 2(μ M), with possibly K M  ≤ K:

$$\displaystyle{ \frac{1} {M}\sum _{i=1}^{M}\phi _{ k}(O_{i})\phi _{l}(O_{i}) =\delta _{k,l}.}$$

Consequently, the solution \(\mathop{\mathrm{arg}\min }\limits_{\phi \in \varPhi } \frac{1} {M}\sum _{i=1}^{M}\vert \phi (O_{ i}) - R_{i}\vert ^{2}\) is given by

$$\displaystyle{\tilde{m}_{M}(.) =\sum _{ j=1}^{K_{M} }\alpha _{j}\phi _{j}(.)\quad \text{ with }\quad \alpha _{j} = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})R_{i}.}$$

Now, set \(\mathbb{E}^{{\ast}}(.) = \mathbb{E}(.\vert O_{1},\cdots \,,O_{M})\). Then, observe that \(\mathbb{E}^{{\ast}}(\tilde{m}_{M}(.))\) is the least-squares solution to \(\mathop{\min }\limits_{\phi \in \varPhi } \frac{1} {M}\sum _{i=1}^{M}\vert \phi (O_{ i}) - m(O_{i})\vert ^{2} =\mathop{\min }\limits_{\phi \in \varPhi }\vert \phi - m\vert _{ M}^{2}\). Indeed:

  • On the one hand, the above least-squares solution is given by \(\sum _{j=1}^{K_{M}}\alpha _{j}^{{\ast}}\phi _{j}(.)\) with \(\alpha _{j}^{{\ast}} = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})m(O_{i})\).

  • On the other hand, \(\mathbb{E}^{{\ast}}(\tilde{m}_{M}(.)) =\sum _{ j=1}^{K_{M}}\mathbb{E}^{{\ast}}(\alpha _{j})\phi _{j}(.)\) and \(\mathbb{E}^{{\ast}}(\alpha _{j}) = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})\mathbb{E}^{{\ast}}(R_{ i}) = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})\mathbb{E}(m(O_{i}) +\epsilon _{i}\vert O_{1},\cdots \,,O_{M}) =\alpha _{ j}^{{\ast}}\).

Thus, by the Pythagoras theorem, we obtain

$$\displaystyle\begin{array}{rcl} \vert \tilde{m}_{M} - m\vert _{M}^{2}& =& \vert \tilde{m}_{ M} - \mathbb{E}^{{\ast}}(\tilde{m}_{ M})\vert _{M}^{2} + \vert \mathbb{E}^{{\ast}}(\tilde{m}_{ M}) - m\vert _{M}^{2}, {}\\ \mathbb{E}^{{\ast}}\vert \tilde{m}_{ M} - m\vert _{M}^{2}& =& \mathbb{E}^{{\ast}}\vert \tilde{m}_{ M} - \mathbb{E}^{{\ast}}(\tilde{m}_{ M})\vert _{M}^{2} + \vert \mathbb{E}^{{\ast}}(\tilde{m}_{ M}) - m\vert _{M}^{2} {}\\ & =& \mathbb{E}^{{\ast}}\vert \tilde{m}_{ M} - \mathbb{E}^{{\ast}}(\tilde{m}_{ M})\vert _{M}^{2}+\mathop{\min }\limits_{\phi \in \varPhi }\vert \phi - m\vert _{ M}^{2}. {}\\ \end{array}$$

Since (ϕ j ) j is orthonormal in L 2(μ M ), we have \(\vert \tilde{m}_{M} - \mathbb{E}^{{\ast}}(\tilde{m}_{M})\vert _{M}^{2} =\sum _{ j=1}^{K_{M}}\vert \alpha _{j} - \mathbb{E}^{{\ast}}(\alpha _{j})\vert ^{2}.\) Since \(\alpha _{j} - \mathbb{E}^{{\ast}}(\alpha _{j}) = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})(R_{i} - m(O_{i}))\), we obtain

$$\displaystyle\begin{array}{rcl} \mathbb{E}^{{\ast}}\vert \tilde{m}_{ M}\,-\,\mathbb{E}^{{\ast}}(\tilde{m}_{ M})\vert _{M}^{2}& =& \sum _{ j=1}^{K_{M} } \frac{1} {M^{2}}\mathbb{E}^{{\ast}}\sum _{ i,l=1}^{M}\phi _{ j}(O_{i})\phi _{j}(O_{l})(R_{i}\,-\,m(O_{i}))(R_{l}\,-\,m(O_{l}) {}\\ & =& \sum _{j=1}^{K_{M} } \frac{1} {M^{2}}\sum _{i=1}^{M}\phi _{ j}^{2}(O_{ i})\mathbb{V}\mathrm{ar}(R_{i}\vert O_{i}) {}\\ \end{array}$$

taking advantage that the (ε i ) i conditionally on (O 1, ⋯O M ) are centered. This proves

$$\displaystyle{\mathbb{E}^{{\ast}}\vert \tilde{m}_{ M} - \mathbb{E}^{{\ast}}(\tilde{m}_{ M})\vert _{M}^{2} \leq \sigma ^{2}\sum _{ j=1}^{K_{M} } \frac{1} {M^{2}}\sum _{i=1}^{M}\phi _{ j}^{2}(O_{ i}) =\sigma ^{2}\frac{K_{M}} {M} \leq \sigma ^{2} \frac{K} {M}.}$$

The proof of (71) is complete. □ 

5.3.3 Least-Squares Method for Solving Discrete BSDE

We now apply the previous empirical least-squares method to numerically solve the DPE (68). For simplicity of exposure, we consider here only uniform time grids with N time steps, \(\varDelta _{i} = T/N\). In addition to assumptions of Theorem 23, we assume that the terminal condition f(. ) is bounded: then, we can easily establish the following result.

Proposition 20.

Under these assumptions, the function y i π (.) defined in (69) is bounded by a constant C , which is independent on N and i.

Actually, C can be given explicitly in terms of the data. To force the stability in the iterative computations of conditional expectations (68), we truncate the numerical solution at the level C using the soft thresholding

$$\displaystyle{[\psi ]_{C_{\star }}:= -C_{\star } \vee \psi \wedge C_{\star }.}$$

Algorithm for Approximating y k π (⋅). At each time index 0 ≤ k ≤ N − 1, we consider a vector space Φ k spanned by basis functions p k (⋅ ), which are understood as vectors of K k functions. The final approximation of y k π(⋅ ) has the form

$$\displaystyle\begin{array}{rcl} y_{k}^{\pi,M}(\cdot ) = [\alpha _{ k}^{M} \cdot p_{ k}(\cdot )]_{C_{\star }}.& & {}\\ \end{array}$$

The coefficients α k M are computed with M independent simulations of \((X_{t_{k}}^{\pi })_{ k}\), that are denoted by \(\{(X_{t_{k}}^{\pi,m})_{ k}\}_{1\leq m\leq M}\): this single set of simulated paths are used to compute all the coefficients at once. This is done as follows:

  • Initialization: for k = N, take y N π(⋅ ) = f(⋅ ).

  • Iteration: for \(k = N - 1,\cdots \,,0\), solve the least-squares problem

    $$\displaystyle{\alpha _{k}^{M} =\mathop{\mathrm{ arg}\min }\limits_{\alpha \in \mathbb{R}^{K_{k} }}\sum _{m=1}^{M}\vert y_{ k+1}^{\pi,M}(X_{ t_{k+1}}^{\pi,m})+\varDelta _{ k}g(t_{k},X_{t_{k}}^{\pi,m},y_{ k+1}^{\pi,M}(X_{ t_{k+1}}^{\pi,m}))-\alpha \cdot p_{ k}(X_{t_{k}}^{\pi,m})\vert ^{2}}$$

    and define \(y_{k}^{\pi,M}(\cdot ) = [\alpha _{k}^{M} \cdot p_{k}(\cdot )]_{C_{\star }}\).

Error Analysis. We now turn to the error estimates. The analysis combines the BSDE techniques (a priori estimates using stochastic calculus), regression tools as those exposed in Sect. 5.3.2, but there is a slight difference which actually requires a significant improvement in the arguments. Since we use a single set of independent paths, the “responses” \((y_{k+1}^{\pi,M}(X_{t_{k+1}}^{\pi,m}))_{ 1\leq m\leq M}\) are not independent, because of their dependence through the function y k+1 π, M. To overcome this interdependence issue in the proof, we shall replace the random function y k+1 π, M by a deterministic neighbor: of course, there is a complexity cost to cover the different function spaces in order to provide close neighbors, and the covering numbers are well controlled using the Vapnik–Chervonenkis dimension, when the function spaces are bounded (Proposition 20). This is the technical reason why we consider bounded functions. We now state a result regarding the global error, see [30, Theorem VIII.3.4] for full details.

Theorem 25.

Under the previous notations and assumptions, there is a constant C > 0 (independent on N) such that we have

$$\displaystyle\begin{array}{rcl} \max _{0\leq k\leq N}\mathbb{E}\vert Y _{t_{k}}^{\pi }-y_{k}^{\pi,M}(X_{ t_{k}}^{\pi })\vert ^{2}& \leq & C\sum _{ k=0}^{N-1}\bigg\{N\mathop{\underbrace{ \frac{K_{k}} {M} }}\limits _{\text{statistical error}}+\mathop{\underbrace{\min _{\phi \in \varPhi _{k}}\mathbb{E}\vert y_{k}^{\pi }(X_{t_{k}}^{\pi }) -\phi (X_{ t_{k}}^{\pi })\vert ^{2}}}\limits _{ \text{approximation error of functions class}}\bigg\} {}\\ & & +C\max _{0\leq k\leq N}\mathop{\underbrace{ \sqrt{\frac{K_{k } \log (M)} {M}} }}\limits _{\text{interdependence error}}. {}\\ \end{array}$$

When the Z-component has to be approximated as well, the estimates are slightly modified, see [54] for details.

Parameter Tuning. We conclude this analysis by providing an example of how to choose appropriately the parameters N, K k and M. Assume that the value function y π is Lipschitz continuous, uniformly in N (which usually follows from a Lipschitz terminal condition). Our objective is to achieve a global error of order \(\varepsilon = \frac{1} {N}\) for \(\max _{0\leq k\leq N}\mathbb{E}\vert Y _{t_{k}}^{\pi } - y_{ k}^{\pi,M}(X_{ t_{k}}^{N})\vert ^{2}\), i.e. the same error magnitude than the time-discretization error.

For the vector spaces Φ k , we consider those generated by functions that are constant on disjoint hypercubes of small edge. Since X π has exponential moments, we can restrict the partitioning to a compact set of \(\mathbb{R}^{d}\) with size clog(N) in any direction, and the induced error is smaller than N −1 provided that c is large enough. If the edge of the hypercube is like N −1, the vector spaces have dimension K k  ∼ N d up to logarithmic factors: then, the terms from approximation error of functions class are O(N −2) and they sum up to give a contribution O(N −1) as required. A quick inspection of the upper bounds of Theorem 25 shows that the highest constraint on M comes from the statistical error: we obtain M ∼ cN 3+d, up to logarithmic terms. The complexity of the scheme is of order NM (still neglecting the log terms), because the computation of all regression coefficients at a given date has a computational cost O(Mlog(N)) due to our specific choice of function basis. Hence, the global complexity is

$$\displaystyle{\mathcal{C}\sim \varepsilon ^{- \frac{1} {4+d} }}$$

up to logarithmic terms. Not surprisingly, the convergence order deteriorates as the dimension increases, this is the curse of dimensionality. Had the value function been smoother, we would have used local polynomials and the convergence order would have been improved: the smoother the functions, the better the convergence rate.

In practice, the algorithm has been performed on a computer up to dimension d = 10 with satisfactory results and rather short computational times (less than 1 min). There are several possible improvements to this basic version of the algorithm.

  • We can use variance reduction techniques, see [8, 9].

  • Instead of writing the DPE between t i and t i+1, it can be written between t i and T: it has the surprising effect (mathematically justified) to reduce the propagation of errors in the DPE. This scheme is called MDP scheme (for Multi step forward Dynamic Programming equation) and it is studied in [39].

Complementary References. For theoretical aspects, see [16, 56, 59, 61]; for applications, see [17, 18]; for numerics, see [5, 7, 11, 14, 32, 40, 54, 69]. This list is not exhaustive.