Abstract
I give a pedagogical introduction to Brownian motion, stochastic calculus introduced by Itô in the fifties, following the elementary (at least not too technical) approach by Föllmer [Seminar on Probability, XV (Univ. Strasbourg, Strasbourg, 1979/1980) (French), pp. 143–150. Springer, Berlin, 1981]. Based on this, I develop the connection with linear and semi-linear parabolic PDEs. Then, I provide and analyze some Monte Carlo methods to approximate the solution to these PDEs. This course is aimed at master students, Ph.D. students and researchers interesting in the connection of stochastic processes with PDEs and their numerical counterpart. The reader is supposed to be familiar with basic concepts of probability (say first chapters of the book Probability essentials by Jacod and Protter [Probability Essentials, 2nd edn. Springer, Berlin, 2003]), but no a priori knowledge on martingales and stochastic processes is required.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
- Stochastic Calculus
- Arithmetic Brownian Motion
- finite Quadratic Variation
- Stochastic Integral
- Complementary Reference
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 The Brownian Motion and Related Processes
1.1 A Brief History of Brownian Motion
Historically, the Brownian motion (BM in short) is associated with the analysis of motions which time evolution is so disordered that it seems difficult to forecast their evolution, even in a very short time interval. It plays a central role in the theory of random processes, because in many theoretical and applied problems, the Brownian motion (or the diffusion processes that are built from Brownian motion) provides simple limit models on which many calculations can be made.
In 1827, the English botanist Robert Brown (1773–1858) first described the erratic motion of fine organic particles in suspension in a gas or a fluid. At the nineteenth century, after him, several physicists had admitted that this motion is very irregular and does not seem to admit a tangent; thus one could not speak of his speed, nor apply the laws of mechanics to him! In 1900 [4], Louis Bachelier (1870–1946) introduced the Brownian motion to model the dynamics of the stock prices, but his approach then is forgotten until the sixties…His Ph.D. thesis, Théorie de la spéculation, is the starting point of modern finance.
But Physics is the field at the beginning of the twentieth century which is at the origin of great interest for this process. In 1905, Albert Einstein (1879–1955) built a probabilistic model to describe the motion of a diffusive particle: he found that the law of the particle position at the time t, given the initial state x, admits a density which satisfies the heat equation, and actually it is Gaussian. Its theory is then quickly confirmed by experimental measurements of satisfactory diffusion constants. The same year as Einstein, a discrete version of the Brownian motion is proposed by the Polish physicist Smoluchowski using random walks.
In 1923, Norbert Wiener (1894–1964) built rigorously the random function that is called Brownian motion; he established in particular that the trajectories are continuous. By 1930, while following an idea of Paul Langevin, Ornstein and Uhlenbeck studied the Gaussian random function which bears their name and which seems to be the stationary or mean-reverting equivalent model associated to the Brownian motion.
It is the beginning of a very active theoretical research in Mathematics. Paul Lévy (1886–1971) discovered then, with other mathematicians, many properties of the Brownian motion [55] and introduced a first form of the stochastic differential equations, the study of which is later systematized by K. Itô (1915–2008). His work is gathered in a famous treaty published in 1948 [44] which is usually referred to as Itô stochastic calculus.
But History knows sometimes incredible bounces. Indeed in 2000, the French Academy of Science opened a manuscript remained sealed since 1940 pertaining to the young mathematician Doeblin (1915–1940), a French telegraphist died during the German offensive. Doeblin was already known for his remarkable achievements in the theory of probability due to his works on the stable laws and the Markov processes. This sealed manuscript gathered in fact his recent research, written between November 1939 and February 1940: it was actually related to his discovery (before Itô) of the stochastic differential equations and their relations with the Kolmogorov partial differential equations. Perhaps the Itô stochastic calculus could have been called Doeblin stochastic calculus…
1.2 The Brownian Motion and Its Paths
In the following, we study the basic properties of the Brownian motion and its paths.
1.2.1 Definition and Existence
The very erratic path which is a specific feature of the Brownian motion is in general associated with the observation that the phenomenon, although very disordered, has a certain time homogeneity, i.e. the origin date does not have importance to describe the time evolution. These properties underly the next definition.
Definition 1 (of Standard Brownian Motion).
A standard Brownian motion is a random process {W t ; t ≥ 0} with continuous paths, such that
-
W 0 = 0.
-
The time increment W t − W s with 0 ≤ s < t has the Gaussian law,Footnote 1 with zero mean and variance equal (t − s).
-
For any 0 = t 0 < t 1 < t 2 …. . < t n , the increments \(\{W_{t_{i+1}} - W_{t_{i}};0 \leq i \leq n - 1\}\) are independentFootnote 2 random variables.
There are important remarks following from the definition.
-
1.
The state W t of the system at time t is distributed as a Gaussian r.v. with mean 0 and variance t (increasing as time gets larger). Its probability density is
$$\displaystyle{ \mathbb{P}(W_{t} \in [x,x + \mathrm{d}x]) = g(t,x)\mathrm{d}x = \frac{1} {\sqrt{2\pi t}}\exp (-x^{2}/2t)\mathrm{d}x. }$$(1) -
2.
With probability 95 %, we have \(\vert W_{t}\vert \leq 1.96\sqrt{t}\) (see Fig. 1) for a given time t. However, it may occur that W goes out this confidence interval.
-
3.
The random variable W t , as the sum of its increments, can be decomposed as a sum of independent Gaussian r.v.: this property serves as a basis from the further stochastic calculus.
Theorem 1.
The Brownian motion exists!
Proof.
There are different constructive ways to prove the existence of Brownian motion. Here, we use a Fourier based approach (proposed by Wiener), showing that W can be represented as a superposition of Gaussian signals. Also, we use a equivalent characterization of Brownian motion as a Gaussian processFootnote 3 with zero mean and covariance function \(\mathbb{C}\mathrm{ov}(W_{t},W_{s}) =\min (s,t) = s \wedge t\).
Let (G m ) m ≥ 0 be a sequence of independent Gaussian r.v. with zero mean and unit variance and set
We now show that W is a Brownian motion on [0, π]; then it is enough to concatenate and sum up such independent processes to get finally a Brownian motion defined on \(\mathbb{R}^{+}\). We sketch the proof of our statement on W. First, the series is a.s. Footnote 4 convergent since this is a Cauchy sequence in L 2: indeed, thanks to the independence of the Gaussian random variables, we have
The partial sum has a Gaussian distribution, thus the a.s. limitFootnote 5 too. The same argument gives that W is a Gaussian process. It has zero mean and its covariance is the limit of the covariance of partial sums: thus
The above series is equal to min(s, t) for (s, t) ∈ [0, π]2, by a standard computation of the Fourier coefficients of the function \(t \in [-\pi,\pi ]\mapsto \min (s,t)\) (for s fixed). The proof of continuity of W is based on the uniform convergence of the function series along some appropriate subsequences, which we do not detail (see [45, pp. 21–22]). □
In many applications, it is useful to consider non standard Brownian motion.
Definition 2 (of Arithmetic Brownian Motion).
An arithmetic Brownian motion (ABM in short) is a random process {X t ; t ≥ 0} where \(X_{t} = x_{0} + bt +\sigma W_{t}\) and
-
W is a standard Brownian motion.
-
\(x_{0} \in \mathbb{R}\) is the starting value of X.
-
\(b \in \mathbb{R}\) is the drift parameter.
-
\(\sigma \in \mathbb{R}\) is the diffusion parameter.
Usually, σ can be taken non-negative due to the symmetry of Brownian motion (see Proposition 1). X is still a Gaussian process, which position X t at time t is distributed as \(\mathcal{N}(x_{0} + bt,\sigma ^{2}t)\) (Fig. 2).
1.2.2 First Easy Properties of the Brownian Path
Proposition 1.
Let \(\{W_{t};t \in \mathbb{R}^{+}\}\) a standard Brownian motion.
-
i)
Symmetry property: \(\{-W_{t};t \in \mathbb{R}^{+}\}\) is a standard Brownian motion.
-
ii)
Scaling property: for any c > 0, \(\{W_{t}^{c};t \in \mathbb{R}^{+}\}\) is a standard Brownian motion where
$$\displaystyle{ W_{t}^{c} = c^{-1}W_{ c^{2}t}. }$$(2) -
iii)
Time reversal: for any fixed T, \(\hat{W}_{t}^{T} = W_{T} - W_{T-t}\) defines a standard Brownian motion on [0,T].
-
iv)
Time inversion: \(\{\hat{W}_{t} = \mathit{tW }_{1/t},t > 0,\quad \hat{W}_{0} = 0\}\) is a standard Brownian motion.
The scaling property is important and illustrates the fractal feature of Brownian motion path: \(\varepsilon\) times W t behaves like a Brownian motion at time \(\varepsilon ^{2}t\).
Proof.
It is a direct verification of the Brownian motion definition, related to independent, stationary and Gaussian increments. The continuity is also easy to verify, except for the case iv) at time 0. For this, we use that \(\lim \limits _{t\rightarrow 0^{+}}\mathit{tW }_{1/t} =\lim \limits _{s\rightarrow +\infty }\frac{W_{s}} {s} = 0\), see Proposition 7. □
1.3 Time-Shift Invariance and Markov Property
Previously, we have studied simple spatial transformation of Brownian motion. We now consider time-shifts, by first considering deterministic shifts.
Proposition 2 (Invariance by a Deterministic Time-Shift).
The Brownian Motion shifted by h ≥ 0, given by \(\{\bar{W}_{t}^{h} = W_{t+h} - W_{h};t \in \mathbb{R}^{+}\}\) , is another Brownian motion, independent of the Brownian Motion stopped at h, {W s ;s ≤ h}.
In other words, \(\{W_{t+h} = W_{h} +\bar{ W}_{t}^{h};t \in \mathbb{R}^{+}\}\) is a Brownian motion starting from W h . The above property is associated to the weak Markov property which states (possibly applicable to other processes) that the distribution of W after h conditionally on the past up to time h depends only on the present value W h .
Proof.
The Gaussian property of \(\bar{W}^{h}\) is clear.
The independent increments of W induce those of \(\bar{W}^{h}\).
It remains to show the independence w.r.t. the past up to h, i.e. the sigma-field generated by {W s ; s ≤ h}, or equivalently w.r.t. the sigma-field generated by \(\{W_{s_{1}},\ldots W_{s_{N}}\}\) for any \(0 \leq s_{1} \leq \ldots \leq s_{N} \leq h\). The independence of increments of W ensures that \((\bar{W}_{t_{1}}^{h},\bar{W}_{t_{2}}^{h}-\bar{W}_{t_{1}}^{h},\cdots \,,\bar{W}_{t_{k}}^{h}-\bar{W}_{t_{k-1}}^{h}) = (W_{t_{1}+h}-W_{h},\cdots \,,W_{t_{k}+h}-W_{t_{k-1}+h})\) is independent of \((W_{s_{1}},W_{s_{2}} - W_{s_{1}},\cdots \,,W_{s_{j}} - W_{s_{j-1}})\). Then \((\bar{W}_{t_{1}}^{h},\bar{W}_{t_{2}}^{h},\cdots \,,\bar{W}_{t_{k}}^{h})\) is independent of {W s ; s ≤ h}. □
As a consequence, we can derive a nice symmetry result making the connection between the maximum of Brownian motion monitored along a finite time grid \(t_{0} = 0 < t_{1} < \cdots < t_{N} = T\) and that of W T only.
Proposition 3.
For any y ≥ 0, we have
Proof.
The equality at the r.h.s. comes from the symmetric distribution of W T . Now we show the inequality on the left. Denote by t y ∗ the first time t j when W reaches the level y. Notice that \(\{\sup _{i\leq N}W_{t_{i}} \geq y\} =\{ t_{y}^{{\ast}}\leq T\}\) and \(\{t_{y}^{{\ast}} = t_{j}\} =\{ W_{t_{i}} < y,\forall i < j,W_{t_{j}} \geq y\}\). For each j < N, the symmetry of Brownian increments gives \(\mathbb{P}[W_{T} - W_{t_{j}} \geq 0] = \frac{1} {2}\). Since the shifted Brownian motion \((\bar{W}_{t}^{t_{j}} =\bar{ W}_{t_{ j}+t} - W_{t_{j}}: t \in \mathbb{R}^{+})\) is independent of (W s : s ≤ t j ), we have
At the two last lines, we have used \(\{W_{t_{j}} \geq y,W_{T} - W_{t_{j}} \geq 0\} \subset \{ W_{t_{j}} \geq y,W_{T} \geq y\}\) and \(\{W_{T} \geq y\} \subset \{ t_{y}^{{\ast}}\leq T\}\). □
Taking a grid with time step T∕N with N → +∞, we have \(\sup _{i\leq N}W_{t_{i}} \uparrow \sup _{0\leq t\leq T}W_{t_{i}}\). Then, we can pass to the limit (up to some probabilistic convergence technicalities) in the inequality (3) to get
Actually, the inequality (4) is an equality: it is proved later in Proposition 5.
Now, our aim is to extend Proposition 2 to the case of stochastic time-shifts h. Without extra assumption on h, the result is false in general: a counter-example is the last passage time of W at zero before the time 1 (\(L =\sup \{ t \leq 1: W_{t} = 0\}\)), which does not satisfy the property. Indeed, since \((W_{s+L} - W_{s})_{s\geq 0}\) do not vanish a.s. at short time (due to the definition of L), the marginal distribution can not be Gaussian and the time-shifted process can not be a Brownian motion.
The right class for extension is the class of stopping times, defined as follows.
Definition 3 (Stopping Time).
A stopping time is non-negative random variable U (taking possibly the value + ∞), such that for any t ≥ 0, the event {U ≤ t} depends only on the Brownian motion values {W s ; s ≤ t}.
The stopping time is discrete if it takes only a countable set of values (u 1, ⋯ , u n , ⋯ ).
In other words, it suffices to observe the Brownian motion until time t to know whether or not the event {U ≤ t} occurs. Of course, deterministic times are stopping times. A more interesting example is the first hitting time of a level y > 0
it is a stopping time, since \(\{T_{y} \leq t\} =\{ \exists s \leq t,W_{s} = y\}\) owing to the continuity of W. Observe that the counter-example of last passage time L is not a stopping time.
Proposition 4.
Let U be a stopping time. On the event {U < +∞}, the Brownian motion shifted by U ≥ 0, i.e. \(\{\bar{W}_{t}^{U} = W_{t+U} - W_{U};t \in \mathbb{R}^{+}\}\) , is a Brownian motion independent of {W t ;t ≤ U}.
This result is usually referred to as the strong Markov property.
Proof.
We show that for any 0 ≤ t 1 < ⋯ < t k , any 0 ≤ s 1 < ⋯ < s l , any (x 1, ⋯ , x k ) and any measurable sets (B 1, ⋯ , B l−1), we have
where W ′ is a Brownian motion independent of W. We begin with the easier case where U is a discrete stopping time valued in (u n ) n ≥ 1: then
applying at the last equality but one the time-shift invariance with deterministic shift u n . For the general case for U, we apply the result to the discrete stopping time \(U_{n} = \frac{[\mathit{nU}]+1} {n}\), and then pass to the limit using the continuity of W. □
1.4 Maximum, Behavior at Infinity, Path Regularity
We apply the strong Markov property to identify the law of the Brownian motion maximum.
Proposition 5 (Symmetry Principle).
For any y ≥ 0 and any x ≤ y, we have
Proof.
Denote by \(T_{y} =\inf \{ t > 0: W_{t} \geq y\}\) and + ∞ if the set is empty. Observe that T y is a stopping time and that \(\{\sup _{t\leq T}W_{t} \geq y;W_{T} \leq x\} =\{ T_{y} \leq T;W_{T} \leq x\}\). By Proposition 4, on {T y ≤ T}, \((W_{T_{y}+t} =\bar{ W}_{t}^{T_{y}} + y: t \in \mathbb{R}^{+})\) is a Brownian motion starting from y, independent of (W s : s ≤ T y ). By symmetry (see Fig. 3), the events {T y ≤ T, W T < x} and {T y ≤ T, W T > 2y − x} has the same probability. But for x ≤ y, we have \(\{T_{y} \leq T,W_{T} > 2y - x\} =\{ W_{T} > 2y - x\}\) and the first result is proved.
For the second result, take y = x and write \(\mathbb{P}[\sup _{t\leq T}W_{t} \geq y] = \mathbb{P}[\sup _{t\leq T}W_{t} \geq y,W_{T} > y]+\mathbb{P}[\sup _{t\leq T}W_{t} \geq y,W_{T} \leq y] = \mathbb{P}[W_{T} > y]+\mathbb{P}[W_{T} \geq y] = 2\mathbb{P}(W_{T} \geq y) = \mathbb{P}(\vert W_{T}\vert \geq y)\). □
As a consequence of the identification of the law of the maximum up to a fixed time, we prove that the range of Brownian motion becomes \(\mathbb{R}\) at time goes to infinity.
Proposition 6.
With probability 1, we have
Proof.
For T ≥ 0, set \(M_{T} =\sup _{t\leq T}W_{t}\). As T↑ +∞, it defines a sequence of increasing r.v., thus converging a.s. to a limit r.v. M ∞ . Applying twice the monotone convergence theorem, we obtain
using (7). This proves that \(\limsup \limits _{t\rightarrow +\infty }W_{t} = +\infty \) a.s. and a symmetry argument gives the liminf. □
However, the increasing rate of W is sublinear as time goes to infinity.
Proposition 7.
With probability 1, we have
Proof.
The strong law of large numbers yields that \(\frac{W_{n}} {n} = \frac{1} {n}\sum _{i=1}^{n}(W_{ i} - W_{i-1})\) converges a.s. to \(\mathbb{E}(W_{1}) = 0\). The announced result is thus proved along the sequence of integers. To fill the gaps between integers, set \(\tilde{M}_{n} =\sup _{n<t\leq n+1}(W_{t} - W_{n})\) and \(\tilde{M}_{n}^{{\prime}} =\sup _{n<t\leq n+1}(W_{n} - W_{t})\): due to Proposition 5, \(\tilde{M}_{n}\) and \(\tilde{M}_{n}^{{\prime}}\) have the same distribution as | W 1 | . Then, the Chebyshev inequality writes
implying that \(\sum _{n\geq 0}\mathbb{P}(\vert \tilde{M}_{n}\vert + \vert \tilde{M}_{n}\vert \geq n^{3/4}) < +\infty \). Thus, by Borel–Cantelli’s lemma, we obtain that with probability 1, for n large enough \(\vert \tilde{M}_{n}\vert + \vert \tilde{M}_{n}^{{\prime}}\vert < n^{3/4}\), i.e. \(\frac{\tilde{M}_{n}} {n}\) and \(\frac{\tilde{M}_{n}^{{\prime}}} {n}\) both converge a.s. to 0. □
By time inversion, \(\hat{W}_{t} = tW_{1/t}\) is another Brownian motion: the \(\hat{W}\)-growth in infinite time gives an estimate on W at 0, which writes
which shows that W is not differentiable at time 0. By time-shift invariance, this is also true at any given time t. The careful reader may notice that the set of full probability measure depends on t and it is unclear at this stage if a single full set is available for any t, i.e. if
Actually, the above result holds true and it is due to Paley–Wiener–Zygmund (1933). The following result is of comparable nature: we claim that a.s. there does not exist any interval on which W is monotone.
Proposition 8 (Nowhere Monotonicity).
We have
Proof.
Define \(M_{s,t}^{\uparrow } =\{\omega: u\mapsto W_{u}(\omega )\text{ is increasing on the interval}]s,t[\}\) and M s, t ↓ similarly. Observe that
and since this is a countable union, it is enough to show \(\mathbb{P}(M_{s,t}^{\uparrow }) = \mathbb{P}(M_{s,t}^{\downarrow }) = 0\) to conclude \(\mathbb{P}(M) \leq \sum _{s,t\in \mathbb{Q},0\leq s<t}[\mathbb{P}(M_{s,t}^{\uparrow }) + \mathbb{P}(M_{s,t}^{\downarrow })] = 0\). For fixed n, set \(t_{i} = s + i(t - s)/n\), then
leveraging the symmetric distribution of the increments. Taking now n large gives \(\mathbb{P}(M_{s,t}^{\uparrow }) = 0\). We argue similarly for \(\mathbb{P}(M_{s,t}^{\downarrow }) = 0\). □
In view of this lack of smoothness, it seems impossible to define differential calculus along the paths of Brownian motion. However, as it will be further developed, Brownian motion paths enjoy a nice property of finite quadratic variations, which serves to build an appropriate stochastic calculus.
There are much more to tell about the properties of Brownian motion. We mention few extra properties without proof:
-
Holder regularity: for any \(\rho \in (0, \frac{1} {2})\) and any deterministic T > 0, there exists a a.s. finite r.v. C ρ, T such that
$$\displaystyle{\forall \ 0 \leq s,t \leq T,\quad \vert W_{t} - W_{s}\vert \leq C_{\rho,T}\vert t - s\vert ^{\rho }.}$$ -
Law of iterated logarithm: setting \(h(t) = \sqrt{2t\log \log t^{-1}}\), we have
$$\displaystyle{\limsup _{t\downarrow 0} \frac{W_{t}} {h(t)} = 1\quad \textit{a.s.}\quad \text{and}\quad \liminf _{t\downarrow 0} \frac{W_{t}} {h(t)} = -1\quad \textit{a.s.}}$$ -
Zeros of Brownian motion: the set \(\chi =\{ t \geq 0: W_{t} = 0\}\) of the zeros of W is closed, unbounded, with null Lebesgue measure and it has no isolated points.
1.5 The Random Walk Approximation
Another algorithmic way to build a Brownian motion consists in rescaling a random walk. This is very simple and very useful for numerics: it leads to the so-called tree methods and it has some connections with finite differences in PDEs.
Consider a sequence (X i ) i of independent random variables with Rademacher distribution: \(\mathbb{P}(X_{i} = \pm 1) = \frac{1} {2}\). Then
defines a random walk on \(\mathbb{Z}\). Like Brownian motion, it is a process with stationary independent increments, but it is not Gaussian. Actually S n has a binomial distribution:
A direct computation shows that \(\mathbb{E}(S_{n}) = 0\) and \(\mathbb{V}\mathrm{ar}(S_{n}) = n\). When we rescale the walk and we let n go towards infinity, we observe however that due the Central Limit Theorem, the distribution of \(\frac{S_{n}} {\sqrt{n}}\) converges to the Gaussian law with zero mean and unit variance. The fact that it is equal to the law of W 1 is not a coincidence, since it can be justified that the full trajectory of the suitably rescaled random walk converges towards that of a Brownian motion, see Fig. 4. This result is known as Donsker theorem, see for instance [12] for a proof.
Proposition 9.
Define (Y t n ) t as the piecewise constant process
The distribution of the process (Y t n ) t converges to that of a Brownian motion (W t ) t as n → +∞, i.e. for any continuous functional
The last result gives a simple way to evaluate numerically expectations of functionals of Brownian motion. It is the principle of the so-called binomial tree methods.
Link with Finite Difference Scheme. The random walk can be interpreted as a explicit FD scheme for the heat equation. We anticipate a bit on the following where the connection between Brownian motion and heat equation will be more detailed.
For \(t = \frac{i} {n}\) (\(i \in \{ 0,\ldots,n\}\)) and \(x \in \mathbb{R}\), set
The independence of (X i ) i gives
Thus, u n related to the expectation of the random walk can be read as an explicit FD scheme of the heat equation \(\partial _{t}u(t,x) = \frac{1} {2}\partial _{\mathit{xx}}^{2}u(t,x)\) and u(0, x) = f(x), with time step \(\frac{1} {n}\) and space step \(\frac{1} {\sqrt{n}}\).
1.6 Other Stochastic Processes
We present other one-dimensional processes, with continuous trajectories, which derive from the Brownian motion.
-
1.
Geometric Brownian motion: this model is popular in finance to model stocks and other assets by a positive process.
-
2.
Ornstein–Uhlenbeck process: it has important applications in physics, mechanics, economy and finance to model stochastic phenomena exhibiting mean-reverting features (like spring endowed with random forces, interest-rates or inflation, …).
-
3.
Stochastic differential equations: it gives the more general framework.
1.6.1 Geometric Brownian Motion
Definition 4.
A Geometric Brownian Motion (GBM in short) with deterministic initial value S 0 > 0, drift coefficient μ and diffusion coefficient σ, is a process (S t ) t ≥ 0 defined by
where {W t ; t ≥ 0} is a standard Brownian motion.
As the argument in the exponential has a Gaussian distribution, the random variable S t (with t fixed) is known as Lognormal.
This is a process with continuous trajectories, which takes strictly positive values. The Geometric Brownian motion is often used as a model of asset price (see Samuelson [65]): this choice is justified on the one hand, by the positivity of S and on the other hand, by the simple Gaussian properties of its returns:
-
The returns log(S t ) − log(S s ) are Gaussian with mean \((\mu -\frac{1} {2}\sigma ^{2})(t - s)\) and variance σ 2(t − s).
-
For all 0 < t 1 < t 2 …. . < t n , the relative increments \(\{\frac{S_{t_{i+1}}} {S_{t_{i}}};0 \leq i \leq n - 1\}\) are independent.
The assumption of Gaussian returns is not valid in practice but this model still serves as a proxy for more sophisticated models.
Naming μ the drift parameter may be surprising at first sight since it appears in the deterministic component as \((\mu -\frac{1} {2}\sigma ^{2})t\). Actually, a computation of expectation gives easily
The above equality gives the interpretation to μ as a mean drift term: \(\mu = \frac{1} {t} \log [\mathbb{E}(S_{t})/S_{0}]\).
1.6.2 Ornstein–Uhlenbeck Process
Let us return to physics and to the Brownian motion by Einstein in 1905. In order to propose a more adequate modeling of the phenomenon of particles diffusion, we introduce the process of Ornstein–Uhlenbeck and its principal properties.
So far we have built the Brownian motion like a model for a microscopic particle in suspension in a liquid subjected to thermal agitation. An important criticism made with this modeling concerns the assumption that displacement increments are independent and they do not take into account the effects of the particle speed due to particle inertia.
Let us denote by m the particle mass and by \(\dot{X}(t)\) its speed. Owing to Newton’s second law, the momentum change \(m\dot{X}(t +\delta (t)) - m\dot{X}(t)\) is equal to the resistance \(-k\dot{X}(t)\delta t\) of the medium during time δ t, plus the momentum change due to molecular shocks, that we assume to be with stationary independent increments and thus associated with a Brownian motion. The process thus modeled is called sometimes the physical Brownian motion. The equation for the increments becomes
Trajectories of the Brownian motion being not differentiable, the equation has to be read in an integral form
\(\dot{X}(t)\) is thus solution of the linear stochastic differential equation (known as Langevin equation)
where \(a = \frac{k} {m}\). If a = 0, we recover an arithmetic Brownian motion and to avoid this trivial reduction, we assume a ≠ 0 in the sequel. However, the existence of solution is not clear since W is not differentiable. To overcome this difficulty, set \(Z_{t} = V _{t} -\sigma W_{t}\): that leads to the new equation
which is now a linear ordinary differential equation that can be solved path by path. The variation of parameter method gives the representation of the unique solution of this equation like
The initial solution is thus
Using stochastic calculus, we will derive later (see Sect. 3.3) another convenient representation of V as follows:
using a stochastic integral not yet defined. From (10), assuming that v 0 is deterministic, we can show the following properties (see also Sect. 3.3).
-
For a given t, V t has a Gaussian distribution: indeed, as the limit of a Riemann sum, it is the a.s. limit of a Gaussian r.v., see footnote 5 page 111.
-
More generally, V is a Gaussian process.
-
Its mean is v 0 e −a t, its covariance function \(\mathbb{C}\mathrm{ov}(V _{t},V _{s}) = e^{-a(t-s)} \frac{\sigma ^{2}} {2a}(1 - e^{-2\mathit{as}})\) for t > s.
Observe that for a > 0, the Gaussian distribution of V t converges to \(\mathcal{N}(0, \frac{\sigma ^{2}} {2a})\) as t → +∞: it does not depend anymore on v 0 and illustrates the mean-reverting feature of this model, see Fig. 5.
1.6.3 Stochastic Differential Equations and Euler Approximations
The previous example gives the generic form of a Stochastic Differential Equation, that generalizes the usual Ordinary Differential Equations x t ′ = b(x t ) or in integral form \(x_{t} = x_{0} +\int _{ 0}^{t}b(x_{s})\mathrm{d}s\).
Definition 5.
Let \(b,\sigma: x \in \mathbb{R}\mapsto \mathbb{R}\) be two functions, respectively the drift and the diffusion coefficient. A Stochastic Differential Equation (SDE in short) with parameter (b, σ) and initial value x is a stochastic process (X t ) t ≥ 0 solution of
where (W t ) t is a standard Brownian motion.
A slightly more general definition (not considered here) could include the case of time-dependent coefficients b(t, x) and σ(t, x), the subsequent analysis would be quite similar. In the definition above, we use a stochastic integral \(\int _{0}^{t}\ldots \mathrm{d}W_{s}\) which has not yet been defined: it will be explained in the next section. For the moment, the reader needs to know that in the simplest case where σ is constant, we simply have \(\int _{0}^{t}\sigma (X_{s})\mathrm{d}W_{s} =\sigma W_{t}\). The previous examples fit this setting:
-
The arithmetic Brownian motion corresponds to b(x) = b et σ(x) = σ.
-
The Ornstein–Uhlenbeck process corresponds to \(b(x) = -\mathit{ax}\) et σ(x) = σ.
Taking σ to be non constant allows for more general situations and more flexible models. Instead of discussing now the important issues of existence and uniqueness to such SDE, we rather consider natural approximations of them, namely the Euler scheme (which is the direct extension of Euler scheme for ODEs).
Definition 6.
Let (b, σ) be given drift and diffusion coefficients. The Euler scheme associated to the SDE with coefficients (b, σ), initial value x and time step h, is defined by
In other words, X h is a piecewise arithmetic Brownian motion with coefficients on the interval (ih, (i + 1)h] computed according to the functions (b, σ) evaluated at X ih h. In general, the law of X t h is not known analytically: at most, we can give explicit representations using an induction of the time-step. On the other hand, as it will be seen further, the random simulation of X h at time (ih) i ≥ 0 is easily performed by simulating the independent Brownian increments \((W_{(i+1)h} - W_{\mathit{ih}})\). The accuracy of the approximation of X by X h is expected to get improved as h goes to 0.
2 Feynman–Kac Representations of PDE Solutions
Our purpose in this section is to make the connection between the expectations of functionals of Brownian motion and the solution of second order linear parabolic partial differential equations (PDE in short): this leads to the well-known Feynman–Kac representations. We extend this point of view to other simple processes introduced before.
2.1 The Heat Equations
2.1.1 Heat Equation in the Whole Space
Let us return to the law of x + W t , the Gaussian density of which is
often called in this context the fundamental solution of the heat equation. One of the key properties is the property of convolution
which says in an analytical language that \(x + W_{t+s}\) is the sum of the independent Gaussian variables x + W t and \(W_{t+s} - W_{t}\). A direct calculation on the density shows that the Gaussian density is solution to the heat equation w.r.t. the two variables x and y
This property is extended to a large class of functions built from the Brownian motion.
Theorem 2 (Heat Equation with Cauchy Initial Boundary Condition).
Let f be a bounded Footnote 6 measurable function. Consider the function
the function u is infinitely continuously differentiable in space and time for t > 0 and solves the heat equation
Equation (15) is the heat equation with initial boundary condition (Cauchy problem, see [22]).
Proof.
Standard Gaussian estimates allow to differentiate u w.r.t. t or x by differentiating under the integral sign: then, we have
□
When the function considered is regular, another formulation can be given to this relation, which will play a significant role in the following.
Proposition 10.
If f is of class \(\mathcal{C}_{b}^{2}\) (bounded and twice continuously differentiable with bounded derivatives), Footnote 7 we have
or equivalently using a probabilistic viewpoint
Proof.
Write \(u(t,x,f) = \mathbb{E}[f(x + W_{t})] =\int _{\mathbb{R}}g(t,0,y)f(x + y)\mathrm{d}y =\int _{\mathbb{R}}g(t,x,z)f(z)\mathrm{d}z\) and differentiate under the integral sign: it gives
using at the first line two integration by parts and at the second line the heat equation satisfied by g. Then the probabilistic representation (16) easily follows:
□
2.1.2 Heat Equation in an Interval
We now extend the previous results in two directions: first, we allow the function f to also depend smoothly on time and second, the final time t is replaced by a stopping time U. The first extension is straightforward and we state it without proof.
Proposition 11.
Let f be a function of class \(\mathcal{C}_{b}^{1,2}\) (bounded, once continuously differentiable in time, twice in space, with bounded derivatives): we have
The second equality readily follows from Fubini’s theorem to invert \(\mathbb{E}\) and time integral: this second form is more suitable for an extension to stochastic times t.
Theorem 3.
Let f be a function of class \(\mathcal{C}_{b}^{1,2}\) , we have
for any bounded Footnote 8 stopping time U.
The above identity between expectations is far to be obvious to establish by hand since the law of U is quite general and an analytical computation is out of reach. This level of generality on U is quite interesting for applications: it provides a powerful tool to determine the distribution of hitting times, to show how often multidimensional Brownian motion visits a given point or a given set. Regarding this lecture, it gives a key tool to derive probabilistic representations of heat equation with Dirichlet boundary conditions.
Proof.
Let us start by giving alternatives of the relation (17). We observe that it could have been written with a random initial condition X 0, like for instance
with W independent of X 0 and where the event A 0 depends on X 0. Similarly, using the time-shifted Brownian motion \(\{\bar{W}_{t}^{u} = W_{t+u} - W_{u};t \in \mathbb{R}^{+}\}\) that is independent of the initial condition x + W u (Proposition 2), it leads to
for any event A u depending only of the values {W s : s ≤ u}, or equivalently
Set \(M_{t} = f(t,x + W_{t}) - f(0,x) -\int _{0}^{t}(f_{t}^{{\prime}}(s,x + W_{s}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(s,x + W_{ s}))\mathrm{d}s\): our aim is to prove \(\mathbb{E}(M_{U}) = 0\). Observe that the preliminary computation has shown that
for t ≥ 0. In particular, taking A u = Ω we obtain that the expectation \(\mathbb{E}(M_{t})\) is constantFootnote 9 w.r.t. t.
Now, consider first that U is a discrete stopping time valued in \(\{0 = u_{0} < u_{1} < \cdots < u_{n} = T\}\): then
by applying (19) since {U ≤ u k } does depend only of {W s : s ≤ u k } (by definition of a stopping time). □
Second, for a general stopping time (bounded by T), we take \(U_{n} = \frac{[\mathit{nU}]+1} {n}\) which is a stopping time converging to U: since \((M_{t})_{0\leq t\leq T}\) is bounded and continuous, the dominated convergence theorem gives \(0 = \mathbb{E}(M_{U_{n}})\mathop{\longrightarrow }\limits_{n \rightarrow \infty }\mathbb{E}(M_{U})\). □
As a consequence, we now make explicit the solutions of the heat equation in an interval and with initial condition: it is a partial generalizationFootnote 10 of Theorem 2, which characterized them in the whole space. The introduction of (non-homogeneous) boundary conditions of Dirichlet type is connected to the passage time of the Brownian motion.
Corollary 1 (Heat Equation with Cauchy–Dirichlet Boundary Condition).
Consider the PDE
If a solution u of class C b 1,2 ([0,T] × [a,b]) exists, then it is given by
where \(U = T_{a} \wedge T_{b} \wedge t\) (using the previous notation for the first passage time T y at the level y for the Brownian motion starting at x, i.e. (x + W t ) t≥0 ).
Proof.
First, extend smoothly the function u outside the interval [a, b] in order to apply previous results. The way to extend is unimportant since u and its derivatives are only evaluated inside [a, b]. Clearly U is a bounded (by t) stopping time. Apply now the equality (18) to the function \((s,y)\mapsto u(t - s,y) = v(s,y)\) of class \(C_{b}^{1,2}([0,t] \times \mathbb{R})\), satisfying \(v_{s}^{{\prime}}(s,y) + \frac{1} {2}v_{\mathit{yy}}^{{\prime\prime}}(s,y) = 0\) for (s, y) ∈ [0, t] × [a, b]. We obtain
since for s ≤ U, (s, x + W s ) ∈ [0, t] × [a, b]. To conclude, we easily check that v(0, x) = u(t, x) and \(v(U,x + W_{U}) = f(t - U,x + W_{U})\). □
2.1.3 A Probabilistic Algorithm to Solve the Heat Equation
To illustrate our purpose, we consider a toy example regarding the numerical evaluation of \(u(t,x) = \mathbb{E}(f(x + W_{t}))\) using random simulations, in order to discuss main ideas underlying to Monte Carlo methods. Actually, the arguments below apply also to \(u(t,x) = \mathbb{E}[f(t - U,x + W_{U})]\) with \(U = T_{a} \wedge T_{b} \wedge t\), although there are some extra significant issues in the simulation of (U, W U ).
For the notational simplicity, denote by X the random variable inside the expectation to compute, that is \(X = f(x + W_{t})\) in our toy example. As a difference with a PDE method (based on finite differences or finite elements), a standard Monte Carlo method provides an approximation of u(t, x) at a given point (t, x), without evaluating the values at other points. Actually, this fact holds because the PDE u is linear; in Sect. 5 related to non-linear PDEs, the situation is different.
The Monte Carlo method is able to provide a convergent, tractable approximation of u(t, x), with a priori error bounds, under two conditions.
-
1.
An arbitrary large number of independent realizations of X can be generated (denote them by (X i ) i ≥ 1): in our toy example, this is straightforward since it requires only the simulation of W t which is distributed as a Gaussian r.v. \(\mathcal{N}(0,t)\) and then we have to compute \(X = f(x + W_{t})\). The independence of simulations is achieved by using a good generator of random numbers, like the excellent Mersenne Twister Footnote 11 generator.
-
2.
Additionally, X which is already integrable (\(\mathbb{E}\vert X\vert < +\infty \)) is assumed to be square integrable: \(\mathbb{V}\mathrm{ar}(X) < +\infty \).
Then, by the law of large numbers, we have
hence the empirical mean of simulations of X provides a convergent approximation of the expectation \(\mathbb{E}(X)\). As a difference with PDE methods where some stability conditions may be required (like the Courant–Friedrichs–Lewy condition), the above Monte Carlo method does not require any extra condition to converge: it is unconditionally convergent. The extra moment condition is used to define a priori error bounds on the statistical error: the approximation error is controlled by means of the Central Limit Theorem
where G is a centered unit variance Gaussian r.v. Observe that the error bounds are stochastic: we can not do better than arguing that with probability \(\mathbb{P}(G \in [a,b])\), the unknown expectation (asymptotically as M → +∞) belongs to the interval
This is known as a confidence interval at level \(\mathbb{P}(G \in [a,b])\). The larger a and b, the larger the confidence interval, the higher the confidence probability.
To obtain a fully explicit confidence interval, one may replace \(\mathbb{V}\mathrm{ar}(X)\) by its estimator using the same simulations:
The factor \(M/(M - 1)\) plays the role of unbiasingFootnote 12 the value \(\mathbb{V}\mathrm{ar}(X)\), although it is not a big deal for M large (M ≥ 100). Anyway, we can prove that the above conditional intervals are asymptotically unchanged by taking the empirical variance σ M 2 instead of \(\mathbb{V}\mathrm{ar}(X)\). Gathering these different results and seeking a symmetric confidence interval \(-a = b = 1.96\) and \(\mathbb{P}(G \in [a,b]) \approx 95\,\%\), we obtain the following: with probability 95 %, approximatively for M large enough, we have
The symmetric confidence interval at level 99 % is given by \(-a = b = 2.58\). Since a Monte Carlo method provides random evaluations of \(\mathbb{E}(X)\), different program runs will give different results (as a difference with a deterministic method which systematically has the same output) which seems uncomfortable: that is why it is important to produce a confidence interval. This is also very powerful and useful to have at hand a numerical method able to state that the error is at most of xxx with high probability.
The confidence interval depends on
-
The confidence level \(\mathbb{P}(G \in [a,b])\), chosen by the user.
-
The number of simulations: improving the accuracy by a factor 2 requires 4 times more simulations.
-
The variance \(\mathbb{V}\mathrm{ar}(X)\) or its estimator σ M 2, which depends on the problem to handle (and not much on M as soon as M is large). This variance can be very different from one problem to another: on Fig. 6, see the width of confidence intervals for two similar computations. There exist variance reduction techniques able to significantly reduce this factor in order to provide thinner confidence intervals while maintaining the same computational cost.
Another advantage of such a Monte Carlo algorithm is the simplicity of code, consisting of one loop on the number of simulations; within this loop, the empirical variance should be simultaneously computed. However, the simulation procedure of X can be delicate in some situations, see Sect. 4.
At last, we focus our discussion on the impact of the dimension of the underlying PDE, which has been equal to 1 so far. Consider now a state variable in \(\mathbb{R}^{d}\) (d ≥ 1) and a heat equation with Cauchy initial condition in dimension d; (15) becomes
where \(\varDelta =\sum _{ i=1}^{d}\partial _{x_{i}x_{i}}^{2}\) stands for the Laplacian in \(\mathbb{R}^{d}\). Using similar arguments as in dimension 1, we check that
where \(W = \left (\begin{array}{c} W_{1}\\ \vdots \\ W_{d} \end{array} \right )\) is a d-dimensional Brownian motion, i.e. each W i is a one-dimensional Brownian motion and the d components are independent (Fig. 7).
-
The Monte Carlo computation of u(t, x) is then achieved using independent simulations of \(X = f(x + W_{t})\): the accuracy is then of order \(1/\sqrt{N}\) and the computational effort is N × d. Thus, the dimension has a very low effect on the complexity of the algorithm.
-
As a comparison with a PDE discretization scheme, to achieve an accuracy of order 1∕N, we essentiallyFootnote 13 need N points in each spatial direction and it follows that the resulting linear system to invert is of size N d: thus, without going into full details, it is clear that the computational cost to achieve a given accuracy depends much on the dimension d. And the situation becomes less and less favourable as the dimension increases. Also, the memory required to run a PDE algorithm increases exponentially with the dimension, as a difference with a Monte Carlo approach.
It is commonly admitted that a PDE approach is more suitable and efficient in dimension 1 and 2, whereas a Monte Carlo procedure is more adapted for higher dimensions. On the other hand, a PDE-based method computes a global approximation of u (at any point (t, x)), while a Monte Carlo scheme gives a pointwise approximation only. The probabilistic approach can be directly used for Parallel Computing, each processor being in charge of a bunch of simulations at a given point (t, x).
2.2 PDE Associated to Other Processes
We extend the Feynman–Kac representation for the Brownian motion to the Arithmetic Brownian Motion and the Ornstein–Uhlenbeck process.
2.2.1 Arithmetic Brownian Motion
First consider the Arithmetic Brownian motion defined by \(\{X_{t}^{x} = x + \mathit{bt} +\sigma W_{t},t \geq 0\}\). The distribution of X t is Gaussian with mean x +bt and variance σ 2 t: we assume in the following that σ ≠ 0 which ensures that its density exists and is given by
Denote by \(L_{b,\sigma ^{2}}^{\mathtt{ABM}}\) the second order operator
also called infinitesimal generator Footnote 14 of X. A direct computation using the heat equation for g(t, x, y) gives
Hence, multiplying by f(y) and integrating over \(y \in \mathbb{R}\), we obtain the following representation that generalizes Theorem 2.
Theorem 4.
Let f be a bounded measurable function. The function
solves
The extension of Propositions 10 and 11 follows the arguments used for the BM case.
Proposition 12.
If \(f \in \mathcal{C}_{b}^{1,2}\) and U is a bounded stopping time (including deterministic time), then
Theorem 4 gives the Feynman–Kac representation of the Cauchy problem written w.r.t. the second order operator \(L_{b,\sigma ^{2}}^{\mathtt{ABM}}\). When Dirichlet boundary conditions are added, Corollary 1 extends as follows, using Proposition 12.
Corollary 2.
Assume the existence of a solution u of class C b 1,2 ([0,T] × [a,b]) to the PDE
Then it is given by
where \(U^{x} =\inf \{ s > 0: X_{s}^{x}\notin ]a,b[\}\wedge t\) is the first exit time from the interval ]a,b[ by the process X x before t.
As for the standard heat equation, this representation naturally leads to a probabilistic algorithm to compute the PDE solution, by empirical mean of independent simulation of \(f(t - U^{x},X_{U^{x}}^{x})\).
2.2.2 Ornstein–Uhlenbeck Process
Now consider the process solution to \(V _{t}^{x} = x - a\int _{0}^{t}V _{s}^{x}\mathrm{d}s +\sigma W_{t}\): we emphasize in our notation the dependence w.r.t. the initial value V 0 = x. We define an appropriate second order operator
which plays the role of the infinitesimal generator for the Ornstein–Uhlenbeck process. We recall that the Gaussian distribution of V t x has mean xe −a t and variance \(\frac{\sigma ^{2}} {2a}(1 - e^{-2\mathit{at}})\), the density of which at y (assuming σ ≠ 0 for the existence) is
Using the heat equation satisfied by g, we easily derive that
from which we deduce the PDE satisfied by \(u(t,x,f) = \mathbb{E}[f(V _{t}^{x})]\). Incorporating Dirichlet boundary conditions is similar to the previous cases. We state the related results without extra details.
Theorem 5.
Let f be a bounded measurable function. The function
solves
Proposition 13.
If \(f \in \mathcal{C}_{b}^{1,2}\) and U is a bounded stopping time, then
Corollary 3.
Assume the existence of a solution u of class C b 1,2 ([0,T] × [a,b]) to the PDE
Then u is given by
where \(U^{x} =\inf \{ s > 0: V _{s}^{x}\notin ]a,b[\}\wedge t\) .
2.2.3 A Natural Conjecture for Stochastic Differential Equations
The previous examples serve as a preparation for more general results, relating the dynamics of a process and its Feynman–Kac representation. Denote X x the solution (whenever it exists) to the Stochastic Differential Equation
In view of the results in simpler models, we announce the following facts.
-
1.
Set \(L_{b,\sigma ^{2}}^{X}g = \frac{1} {2}\sigma ^{2}(x)g_{\mathit{ xx}}^{{\prime\prime}} + b(x)g_{ x}^{{\prime}}\).
-
2.
\(u(t,x) = \mathbb{E}(f(X_{t}^{x}))\) solves
$$\displaystyle{ u_{t}^{{\prime}}(t,x) = L_{ b,\sigma ^{2}}^{X}u(t,x),\quad u(0,x) = f(x). }$$ -
3.
If \(f \in \mathcal{C}_{b}^{1,2}\) and U is a bounded stopping time, then
$$\displaystyle{\mathbb{E}[f(U,X_{U}^{x})] = f(0,x) + \mathbb{E}\big[\int _{ 0}^{U}[L_{ b,\sigma ^{2}}^{X}f(s,X_{ s}^{x}) + f_{ t}^{{\prime}}(s,X_{ s}^{x})]\mathrm{d}s\big].}$$ -
4.
If u of class C b 1, 2([0, T] × [a, b]) solves the PDE
$$\displaystyle{\left \{\begin{array}{@{}l@{\quad }l@{}} u_{t}^{{\prime}}(t,x) = L_{b,\sigma ^{2}}^{X}u(t,x),\quad &\mbox{ for $t > 0$ and $x \in ]a,b[$,} \\ u(0,x) = f(0,x) \quad &\mbox{ for $t = 0$ and $x \in [a,b]$}, \\ u(t,x) = f(t,x) \quad &\mbox{ for $x = a$ or $b$, with $t \geq 0$},\end{array} \right.}$$then it is given by \(u(t,x) = \mathbb{E}[f(t - U^{x},X_{U^{x}}^{x})]\) where \(U^{x} =\inf \{ s > 0: X_{s}^{x}\notin ]a,b[\}\wedge t\).
The above result could be extended to PDE with a space variable in \(\mathbb{R}^{d}\) (d ≥ 1) by considering a \(\mathbb{R}^{d}\)-valued SDE: it would be achieved by replacing W by a d-dimensional standard Brownian motion, by having a drift coefficient \(b: \mathbb{R}^{d}\mapsto \mathbb{R}^{d}\) and a diffusion coefficient \(\sigma: \mathbb{R}^{d}\mapsto \mathbb{R}^{d} \otimes \mathbb{R}^{d}\), a reward function \(f: [0,T] \times \mathbb{R}^{d}\mapsto \mathbb{R}\), by replacing the interval [a, b] by a domain D in \(\mathbb{R}^{d}\) and defining U x as the first exit time by X x from that domain. Then the operator L would be a linear parabolic second order operator of the form
where ⊤ denotes the transpose. We could also add a zero-order term in \(L_{b,\sigma \sigma ^{\top }}^{X}\), by considering a discounting factor for f; we do not develop further this extension.
The next section provides stochastic calculus tools, that allow to show the validity of these Feynman–Kac type results, under some appropriate smoothness and growth assumptions on b, σ, f. To allow non smooth f or Dirichlet boundary conditions, we may additionally assume a non-degeneracy condition on \(L_{b,\sigma \sigma ^{\top }}^{X}\) (like ellipticity condition \(\vert \sigma \sigma ^{\top }(x)\vert \geq \frac{1} {c}\) for some c > 0).
3 The Itô Formula
One achievement of Itô’s formula is to go from an infinitesimal time-decomposition in expectation like
(see (17)) to a pathwise infinitesimal time-decomposition of
Since Brownian motion paths are not differentiable, it is hopeless to apply standard differential calculus based on usual first order Taylor formula. Instead of this, we go up to the second order, taking advantage of the fact that W has a finite quadratic variation. The approach presented below is taken from the nice paper Calcul d’Itô sans probabilité by Föllmer [19], which does not lead to the most general and deepest approach but it has the advantage of light technicalities and straightforward arguments compared to the usual tough arguments using L 2-spaces and isometry (see for instance [48] or [63] among others).
3.1 Quadratic Variation
3.1.1 Notations and Definitions
Brownian increments in a small interval [t, t + h] are centered Gaussian r.v. with variance h, which thus behave like \(\sqrt{h}\). The total variation does not exist, because the trajectories are not differentiable, but the quadratic variation has interesting properties.
To avoid convergence technicalities, we consider particular time subdivisions.
Definition 7 (Dyadic Subdivision of Order n).
Let n be an integer. The subdivision of \(\mathbb{R}^{+}\) defined by \(\mathbb{D}_{n} =\{ t_{0} < \cdots < t_{i} < \cdots \,\}\) where \(t_{i} = i2^{-n}\) is called the dyadic subdivision of order n. The subdivision step is \(\delta _{n} = 2^{-n}.\)
Definition 8 (Quadratic Variation).
The quadratic variation of a Brownian motion W associated with the dyadic subdivision of order n is defined, for t ≥ 0, by
3.1.2 Convergence
Then there is the following remarkable result.
Proposition 14 (Pointwise Convergence).
With probability 1, we have
for any \(t \in \mathbb{R}^{+}\) .
Had W been differentiable, the limit of V n would be equal to 0.
Proof.
First let us show the a.s. convergence for a fixed time t, and denote by n(t) the index of the dyadic subdivision of order n such that \(t_{n(t)} \leq t < t_{n(t)+1}.\) Then observe that \(V _{t}^{n} - t =\sum _{ j=0}^{n(t)}Z_{j} + (t_{n(t)+1} - t)\) where \(Z_{j} = (W_{t_{j+1}} - W_{t_{j}})^{2} - (t_{j+1} - t_{j})\). The term \(t_{n(t)+1} - t\) converges to 0 as the subdivision step shrinks to 0. The random variables Z j are independent, centered, square integrable (since the Gaussian law of \(W_{t_{j+1}} - W_{t_{j}}\) has finite fourth moments): additionally, the scaling property of Proposition 1 ensures that \(\mathbb{E}(Z_{j}^{2}) = C_{2}(t_{j+1} - t_{j})^{2}\) for a positive constant C 2. Thus
This proves the L 2-convergence of \(\sum _{j=0}^{n(t)}Z_{j}\) towards 0.
Moreover we obtain \(\sum _{n\geq 1}\mathbb{E}\left (\sum _{j=0}^{n(t)}Z_{j}\right )^{2} < \infty \), i.e. the random series \(\sum _{n\geq 1}\left (\sum _{j=0}^{n(t)}Z_{j}\right )^{2}\) has a finite expectation, whence a.s. finite and consequently its general term converges a.s. to 0. This shows that for any fixed t, V t n → t except on a negligible set N t .
We now extend the result to any time: first the set \(N = \cup _{t\in \mathbb{Q}^{+}}N_{t}\) is still negligible because the union of negligible sets is taken on a countable family. For an arbitrary t, take two monotone sequences of rational numbers r p ↑ t and s p ↓ t as p → +∞. Since t↦V t n is increasing for fixed n, we deduce, for any ω∉N
Passing to the limit in p gives the result. □
As a consequence, we obtain the formula giving the infinitesimal decomposition of W t 2.
Proposition 15 (A First Itô Formula).
Let W be a standard Brownian motion. With probability 1, we have for any t ≥ 0
where the stochastic integral \(\int _{0}^{t}W_{s}\mathit{dW }_{s}\) is the a.s. limit of \(\sum _{t_{i}\leq t}W_{t_{i}}(W_{t_{i+1}} - W_{t_{i}})\) , along the dyadic subdivision.
For usual C 1-function f(t), we have \(f^{2}(t) - f^{2}(0) = 2\int _{0}^{t}f(s)\mathrm{d}f(s)\): the extra term t in (27) is intrinsically related to Brownian motion paths.
Proof.
Adopting once again the notation with n(t), we have
The first term at the r.h.s. tends towards 0 by continuity of the Brownian paths. The second term is equal to V t n and converges towards t. Consequently, the third term at the right-hand side must converge a.s. towards a term that we call stochastic integral and that we denote by \(2\int _{0}^{t}W_{s}\mathrm{d}W_{s}\). □
The random function V t n, as a function of t, is increasing and can be associated to the cumulative distribution function of the positive discrete measure
satisfying \(\mu _{n}(f) =\sum _{i\geq 0}f(t_{i})(W_{t_{i+1}} - W_{t_{i}})^{2}\).
The convergence of cumulative distribution function of μ n(. ) (Proposition 14) can then be extended to integrals of continuous functions (possibly random as well). It is the purpose of the following result which is of deterministic nature.
Proposition 16 (Convergence as a Positive Measure).
For any continuous function f, with probability 1 we have
for any t ≥ 0.
The proof is standard: the result first holds for functions of the form \(f(s) = \mathbf{1}_{]r_{1},r_{2}]}(s)\), then for piecewise constant functions, at last for continuous functions by simple approximations.
3.2 The Itô Formula for Brownian Motion
Differential calculus extends to other functions than x → x 2. To the usual classical formula with functions that are smooth in time, a term should be added, due to the non-zero quadratic variation.
Theorem 6 (Itô Formula).
Let \(f \in \mathcal{C}^{1,2}(\mathbb{R}^{+} \times \mathbb{R}, \mathbb{R})\) . Then with probability 1, we have t ≥ 0
The term \(\mathcal{I}_{t}(f) =\int _{ 0}^{t}f_{x}^{{\prime}}(s,x + W_{s})\mathrm{d}W_{s}\) is called the stochastic integral of f x ′ (s,x + W s ) w.r.t. W and it is the a.s. limit of
taken along the dyadic subdivision of order n.
The reader should compare the equality (28) with (17) to see that, under the extra assumptions that f is bounded with bounded derivatives, we have proved that the stochastic integral \(\mathcal{I}_{t}(f)\) is centered:
This explains how we can expect to go from (28) to (17):
-
1.
Apply Itô formula.
-
2.
Take expectation.
-
3.
Prove that the stochastic integral is centered.
This is an interesting alternative proof to the property satisfied by the Gaussian kernel, which is difficult to extend to more general (non Gaussian) process.
Proof.
As before, let us introduce the index n(t) such that t n(t) ≤ t < t n(t)+1; then we can write
-
The second term of the r.h.s. \([f(t,x + W_{t}) - f(t_{n(t)+1},x + W_{t_{n(t)+1}})]\) converges to 0 by continuity of f(t, x + W t ).
-
The third term is analyzed by means of the first order Taylor formula:
$$\displaystyle{f(t_{i+1},x + W_{t_{i+1}}) - f(t_{i},x + W_{t_{i+1}}) = f_{t}^{{\prime}}(\tau _{ i},x + W_{t_{i+1}})(t_{i+1} - t_{i})}$$for τ i ∈ ]t i , t i+1[. The uniform continuity of (W s )0 ≤ s ≤ t+1 ensures that \(\sup _{i}\vert f_{t}^{{\prime}}(\tau _{i},x + W_{t_{i+1}}) - f_{t}^{{\prime}}(t_{i},x + W_{t_{i}})\vert \rightarrow 0\): thus \(\lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f_{t}^{{\prime}}(\tau _{i},x + W_{t_{i+1}})(t_{i+1} - t_{i})\) equals to
$$\displaystyle\begin{array}{rcl} \lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f_{t}^{{\prime}}(t_{ i},x + W_{t_{i}})(t_{i+1} - t_{i}) =\int _{ 0}^{t}f_{ t}^{{\prime}}(s,x + W_{ s})\mathrm{d}s.& & {}\\ \end{array}$$ -
A second order Taylor formula allows to write the fourth term: \(f(t_{i},x + W_{t_{i+1}}) - f(t_{i},x + W_{t_{i}})\) equals
$$\displaystyle\begin{array}{rcl} f_{x}^{{\prime}}(t_{ i},x + W_{t_{i}})(W_{t_{i+1}} - W_{t_{i}}) + \frac{1} {2}f_{\mathit{xx}}^{{\prime\prime}}(t_{ i},x +\xi _{i})(W_{t_{i+1}} - W_{t_{i}})^{2}& & {}\\ \end{array}$$where \(\xi _{i} \in (W_{t_{i}},W_{t_{i+1}})\). Similarly to before, \(\sup _{i}\vert f_{\mathit{xx}}^{{\prime\prime}}(t_{i},x +\xi _{i}) - f_{\mathit{xx}}^{{\prime\prime}}(t_{i},x + W_{t_{i}})\vert =\epsilon _{n} \rightarrow 0\) and it leads to
$$\displaystyle\begin{array}{rcl} \Big\vert \sum _{t_{i}\leq t}(f_{\mathit{xx}}^{{\prime\prime}}(t_{ i},x +\xi _{i}) - f_{\mathit{xx}}^{{\prime\prime}}(t_{ i},x + W_{t_{i}}))(W_{t_{i+1}} - W_{t_{i}})^{2}\Big\vert \leq \epsilon _{ n}V _{t}^{n},& & {}\\ \lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f_{\mathit{xx}}^{{\prime\prime}}(t_{ i},x + W_{t_{i}})(W_{t_{i+1}} - W_{t_{i}})^{2} =\int _{ 0}^{t}f_{\mathit{ xx}}^{{\prime\prime}}(s,x + W_{ s})\mathrm{d}s,& & {}\\ \end{array}$$by applying Proposition 16.
Observe that in spite of the non-differentiability of W, \(\sum _{t_{i}\leq t}f_{x}^{{\prime}}(t_{ i},x + W_{t_{i}})(W_{t_{i+1}} - W_{t_{i}})\) is necessarily convergent as a difference of convergent terms. □
Interestingly, we obtain a representation of the random variable f(x + W t ) as a stochastic integral, in terms of the derivatives of solution u to the heat equation
Corollary 4.
Assume that \(u \in \mathcal{C}_{b}^{1,2}([0,T] \times \mathbb{R})\) . We have
Proof.
Apply the Itô formula to \(v(t,x) = u(T - t,x)\) (which satisfies \(v_{t}^{{\prime}}(t,x) + \frac{1} {2}v_{\mathit{xx}}^{{\prime\prime}}(t,x) = 0\)) at time T. This gives \(f(x + W_{T}) = u(0,x + W_{T}) = u(T,x) +\int _{ 0}^{T}u_{x}^{{\prime}}(T - s,x + W_{s})\mathrm{d}W_{s}\). □
This representation formula leads to important remarks.
-
If the above stochastic integral has zero expectation (as for the examples presented before), taking the expectation shows that
$$\displaystyle{u(T,x) = \mathbb{E}(f(x + W_{T})),}$$recovering the Feynman–Kac representation of Theorem 2.
-
Then, the above representation writes, setting \(\varPsi = f(x + W_{T})\),
$$\displaystyle{\varPsi = \mathbb{E}(\varPsi ) +\int _{ 0}^{T}h_{ s}\mathrm{d}W_{s}.}$$Actually, a similar stochastic integral representation theorem holds in a larger generality on the form of Ψ, since any boundedFootnote 15 functional of (W t )0 ≤ t ≤ T can be represented as its expectation plus a stochastic integral: the process h is not tractable in general, whereas here it is explicitly related to the derivative of u along the Brownian path.
-
Assume \(u \in \mathcal{C}_{b}^{1,2}([0,T] \times \mathbb{R})\) imposes that \(f \in \mathcal{C}_{b}^{2}(\mathbb{R})\) which is too strong for many applications: however, the assumptions on u can be relaxed to handle bounded measurable function f, because the heat equation is immediately smoothing out the initial condition. The proof of this extension involves extra stochastic calculus technicalities that we do not develop.
3.3 Wiener Integral
In general, it is not possible to make explicit the law of the stochastic integral \(\int _{0}^{t}f_{x}^{{\prime}}(s,x + W_{s})\mathrm{d}W_{s}\), except in a situation where f x ′(s, x) = h(s) is independent of x and square integrable. In that case, \(\int _{0}^{t}h(s)\mathrm{d}W_{s}\) is distributed as a Gaussian r.v. The resulting stochastic integral is called Wiener integral. We sum up its important properties.
Proposition 17 (Wiener Integral and Integration by Parts).
Let \(f: [0,T]\mapsto \mathbb{R}\) be a continuously differentiable function, with bounded derivatives on [0,T].
-
1.
With probability 1, for any t ∈ [0,T] we have
$$\displaystyle{ \int _{0}^{t}f(s)\mathrm{d}W_{ s} = f(t)W_{t} -\int _{0}^{t}W_{ s}f^{{\prime}}(s)\mathrm{d}s. }$$(31) -
2.
The process \(\{\int _{0}^{t}f(s)\mathrm{d}W_{s};t \in [0,T]\}\) is a continuous Gaussian process, with zero mean and with a covariance function
$$\displaystyle{ \mathbb{C}\mathrm{ov}(\int _{0}^{t}f(u)\mathrm{d}W_{ u},\int _{0}^{s}f(u)\mathrm{d}W_{ u}) =\int _{ 0}^{s\wedge t}f^{2}(u)\mathrm{d}u. }$$(32) -
3.
For another function g satisfying the same assumptions, we have
$$\displaystyle{ \mathbb{C}\mathrm{ov}(\int _{0}^{t}f(u)\mathrm{d}W_{ u},\int _{0}^{s}g(u)\mathrm{d}W_{ u}) =\int _{ 0}^{s\wedge t}f(u)g(u)\mathrm{d}u. }$$(33)
Proof.
The first item is a direct application of Theorem 6 to the function (t, x)↦f(t)x.
For any coefficients (α i )1 ≤ i ≤ N and times (T i )1 ≤ i ≤ N , \(\sum _{i=1}^{N}\alpha _{i}\int _{0}^{T_{i}}f(u)\mathrm{d}W_{u}\) is a Gaussian r.v. since it can written as a limit of Gaussian r.v. of the form \(\sum _{j}\beta _{j}(W_{t_{j+1}} - W_{t_{j}})\): thus, \(\{\int _{0}^{t}f(s)\mathrm{d}W_{s};t \in [0,T]\}\) is a Gaussian process. Its continuity is obvious in view of (31). Its expectation is the limit of the expectation of \(\sum _{t_{i}\leq t}f(t_{i})[W_{t_{i+1}} - W_{t_{i}}]\), thus equal to 0. The covariance is the limit of the covariance
The second item is proved. The last item is proved similarly. □
As a consequence, going back to the Ornstein–Uhlenbeck process (Sect. 1.6.2), we can complete the proof of its representation (11) using a stochastic integral, starting from (10). For this apply the result below to the function \(f(s) = e^{-a(t-s)}\) (t fixed): it gives \(\int _{0}^{t}e^{-a(t-s)}\mathrm{d}W_{s} = W_{t} - a\int _{0}^{t}e^{-a(t-s)}W_{s}\mathrm{d}s\). It leads to
Then the Gaussian property from Proposition 17 gives that the variance of V t is equal to \(\sigma ^{2}\int _{0}^{t}e^{-2a(t-s)}\mathrm{d}s = \frac{\sigma ^{2}} {2a}(1 - e^{-2\mathit{at}})\).
3.4 Itô Formula for Other Processes
The reader should have noticed that the central property for the proof of Theorem 6 is that the Brownian motion has a finite quadratic variation. Thus, the Itô formula can directly be extended to processes X which enjoy the same property.
3.4.1 The One-Dimensional Case
In this paragraph, we first consider scalar processes. The multidimensional extension is made afterwards.
Definition 9 (Quadratic Variation of a Process).
A continuous process X has a finite quadratic variation if for any t ≥ 0, the limit
along the dyadic subdivision of order n, exists a.s. and is finite. We denote this limit by \(\langle X\rangle _{t}\) and it is usually called the bracket of X at time t.
If X = W is a Brownian motion, we have \(\langle X\rangle _{t} = t\). More generally, it is easy to check that \(\langle X\rangle\) is increasing and continuous. We associate to it a positive measure and this extends Proposition 16 to X.
Proposition 18.
For any continuous function f, with probability 1 for any t ≥ 0 we have
Theorem 6 becomes
Theorem 7 (Itô Formula for X).
Let \(f \in \mathcal{C}^{1,2}(\mathbb{R}^{+} \times \mathbb{R}, \mathbb{R})\) and X be with finite quadratic variation. With probability 1, for any t ≥ 0 we have
where \(\int _{0}^{t}f_{x}^{{\prime}}(s,X_{s})\mathrm{d}X_{s}\) is the stochastic integral of f x ′ (s,X s ) w.r.t. X and it is the a.s. limit of \(\sum _{t_{i}\leq t}f_{x}^{{\prime}}(t_{i},X_{t_{i}})(X_{t_{i+1}} - X_{t_{i}})\) along dyadic subdivision of order n.
Often, the Itô formula is written formally under a differential form
We now provide hand-made tools to compute the bracket of X in practice.
Proposition 19 (Computation of the Bracket).
Let A and M two continuous processes such that A has a finite variation Footnote 16 and M has a finite quadratic variation:
-
1.
\(\langle A\rangle _{t} = 0\) .
-
2.
If \(X_{t} = x + M_{t}\) , then \(\langle X\rangle _{t} =\langle M\rangle _{t}\) .
-
3.
If X t = λM t , then \(\langle X\rangle _{t} =\lambda ^{2}\langle M\rangle _{t}\) .
-
4.
If \(X_{t} = M_{t} + A_{t}\) , then \(\langle X\rangle _{t} =\langle M\rangle _{t}\) .
-
5.
If X t = f(A t ,M t ) with f ∈ C 1 , then \(\langle X\rangle _{t} =\int _{ 0}^{t}[f_{m}^{{\prime}}(A_{s},M_{s})]^{2}\mathrm{d}\langle M\rangle _{s}\) .
The proof is easy and it uses deterministic arguments based on the definition of \(\langle X\rangle\), we skip it. Item (5) shows that the class of processes with finite quadratic variation is stable by smooth composition. The following examples are important.
Example 1 (Arithmetic Brownian Motion).
(\(X_{t} = x + bt +\sigma W_{t}\)): we have
Itô’s formula becomes
An important example is associated to f(x) = exp(x):
Example 2 (Geometric Brownian Motion).
(\(S_{t} = S_{0}e^{(\mu -\frac{1} {2} \sigma ^{2})t+\sigma W_{ t}}\)): we have
From (38), we obtain a linear equation for the dynamics of S,
also written \(\frac{\mathrm{d}S_{t}} {S_{t}} =\mu \mathrm{d}t +\sigma \mathrm{d}W_{t}\) putting an emphasize of the financial interpretation as returns. The Itô formula writes
Example 3 (Ornstein–Uhlenbeck Process).
(\(V _{t} = v_{0} - a\int _{0}^{t}V _{s}\mathrm{d}s +\sigma W_{t}\)): we have
The Itô formula follows
Example 4 (Euler Scheme Defined in (12)).
(\(X_{t}^{h} = X_{\mathit{ih}}^{h} + b(X_{\mathit{ih}}^{h})(t - ih) +\sigma (X_{\mathit{ih}}^{h})(W_{t} - W_{\mathit{ih}})\) for i ≥ 0, t ∈ (ih, (i + 1)h]). Since X h is an arithmetic Brownian motion on each interval (ih, (i + 1)h], we easily obtain
where \(\varphi (t) = \mathit{ih}\) for t ∈ (ih, (i + 1)h]. The Itô formula writes
3.4.2 The Multidimensional Case
We briefly expose the situation when \(X = (X_{1},\ldots,X_{d})\) takes values in \(\mathbb{R}^{d}\). The main novelty consists in considering the cross quadratic variation defined by the limit (assuming its existence, along dyadic subdivision) of
We list basic properties.
Properties 8
-
1.
Symmetry: \(\langle X_{k},X_{l}\rangle _{t} =\langle X_{l},X_{k}\rangle _{t}\).
-
2.
Usual bracket: \(\langle X_{k},X_{k}\rangle _{t} =\langle X_{k}\rangle _{t}\).
-
3.
Polarization: \(\langle X_{k},X_{l}\rangle _{t} = \frac{1} {4}\left (\langle X_{k} + X_{l}\rangle _{t} -\langle X_{k} - X_{l}\rangle _{t}\right ).\)
-
4.
\(\langle \cdot,\cdot \rangle _{t}\) is bilinear.
-
5.
For any continuous function f, we have
$$\displaystyle{\lim _{n\rightarrow \infty }\sum _{t_{i}\leq t}f(t_{i})(X_{k,t_{i+1}} - X_{k,t_{i}})(X_{l,t_{i+1}} - X_{l,t_{i}}) =\int _{ 0}^{t}f(s)\mathrm{d}\langle X_{ k},X_{l}\rangle _{s}.}$$ -
6.
Let \(X_{1,t} = f(A_{1,t},M_{1,t})\) and X 2,t = g(A 2,t ,M 2,t ), where the variation (resp. quadratic variation) of A = (A 1 ,A 2 ) (resp. M = (M 1 ,M 2 )) is finite, and let f and g be two \(\mathcal{C}^{1}\) -functions: we have
$$\displaystyle{\langle X_{1},X_{2}\rangle _{t} =\int _{ 0}^{t}f_{ m}^{{\prime}}(A_{ 1,s},M_{1,s})g_{m}^{{\prime}}(A_{ 2,s},M_{2,s})\mathrm{d}\langle M_{1},M_{2}\rangle _{s}.}$$In particular, \(\langle A_{1} + M_{1},A_{2} + M_{2}\rangle _{t} =\langle M_{1},M_{2}\rangle _{t}\) .
-
7.
Let W 1 and W 2 be two independent Brownian motions: then
$$\displaystyle{\langle W_{1},W_{2}\rangle _{t} = 0.}$$
Proof.
The statements (1)–(6) are easy to check from the definition or using previous arguments. The statement (7) is important and we give details: use the polarization identity
We observe that both \(\frac{1} {\sqrt{2}}(W_{1} + W_{2})\) and \(\frac{1} {\sqrt{2}}(W_{1} - W_{2})\) are Brownian motions, since each one is a continuous Gaussian process with the right covariance function. Thus, \(\langle \frac{1} {\sqrt{2}}(W_{1} + W_{2})\rangle _{t} =\langle \frac{1} {\sqrt{2}}(W_{1} - W_{2})\rangle _{t} = t\) and the result follows. □
The Itô formula naturally extends to this setting.
Theorem 9 (Multidimensional Itô Formula).
Let \(f \in \mathcal{C}^{1,2}(\mathbb{R}^{+} \times \mathbb{R}^{d}, \mathbb{R})\) and X be a continuous d-dimensional process with finite quadratic variation. Then, with probability 1, for any t ≥ 0 we have
where the sum of stochastic integrals are defined as before.
In particular, the integration by parts formula writes
For two independent Brownian motions, we recover the usual deterministic formula (because \(\langle W_{1},W_{2}\rangle _{t} = 0\)), but in general, formulas are different because of the quadratic variation.
3.5 More Properties on Stochastic Integrals
So far, we have defined some specific stochastic integrals, those appearing in deriving a Itô formula and which have the form
the limit being taken along dyadic subdivision. Also, we have proved that if f has bounded derivatives and X = W is a Brownian motion, the above stochastic integral must have zero-expectation [see equality (29)]. Moreover, we also have established that in the case of deterministic integrand (Wiener integral), the second moment of the stochastic integral is explicit and given by
The aim of this paragraph is to provide extensions of the above properties on the two first moments to more general integrands, under some suitable boundedness or integrability conditions.
3.5.1 Heuristic Arguments
In view of the previous construction, there is a natural candidate to be the stochastic integral \(\int _{0}^{t}h_{s}\mathit{dW }_{s}\). When h is piecewise constant process (called simple process), that is \(h_{s} = h_{t_{i}}\) if s ∈ [t i , t i+1] for a given deterministic time grid (t i ) i , we set
Without extra assumptions on the stochasticity of h, it is not clear why its expectation equals 0. This property should come from the centered Brownian increments \(W_{t\wedge t_{i+1}} - W_{t_{i}}\) and their independence to \(h_{t_{i}}\) so that
To validate this computation, we shall assume that h t depends only the Brownian Motion W before t and it is integrable. To go to the second moment, assume additionally that h is square integrable: then
This equality should be read as an isometry property (usually referred to as Itô isometry), on which we can rely an extension of the stochastic integral of simple process to more general process. At this point, we should need to enter into measurability considerations to describe what “h t depends only the Brownian Motion W before t” means at the most general level. It goes far beyond this introductory lecture: for the exposure of the general theory, see for instance [48] or [63].
For most of the examples considered in this lecture, we can restrict to very good integrands, in the sense that a integrand h is very good if
-
1.
(h t ) t is continuous or piecewise continuous (as for simple processes).
-
2.
For a given t, h t is a continuous functional of (W s : s ≤ t).
-
3.
It is square integrable in time and ω: \(\mathbb{E}(\int _{0}^{t}h_{s}^{2}\mathrm{d}s) < +\infty \) for any t.
This setting ensures that we can define stochastic integrals for very good integrands as the L 2-limit of stochastic integrals for simple integrands: indeed, a Cauchy sequence (h n ) n in \(L_{2}(\mathrm{d}t \otimes \mathrm{d}\mathbb{P})\) gives a Cauchy sequence \((\int _{0}^{t}h_{n,s}\mathrm{d}W_{s})_{n}\) in \(L_{2}(\mathbb{P})\) due to the isometry (45).
3.5.2 General Results
We collect here all the stochastic integration results needed in this lecture.
Theorem 10.
Let h be a very good integrand. Then the stochastic integral \(\int _{0}^{t}h_{s}\mathrm{d}W_{s}\) is such that
-
1.
It is the L 2 limit of \(\sum _{t_{i}\leq t}h_{t_{i}}(W_{t\wedge t_{i+1}} - W_{t_{i}})\) along time subdivision which time step goes to 0.
-
2.
It is centered: \(\mathbb{E}(\int _{0}^{t}h_{s}\mathrm{d}W_{s}) = 0\) .
-
3.
It is square integrable: \(\mathbb{E}\vert \int _{0}^{t}h_{s}\mathrm{d}W_{s}\vert ^{2} = \mathbb{E}(\int _{0}^{t}h_{s}^{2}\mathrm{d}s)\) .
-
4.
For two very good integrands h 1 and h 2 , we have
$$\displaystyle{\mathbb{E}\Big[(\int _{0}^{t}h_{ 1,s}\mathrm{d}W_{s})(\int _{0}^{t}h_{ 2,s}\mathrm{d}W_{s})\Big] = \mathbb{E}(\int _{0}^{t}h_{ 1,s}h_{2,s}\mathrm{d}s).}$$
Beyond the t-by-t construction, actually the full theory gives a construction for any t simultaneously, proving additionally time continuity property, general centering property (martingale property), tight L p -estimates on the value at time t and the extrema until time t (Burkholder–Davis–Gundy inequalities) and so on…For multidimensional W and h, the construction should be understood componentwise. Another fruitful extension is to allow t to be a bounded stopping time, similarly to the discussion we have made in the proof of Theorem 3.
Another interesting part in the theory is devoted to the existence and uniqueness of solution to Stochastic Differential Equations (also known as diffusion processes). The easiest setting is to assume globally Lipschitz coefficientsFootnote 17: it is similar to the ODE framework, and the proof is also based on the Picard fixed-point argument. We state the results without proof.
Theorem 11.
Let W be a d-dimensional standard Brownian motion.
Assume that the functions \(b: \mathbb{R}^{d}\mapsto \mathbb{R}^{d}\) and \(\sigma: \mathbb{R}^{d}\mapsto \mathbb{R}^{d} \otimes \mathbb{R}^{d}\) are globally Lipschitz. Then, for any initial condition \(x \in \mathbb{R}^{d}\) , there exists a unique Footnote 18 continuous solution \((X_{t}^{x})_{t\geq 0}\) valued in \(\mathbb{R}^{d}\) which satisfies
with \(\sup _{0\leq t\leq T}\mathbb{E}\vert X_{t}^{x}\vert ^{2} < +\infty \) for any given \(T \in \mathbb{R}^{+}\) .
The continuous process X x has a finite quadratic variation given by
Observe that this general result includes all the model considered before, such as Arithmetic and Geometric Brownian Motion, Ornstein–Uhlenbeck processes, here stated in a possibly multidimensional framework.
4 Monte Carlo Resolutions of Linear PDEs Related to SDEs
Probabilistic methods to solve PDEs have become very popular during the two last decades. They are usually not competitive compared to deterministic methods in low dimension, but for higher dimension they provide very good alternative schemes. In the sequel, we give a brief introduction to the topics, relying on the material presented in the previous sections. We start with linear parabolic PDEs, with Cauchy–Dirichlet boundary conditions. Next section is devoted to semi-linear PDEs.
4.1 Second Order Linear Parabolic PDEs with Cauchy Initial Condition
4.1.1 Feynman–Kac Formulas
We start with a verification Theorem generalizing Theorems 2, 4, 5 to the case of general SDEs. We incorporate a source term g.
Theorem 12.
Under the assumptions of Theorem 11, let X x be the solution (46) starting from \(x \in \mathbb{R}^{d}\) and set
Assume there is a solution \(u \in \mathcal{C}_{b}^{1,2}(\mathbb{R}^{+} \times \mathbb{R}^{d}, \mathbb{R})\) to the PDE
for two given functions \(f,g: \mathbb{R}^{d} \rightarrow \mathbb{R}\) . Then u is given by
Proof.
Let t be fixed. We apply the general Itô formula (Theorem 9) to the process X x and to the function v: (s, y)↦u(t − s, y): it gives
where \(\mathit{Dv}:= (\partial _{x_{1}}v,\ldots,\partial _{x_{d}}v)\). Observe that the integrand h s = Dv(s, X s x)σ(X s x) is very good, since v has bounded derivatives, σ has a linear growth, and X s has bounded second moments, locally uniformly in s: thus, the stochastic integral \(\int _{0}^{t}\mathit{Dv}(s,X_{s}^{x})\sigma (X_{s}^{x})\mathrm{d}W_{s}\) has zero expectation. Hence, applying the above decomposition between s = 0 and s = t and taking the expectation, it gives
We are done. □
Smoothness assumptions on u are satisfied in f, g are smooth enough. If not, and if a uniform ellipticity condition is met on σ σ ⊤, the fundamental solution of the PDE is smoothing the data and the result can be extended. However, the derivatives blow up as time goes to 0, and more technicalities are necessary to justify the same stochastic calculus computations. The fundamental solution p(t, x, y) has a simple probabilistic interpretation: it is the density of X t x at y. Indeed, identify \(\mathbb{E}[f(X_{t}^{x}) +\int _{ 0}^{t}g(X_{s}^{x})\mathrm{d}s]\) with
4.1.2 Monte Carlo Schemes
Since u(t, x) is represented as an expectation, it allows the use of a Monte Carlo method to numerically compute the solution. The difficulty is that in general, X can not be simulated perfectly accurately, only an approximation on a finite time-grid can be simply and efficiently produced. Namely we use the Euler scheme with time step \(h = t/N\):
Observe that to get X t x, h, we do not need to sample the continuous path of X x, h (as difficult as having a continuous path of a Brownian motion): in fact, we only need to compute X ih x, h iteratively for i = 0 to i = N. Each time iteration requires to sample d new independent Gaussian increments \(W_{k,(i+1)h} - W_{k,\mathit{ih}}\), centered with variance h: it is straightforward. The computational cost is essentially equal to C(d)N where the constant depends on the dimension (coming from d-dimensional vector and matrix computations).
As an approximation of the expectation of \(\mathcal{E}(f,g,X^{x}) = f(X_{t}^{x}) +\int _{ 0}^{t}g(X_{s}^{x})\mathrm{d}s\), we take the expectation
a random variable of which we sample M independent copies, that are denoted by \(\{\mathcal{E}(f,g,X^{x,h,m}): 1 \leq m \leq M\}\). Then, the Monte Carlo approximation, based on this sample of M Euler schemes with time step h, is
The first error contribution is due to the sample of finite size: the larger M, the better the accuracy. As mentioned in Sect. 2.1.3, once renormalized by \(\sqrt{ M}\), this error is still random and its distribution is closed to the Gaussian distribution with zero mean and variance \(\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x,h}))\): the latter still depends on h but very little, since it is expected to be close to \(\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x}))\).
The second error contribution is related to the time discretization effect: the smaller the time h, the better the accuracy. In the sequel (Sect. 4.1.3), we theoretically estimate this error in terms of h, and proves that it is of order h (even equivalent to) under some reasonable and fairly general assumptions.
What Is the Optimal Tuning of h → 0 and M → +∞? An easy complexity analysis shows that the computational effort is \(\mathcal{C}_{e} = C(d)\mathit{Mh}^{-1}\). Observe that the rate does not depend on the dimension d, as a difference with a PDE method, but on the other hand, the solution is computed only at single point (t, x). The squared quadratic error is equal to
Only the first factor \(\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x,h}))\) can be estimated with the same sample, for M large, and it depends little of h. Say that the second term is equivalent to (Ch)2 as h → 0, with C ≠ 0. Then, three asymptotic situations occur:
-
1.
If M ≫ h −2, the statistical error becomes negligible and \(\frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m}) - u(t,x) \sim \mathit{Ch}\). The computational effort is \(\mathcal{C}_{e} \gg h^{-3}\) and thus \(\mathrm{Err}_{2}(h,M) \gg \mathcal{C}_{e}^{-1/3}\). Deriving a confidence interval as in Sect. 2.1.3 is meaningless, we face with the discretization error only.
-
2.
If M ≪ h −2, the discretization error becomes negligible and the distribution of \(\sqrt{M}\left ( \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m}) - u(t,x)\right )\) converges to that a Gaussian r.v. centered with variance \(\mathbb{V}\mathrm{ar}(\mathcal{E}(f,g,X^{x}))\) (that can be asymptotically computed using the M- sample). Thus, we can derive confidence intervals: setting σ h, M 2 the empirical variance of \(\mathcal{E}(f,g,X^{x,h})\), with probability 95 % we have
$$\displaystyle{u(t,x)\,\in \,\Big[ \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m})-1.96 \frac{\sigma _{h,M}} {\sqrt{M}}, \frac{1} {M}\sum _{m=1}^{M}\mathcal{E}(f,g,X^{x,h,m})+1.96 \frac{\sigma _{h,M}} {\sqrt{M}}\Big].}$$Regarding the computational effort, we have \(\mathcal{C}_{e} \gg M^{3/2}\) and thus \(\mathrm{Err}_{2}(h,M) \gg \mathcal{C}_{e}^{-1/3}\).
-
3.
If M ∼ ch −2, both statistical and discretization errors have the same magnitude and one can still derive a asymptotic confidence interval, but it is no more centered (as in M ≪ h −2) and unfortunately, the bias is not easily estimated on the fly. The problem is that the bias is of same magnitude as the size of the confidence interval, thus it reduces the interest of having such a priori statistical error estimate. Here, \(\mathrm{Err}_{2}(h,M) = O(\mathcal{C}_{e}^{-1/3})\).
Summing up by considering the ability of having or not on-line error estimates and by optimizing the final accuracy w.r.t. the computational effort, the second case \(M = h^{-2+\varepsilon }\) (for a small \(\varepsilon > 0\)) may be the most attractive since it achieves (almost) the best accuracy w.r.t. the computational effort and gives a centered confidence interval (and therefore tractable and meaningful error bounds).
4.1.3 Convergence of the Euler Scheme
An important issue is to analyze the impact of time discretization of SDE. This dates back to the end of eighties, see [68] among others. The result below gives a mathematical justification of the use of the Euler scheme as an approximation for the distribution of the SDE.
Theorem 13.
Assume that b and σ are \(\mathcal{C}_{b}^{2}\) , let X x be the solution (46) starting from \(x \in \mathbb{R}^{d}\) and let X h,x be its Euler scheme defined in (52) . Assume that \(u(t,x) = \mathbb{E}[f(X_{t}^{x}) +\int _{ 0}^{t}g(X_{s}^{x})\mathrm{d}s]\) is a \(\mathcal{C}_{b}^{2,4}([0,T] \times \mathbb{R}^{d}, \mathbb{R})\) -function solution of the PDE of Theorem 12 . Then,
Proof.
Denote by Err. disc. (h) the above discretization error. As in Theorem 12, we use the function v: (s, y)↦u(t − s, y) (for a fixed t) and we apply the Itô formula to X h, x (Theorem 9): it gives (setting \(\mathit{Dv}:= (\partial _{x_{1}}v,\ldots,\partial _{x_{d}}v)\))
where at the second equality, we have used the PDE solved by v at (s, X s x). Then, by taking the expectation (it removes the stochastic integral term because the integrand is very good), we obtain
The global error is represented as a summation of local errors. For instance, let us estimate the first term related to σ σ ⊤: apply once again the Itô formula on the interval \([\mathit{kh},s] \subset [\mathit{kh},(k + 1)h]\) and to the function \((s,y)\mapsto \big([\sigma \sigma ^{\top }]_{i,j}(X_{\varphi (s)}^{h,x}\big) - [\sigma \sigma ^{\top }]_{i,j}(y)\big)\partial _{x_{i}x_{j}}^{2}v(s,y)\). It gives raise to a time integral between \(\mathit{kh} =\varphi (s)\) and s and a stochastic integral that vanishes in expectation. Proceed similarly for the other contributions with b and g. Finally we obtain a representation formula of the form
where the summation is made on differentiation multi-indices of length smaller than 4, where l α are functions depending on b, σ, g and their derivatives up to order 2, and where l α has at most a linear growth w.r.t its two variables. Taking advantage of the boundedness of the derivatives of v, we easily complete the proof.
Observe that, by strengthening the assumptions and by going a bit further in the analysis, we could establish an expansion w.r.t. h. □
The previous assumption on u implies that \(f \in \mathcal{C}_{b}^{4}\) and \(g \in \mathcal{C}_{b}^{2}\), which is too strong in practice. The extension to non smooth f is much more difficult and we have to take advantage of the smoothness coming from the non-degenerate distribution of X or X h. We may follow the same types of computations, mixing PDE techniques and stochastic arguments, see [6]. But this is a pure stochastic analysis approach (Malliavin calculus) which provides the extension under the minimal non-degeneracy assumption (i.e. only stated at the initial point x), see [38]. We state the result without proofs.
Theorem 14.
Assume that b and σ are \(\mathcal{C}_{b}^{\infty }\) , let X x be the solution (46) starting from \(x \in \mathbb{R}^{d}\) and let X h,x be its Euler scheme defined in (52) . Assume additionally that σσ ⊤ (x) is invertible. Then, for any bounded measurable function f, we have
In the same reference [38], the result is also proved for hypoelliptic system, where the hypoellipticity holds only at the starting point x. On the other hand, without such a degeneracy condition and for non smooth f (like Heaviside function), the convergence may fail.
The case of coefficients b and σ with low regularity or exploding behavior is still an active fields of research.
4.1.4 Sensitivities
If in addition we are interested by computing derivatives of u(t, x) w.r.t. x or other model parameters, this is still possible using Monte Carlo simulations. For the sake of simplicity, in our discussion we focus on the gradient of u w.r.t. x. Essentially, two approaches are known.
Resimulation Method. The derivative is approximated using the finite difference method
where \(e_{i} = (0,\ldots,0,\mathop{1}\limits_{i^{\mathit{th}}},0,\ldots )\), and \(\varepsilon\) is small. Then, each value function is approximated by its Monte Carlo approximation given in (54). However, we have to be careful in generating the Euler scheme starting from \(x +\varepsilon e_{i}\) and \(x -\varepsilon e_{i}\): its sampling should use the same Brownian motion increments, that is
Indeed, for an infinite sample (\(M = +\infty \)), it does not have any impact on the statistical error whether or not we use the same driving noise, but for finite M, this trick likely maintains a smaller statistical error. Furthermore, the optimal choice of h, M and \(\varepsilon\) is an important issue, but here results are different according to the regularity of f and g, we do not go into details.
Likelihood Method. To avoid the latter problems of selecting the appropriate value of the finite difference parameter \(\varepsilon\), we may prefer another Monte Carlo estimator of \(\partial _{x_{i}}u(t,x)\), which consists in appropriately weighting the output. When g equals 0, it takes the following form
where H t x, h, m is simultaneously generated with the Euler scheme and does not depend on f. The advantage of this approach is to avoid the possibly delicate choice of the perturbation parameter \(\varepsilon\) and it is valid for any function f: thus, it may reduce much the computational time, if many sensitivities are required for the same model. On the other hand, the confidence interval may be larger than that of the resimulation method.
We now provide the formula for the weight H (known as Bismut–Elworthy–Li formula). It uses the tangent process, which is the (well-defined, see [52]) derivative of x↦X t x w.r.t. x and which solves
where σ j is the j-th column of the matrix σ.
Theorem 15.
Assume that b and σ are \(\mathcal{C}_{b}^{2}\) -functions, that \(u \in \mathcal{C}^{1,2}([0,T] \times \mathbb{R}^{d}, \mathbb{R})\) solves the PDE (48) , and that σ is invertible with a uniformly bounded inverse σ −1 . We have
Proof.
First, we recall the decomposition (51) obtained from Itô formula, using \(v(s,y) = u(t - s,y)\):
Second, taking expectation, it gives \(v(0,x) = u(t,x) = \mathbb{E}(v(r,X_{r}^{x}))\) for any r ∈ [0, T]. By differentiating w.r.t. x, we obtain a nice relation letting the expectation constant in time (actually deeply related to martingale property):
Thus, we deduce
using Theorem 10 at the second and fourth equality, (58) at the third one. □
In view of the above assumptions of u, implicitly the function f is smooth. However, under the current ellipticity condition, u is still smooth even if f is not; since the formula does depend on f and not on its derivatives, it is standard to extend the formula to any bounded function f (without any regularity assumption).
The Monte Carlo evaluation of Du(t, x) easily follows by independently sampling \(\frac{f(X_{t}^{x})} {t} \left [\int _{0}^{t}[\sigma ^{-1}(X_{ s}^{x})Y _{ s}^{x}]^{\top }\mathrm{d}W_{ s}\right ]^{\top }\) and taking the empirical mean. The exact simulation is not possible and once again, we may use an Euler-type scheme, with time step h:
-
The dimension-augmented Stochastic Differential Equation (X x, Y x) is approximated using the Euler scheme.
-
We use a simple-approximation of the stochastic integral
$$\displaystyle{\int _{0}^{t}[\sigma ^{-1}(X_{ s}^{x})Y _{ s}^{x}]^{\top }\mathrm{d}W_{ s} =\sum _{ i=0}^{N-1}[\sigma ^{-1}(X_{\mathit{ ih}}^{x,h})Y _{\mathit{ ih}}^{x,h}]^{\top }(W_{ (i+1)h} - W_{\mathit{ih}}).}$$
The analysis of discretization error is more intricate than for \(\mathbb{E}(f(X_{t}^{x,h}) - f(X_{t}^{x}))\): nevertheless, the error is still of magnitude h (the convergence order is 1 w.r.t. h, as proved in [38]).
Theorem 16.
Under the setting of Theorem 14, for any bounded measurable function f, we have
4.1.5 Other Theoretical Estimates in Small Time
The representation formula of Theorem 15 is the starting point for getting accurate probabilistic estimates on the derivatives of the underlying PDE as time is small, in terms of the fractional smoothness of f(X t x) which is related to the decay of
The derivatives are measured in weighted L 2-norms and surprisingly, the above results are equivalence results [36]; we are not aware of such results using PDE arguments.
Theorem 17.
Under the setting Footnote 19 of Theorem 14 , let t be fixed, for 0 < θ ≤ 1 and a bounded f, the following assertions are equivalent:
-
i)
For some c ≥ 0, \(\mathbb{E}\vert f(X_{t}^{x}) - \mathbb{E}(f(X_{t-s}^{y}))\vert _{y=X_{s}^{x}}\vert ^{2} \leq c(t - s)^{\theta }\) for 0 ≤ s ≤ t.
-
ii)
For some c ≥ 0, \(\mathbb{E}\vert \mathit{Du}(t - s,X_{s}^{x})\vert ^{2} \leq \frac{c} {(t-s)^{1-\theta }}\) for 0 ≤ s < t.
-
iii)
For some c ≥ 0, \(\int _{0}^{s}\mathbb{E}\vert D^{2}u(t - r,X_{r}^{x})\vert ^{2}\mathrm{d}r \leq \frac{c} {(t-s)^{1-\theta }}\) for 0 ≤ s < t.
If 0 < θ < 1, it is also equivalent to:
-
iv)
For some c ≥ 0, \(\mathbb{E}\vert D^{2}u(t - s,X_{s}^{x})\vert ^{2} \leq \frac{c} {(t-s)^{2-\theta }}\) for 0 ≤ s < t.
Theorem 18.
Under the setting of Theorem 14 , let t be fixed, for 0 < θ < 1 and a bounded f, the following assertions are equivalent:
-
i)
\(\int _{0}^{t}(t - s)^{-\theta -1}\mathbb{E}\vert f(X_{t}^{x}) - \mathbb{E}(f(X_{t-s}^{y}))\vert _{y=X_{s}^{x}}\vert ^{2}\mathrm{d}s < +\infty \) .
-
ii)
\(\int _{0}^{t}(t - s)^{-\theta }\mathbb{E}\vert \mathit{Du}(t - s,X_{s}^{x})\vert ^{2}\mathrm{d}s < +\infty \) .
-
iii)
\(\int _{0}^{t}(t - s)^{1-\theta }\mathbb{E}\vert D^{2}u(t - s,X_{s}^{x})\vert ^{2}\mathrm{d}s < +\infty \) .
4.2 The Case of Dirichlet Boundary Conditions and Stopped Processes
4.2.1 Feynman–Kac Formula
In view of Corollary 1, the natural extension of Theorem 12 in the case of Dirichlet boundary condition is the following. We state the result without source term to simplify. The proof is similar and we skip it.
Theorem 19.
Let D be a bounded domain of \(\mathbb{R}^{d}\) . Under the setting of Theorem 12 , assume there is a solution \(u \in \mathcal{C}_{b}^{1,2}([0,T] \times \overline{D}, \mathbb{R})\) to the PDE
for a given function \(f: \mathbb{R}^{+} \times \overline{D} \rightarrow \mathbb{R}\) . Then u is given by
for x ∈ D, where \(\tau ^{x} =\inf \{ s > 0: X_{s}^{x}\notin D\}\) is the first exit time from D by X.
4.2.2 Monte Carlo Simulations
Performing a Monte Carlo algorithm in this context is less easy since we have to additionally simulate the exit time of X. A simple approach consists in discretizing X using the Euler scheme with time step h, and then taking for the exit time
It does not require any further computations than those needed to generate (X ih x, h, 0 ≤ i ≤ N). But, the discretization error worsens much since it becomes of magnitude \(\sqrt{ h}\). Actually, even if the values of (X ih x, h, 0 ≤ i ≤ N) are generated without error (like in Brownian motion case or other simple processes), the convergence order is still \(\frac{1} {2}\) w.r.t. h [27]. The deterioration of the discretization error really comes from the high irregularity of Brownian motion paths (and SDE paths): even if two successive points X ih x, h and X (i+1)h x, h are close to the boundary but inside the domain, a discrete monitoring scheme does not detect the exit while a continuous Brownian motion-like path would likely exit from the domain between ih and (i + 1)h. Moreover, it gives a systematic (in mean) underestimation of the true exit time. To overcome this lack of accuracy, there are several improved schemes.
-
The Brownian bridge technique consists in simulating the exit time of local arithmetic Brownian motion [corresponding to the local dynamics of Euler scheme, see (12)]. For simple domain like half-space, the procedure is explicit and tractable, this is related the explicit knowledge of the distribution of the Brownian maximum, see Proposition 5. For smooth domain, we can approximate locally the domain by half-spaces. This improvement allows to recover the order 1 for the convergence, see [27, 28]. For non smooth domains (including corners for instance) and general SDEs, providing an accurate scheme and performing its error analysis is still an open issue; for heuristics and numerical experiments, see [29] for instance.
-
The boundary shifting method consists in shrinking the domain to compensate the systematic bias in the simulation of the discrete exit time. Very remarkably, there is a universal elementary rule to make the domain smaller:
locally at a point y close to the boundary, move the boundary inwards by a quantity proportional to \(c_{0}\sqrt{h}\) times the norm of the diffusion coefficient in the normal direction.
The constant c 0 is equal to the mean of the asymptotic overshoot of the Gaussian random walk as the ladder height goes to infinity: it can be expressed using the zeta function
$$\displaystyle{c_{0} = -\frac{\zeta (\frac{1} {2})} {\sqrt{2\pi }} = 0.5826\ldots.}$$This procedure strictly improves the order \(\frac{1} {2}\) of the discrete procedure, but it is still an open question whether the convergence order is 1, although numerical experiments corroborates this fact.
The result is stated as follows, see [37].
Theorem 20.
Assume that the domain D is bounded and has a \(\mathcal{C}^{3}\) -boundary, that b,σ are \(\mathcal{C}_{b}^{2}\) and \(f \in \mathcal{C}_{b}^{1,2}\) . Let n(y) be the unit inward normal vector to the boundary ∂D at the closest Footnote 20 point to y on the boundary. Set
Then, we have
Observe that this improvement is very cheap regarding the computational cost. It can be extended (regarding to the numerical scheme and its mathematical analysis) to a source term, to time-dependent domain and to stationary problems (elliptic PDE).
Complementary References. See [2, 13, 26, 49, 53, 64] for general references. For reflected processes and Neumann boundary conditions, see [10, 28]. For variance reduction techniques, see [34, 47, 58]. For domain decomposition, see [35, 62]. This list is not exhaustive.
5 Backward Stochastic Differential Equations and Semi-linear PDEs
The link between PDEs and stochastic processes have been developed since several decades and more recently, say in the last 20 years, researchers have paid attention to the probabilistic interpretation of non-linear PDEs, and in particular semi-linear PDEs. These PDEs are connected to non-linear processes, called Backward Stochastic Differential Equations (BSDE in short). In this section, we define these equations, firstly introduced by Pardoux and Peng [60], and give their connection with PDEs. Finally, we present a Monte Carlo algorithm to simulate them, using empirical regressions: it has the advantage to suit well the case of multidimensional problems, with a great generality on the type of semi-linearity.
These equations have many fruitful applications in stochastic control theory and mathematical finance, where they usually provide elegant proofs to characterize the solution to optimal investment problems for instance; for the related applications, we refer to reader to [17, 18]. Regarding the semi-linear PDE point of view, the applications are reaction-diffusion equations in chemistry [24], evolution of species in population biology [51, 66], Hodgkin–Huxley model in neuroscience [43], Allen–Cahn equation for phase transition in physics…see the introductive course [30] and references therein. For other non-linear equations with connections with stochastic processes, see the aforementioned reference.
5.1 Existence of BSDE and Feynman–Kac Formula
5.1.1 Heuristics
As a difference with a Stochastic Differential Equation defined by (46) where the initial condition is given and the dynamics is imposed, a Backward SDE is defined through a random terminal condition ξ at a fixed terminal T and a dynamics imposed by a driver g. It takes the form
where we write the integrals between t and T to emphasize the backward point of view: ξ should be thought as a stochastic target to reach at time T. A solution to (61) is the couple (Y, Z): without extra conditions, the problem has an infinite number of solutions and thus is ill-posed. For instance, if g ≡ 0 and ξ = f(W T ): taking \(c \in \mathbb{R}\), a solution is Z t = c and \(Y _{t} =\xi +c(W_{T} - W_{t})\), thus uniqueness fails. In addition to integrability properties (appropriate L 2-spaces) that we do not detail, an important condition is that the solution does not anticipate the future of Brownian motion, i.e. the solution Y t depends on the Brownian Motion W up to t, and similarly to Z: we informally say that the solution is adapted to W. In a stochastic control problem, this adaptedness constraint is natural since it states that the value function or the decision can not be made in advance to the flow of information given by W. Observe that in the uniqueness counter-example, Y is not adapted to W since Y t depends on the Brownian motion on [0, T] and not only on [0, t].
Taking the conditional expectation in (61) gives
because the stochastic integral (built with Brownian increments after t) is centered conditionally on the Brownian motion up to time t. Of course, this rule is fully justified by the stochastic calculus theory. Since Y t in (62) is adapted to W, it should be the right solution (if unique); then, Z serves as a control to make the equation (61) valid (with Y adapted).
5.1.2 Feynman–Kac Formula
The connection with PDE is possible when the terminal condition is a function of a (forward) SDE: this case is called Markovian BSDE. Additionally, the driver may depend also on this SDE as g(s, X s , Y s , Z s ) for a deterministic function g. We now proceed by a verification theorem. To allow a more natural presentation as backward system, we choose to write the semi-linear PDE with a terminal condition at time T instead of an initial condition at time 0.
Theorem 21.
Let T > 0 be given. Under the assumptions of Theorem 11 , let X x be the solution (46) starting from \(x \in \mathbb{R}^{d}\) , assume there is a solution \(v \in \mathcal{C}_{b}^{1,2}([0,T] \times \mathbb{R}^{d}, \mathbb{R})\) to the semi-linear PDE
for two given functions \(f: \mathbb{R}^{d} \rightarrow \mathbb{R}\) and \(g: [0,T] \times \mathbb{R}^{d} \times \mathbb{R} \times (\mathbb{R} \otimes \mathbb{R}^{d}) \rightarrow \mathbb{R}\) . Then, Y t x = v(t,X t x ) and Z t x = [Dv σ](t,X t x ) solves the BSDE
Proof.
The Itô formula (50) applied to v and X x gives
which writes between s = t and s = T:
Since v(T, . ) = f(. ), we complete the proof by identification. □
In particular, at time 0 where X 0 x = x, we obtain Y 0 x = v(0, x) and in view of (62), it gives a Feynman–Kac representation to v:
As in case of linear PDEs, the assumption of uniform smoothness on v up to T is too strong to include the case of non-smooth terminal function f. But with an extra ellipticity condition, as for the heat equation, the solution becomes smooth immediately away from T (see [21]) and a similar verification could be checked under milder conditions.
The above Backward SDE (64) is coupled to a Forward SDE, but the latter is not coupled to the BSDE. Another interesting extension is to allow the coupling in both directions by having the coefficients of X dependent on v, i.e. b(x) and σ(x) become functions of x, v(t, x), Dv(t, (x). The resulting process is called a Forward Backward Stochastic Differential Equations and is related to Quasi-Linear PDEs, where the operator \(L_{b,\sigma \sigma ^{\top }}^{X}\) also depends on v and Dv, see [56].
5.1.3 Other Existence Results Without PDE Framework
So far, only Markovian BSDEs are presented but from the probabilistic point of view, the Markovian structure is not required to define a solution: what is really crucial is the ability to represent a random variable built from (W s : s ≤ T) as a stochastic integral w.r.t. the Brownian motion. This point has been discussed in Corollary 4. Then, in the simple case where g is Lipschitz w.r.t. y, z, (Y, Z) are built by means of an usual fixed point procedure in suitable L 2-norms and of this stochastic integral representation. We now state a more general existence and uniqueness result for BSDE, which is valid without any underlying (finite-dimensional) semi-linear PDE, we omit the proof.
Theorem 22.
Let T > 0 be fixed and assume the assumptions of Theorem 11 for the existence of X and that
-
The terminal condition ξ = f(X s : s ≤ T) is a square integrable functional of the stochastic process (X s : s ≤ T).
-
The measurable function \(g: [0,T] \times \mathbb{R}^{d} \times \mathbb{R} \times (\mathbb{R} \otimes \mathbb{R}^{d})\) is uniformly Lipschitz in (y,z):
$$\displaystyle{\vert g(t,x,y_{1},z_{1}) - g(t,x,y_{2},z_{2})\vert \leq C_{g}(\vert y_{1} - y_{2}\vert + \vert z_{1} - z_{2}\vert ),}$$uniformly in (t,x).
-
The driver is square integrable at (y,z) = (0,0): \(\mathbb{E}(\int _{0}^{T}g^{2}(t,X_{t},0,0)\mathit{dt}) < +\infty \) .
Then, there exists a unique solution (Y,Z), adapted and in L 2 -spaces, to
Many works have been done in the last decade to go beyond the case of Lipschitz driver, which may be too stringent for some applications. In particular, having g with quadratic growth in Z is particularly interesting in exponential utility maximization problem (the non-linear PDE term is quadratic in | Dv | ). This leads to quadratic BSDEs (see for instance [50]). A simple example of such BSDEs can be cooked up from heat equation and Brownian motion. Namely from Corollary 4, for a smooth function f with compact support, set \(u(t,x) = \mathbb{E}(\exp (f(x + W_{t})))\) and \(v(t,y) = u(1 - t,y)\), so that
and by setting Y t = log(v(t, W t )) and \(Z_{t} = v_{x}^{{\prime}}(t,W_{t})/Y _{t}\), we obtain
which is the simplest quadratic BSDE.
5.2 Time Discretization and Dynamic Programming Equation
5.2.1 Explicit and Implicit Schemes
To perform the simulation, a first stage may be the derivation of a discretization scheme, written backwardly in time (backward dynamic programming equation). For the further analysis, assume that the terminal condition is of the form ξ = f(X T ) where X is standard (forward) SDE.
Consider a time grid with N time steps \(\pi =\{ 0 = t_{0} < \cdots < t_{i} < \cdots < t_{N} = T\}\), with possibly non uniform time step, and set \(\vert \pi \vert =\max _{i}(t_{i+1} - t_{i})\). We will suppose later that | π | → 0.
We write \(\varDelta _{i} = t_{i+1} - t_{i}\) and \(\varDelta W_{i} = W_{t_{i+1}} - W_{t_{i}}\). Writing the Eq. (64) between times t i and t i+1, we have
Then, by applying simple approximations for ds and dW s integrals and by replacing X by a Euler scheme computed along the grid π (and denoted X π), we may define the discrete BSDE as
with the initialization Y T π = f(X T π) at i = N, where \(L_{2}(\mathcal{F}_{t_{i}}^{\pi })\) stands for the set of random variables (with appropriate dimension) that are square integrable and depend on the Brownian motion increments (Δ W j : j ≤ i − 1). The latter property is the measurability w.r.t. the sigma-field \(\mathcal{F}_{t_{i}}^{\pi }\) generated by (Δ W j : j ≤ i − 1).
Then, a direct computation using the properties of Brownian increments gives
This is the implicit scheme since the arguments of the function at the r.h.s. depend on the quantity \(Y _{t_{i}}^{\pi }\) to compute on the l.h.s. Nevertheless, since g is uniformly Lipschitz in y, it is not difficult to show that the Dynamic Programming Equation (DPE in short) (66) is well-defined for | π | small enough and that \(Y _{t_{i}}^{\pi }\) can be computed using a Picard iteration procedure.
It is easy to turn the previous scheme into an explicit scheme and therefore, to avoid this extra Picard procedure. It writes
In our personal experience on numerics, we have not observed a significant outperformance of one scheme on another. Moreover, from the theoretical point of view, both schemes exhibit the same rates of convergence w.r.t. | π | , at least when the driver is Lipschitz.
The explicit scheme is the simplest one, and this is the one that we recommend in practice.
5.2.2 Time Discretization Error
Define the measure of the quadratic error
Although not explicitly mentioned in the previous existence results on BSDE, this type of norm is appropriate to perform the fixed point argument in the proof of Theorem 22. We now state an error estimate [33], in order to show the convergence of the DPE to the BSDE.
Theorem 23.
For a Lipschitz driver w.r.t. (x,y,z) and \(\frac{1} {2}\) -Hölder w.r.t. t, there is a constant C independent on π such that we have
Let us discuss on the nature and the magnitude of different error contributions.
-
First, we face the strong approximation error of the forward SDE by its Euler scheme. Here we rather focus on convergence of paths (in L 2-norms), whereas in Sect. 4.1.3, we have studied the convergence of expectations of function of X T π towards those of X T . Anyway, the problem is now well-understood: under a Lipschitz condition on b and σ, we can prove \(\sup _{i\leq N}\mathbb{E}\vert X_{t_{i}}^{\pi } - X_{ t_{i}}\vert ^{2} = O(\vert \pi \vert )\).
-
Second, we should ensure a good strong approximation of the terminal conditions: if f is Lipschitz continuous, it readily follows from the previous term and \(\mathbb{E}\vert f(X_{T}^{\pi }) - f(X_{T})\vert ^{2} = O(\vert \pi \vert )\). For non Lipschitz f, there are partial answers, see [3].
-
Finally, the last contribution \(\sum _{i=0}^{N-1}\frac{1} {\varDelta _{i}} \int _{t_{i}}^{t_{i+1}}\int _{t_{ i}}^{t_{i+1}}\mathbb{E}\vert Z_{ t} - Z_{s}\vert ^{2}\mathrm{d}s\ \mathrm{d}t\) is related to the L 2 -regularity of Z (or equivalently of the gradient of the semi-linear PDE along the X-path) and it is intrinsic to the BSDE-solution. For smooth data, Z has the same regularity of Brownian paths and this error term is O( | π | ). For non smooth f (but under ellipticity condition on X), the L 2-norm of Z t blows up as t → T and the rate | π | usually worsens: for instance for f(x) = 1 x ≥ 0, it becomes \(N^{-\frac{1} {2} }\) for uniform time-grid.
The analysis is very closely related to the fractional smoothness of f(X T ) briefly discussed in Sect. 4.1.5, see also [25]. Choosing an appropriate grid of the form (see Fig. 8)
$$\displaystyle{t_{k}^{\bar{\theta }} = T - T(1 - k/N)^{1/\bar{\theta }}\quad (\bar{\theta }\in (0,1])}$$compensates this blow-up (for appropriate value of \(\bar{\theta }\)) and enables to retrieve the rate N −1.
Actually in [31], it is shown that the upper bounds in Theorem 23 can be refined for smooth data, to finally obtain that the main error comes from strong approximation error on the forward component. This is an incentive to accurately approximate the SDE in L 2-sense.
5.2.3 Towards the Resolution of the Dynamic Programming Equation
The effective implementation of the explicit scheme (67) requires the iterative computations of conditional expectations: this is discussed in the next paragraphs.
Prior to this, we make some preliminary simplifications for the sake of conciseness. First, we consider the case of g independent of z,
therefore we only approximate Y π; the general case is detailed in [39, 54]. Second, it can be easily seen that it is enough to take the conditioning w.r.t. \(X_{t_{i}}^{\pi }\) instead of \(\mathcal{F}_{t_{i}}^{\pi }\), because of the Markovian property of X π along the grid π and of the independent Brownian increments. Thus, (67) becomes
The same arguments apply to assert that for a (measurable) deterministic function y i π we have
Therefore, simulating Y π is equivalent to the computation of the functions y i π for any i and the simulation of the process X π.
5.3 Approximation of Conditional Expectations Using Least-Squares Method
5.3.1 Empirical Least-Squares Problem
We adopt the point of view of conditional expectation as a projection operator in L 2. This is not the only possible approach, but it has the advantages (as it will be seen later)
-
1.
To be much flexible w.r.t. the knowledge on the model for X (or X π): only independent simulations of X π are required (which is straightforward to perform).
-
2.
To be little demanding on the assumptions on the underlying stochastic model: in particular, no ellipticity nor degeneracy condition are required, it could also include jumps (corresponding to PDE with a non-local Integro-Differential operator).
-
3.
To provide robust theoretical error estimates, which allow to optimally tune the convergence parameters.
-
4.
To be possibly adaptive to the data (data-driven scheme).
We recall that if a scalar random variable R (called the response) is square integrable, the conditional expectation of R given another possibly multidimensional r.v. O (called the observation) is given by
This is a least-squares problem in infinite dimension, also called regression problem. Usually in this context of BSDE simulation, none of the distributions of O, R or (O, R) is known in analytical and tractable form: thus an exact computation of \(\mathbb{E}(R\vert O)\) is hopeless. The difficulty remains unchanged if we approximate the regression function
on a finite dimensional functions basis. Alternatively, we can rely on independent simulations of (O, R) to compute an empirical version of m. This is the approach subsequently developed.
The basis functions are (ϕ k (. ))1 ≤ k ≤ K and we assume that \(\mathbb{E}\vert \phi _{k}(O)\vert ^{2} < +\infty \) for any k. We emphasize that
since in our setting, the distribution of O is not explicit. Using this finite dimensional approximation, we anticipate to unfortunately retrieve the curse of dimensionality: the larger the dimension d of O, the larger the required K for a good accuracy of m, the larger the complexity.
We compute the coefficients on the basis by solving a empirical least-squares problem
where \((O_{i},R_{i})_{1\leq i\leq M}\) are independent simulations of the couple (O, R). Then, for the approximation of m, we set
To efficiently compute the coefficients (α k M) k , we might use a SVD decomposition to account for instability issues, see [41].
5.3.2 Model-Free Error Estimates
Without extra assumptions on the model, we can derive model-free error estimates, see [42].
Theorem 24.
Assume that
-
\(R = m(O)+\epsilon\) with \(\mathbb{E}(\epsilon \vert O) = 0\).Footnote 21
-
(O 1 ,R 1 ),⋯ ,(O M ,R M ) are independent copies of (O,R).
-
\(\sigma ^{2} =\sup _{x}\mathbb{V}\mathrm{ar}(R\vert O = x) < +\infty \) .
-
Let K be a finite positive integer and Φ be the linear vector space spanned by some functions (ϕ 1 ,…ϕ K ), with dim(Φ) ≤ K. Footnote 22
Denote by μ M the empirical measure associated to (O 1 ,⋯ ,O M ), μ the probability measure of O and by \(\vert \phi \vert _{M}^{2} = \frac{1} {M}\sum _{i=1}^{M}\phi ^{2}(O_{ i})\) the empirical L 2 -measure of ϕ w.r.t. μ M , and set:
Then
The first term in the r.h.s. above is interpreted as a statistical error Footnote 23 term (due to a finite sample to compute the empirical coefficients), while the second term is an approximation error of the functions class Footnote 24 (due to finite-dimensional vector space). The first term converges to 0 as M → +∞ but it blows up if K → +∞, while the second one converges to 0 as K → +∞ (as least if Φ asymptotically spans all the functions in L 2(μ)). This bias-variance decomposition shows that there is a necessary trade-off between K and M to ensure a convergent approximation. Without this right balance, the approximation (70) may be not convergent. Furthermore, the parameter tuning can also be made optimally.
In the quoted reference [42], the space Φ could also depend on the simulations (data-driven approximation spaces).
Proof.
Assume that
Then, the announced result directly follows by taking expectations and observing that
We now prove (71). As far as computations conditionally on O 1, ⋯ , O M are concerned, without loss of generality we can assume that \((\phi _{1},\ldots \phi _{K_{M}})\) is an orthonormal family in L 2(μ M), with possibly K M ≤ K:
Consequently, the solution \(\mathop{\mathrm{arg}\min }\limits_{\phi \in \varPhi } \frac{1} {M}\sum _{i=1}^{M}\vert \phi (O_{ i}) - R_{i}\vert ^{2}\) is given by
Now, set \(\mathbb{E}^{{\ast}}(.) = \mathbb{E}(.\vert O_{1},\cdots \,,O_{M})\). Then, observe that \(\mathbb{E}^{{\ast}}(\tilde{m}_{M}(.))\) is the least-squares solution to \(\mathop{\min }\limits_{\phi \in \varPhi } \frac{1} {M}\sum _{i=1}^{M}\vert \phi (O_{ i}) - m(O_{i})\vert ^{2} =\mathop{\min }\limits_{\phi \in \varPhi }\vert \phi - m\vert _{ M}^{2}\). Indeed:
-
On the one hand, the above least-squares solution is given by \(\sum _{j=1}^{K_{M}}\alpha _{j}^{{\ast}}\phi _{j}(.)\) with \(\alpha _{j}^{{\ast}} = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})m(O_{i})\).
-
On the other hand, \(\mathbb{E}^{{\ast}}(\tilde{m}_{M}(.)) =\sum _{ j=1}^{K_{M}}\mathbb{E}^{{\ast}}(\alpha _{j})\phi _{j}(.)\) and \(\mathbb{E}^{{\ast}}(\alpha _{j}) = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})\mathbb{E}^{{\ast}}(R_{ i}) = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})\mathbb{E}(m(O_{i}) +\epsilon _{i}\vert O_{1},\cdots \,,O_{M}) =\alpha _{ j}^{{\ast}}\).
Thus, by the Pythagoras theorem, we obtain
Since (ϕ j ) j is orthonormal in L 2(μ M ), we have \(\vert \tilde{m}_{M} - \mathbb{E}^{{\ast}}(\tilde{m}_{M})\vert _{M}^{2} =\sum _{ j=1}^{K_{M}}\vert \alpha _{j} - \mathbb{E}^{{\ast}}(\alpha _{j})\vert ^{2}.\) Since \(\alpha _{j} - \mathbb{E}^{{\ast}}(\alpha _{j}) = \frac{1} {M}\sum _{i=1}^{M}\phi _{ j}(O_{i})(R_{i} - m(O_{i}))\), we obtain
taking advantage that the (ε i ) i conditionally on (O 1, ⋯O M ) are centered. This proves
The proof of (71) is complete. □
5.3.3 Least-Squares Method for Solving Discrete BSDE
We now apply the previous empirical least-squares method to numerically solve the DPE (68). For simplicity of exposure, we consider here only uniform time grids with N time steps, \(\varDelta _{i} = T/N\). In addition to assumptions of Theorem 23, we assume that the terminal condition f(. ) is bounded: then, we can easily establish the following result.
Proposition 20.
Under these assumptions, the function y i π (.) defined in (69) is bounded by a constant C ⋆ , which is independent on N and i.
Actually, C ⋆ can be given explicitly in terms of the data. To force the stability in the iterative computations of conditional expectations (68), we truncate the numerical solution at the level C ⋆ using the soft thresholding
Algorithm for Approximating y k π (⋅). At each time index 0 ≤ k ≤ N − 1, we consider a vector space Φ k spanned by basis functions p k (⋅ ), which are understood as vectors of K k functions. The final approximation of y k π(⋅ ) has the form
The coefficients α k M are computed with M independent simulations of \((X_{t_{k}}^{\pi })_{ k}\), that are denoted by \(\{(X_{t_{k}}^{\pi,m})_{ k}\}_{1\leq m\leq M}\): this single set of simulated paths are used to compute all the coefficients at once. This is done as follows:
-
Initialization: for k = N, take y N π(⋅ ) = f(⋅ ).
-
Iteration: for \(k = N - 1,\cdots \,,0\), solve the least-squares problem
$$\displaystyle{\alpha _{k}^{M} =\mathop{\mathrm{ arg}\min }\limits_{\alpha \in \mathbb{R}^{K_{k} }}\sum _{m=1}^{M}\vert y_{ k+1}^{\pi,M}(X_{ t_{k+1}}^{\pi,m})+\varDelta _{ k}g(t_{k},X_{t_{k}}^{\pi,m},y_{ k+1}^{\pi,M}(X_{ t_{k+1}}^{\pi,m}))-\alpha \cdot p_{ k}(X_{t_{k}}^{\pi,m})\vert ^{2}}$$and define \(y_{k}^{\pi,M}(\cdot ) = [\alpha _{k}^{M} \cdot p_{k}(\cdot )]_{C_{\star }}\).
Error Analysis. We now turn to the error estimates. The analysis combines the BSDE techniques (a priori estimates using stochastic calculus), regression tools as those exposed in Sect. 5.3.2, but there is a slight difference which actually requires a significant improvement in the arguments. Since we use a single set of independent paths, the “responses” \((y_{k+1}^{\pi,M}(X_{t_{k+1}}^{\pi,m}))_{ 1\leq m\leq M}\) are not independent, because of their dependence through the function y k+1 π, M. To overcome this interdependence issue in the proof, we shall replace the random function y k+1 π, M by a deterministic neighbor: of course, there is a complexity cost to cover the different function spaces in order to provide close neighbors, and the covering numbers are well controlled using the Vapnik–Chervonenkis dimension, when the function spaces are bounded (Proposition 20). This is the technical reason why we consider bounded functions. We now state a result regarding the global error, see [30, Theorem VIII.3.4] for full details.
Theorem 25.
Under the previous notations and assumptions, there is a constant C > 0 (independent on N) such that we have
When the Z-component has to be approximated as well, the estimates are slightly modified, see [54] for details.
Parameter Tuning. We conclude this analysis by providing an example of how to choose appropriately the parameters N, K k and M. Assume that the value function y π is Lipschitz continuous, uniformly in N (which usually follows from a Lipschitz terminal condition). Our objective is to achieve a global error of order \(\varepsilon = \frac{1} {N}\) for \(\max _{0\leq k\leq N}\mathbb{E}\vert Y _{t_{k}}^{\pi } - y_{ k}^{\pi,M}(X_{ t_{k}}^{N})\vert ^{2}\), i.e. the same error magnitude than the time-discretization error.
For the vector spaces Φ k , we consider those generated by functions that are constant on disjoint hypercubes of small edge. Since X π has exponential moments, we can restrict the partitioning to a compact set of \(\mathbb{R}^{d}\) with size clog(N) in any direction, and the induced error is smaller than N −1 provided that c is large enough. If the edge of the hypercube is like N −1, the vector spaces have dimension K k ∼ N d up to logarithmic factors: then, the terms from approximation error of functions class are O(N −2) and they sum up to give a contribution O(N −1) as required. A quick inspection of the upper bounds of Theorem 25 shows that the highest constraint on M comes from the statistical error: we obtain M ∼ cN 3+d, up to logarithmic terms. The complexity of the scheme is of order NM (still neglecting the log terms), because the computation of all regression coefficients at a given date has a computational cost O(Mlog(N)) due to our specific choice of function basis. Hence, the global complexity is
up to logarithmic terms. Not surprisingly, the convergence order deteriorates as the dimension increases, this is the curse of dimensionality. Had the value function been smoother, we would have used local polynomials and the convergence order would have been improved: the smoother the functions, the better the convergence rate.
In practice, the algorithm has been performed on a computer up to dimension d = 10 with satisfactory results and rather short computational times (less than 1 min). There are several possible improvements to this basic version of the algorithm.
-
Instead of writing the DPE between t i and t i+1, it can be written between t i and T: it has the surprising effect (mathematically justified) to reduce the propagation of errors in the DPE. This scheme is called MDP scheme (for Multi step forward Dynamic Programming equation) and it is studied in [39].
Complementary References. For theoretical aspects, see [16, 56, 59, 61]; for applications, see [17, 18]; for numerics, see [5, 7, 11, 14, 32, 40, 54, 69]. This list is not exhaustive.
Notes
- 1.
A Gaussian random variable X (see [46]) with mean μ and variance σ 2 > 0 (often denoted by \(\mathcal{N}(\mu,\sigma ^{2})\)) is the r.v. with density
$$\displaystyle{ g_{\mu,\sigma ^{2}}(x) = \frac{1} {\sigma \sqrt{2\pi }}\mathrm{exp}[-\frac{(x-\mu )^{2}} {2\sigma ^{2}} ],\quad x \in \mathbb{R}. }$$If σ 2 = 0, X = μ with probability 1. Moreover, for any \(u \in \mathbb{R}\), \(\mathbb{E}(e^{\mathit{uX}}) = e^{u\mu +\frac{1} {2} u^{2}\sigma ^{2}}\).
- 2.
Two random variables X 1 and X 2 are independent if and only if \(\mathbb{E}(f(X_{1})g(X_{2})) = \mathbb{E}(f(X_{1}))\mathbb{E}(g(X_{2}))\) for any bounded functions f and g. This extends similarly to a vector.
- 3.
\((X_{1},\ldots,X_{n})\) is a Gaussian vector if and only if for any \((\lambda _{i})_{1\leq i\leq n} \in \mathbb{R}^{n}\), \(\sum _{i=1}^{n}\lambda _{i}X_{i}\) has a Gaussian distribution. Independent Gaussian random variables form a Gaussian vector. A process (X t ) t is Gaussian if \((X_{t_{1}},\ldots,X_{t_{n}})\) is a Gaussian vector for any times \((t_{1},\ldots,t_{n})\) and any n. A Gaussian process is characterized by its mean \(m(t) = \mathbb{E}(X_{t})\) and its covariance function \(K(s,t) = \mathbb{C}\mathrm{ov}(X_{s},X_{t})\).
- 4.
We recall that “an event A occurs a.s.” (almost surely) if \(\mathbb{P}(\omega:\omega \in A) = 1\) or equivalently if {w: w∉A} is a set of zero probability measure.
- 5.
Here, we use the following standard result: let (X n ) n ≥ 1 be a sequence of random variables, each having the Gaussian distribution with mean μ n and variance σ n 2. If the distribution of X n converges, then (μ n , σ n 2) converge to (μ, σ 2), and the limit distribution is Gaussian with mean μ and variance σ 2. We recall that if X n converges a.s., then it also converges in distribution.
- 6.
This growth condition can be relaxed into \(\vert f(x)\vert \leq C\exp \left (\frac{\vert x\vert ^{2}} {2\alpha } \right )\) for any x, for some positive constants C and α: in that case, the smoothness of the function u is satisfied for t < α only.
- 7.
Here again, the boundedness could be relaxed to some exponential growth.
- 8.
Meaning that for a deterministic positive constant C, \(\mathbb{P}(U \leq C) = 1\).
- 9.
Actually, (19) proves that M is a martingale and the result to be proved is related to the optional sampling theorem.
- 10.
Indeed, the result gives uniqueness and not the existence.
- 11.
- 12.
Indeed, we can show that \(\mathbb{E}(\sigma _{M}^{2}) = \mathbb{V}\mathrm{ar}(X)\).
- 13.
In fact, it generally depends on the regularity of u.
- 14.
This labeling comes from the infinitesimal decomposition of \(\mathbb{E}(f(X_{t}))\) as time is small, \(\partial _{t}\mathbb{E}(f(X_{t}))\vert _{t=0} = L_{b,\sigma ^{2}}^{\mathtt{ABM}}f(x)\), see Proposition 12.
- 15.
Integrability is the right assumption.
- 16.
That is the sum of \(\sum _{t_{i}\leq t}\vert A_{t_{i+1}} - A_{t_{i}}\vert \) exists and is finite, for instance A is continuously differentiable.
- 17.
Leading to the notion of strong solution; the case of non-smooth coefficients is much more delicate and related to weak solutions, see [67].
- 18.
Up to a set of zero probability measure.
- 19.
To simplify the exposure.
- 20.
Uniquely defined if y is close to the boundary.
- 21.
Meaning that \(m(O) = \mathbb{E}(R\vert O)\).
- 22.
There may be some colinearities within (ϕ j ) 1≤j≤K.
- 23.
Also called variance term.
- 24.
Squared bias term.
References
Achdou, Y., Pironneau, O.: Computational Methods for Option Pricing. SIAM Series. Frontiers in Applied Mathematics, Philadelphia (2005)
Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability, vol. 57. Springer, New York (2007)
Avikainen, R.: On irregular functionals of SDEs and the Euler scheme. Financ. Stoch. 13, 381–401 (2009)
Bachelier, L.: Théorie de la spéculation. Ph.D. thesis, Ann. Sci. École Norm. Sup. (1900)
Bally, V., Pagès, G.: Error analysis of the optimal quantization algorithm for obstacle problems. Stoch. Process. Appl. 106(1), 1–40 (2003)
Bally, V., Talay, D.: The law of the Euler scheme for stochastic differential equations: I. Convergence rate of the distribution function. Probab. Theory Relat. Fields 104(1), 43–60 (1996)
Bender, C., Denk, R.: A forward scheme for backward SDEs. Stoch. Process. Their Appl. 117(12), 1793–1823 (2007)
Bender, C., Steiner, J.: Least-squares Monte Carlo for BSDEs. In: Carmona, R., Del Moral, P., Hu, P., Oudjane, N. (eds.) Numerical Methods in Finance. Series: Springer Proceedings in Mathematics, vol. 12. Springer, Berlin (2012)
Ben Zineb, T., Gobet, E.: Preliminary control variates to improve empirical regression methods. Monte Carlo Methods Appl. 19(4), 331–354 (2013)
Bossy, M., Gobet, E., Talay, D.: Symmetrized Euler scheme for an efficient approximation of reflected diffusions. J. Appl. Probab. 41(3), 877–889 (2004)
Bouchard, B., Touzi, N.: Discrete time approximation and Monte Carlo simulation of backward stochastic differential equations. Stoch. Process. Their Appl. 111, 175–206 (2004)
Breiman, L.: Probability. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1992) [Corrected reprint of the 1968 original]
Cessenat, M., Dautray, R., Ledanois, G., Lions, P.L., Pardoux, E., Sentis, R.: Méthodes probabilistes pour les équations de la physique. Collection CEA, Eyrolles (1989)
Crisan, D., Manolarakis, K.: Solving backward stochastic differential equations using the cubature method. Preprint (2010)
Durrett, R.: Brownian Motion and Martingales in Analysis. Wadsworth Mathematics Series. Wadsworth International Group, Belmont (1984)
El Karoui, N., Kapoudjian, C., Pardoux, E., Peng, S., Quenez, M.C.: Reflected solutions of backward SDE’s and related obstacle problems for PDE’s. Ann. Probab. 25(2), 702–737 (1997)
El Karoui, N., Peng, S.G., Quenez, M.C.: Backward stochastic differential equations in finance. Math. Financ. 7(1), 1–71 (1997)
El Karoui, N., Hamadène, S., Matoussi, A.: Backward stochastic differential equations and applications. In: Carmona, R. (ed.) Indifference Pricing: Theory and Applications, Chap. 8, pp. 267–320. Princeton University Press, Princeton (2008)
Föllmer, H.: Calcul d’Itô sans probabilités. In: Seminar on Probability, XV (University of Strasbourg, Strasbourg, 1979/1980) (French), pp. 143–150. Springer, Berlin (1981)
Freidlin, M.I.: Functional Integration and Partial Differential Equations. Annals of Mathematics Studies. Princeton University Press, Princeton (1985)
Friedman, A.: Partial Differential Equations of Parabolic Type. Prentice-Hall, Eglewood Cliffs (1964)
Friedman, A.: Stochastic Differential Equations and Applications, vol. 1. Academic, New York (1975) [A Subsidiary of Harcourt Brace Jovanovich, Publishers, XIII]
Friedman, A.: Stochastic Differential Equations and Applications, vol. 2. Academic, New York (1976) [A Subsidiary of Harcourt Brace Jovanovich, Publishers, XIII]
Gavalas, G.R.: Nonlinear Differential Equations of Chemically Reacting Systems. Springer Tracts in Natural Philosophy, vol. 17. Springer, New York (1968)
Geiss, C., Geiss, S., Gobet, E.: Generalized fractional smoothness and L p -variation of BSDEs with non-Lipschitz terminal condition. Stoch. Process. Their Appli. 122(5), 2078–2116 (2012)
Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2003)
Gobet, E.: Euler schemes for the weak approximation of killed diffusion. Stoch. Process. Their Appl. 87, 167–197 (2000)
Gobet, E.: Euler schemes and half-space approximation for the simulation of diffusions in a domain. ESAIM Probab. Stat. 5, 261–297 (2001)
Gobet, E.: Advanced Monte Carlo methods for barrier and related exotic options. In: Ciarlet, P.G., Bensoussan, A., Zhang, Q. (eds.) Handbook of Numerical Analysis, vol. XV. Special Volume: Mathematical Modeling and Numerical Methods in Finance, pp. 497–528. Elsevier/North-Holland, Netherlands (2009)
Gobet, E.: Méthodes de Monte-Carlo et processus stochastiques: du linéaire au non-linéaire. Editions de l’École Polytechnique, Palaiseau (2013)
Gobet, E., Labart, C.: Error expansion for the discretization of backward stochastic differential equations. Stoch. Process. Their Appl. 117(7), 803–829 (2007)
Gobet, E., Labart, C.: Solving BSDE with adaptive control variate. SIAM Numer. Anal. 48(1), 257–277 (2010)
Gobet, E., Lemor, J.P.: Numerical simulation of BSDEs using empirical regression methods: Theory and practice. In: Proceedings of the Fifth Colloquium on BSDEs, 29th May–1st June 2005, Shangai. Available at http://hal.archives-ouvertes.fr/hal-00291199/fr/ (2006)
Gobet, E., Maire, S.: Sequential control variates for functionals of Markov processes. SIAM J. Numer. Anal. 43(3), 1256–1275 (2005)
Gobet, E., Maire, S.: Sequential Monte Carlo domain decomposition for the Poisson equation. In: Proceedings of the 17th IMACS World Congress, Scientific Computation, Applied Mathematics and Simulation, Paris, 11–15 July 2005
Gobet, E., Makhlouf, A.: L2-time regularity of BSDEs with irregular terminal functions. Stoch. Proces. Their Appli. 120, 1105–1132 (2010)
Gobet, E., Menozzi, S.: Stopped diffusion processes: Boundary corrections and overshoot. Stoch. Process. Their Appl. 120, 130–162 (2010)
Gobet, E., Munos, R.: Sensitivity analysis using Itô-Malliavin calculus and martingales. Application to stochastic control problem. SIAM J. Control Optim. 43(5), 1676–1713 (2005)
Gobet, E., Turkedjiev, P.: Linear regression MDP scheme for discrete backward stochastic differential equations under general conditions (2013) [In Revision for Mathematics of Computation]. Available at http://hal.archives-ouvertes.fr/hal-00642685
Gobet, E., Lemor, J.P., Warin, X.: A regression-based Monte Carlo method to solve backward stochastic differential equations. Ann. Appl. Probab. 15(3), 2172–2202 (2005)
Golub, G., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Balimore (1996)
Gyorfi, L., Kohler, M., Krzyzak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer, New York (2002)
Hodgkin, A.L., Huxley, A.F., Katz, B.: Measurement of current-voltage relations in the membrane of the giant axon of Loligo. J. Physiol. 116, 424–448 (1952). Available at http://www.sfn.org/skins/main/pdf/HistoryofNeuroscience/hodgkin1.pdf
Ito, K.: On stochastic differential equations. Mem. Am. Math. Soc. (4), 1–51 (1951)
Itô, K., McKean, H.P.: Diffusion Processes and Their Sample Paths. Springer, Berlin (1965)
Jacod, J., Protter, P.: Probability Essentials, 2nd edn. Springer, Berlin (2003)
Jourdain, B., Lelong, J.: Robust adaptive importance sampling for normal random vectors. Ann. Appl. Probab. 19(5), 1687–1718 (2009)
Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus, 2nd edn. Springer, New York (1991)
Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Springer, Berlin (1995)
Kobylanski, M.: Backward stochastic differential equations and partial differential equations with quadratic growth. Ann. Probab. 28(2), 558–602 (2000)
Kolmogorov, A.N., Petrovsky, I.G., Piskunov, N.S.: Etude de l’équation de la diffusion avec croissance de la quantité de matière et son application à un problème biologique. Bulletin Université d’État à Moscou, Série internationale A 1, pp. 1–26 (1937)
Kunita, H.: Stochastic differential equations and stochastic flows of diffeomorphisms. In: Ecole d’Eté de Probabilités de St-Flour XII, 1982. Lecture Notes in Mathematics, vol. 1097, pp. 144–305. Springer, Berlin (1984)
Lapeyre, B., Pardoux, E., Sentis, R.: Methodes de Monte Carlo pour les processus de transport et de diffusion. Collection Mathématiques et Applications, vol. 29. Springer, Berlin (1998)
Lemor, J.P., Gobet, E., Warin, X.: Rate of convergence of an empirical regression method for solving generalized backward stochastic differential equations. Bernoulli 12(5), 889–916 (2006)
Lévy, P.: Sur certains processus stochastiques homogènes. Compos. Math. 7, 283–339 (1939)
Ma, J., Yong, J.: Forward-Backward Stochastic Differential Equations. Lecture Notes in Mathematics, vol. 1702. Springer, Berlin (1999)
Nelson, E.: Dynamical Theories of Brownian Motion. Princeton University Press, Princeton (1967)
Newton, N.J.: Variance reduction for simulated diffusions. SIAM J. Appl. Math. 54(6), 1780–1805 (1994)
Pardoux, E.: Backward stochastic differential equations and viscosity solutions of systems of semilinear parabolic and elliptic PDEs of second order. In: Stochastic Analysis and Related Topics, VI (Geilo, 1996). Progress in Probability, vol. 42, pp. 79–127. Birkhäuser, Boston (1998)
Pardoux, E., Peng, S.G.: Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 14(1), 55–61 (1990)
Pardoux, E., Peng, S.: Backward stochastic differential equations and quasilinear parabolic partial differential equations. In: Stochastic Partial Differential Equations and Their Applications. Proceedings of the IFIP International Conference, Charlotte/NC, 1991. Lecture Notes in Control and Information Sciences, vol. 176, pp. 200–217. Springer, Berlin (1992)
Peirano, É., Talay, D.: Domain decomposition by stochastic methods. In: Domain Decomposition Methods in Science and Engineering, pp. 131–147 (electronic). National Autonomous University of Mexico, México (2003)
Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion, 3rd edn. Comprehensive Studies in Mathematics. Springer, Berlin (1999)
Sabelfeld, K.K.: Monte Carlo Methods in Boundary Value Problems. Springer Series in Computational Physics. Springer, Berlin (1991) [translated from Russian]
Samuelson, P.A.: Proof that properly anticipated prices fluctuate randomly. Ind. Manag. Rev. 6, 42–49 (1965)
Shigesada, N., Kawasaki, K. (eds.): Biological Invasions: Theory and Practice. Oxford Series in Ecology and Evolution. Oxford University Press, Oxford (1997)
Stroock, D.W., Varadhan, S.R.S.: Multidimensional Diffusion Processes, 2nd edn. Comprehensive Studies in Mathematics. Springer, Berlin (1997)
Talay, D., Tubaro, L.: Expansion of the global error for numerical schemes solving stochastic differential equations. Stoch. Anal. Appl. 8(4), 94–120 (1990)
Zhang, J.: A numerical scheme for BSDEs. Ann. Appl. Probab. 14(1), 459–488 (2004)
Acknowledgements
The author’s research is partly supported by the Chair Financial Risks of the Risk Foundation and the Finance for Energy Market Research Centre.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Gobet, E. (2014). Introduction to Stochastic Calculus and to the Resolution of PDEs Using Monte Carlo Simulations. In: Parés, C., Vázquez, C., Coquel, F. (eds) Advances in Numerical Simulation in Physics and Engineering. SEMA SIMAI Springer Series, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-319-02839-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-02839-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02838-5
Online ISBN: 978-3-319-02839-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)