The finite-difference approach with equidistant grids is easy to understand and straightforward to implement. Resulting uniform rectangular grids are comfortable, but in many applications not flexible enough. Steep gradients of the solution require a finer grid locally such that the difference quotients provide good approximations of the differentials. On the other hand, a flat gradient may be well modeled on a coarse grid. Arranging such a flexibility of the grid with finite-difference methods is possible but cumbersome.

An alternative type of methods for solving PDEs that does provide high flexibility is the class of finite-element methods (FEM). A “finite element” designates a mathematical topic such as an interval and thereupon defined a piece of function. There are alternative names such as variational methods, or weighted residuals, or Ritz–Galerkin methods. These names hint at underlying principles that serve to derive suitable equations. As these different names suggest, there are several different approaches leading to finite elements. The methods are closely related.

The flexibility of finite-element methods is not only favorable to approximate functions, but also to approximate domains of computation that are not rectangular. This is important for multifactor options. For the one-dimensional situation of standard options, the possible improvement of a finite-element method over the standard methods of the previous chapter is not significant. With the focus on standard options, Chap. 5 may be skipped on first reading. But options with several underlyings may lead to domains of computation that are more “fancy.”

For example, a two-asset basket with portfolio value α 1 S 1 + α 2 S 2 in the case of a call option leads to a payoff of type Ψ(S 1, S 2) = (α 1 S 1 + α 2 S 2K)+. If such an option is endowed with barriers, then it is reasonable to set up barriers such that the payoff takes a constant value. For the two-asset basket, this amounts to barrier lines α 1 S 1 + α 2 S 2 = constant. This naturally leads to trapezoidal shapes of domains. For a special case with two knock-out barriers the payoff and the domain are illustrated by Fig. 5.1. This example will be considered in Sect. 5.4, see the domain in Fig. 5.5. In more complicated examples, the domain may be elliptic ( Exercise 5.1). In such situations of non-rectangular domains, finite elements are ideally applicable and highly recommendable.

Fig. 5.1
figure 1

Payoff Ψ(S 1, S 2) of a call on a two-asset basket, with knock-out barrier (Example 5.6)

Faced with the huge field of finite-element methods, in this chapter we confine ourselves to a step-by-step exposition towards the solution of two-asset options. We start with an overview on basic approaches and ideas (in Sect. 5.1). Then, in Sect. 5.2, we describe the approximation with the simplest finite elements, namely, piecewise straight-line segments, and apply this to a stationary model problem. These approaches will be applied to the time-dependent situation of pricing standard options, in Sect. 5.3. This sets the stage to the main application of FEM in financial engineering, options on two or more assets. Section 5.4 will present an application to an exotic option with two underlyings. Here we derive a weak form of the PDE, and discuss boundary conditions. Finally, in Sect. 5.5, we will introduce to error estimates. Methods more subtle than just the Taylor expansion of the discretization error are required to show that quadratic convergence is possible with unstructured grids and nonsmooth solutions. To keep the exposition of an error analysis short, we concentrate on the one-dimensional situation. But the ideas extend to multidimensional scenarios.

5.1 Weighted Residuals

Many of the principles on which finite-element methods are based, can be interpreted as weighted residuals. What does this mean? This heading points at ways in which a discretization can be set up, and how an approximation can be defined. There lies a duality in a discretization. This is illustrated by means of Fig. 5.2, which shows a partition of an x-axis. This discretization is either represented by

  1. (a)

    discrete grid points x i , or by

    Fig. 5.2
    figure 2

    Discretization of a continuum

  2. (b)

    a set of subintervals.

The two ways to see a discretization lead to different approaches of constructing an approximation w. Let us illustrate this with the one-dimensional situation of Fig. 5.3. An approximation w based on finite differences is built on the grid points and primarily consists of discrete points (Fig. 5.3a). In contrast, finite elements are founded on subdomains (intervals in Fig. 5.3b) with piecewise functions, which are defined by suitable criteria and constitute a global approximation w. In a narrower sense, a finite element is a pair consisting of one piece of subdomain and the corresponding function defined thereupon, mostly a polynomial. Figure 5.3 reflects the respective basic approaches; in a second step the isolated points of a finite-difference calculation can well be extended to continuous piecewise functions by means of interpolation ( Appendix C.1).

Fig. 5.3
figure 3

Two kinds of approximations (one-dimensional situation)

A two-dimensional domain can be partitioned into triangles, for example, where w is again represented by piecewise polynomials. Figure 5.4 depicts the simplest such situation, namely, a triangle in an (x, y)-plane, and a piece of a linear function defined thereupon. Figure 5.5 below will provide an example how triangles easily fill a seemingly “irregular” domain.

Fig. 5.4
figure 4

A simple finite element in two dimensions, based on a triangle

As will be shown next, the approaches of finite-element methods use integrals. If done properly, integrals require less smoothness. This often matches applications better and adds to the flexibility of finite-element methods. The integrals can be derived in a natural way from minimum principles, or are constructed artificially. Finite elements based on polynomials make the calculation of the integrals easy.

5.1.1 The Principle of Weighted Residuals

To explain the principle of weighted residuals we discuss the formally simple case of the differential equation

$$\displaystyle{ Lu = f\,. }$$
(5.1)

Here L symbolizes a linear differential operator. Important examples are

$$\displaystyle\begin{array}{rcl} Lu:& =& -u''\ \text{ for }\ u(x),\ \text{ or }{}\end{array}$$
(5.2)
$$\displaystyle\begin{array}{rcl} Lu:& =& -u_{xx} - u_{yy}\ \text{ for }\ u(x,y)\,.{}\end{array}$$
(5.3)

The right-hand side f is a problem-dependent function. Solutions u of the differential equation (5.1) are studied on a domain \(\mathcal{D}\subseteq \mathbb{R}^{n}\), with n = 1 in (5.2) and n = 2 in (5.3). The piecewise approach starts with a partition of the domain into a finite number m of subdomains \(\mathcal{D}_{k}\),

$$\displaystyle{ \mathcal{D} =\bigcup _{ k=1}^{m}\mathcal{D}_{ k}\,. }$$
(5.4)

All boundaries of \(\mathcal{D}\) should be included, and approximations to u are calculated on the closure of \(\mathcal{D}\). The partition is assumed disjoint up to the boundaries of \(\mathcal{D}_{k}\), so \(\mathcal{D}_{j}^{\circ }\cap \mathcal{D}_{k}^{\circ } =\emptyset\) for jk. In the one-dimensional case (n = 1), for example, the \(\mathcal{D}_{k}\) are subintervals of a whole interval \(\mathcal{D}\). In the two-dimensional case, (5.4) may describe a partition into triangles, as illustrated in Fig. 5.5.

Fig. 5.5
figure 5

A simple regular finite-element discretization of a domain \(\mathcal{D}\) into triangles \(\mathcal{D}_{k}\) (see Example 5.6)

The ansatz for approximations w to a solution u is a basis representation with N basis functions φ i ,

$$\displaystyle{ w:=\sum _{ i=1}^{N}c_{ i}\,\varphi _{i}\,. }$$
(5.5)

The functions φ i are also called trial functions. In the case of one independent variable x the \(c_{i} \in \mathbb{R}\) are constant coefficients, and the φ i are functions of x. Typically, N is chosen and φ 1, , φ N are prescribed. Depending on this choice, the free parameters c 1, , c N are to be determined such that wu. The ansatz (5.5) was suggested by Ritz in 1908.

We have m subdomains and N basis functions. In the one-dimensional situation (n = 1), nodes and subintervals interlace, and m and N essentially can be identified. For n = 1 the two numbers m and N differ by at most one, depending on whether the solution is known or unknown at the end points of the interval \(\mathcal{D}\). In the latter case it is convenient to have the summation index in (5.5) run as i = 0, , m. For dimensions n > 1 the number m of subdomains (e.g. triangles in case n = 2) in general is different from the number N of basis functions (nodesFootnote 1). For example, in Fig. 5.5 we have 75 triangles and 51 nodes; 26 of the nodes are interior nodes and 25 are placed along the boundary. That is, 1 ≤ k ≤ 75. The number N refers to the number of nodes for which a value of u is to be approximated.

One strategy to determine the coefficients c i is based on the residual function

$$\displaystyle{ R(w):= Lw - f\,. }$$
(5.6)

We look for a w such that the residual R becomes “small.” Since the φ i are considered prescribed, in view of (5.5) N conditions or equations must be established to define and calculate the unknown c 1, , c N  . To this end we weight the residual R by introducing N weighting functions (test functions) ψ 1, , ψ N and require

$$\displaystyle{ \int _{\mathcal{D}}R(w)\,\psi _{j}\,\mathrm{d}\mathcal{D} = 0\quad \text{ for }j = 1,\ldots,N\,. }$$
(5.7)

This amounts to the requirement that the residual be orthogonal to the set of weighting functions ψ j . The “\(\mathrm{d}\mathcal{D}\)” in (5.7) symbolizes the integration that matches \(\mathcal{D}\subseteq \mathbb{R}^{n}\), as dx for n = 1. For ease of notation, we frequently drop dx as well as the \(\mathcal{D}\) at the n-dimensional integral. For the model problem (5.1) the system of Eqs. (5.7) consists of the N equations

$$\displaystyle{ \int _{\mathcal{D}}Lw\,\psi _{j} =\int _{\mathcal{D}}f\,\psi _{j}\quad (\,j = 1,\ldots,N) }$$
(5.8)

for the N unknowns c 1, , c N , which define w. Often the equations in (5.8) are written using a formulation with inner products,

$$\displaystyle{(Lw,\psi _{j}) = (\,f,\psi _{j})\,,}$$

defined as the corresponding integrals in (5.8). For linear L the ansatz (5.5) implies

$$\displaystyle{\int Lw\,\psi _{j} =\int \left (\sum _{i}c_{i}L\varphi _{i}\right )\psi _{j} =\sum _{i}c_{i}\mathop{\underbrace{ \int L\varphi _{i}\psi _{j}}}\limits _{=:a_{ij}}\,.}$$

The integrals a ij constitute a matrix A. The r j : = ∫fψ j set up the elements of a vector r and the coefficients c j a vector c = (c 1, , c N )tr. In vector notation the system of equations is rewritten as

$$\displaystyle{ Ac = r\,. }$$
(5.9)

This outlines the general principle, but leaves open the questions how to handle boundary conditions and how to select basis functions φ i and weighting functions ψ j . The freedom to choose trial functions φ i and test functions ψ j allows to construct several different methods. For the time being suppose that these functions have sufficient potential to be differentiated or integrated. We will enter a discussion of relevant function spaces in Sect. 5.5.

5.1.2 Examples of Weighting Functions

We postpone the choice of basis functions φ i and begin with listing important examples of how to select weighting functions ψ:

  1. 1.)

    Galerkin’s choice: Choose ψ j : = φ j for all j. Then a ij = ∫Lφ i φ j  .

  2. 2.)

    Collocation: Choose ψ j : = δ(xx j ). Here δ denotes Dirac’s delta function , which in \(\mathbb{R}^{1}\) satisfies ∫fδ(xx j ) dx = f(x j ). As a consequence,

    $$\displaystyle{\int Lw\,\psi _{j} = Lw(x_{j})\,,\quad \int f\psi _{j} = f(x_{j})\,.}$$

    That is, a system of equations Lw(x j ) = f(x j ) results, which amounts to evaluating the differential equation at selected points x j .

  3. 3.)

    Least squares:

    Choose

    $$\displaystyle{\psi _{j}:={ \partial R \over \partial c_{j}}\,.}$$

    This choice of test functions deserves its name least-squares, because to minimize (R(c 1, , c N ))2 the necessary criterion is the vanishing of the gradient, so

    $$\displaystyle{\int _{\mathcal{D}}R{\partial R \over \partial c_{j}} = 0\quad \text{ for all }j\,.}$$

5.1.3 Examples of Basis Functions

The construction of suitable basis functions φ i observes the underlying partition into subdomains \(\mathcal{D}_{k}\). Our concern will be to meet two aims: resulting methods must be accurate, and their implementation should become efficient.

The efficiency can be focused on the sparsity of matrices. In particular, if the matrix A of the linear equations is sparse, then the system can be solved efficiently even when it is large. In order to achieve sparsity we require that φ i ≡ 0 on most of the subdomains \(\mathcal{D}_{k}\). Figure 5.6 illustrates an example for the one-dimensional case n = 1. This hat function of Fig. 5.6 is the simplest example related to finite elements. It is piecewise linear, and each function φ i has a support consisting of only two subintervals, φ i (x) ≠ 0 for x ∈ support. A consequence is

$$\displaystyle{ \int _{\mathcal{D}}\varphi _{i}\varphi _{j} = 0\ \text{ for }\vert i - j\vert> 1\,, }$$
(5.10)

as well as an analogous relation for ∫φ i φ j  . We will discuss hat functions in the following Sect. 5.2. Basis functions more advanced than the canonical hat functions are constructed using piecewise polynomials of higher degree. In this way, basis functions can be obtained with \(\mathcal{C}^{1}\)- or \(\mathcal{C}^{2}\)-smoothness ( Exercise 5.2). Recall from interpolation ( Appendix C.1) that polynomials of degree three can lead to \(\mathcal{C}^{2}\)-smooth splines.

Fig. 5.6
figure 6

“Hat function”: simple choice of finite elements

5.1.4 Smoothness

We have left open how close an approximation w of (5.5)/(5.9) is to the solution u of (5.1). Clearly, R(u) = 0 and u satisfies (5.7). But w in general does not solve (5.1). The differential equation (5.1) is a stronger requirement than the integral relations (5.7).

The accuracy depends on the smoothness of the basis functions. Depending on the chosen method, different kinds of smoothness are relevant. Let us illustrate this matter on the model problem (5.2),

$$\displaystyle{Lu = -u'',\quad \text{with }\ u,\varphi,\psi \in \{\, u\,\mid \,u(0) = u(1) = 0\,\}\,.}$$

Integration by parts formally implies

$$\displaystyle{\int _{0}^{1}\varphi ''\psi = -\int _{ 0}^{1}\varphi '\psi ' =\int _{ 0}^{1}\varphi \psi ''\,,}$$

because the boundary conditions u(0) = u(1) = 0 let the nonintegral terms vanish. These three versions of the integral can be distinguished by the smoothness requirements on φ and ψ, and by the question whether the integrals exist. One will choose the integral version that corresponds to the underlying method, and to the smoothness of the solution. For example, for Galerkin’s approach the elements a ij of A consist of the integrals

$$\displaystyle{-\int _{0}^{1}\varphi _{ i}^{{\prime}}\varphi _{ j}^{{\prime}}\,.}$$

We will return to the topics of accuracy, convergence, and function spaces in Sect. 5.5 (with Appendix C.3).

5.2 Ritz–Galerkin Method with One-Dimensional Hat Functions

As mentioned before, any required flexibility is provided by finite-element methods. This holds to a larger extent in higher-dimensional spaces. In this section, for simplicity, we stick to the one-dimensional situation, \(x \in \mathbb{R}\). The dependence on the time variable t will be postponed to Sect. 5.3.

Assume a partition of the x-domain by a set of increasing mesh points x 0, , x m . A nonuniform spacing is advisable in several instances in order to improve the accuracy. For example, close to the strike, a denser grid is appropriate to mollify the lack of smoothness of a payoff. In contrast, to model infinity, one rarefies the nodes for larger x and shifts the final node x m to a large value. One strategy is to select a spacing such that locally (up to additional scaling and shifts) sinh(x i ) = η i , where η i are chosen equidistantly. A dense spacing is also advisable for barrier options close to the barrier, where the gradient of option prices is high.

5.2.1 Hat Functions

The prototype of a finite-element method makes use of the hat functions, which we define formally (compare Figs. 5.6 and 5.7).

Fig. 5.7
figure 7

Special “hat functions” φ 0 and φ m

Definition 5.1 (Hat Functions)

For 1 ≤ im − 1 set φ i (x): = 0 on all subintervals except two:

$$\displaystyle\begin{array}{rcl} \varphi _{i}(x):& =&{ x - x_{i-1} \over x_{i} - x_{i-1}}\quad \text{ for }x_{i-1} \leq x <x_{i}\,, {}\\ \varphi _{i}(x):& =&{ x_{i+1} - x \over x_{i+1} - x_{i}}\quad \text{ for }x_{i} \leq x <x_{i+1}\,, {}\\ \end{array}$$

and boundary functions φ 0, φ m nonzero on just one subinterval:

$$\displaystyle\begin{array}{rcl} & & \varphi _{0}(x):={ x_{1} - x \over x_{1} - x_{0}}\quad \text{ for }x_{0} \leq x <x_{1}\,, {}\\ & & \varphi _{m}(x):={ x - x_{m-1} \over x_{m} - x_{m-1}}\quad \text{ for }x_{m-1} \leq x \leq x_{m}\,. {}\\ \end{array}$$

For each node x i there is one hat function. These m + 1 hat functions satisfy the following properties.

Properties 5.2 (Hat Functions)

The following properties (a)–(e) hold:

  1. (a)

    The φ 0, , φ m form a basis of the space of polygons

    $$\displaystyle\begin{array}{rcl} \{\,g \in \mathcal{C}^{0}[x_{ 0},x_{m}]\ \mid \ & & g\text{ straight line on }\mathcal{D}_{k}:= [x_{k},x_{k+1}]\,, {}\\ & & \text{for all }k = 0,\ldots,m - 1\,\}\;. {}\\ \end{array}$$

    That is to say, for each polygon v on the union of \(\mathcal{D}_{0},\ldots,\mathcal{D}_{m-1}\) there are unique coefficients c 0, , c m such that

    $$\displaystyle{v =\sum _{ i=0}^{m}c_{ i}\varphi _{i}\;.}$$
  2. (b)

    On any \(\mathcal{D}_{k}\) only φ k and φ k+1 ≠ 0 are nonzero. Hence

    $$\displaystyle{\varphi _{i}\varphi _{j} = 0\ \text{ for }\ \vert i - j\vert> 1\;,}$$

    which explains (5.10).

  3. (c)

    A simple approximation of the integral \(\int _{x_{0}}^{x_{m}}f\varphi _{j}\,\mathrm{d}x\) can be calculated as follows: Substitute f by the interpolating polygon

    $$\displaystyle{f_{\mathrm{p}}:=\sum _{ i=0}^{m}f_{ i}\varphi _{i}\quad \text{, where}\quad f_{i}:= f(x_{i})\;,}$$

    and obtain for each j the approximating integral

    $$\displaystyle{I_{j}:=\int _{ x_{0}}^{x_{m} }f_{\mathrm{p}}\varphi _{j}\,\mathrm{d}x =\int _{ x_{0}}^{x_{m} }\sum _{i=0}^{m}f_{ i}\varphi _{i}\varphi _{j}\,\mathrm{d}x =\sum _{ i=0}^{m}f_{ i}\mathop{\underbrace{ \int _{x_{0}}^{x_{m}}\varphi _{ i}\varphi _{j}\,\mathrm{d}x}}\limits _{=:b_{ij}}\,.}$$

    The b ij constitute a symmetric matrix B and the f i a vector \(\bar{f}\). If we arrange all integrals I j  (0 ≤ jm) into a vector, then all integrals can be written in a compact way in vector notation as

    $$\displaystyle{B\bar{f}\;.}$$

    This will approximate the vector r in (5.9).

  4. (d)

    The “large” (m + 1)2–matrix B: = (b ij ) can be set up \(\mathcal{D}_{k}\)-elementwise by (2 × 2)-matrices (discussed below in Sect. 5.2.2). The (2 × 2)-matrices are those integrals that integrate only over a single subdomain \(\mathcal{D}_{k}\). For each \(\mathcal{D}_{k}\) in our one-dimensional setting exactly the four integrals ∫φ i φ j dx for i, j ∈ {k, k + 1} are nonzero. They can be arranged into a (2 × 2)-matrix

    $$\displaystyle{\int _{x_{k}}^{x_{k+1} }\left (\begin{array}{*{10}c} \varphi _{k}^{2} & \varphi _{k}\varphi _{k+1} \\ \varphi _{k+1}\varphi _{k}& \varphi _{k+1}^{2}\\ \end{array} \right )\,\mathrm{d}x\,.}$$

    (The integral over a matrix is understood elementwise.) These are the integrals on \(\mathcal{D}_{k}\), where the integrand is a product of the factors

    $$\displaystyle{{ x_{k+1} - x \over x_{k+1} - x_{k}}\ \text{ and }\ { x - x_{k} \over x_{k+1} - x_{k}}\,.}$$

    The four numbers

    $$\displaystyle{{ 1 \over (x_{k+1} - x_{k})^{2}}\int _{x_{k}}^{x_{k+1} }\left (\begin{array}{*{10}c} (x_{k+1} - x)^{2} & (x_{k+1} - x)(x - x_{k}) \\ (x - x_{k})(x_{k+1} - x)& (x - x_{k})^{2}\\ \end{array} \right )\,\mathrm{d}x}$$

    result. With h k : = x k+1x k integration yields the element-mass matrix ( Exercise 5.3)

    $$\displaystyle{{1 \over 6}h_{k}\left (\begin{array}{*{10}c} 2&1\\ 1 &2\\ \end{array} \right )\,.}$$
  5. (e)

    Analogously, integrating φ i φ j yields

    $$\displaystyle\begin{array}{rcl} & & \int _{x_{k}}^{x_{k+1} }\left (\begin{array}{*{10}c} \varphi _{k}^{{\prime}2} & \varphi _{k}^{{\prime}}\varphi _{k+1}^{{\prime}} \\ \varphi _{k+1}^{{\prime}}\varphi _{k}^{{\prime}}& \varphi _{k+1}^{{\prime}2}\\ \end{array} \right )\,\mathrm{d}x {}\\ & & ={ 1 \over h_{k}^{2}}\int _{x_{k}}^{x_{k+1} }\left (\begin{array}{*{10}c} (-1)^{2} & (-1)1 \\ 1(-1)& 1^{2}\\ \end{array} \right )\,\mathrm{d}x ={ 1 \over h_{k}}\left (\begin{array}{*{10}c} 1 &-1\\ -1 & 1\\ \end{array} \right )\,. {}\\ & {}\\ \end{array}$$

    These matrices are called element-stiffness matrices. They are used to set up the matrix A.

5.2.2 Assembling

The next step is to assemble the matrices A and B. It might be tempting to organize this task as follows: run a double loop on all basis indices i, j (N node indices) and check for each (i, j) on which \(\mathcal{D}_{k}\) the integral

$$\displaystyle{\int _{\mathcal{D}_{k}}\varphi _{i}\varphi _{j}}$$

is nonzero. Such a procedure of performing a double loop has the complexity of O(N 2 m). This is cumbersome as compared to the alternative of running a single loop on the subdomain index k and benefit from all relevant integrals on \(\mathcal{D}_{k}\), which are precalculated above (Fig. 5.8).

Fig. 5.8
figure 8

Assembling in the one-dimensional setting

To this end, split the integrals

$$\displaystyle{\int _{x_{0}}^{x_{m} } =\sum _{ k=0}^{m-1}\int _{ \mathcal{D}_{k}}}$$

to construct the (m + 1) × (m + 1)-matrices A = (a ij ) and B = (b ij ) additively out of the small element matrices. For the case of the one-dimensional hat functions with subintervals

$$\displaystyle{\mathcal{D}_{k} =\{\, x\ \mid \ x_{k} \leq x \leq x_{k+1}\,\}}$$

the element matrices are (2 × 2), see above. In this case only those integrals of φ i φ j and φ i φ j are nonzero, for which \(i,j \in \mathcal{I}_{k}\), where

$$\displaystyle{ i,j \in \mathcal{I}_{k}:=\{ k,k + 1\}\,. }$$
(5.11)

\(\mathcal{I}_{k}\) is the set of indices of those products of basis functions that are nonzero on \(\mathcal{D}_{k}\). The assembling algorithm performs a loop over the subdomain index k = 0, 1, , m − 1 and distributes the (2 × 2)-element matrices additively to the positions \(i,j \in \mathcal{I}_{k}\). Before the assembling is started, the matrices A and B must be initialized with zeros. For k = 0, , m − 1 one obtains for A the (m + 1)2-matrix

$$\displaystyle{ A = \left (\begin{array}{*{10}c} { 1 \over h_{0}} & -{ 1 \over h_{0}} & & & \\ -{ 1 \over h_{0}} &{ 1 \over h_{0}} +{ 1 \over h_{1}} & -{ 1 \over h_{1}} & & \\ & -{ 1 \over h_{1}} &{ 1 \over h_{1}} +{ 1 \over h_{2}} & -{ 1 \over h_{2}} & \\ & & -{ 1 \over h_{2}} & \ddots & \ddots \\ & & & \ddots &\\ \end{array} \right )\,. }$$
(5.12)

The matrix B is assembled in an analogous way. In the one-dimensional situation the matrices are tridiagonal. For an equidistant grid with h = h k the matrix A specializes to

$$\displaystyle{ A ={ 1 \over h}\left (\begin{array}{*{10}c} 1 &-1& & & & 0\\ -1 & 2 &-1 & & & \\ &-1& 2 &\ddots&\\ & & \ddots &\ddots& \ddots\\ & & &\ddots & 2 &-1 \\ 0 & & & &-1& 1\\ \end{array} \right ) }$$
(5.13)

and B to

$$\displaystyle{ B ={ h \over 6}\left (\begin{array}{*{10}c} 2&1& & & &0\\ 1 &4 &1 & & & \\ &1&4&\ddots&\\ & & \ddots &\ddots& \ddots\\ & & &\ddots &4&1 \\ 0& & & &1&2\\ \end{array} \right )\,. }$$
(5.14)

5.2.3 A Simple Application

In order to demonstrate the procedure, let us consider the simple time-independent (“stationary”) model boundary-value problem

$$\displaystyle{ Lu:= -u'' = f\ \text{ with }\ u(x_{0}) = u(x_{m}) = 0\,. }$$
(5.15)

Substituting w: = i = 0 m c i φ i into the differential equation, in view of (5.8), leads to

$$\displaystyle{\sum _{i=0}^{m}c_{ i}\int _{x_{0}}^{x_{m} }L\varphi _{i}\,\varphi _{j}\;\mathrm{d}x =\int _{ x_{0}}^{x_{m} }f\varphi _{j}\;\mathrm{d}x\,.}$$

This is the result of the Ritz–Galerkin approach. Next we apply integration by parts on the left-hand side, and invoke Property 5.2(c) on the right-hand side. The resulting system of equations is

$$\displaystyle{ \sum _{i=0}^{m}c_{ i}\mathop{\underbrace{ \int _{x_{0}}^{x_{m}}\varphi _{ i}^{{\prime}}\varphi _{ j}^{{\prime}}\;\mathrm{d}x}}\limits _{ a_{ij}} =\sum _{ i=0}^{m}f_{ i}\mathop{\underbrace{ \int _{x_{0}}^{x_{m}}\varphi _{ i}\varphi _{j}\;\mathrm{d}x}}\limits _{b_{ij}},\quad j = 0,1,\ldots,m\,. }$$
(5.16)

This system is preliminary because the homogeneous boundary conditions u(x 0) = u(x m ) = 0 are not yet taken into account.

At this state, the preliminary system of Eqs. (5.16) can be written as

$$\displaystyle{ Ac = B\bar{f}\,. }$$
(5.17)

It is easy to see that the matrix A from (5.13) is singular, because

$$\displaystyle{A(1,1,\ldots,1)^{\mbox{ $tr$}} = 0\,.}$$

The singularity reflects the fact that the system (5.17) does not have a unique solution. This is consistent with the differential equation − u″ = f(x): If u(x) is solution, then also u(x) + α for arbitrary α. Unique solvability is attained by satisfying the boundary conditions; a solution u of − u″ = f must be fixed by at least one essential boundary condition. For our example (5.15) we know in view of u(x 0) = u(x m ) = 0 the coefficients c 0 = c m = 0. This information can be inserted into the system of equations in such a way that the matrix A changes to a nonsingular matrix without losing symmetry. To this end, cancel the first and the last of the n + 1 equations in (5.17), and make use of c 0 = c m = 0. Now the inner part of size (m − 1) × (m − 1) of A remains. The matrix B is (m − 1) × (m + 1). Finally, for the special case of an equidistant grid, the system of equations is

$$\displaystyle{ \begin{array}{rcl} &&\left (\begin{array}{*{10}c} 2 &-1&& & 0\\ -1 & 2 &\ddots & & \\ & \ddots &\ddots& \ddots &\\ & &\ddots & 2&-1 \\ 0 & &&-1& 2\\ \end{array} \right )\left (\begin{array}{*{10}c} c_{1} \\ c_{2}\\ \vdots \\ c_{m-2} \\ c_{m-1}\\ \end{array} \right ) = \\ &&{h^{2} \over 6} \left (\begin{array}{*{10}c} 1&4&1& & & &0\\ &1 &4 &1 & & &\\ & & \ddots & \ddots & \ddots & & \\ & & &1&4&1&\\ 0 & & & &1 &4&1\\ \end{array} \right )\left (\begin{array}{*{10}c} \bar{f}_{0} \\ \bar{f}_{1}\\ \vdots \\ \bar{f}_{m-1} \\ \bar{f}_{m}\\ \end{array} \right )\,. \end{array} }$$
(5.18)

In (5.18) we have used an equidistant grid for sake of a lucid exposition. Our main focus is the nonequidistant version, which is also implemented easily. In case nonhomogeneous boundary conditions are prescribed, appropriate values of c 0 or c m are predefined. The importance of finite-element methods in structural engineering has lead to call the global matrix A the stiffness matrix, and B is called the mass matrix.

5.3 Application to Standard Options

Finite elements are especially advantageous in higher-dimensional spaces (several underlyings). But it also works for the one-dimensional case of standard options. This is the theme of this section. In contrast to the previous section, time must be included.

5.3.1 European Options

We know that the valuation of single-asset European options with vanilla payoff makes use of the Black–Scholes formula. But for the sake of exposition, and for non-vanilla payoff, let us briefly sketch a finite-element approach. Here we apply the FEM approach to the transformed version y τ = y xx of the Black–Scholes equation with constant parameters. In view of the general basis representation in (5.5) one may think of starting from w = ∑w i φ i (x, τ) with constant coefficients w i . This would require two-dimensional basis functions. (We shall come back to such functions in Sect. 5.4.) To make use of one-dimensional hat functions, apply a separation ansatz in the form ∑w i (τ)φ i (x) with functions w i (τ). As a consequence of this simple approach, the same x-grid is applied for all τ, which results in a rectangular grid in the (x, τ)-plane. Dirichlet boundary conditions

$$\displaystyle{y(x_{\mathrm{min}},\tau ) =\alpha (\tau ),\ y(x_{\mathrm{max}},\tau ) =\beta (\tau )}$$

mean that in view of the shape of φ 0, φ m (Definition 5.1, Fig. 5.7) the values w 0 = α or w m = β would be known. It is practical to separate known terms and restrict the sum to the terms with unknown weights w i . This can be managed by introducing a special function φ b that compensates for Dirichlet boundary conditions on y. The function φ b(x, τ) is no basis function, and is constructed in advance. For example,

$$\displaystyle{\varphi _{\mathrm{b}}(x,\tau ):= (\beta (\tau ) -\alpha (\tau ))\,{ x - x_{\mathrm{min}} \over x_{\mathrm{max}} - x_{\mathrm{min}}} +\alpha (\tau )}$$

does the job for the above boundary conditions. So φ b can be considered to be known, and the sum ∑w i φ i does not reflect any nonzero Dirichlet boundary conditions on y. Then the final ansatz is

$$\displaystyle{ \sum \limits _{i}w_{i}(\tau )\varphi _{i}(x) +\varphi _{\mathrm{b}}(x,\tau )\,, }$$
(5.19)

and the index i counts those nodes x i for which no boundary conditions of the above type are prescribed, 1 ≤ im − 1 in case two Dirichlet boundary conditions are given. The basis functions φ 1, , φ N are chosen to be the hat functions, which incorporate the discretization of the x-axis. Hence, N = m − 1, and x 0 corresponds to x min, and x m to x max. The functions w 1, , w m−1 are unknown, and w 0 = w m = 0.

Calculating derivatives of (5.19) and substituting into y τ = y xx leads to the Ritz–Galerkin approach

$$\displaystyle{\int \limits _{x_{0}}^{x_{m} }\left [\sum \limits _{i=1}^{m-1}\dot{w}_{ i}\varphi _{i} +\dot{\varphi } _{\mathrm{b}}\right ]\varphi _{j}\,\mathrm{d}x =\int \limits _{ x_{0}}^{x_{m} }\left [\sum \limits _{i=1}^{m-1}w_{ i}\varphi _{i}^{{\prime\prime}} +\varphi _{ \mathrm{ b}}^{{\prime\prime}}\right ]\varphi _{ j}\,\mathrm{d}x}$$

for j = 1, , m − 1. The overdot represents differentiation with respect to τ, and the prime with respect to x. Arranging the terms that involve derivatives of φ b into vectors a(τ), b(τ),

$$\displaystyle{a(\tau ):= \left (\begin{array}{*{10}c} \int \varphi _{\mathrm{b}}^{{\prime\prime}}(x,\tau )\,\varphi _{1}(x)\,\mathrm{d}x\\ \vdots \\ \int \varphi _{\mathrm{b}}^{{\prime\prime}}(x,\tau )\,\varphi _{m-1}(x)\,\mathrm{d}x\\ \end{array} \right )\,,\quad b(\tau ):= \left (\begin{array}{*{10}c} \int \dot{\varphi }_{\mathrm{b}}(x,\tau )\,\varphi _{1}(x)\,\mathrm{d}x\\ \vdots \\ \int \dot{\varphi }_{\mathrm{b}}(x,\tau )\,\varphi _{m-1}(x)\,\mathrm{d}x\\ \end{array} \right )\,,}$$

and using the matrices A, B as in (5.13)/(5.14), we arrive after integration by parts at

$$\displaystyle{ B\dot{w} + b = -Aw - a\,. }$$
(5.20)

Note that for the specific φ b from above φ b″ = 0 and a = 0. For vanilla options, α and β can be drawn from (4.28), and b can be set up analytically; a and b can be considered as known. This completes the semidiscretization. Time τ is still continuous, and (5.20) defines the unknown vector function w(τ): = (w 1(τ), , w m−1(τ))tr as solution of a system of ordinary differential equations. This is a method of lines approach. The lines are defined by x = x i for 1 ≤ im − 1, and the approximations along the lines are given by w i (τ). Initial conditions for τ = 0 are derived from (5.19). Assume the initial condition from the payoff as y(x, 0) = γ(x), then

$$\displaystyle{\sum _{i=1}^{N}w_{ i}(0)\varphi _{i}(x) +\varphi _{\mathrm{b}}(x,0) =\gamma (x)\,.}$$

For vanilla payoff, γ is given by (4.5)/(4.6). Specifically for x = x j the sum reduces to w j (0) ⋅ 1, leading to

$$\displaystyle{w_{j}(0) =\gamma (x_{j}) -\varphi _{\mathrm{b}}(x_{j},0)\,.}$$

To complete the discretization, time τ must be discretized. Standard software for ODEs can be applied to (5.20), in particular, codes for stiff systems. For discretizing with difference quotients consult Sect. 4.2.1. For example, apply the ODE trapezoidal rule as in (4.20) for the discretization of \(\dot{w}\) in (5.20). We leave the derivation of the resulting Crank–Nicolson type discretization as an exercise to the reader. With the usual notation of the vector w (ν) approximating w(τ ν ), the result can be written

$$\displaystyle{ \begin{array}{rcl} (B +{ \varDelta \tau \over 2}A)\,w^{(\nu +1)} & =&(B -{ \varDelta \tau \over 2}A)\,w^{(\nu )} \\ & & -{ \varDelta \tau \over 2}\,(a^{(\nu )} + a^{(\nu +1)} + b^{(\nu )} + b^{(\nu +1)})\,. \end{array} }$$
(5.21)

The structure of (5.21) strongly resembles the finite-difference approach (4.24). This similarity suggests that the order is the same, because for the finite-element A’s and B’s we have (compare (5.13)/(5.14))

$$\displaystyle{A = O\left ({1 \over \varDelta x}\right )\,\,,\quad B = O(\varDelta x)\,.}$$

The separation of the variables x and τ in (5.19) allows to investigate the orders of the discretizations separately. In Δτ, the order O(Δτ 2) of the Crank–Nicolson type approach (5.21) is clear from the ODE trapezoidal rule. It remains to derive the order of convergence with respect to the discretization in x. Because of the separation of variables it is sufficient to derive the convergence for a one-dimensional model problem. This will be done in Sect. 5.5.

5.3.2 Variational Form of the Obstacle Problem

To warm up for the discussion of the American option case, let us return to the simple obstacle problem of Sect. 4.5.5 with the obstacle function g(x), or g(x, τ). This problem can be formulated as a variational inequality. The function u solving the obstacle problem can be characterized by comparing it to functions v out of a set \(\mathcal{K}\) of competing functions

$$\displaystyle\begin{array}{rcl} \mathcal{K}:=\{& & v \in \mathcal{C}^{0}[-1,1]\mid \ v(-1) = v(1) = 0\,, {}\\ & & v(x) \geq g(x)\ \text{ for } - 1 \leq x \leq 1,\ v\text{ piecewise } \in \mathcal{C}^{1}\,\}\,. {}\\ \end{array}$$

The requirements on u imply \(u \in \mathcal{K}\). For \(v \in \mathcal{K}\) we have vg ≥ 0 and in view of − u″ ≥ 0 also − u″(vg) ≥ 0. Hence for all \(v \in \mathcal{K}\) the inequality

$$\displaystyle{\int _{-1}^{1} - u''(v - g)\,\mathrm{d}x \geq 0}$$

must hold. By the LCP formulation (4.39) the integral

$$\displaystyle{\int _{-1}^{1} - u''(u - g)\,\mathrm{d}x = 0}$$

vanishes. Subtracting yields

$$\displaystyle{\int _{-1}^{1} - u''(v - u)\,\mathrm{d}x \geq 0\ \text{ for any }v \in \mathcal{K}\,.}$$

The obstacle function g does not occur explicitly in this formulation; the obstacle is implicitly defined in \(\mathcal{K}\). Integration by parts leads to

$$\displaystyle{[\mathop{\underbrace{-u'(v - u)}}\limits _{=0}]_{-1}^{1} +\int _{ -1}^{1}u'(v - u)'\,\mathrm{d}x \geq 0\,.}$$

The integral-free term vanishes because of u(−1) = v(−1),  u(1) = v(1). In summary, we have derived the statement:

$$\displaystyle{ \begin{array}{rcl} \text{If }u\text{ solves the}&&\text{obstacle problem (4.39), then} \\ &&\qquad \int _{-1}^{1}u'(v - u)'\,\mathrm{d}x \geq 0\quad \text{ for all }v \in \mathcal{K}\,.\end{array} }$$
(5.22)

Since v varies in the set \(\mathcal{K}\) of competing functions, an inequality such as in (5.22) is called variational inequality. The characterization of u by (5.22) can be used to construct an approximation w: Instead of u, find a \(w \in \mathcal{K}\) such that the inequality (5.22) is satisfied for all \(v \in \mathcal{K}\),

$$\displaystyle{\int \limits _{-1}^{1}w'(v - w)'\,\mathrm{d}x \geq 0\quad \text{for all }v \in \mathcal{K}\,.}$$

The characterization (5.22) is related to a minimum problem, because the integral vanishes for v = u.

5.3.3 Variational Form of an American Option

Analogously as the simple obstacle problem also the problem of calculating American options can be formulated as variational problem, compare Problem 4.7. The class of competing functions must be redefined as

$$\displaystyle{ \begin{array}{rcl} \mathcal{K}:=\{\,&&v \in \mathcal{C}^{0}[x_{\min },x_{\max }]\;\mid \;{\partial v \over \partial x}\ \text{ piecewise }\mathcal{C}^{0}\,, \\ &&v(x,\tau ) \geq g(x,\tau )\ \text{ for all }x,\tau \,\ v(x,0) = g(x,0)\,, \\ &&v(x_{\max },\tau ) = g(x_{\mathrm{max}},\tau ),\ v(x_{\min },\tau ) = g(x_{\min },\tau )\,\}\,. \end{array} }$$
(5.23)

For the following, \(v \in \mathcal{K}\) for the \(\mathcal{K}\) from (5.23). Let y denote the exact solution of Problem 4.7. As solution of the partial differential inequality, y is \(\mathcal{C}^{2}\)-smooth on the continuation region, and \(y \in \mathcal{K}\). From

$$\displaystyle{v \geq g,\quad {\partial y \over \partial \tau } -{ \partial ^{2}y \over \partial x^{2}} \geq 0}$$

we deduce

$$\displaystyle{\int _{x_{\min }}^{x_{\max }}\left ({\partial y \over \partial \tau } -{ \partial ^{2}y \over \partial x^{2}}\right )(v - g)\;\mathrm{d}x \geq 0\,.}$$

Invoking the complementarity

$$\displaystyle{\int _{x_{\min }}^{x_{\max }}\left ({\partial y \over \partial \tau } -{ \partial ^{2}y \over \partial x^{2}}\right )(\,y - g)\,\mathrm{d}x = 0}$$

and subtraction gives

$$\displaystyle{\int _{x_{\min }}^{x_{\max }}\left ({\partial y \over \partial \tau } -{ \partial ^{2}y \over \partial x^{2}}\right )(v - y)\,\mathrm{d}x \geq 0\,.}$$

Integration by parts leads to the inequality

$$\displaystyle{\int _{x_{\min }}^{x_{\max }}\left ({\partial y \over \partial \tau } (v - y) +{ \partial y \over \partial x}\left ({\partial v \over \partial x} -{ \partial y \over \partial x}\right )\right )\,\mathrm{d}x -{ \partial y \over \partial x}(v - y)\Bigg\vert _{x_{\min }}^{x_{\max }} \geq 0\,.}$$

The nonintegral term vanishes, because at the boundary for x min, x max, in view of v = g, y = g, the equality v = y holds. The final result is

$$\displaystyle{ I(\,y;v):=\int _{ x_{\min }}^{x_{\max }}\left ({\partial y \over \partial \tau } \cdot (v - y) +{ \partial y \over \partial x}\left ({\partial v \over \partial x} -{ \partial y \over \partial x}\right )\right )\,\mathrm{d}x \geq 0\quad \text{for all }v \in \mathcal{K}\,. }$$
(5.24)

The exact y is characterized by the fact that the inequality (5.24) holds for all comparison functions \(v \in \mathcal{K}\). For the special choice v = y the integral takes its minimal value,

$$\displaystyle{\min _{v\in \mathcal{K}}I(\,y;v) = I(\,y;y) = 0\,.}$$

A more general question is, whether the inequality (5.24) holds for a \(\widehat{y} \in \mathcal{K}\) that is not \(\mathcal{C}^{2}\)-smooth on the continuation region.Footnote 2 The aim is:

Problem 5.3 (Weak Version)

Construct a \(\widehat{y} \in \mathcal{K}\) such that \(I(\widehat{y};v) \geq 0\) for all \(v \in \mathcal{K}\).

This formulation of our problem is called weak version, because it does not use \(\widehat{y} \in \mathcal{C}^{2}\). Solutions \(\widehat{y}\) of Problem 5.3, which are globally continuous but only piecewise \(\in \mathcal{C}^{1}\), are called weak solutions. The original partial differential equation requires \(y \in \mathcal{C}^{2}\) and hence more smoothness. Such \(\mathcal{C}^{2}\)-solutions are called strong solutions or classical solutions ( Sect. 5.5).

5.3.4 Implementation of Finite Elements

A discretized version of the weak problem is obtained by replacing the space \(\mathcal{K}\) by a finite-dimensional subspace \(\widehat{\mathcal{K}}\), which is spanned by a finite number of basis functions. That is, we search for a \(\widehat{y} \in \widehat{\mathcal{K}}\) such that

$$\displaystyle{I(\widehat{y};\widehat{v}) \geq 0\quad \text{ for all }\widehat{v} \in \widehat{\mathcal{K}}\,,}$$

where I( y; v) is defined in (5.24). This sets the arena for finite element methods.

As a first step to approximately solve the minimum problem, assume as in Sect. 5.3.1 separation approximations for \(\widehat{y}\) and \(\widehat{v}\) in the similar forms

$$\displaystyle{ \begin{array}{rcl} &&\widehat{y} =\sum _{i}w_{i}(\tau )\varphi _{i}(x)\,, \\ &&\widehat{v} =\sum _{i}v_{i}(\tau )\varphi _{i}(x)\,. \end{array} }$$
(5.25)

Summation is over a finite number of terms, which represents \(\widehat{y},\,\widehat{v} \in \widehat{\mathcal{K}}\). The reduced smoothness of these expressions match the requirements of \(\mathcal{K}\) from (5.23); time dependence is incorporated in the coefficient functions w i and v i . Since the basis functions φ i represent the x i -grid, we again perform a semidiscretization. Plugging the ansatz (5.25) into \(I(\widehat{y};\widehat{v})\) from (5.24) gives

$$\displaystyle\begin{array}{rcl} & & \int \left \{\left (\sum _{i}{\,\mathrm{d}w_{i} \over \,\mathrm{d}\tau } \varphi _{i}\right )\left (\sum _{j}(v_{j} - w_{j})\varphi _{j}\right )+\right. {}\\ & & \qquad \left.\quad \left (\sum _{i}w_{i}\varphi _{i}^{{\prime}}\right )\left (\sum _{ j}(v_{j} - w_{j})\varphi _{j}^{{\prime}}\right )\right \}\,\mathrm{d}x {}\\ & =& \sum _{i}\sum _{j}{\mathrm{d}w_{i} \over \mathrm{d}\tau } (v_{j} - w_{j})\int \varphi _{i}\varphi _{j}\,\mathrm{d}x +\sum _{i}\sum _{j}w_{i}(v_{j} - w_{j})\int \varphi _{i}^{{\prime}}\varphi _{ j}^{{\prime}}\,\mathrm{d}x \geq 0\,. {}\\ \end{array}$$

Translated into vector notation for the coefficient functions w i (τ), v i (τ), this is equivalent to

$$\displaystyle{\left ({\mathrm{d}w \over \mathrm{d}\tau } \right )^{\mbox{ $tr$}}B(v - w) + w^{\mbox{ $tr$}}A(v - w) \geq 0}$$

orFootnote 3

$$\displaystyle{(v - w)^{\mbox{ $tr$}}\left (B{\mathrm{d}w \over \mathrm{d}\tau } + Aw\right ) \geq 0\,.}$$

This is the (semi-)discretized weak version of \(I(\widehat{y};\widehat{v}) \geq 0\). The matrices A and B are defined via the assembling described above; for equidistant steps the special versions in (5.13), (5.14) arise.

As a second step, the time τ is discretized as well. To this end let us define the vectors

$$\displaystyle{w^{(\nu )}:= w(\tau _{\nu }),\quad v^{(\nu )}:= v(\tau _{\nu })\,.}$$

Upon substituting, and θ-averaging the Aw term as in Sect. 4.6.1, we arrive at the inequalities

$$\displaystyle{ \left (v^{(\nu +1)} - w^{(\nu +1)}\right )^{\mbox{ $tr$}}\left (B{ 1 \over \varDelta \tau } (w^{(\nu +1)} - w^{(\nu )}) +\theta Aw^{(\nu +1)} + (1-\theta )Aw^{(\nu )}\right ) \geq 0 }$$
(5.26)

for all ν. For θ = 1∕2 this is a Crank–Nicolson-type method. Rearranging (5.26) leads to

$$\displaystyle{\left (v^{(\nu +1)} - w^{(\nu +1)}\right )^{\mbox{ $tr$}}\left (\left (B +\varDelta \tau \,\theta A\right )w^{(\nu +1)} + \left (\varDelta \tau (1-\theta )A - B\right )w^{(\nu )}\right ) \geq 0\,.}$$

With the abbreviations

$$\displaystyle{ \begin{array}{rcl} r:& =&(B -\varDelta \tau (1-\theta )A)\,w^{(\nu )}\,, \\ C:& =&B +\varDelta \tau \,\theta A\,, \end{array} }$$
(5.27)

the inequality can be rewritten as

$$\displaystyle{ \left (v^{(\nu +1)} - w^{(\nu +1)}\right )^{\mbox{ $tr$}}\left (Cw^{(\nu +1)} - r\right ) \geq 0\,. }$$
(5.28)

This is the fully discretized version of \(I(\widehat{y};v) \geq 0\).

5.3.4.1 Side Conditions

To match the requirements of \(\mathcal{K}\), the inequalities \(\widehat{y} \geq g\) and \(\widehat{v} \geq g\) must hold. \(\widehat{y}(x,\tau ) \geq g(x,\tau )\) amounts to

$$\displaystyle{\sum w_{i}(\tau )\varphi _{i}(x) \geq g(x,\tau )\,.}$$

For hat functions φ i (with φ i (x i ) = 1 and φ i (x j ) = 0 for ji) and x = x j this implies w j (τ) ≥ g(x j , τ). With τ = τ ν we have

$$\displaystyle{w^{(\nu )} \geq g^{(\nu )};\quad \text{ analogously }v^{(\nu )} \geq g^{(\nu )}\,.}$$

For each time level ν we must find a solution that satisfies both the inequality (5.26)–(5.28) and the side condition

$$\displaystyle{w^{(\nu +1)} \geq g^{(\nu +1)}\ \text{ for all }\ v^{(\nu +1)} \geq g^{(\nu +1)}\,.}$$

In summary, the algorithm is

Algorithm 5.4 (Finite Elements for American Standard Options)

$$\displaystyle\begin{array}{rcl} & & \text{Choose }\theta \ (\theta = 1/2).\text{ Calculate }w^{(0)},\text{ and }C\text{ from (5.27)}. {}\\ & & \mathit{For\ }\nu = 1,\ldots,\nu _{\max }: {}\\ & & \quad \text{Calculate }r = (B -\varDelta \tau (1-\theta )A)w^{(\nu -1)}\text{ and }g = g^{(\nu )}\,. {}\\ & & \quad \text{Construct a }w\text{ such that for all }v \geq g {}\\ & & \qquad (v - w)^{\mbox{ $tr$}}(Cw - r) \geq 0,\quad w \geq g. {}\\ & & \quad \text{Set }w^{(\nu )}:= w\,. {}\\ \end{array}$$

This algorithm generates a discretized solution of the weak Problem 5.3: The vectors w define \(\widehat{y} \in \widehat{\mathcal{K}}\) via (5.25); \(\widehat{v}\) is not needed explicitly. Let us emphasize again the main step (FE), which is the kernel of this algorithm and the main labor: Construct w such that

$$\displaystyle{ \begin{array}{rcl} \mathbf{(FE)}\quad &&\text{for all }v \geq g \\ &&(v - w)^{\mbox{ $tr$}}(Cw - r) \geq 0\,,\quad w \geq g\,. \end{array} }$$
(5.29)

This task (FE) can be reformulated into a task we already solved in Sect. 4.6. To this end recall the finite-difference equation (4.44), replacing A by C, and b by r. There the following holds for w:

$$\displaystyle{ \begin{array}{rcl} \mathbf{(FD)}\quad &&Cw - r \geq 0\,,\quad w \geq g\,, \\ &&(Cw - r)^{\mbox{ $tr$}}(w - g) = 0\,.\qquad \qquad \end{array} }$$
(5.30)

Theorem 5.5 (Equivalence)

The solution of the problem (FE) is equivalent to the solution of problem (FD).

Proof

 

  1. a)

    (FD) ⇒ (FE):

    Let w solve (FD), so wg, and

    $$\displaystyle{(v-w)^{\mbox{ $tr$}}(Cw-r) = (v-g)^{\mbox{ $tr$}}\mathop{\underbrace{(Cw - r)}}\limits _{\geq 0}-\mathop{\underbrace{(w - g)^{\mbox{ $tr$}}(Cw - r)}}\limits _{=0}}$$

    hence (vw)tr(Cwr) ≥ 0 for all vg .

  2. b)

    (FE) ⇒ (FD):

    Let w solve (FE), so wg, and

    $$\displaystyle{v^{\mbox{ $tr$}}(Cw - r) \geq w^{\mbox{ $tr$}}(Cw - r)\quad \text{for all }v \geq g\,.}$$

    Suppose the kth component of Cwr is negative, and make v k arbitrarily large. Then the left-hand side becomes arbitrarily small, which is a contradiction. So Cwr ≥ 0. Now

    $$\displaystyle{w \geq g\ \Longrightarrow\ (w - g)^{\mbox{ $tr$}}(Cw - r) \geq 0\,.}$$

    Set in (FE) v = g, then (wg)tr(Cwr) ≤ 0. Therefore (wg)tr(Cwr) = 0.

5.3.4.2 Implementation

As a consequence of this equivalence, the solution of the finite-element problem (FE) can be calculated with the methods we applied to solve problem (FD) in Sect. 4.6. Following the exposition in Sect. 4.6.2, the kernel of the finite-element Algorithm 5.4 can be written as follows

$$\displaystyle\begin{array}{rcl} \mathbf{(FE')}& & \text{Solve }Cw = r\text{ componentwise such that} {}\\ & & \text{the side condition }w \geq g\text{ is obeyed.} {}\\ \end{array}$$

The vector v is not calculated. Boundary conditions on w are set up in the same way as discussed in Sect. 4.4 and summarized in Algorithm 4.14. Consequently, the finite-element algorithm parallels Algorithm 4.14 closely in the special case of an equidistant x-grid; there is no need to repeat this algorithm ( Exercise 5.4). In the general nonequidistant case, the off-diagonal and the diagonal elements of the tridiagonal matrix C vary with i. Then the formulation of the SOR-loop gets more involved. The details of the implementation are technical and omitted. The Algorithm 4.15 is the same in the finite-element case.

The computational results match those of Chap. 4 and are not repeated. The costs of the presented simple version of a finite-element approach are slightly lower than that of the finite-difference approach, because we can take advantage of an optimal spacing of the mesh points x i . For arguments discussing the closeness of \(\widehat{y}\) to y, we refer to Sect. 5.5.

5.4 Two-Asset Options

In Sect. 3.5.5 we discussed an option based on two assets with prices S 1, S 2. There we applied Monte Carlo to simulate the GBM model, see Example 3.9. For the mathematical model we have chosen the Black–Scholes market. The corresponding PDE for the value function V (S 1, S 2, t) is

$$\displaystyle{ \begin{array}{rcl} {\partial V \over \partial t} &+&{1 \over 2}\sigma _{1}^{2}S_{ 1}^{2}{ \partial ^{2}V \over \partial S_{1}^{2}} + (r -\delta _{1})S_{1}{ \partial V \over \partial S_{1}} - rV \\ &+&{1 \over 2}\sigma _{2}^{2}S_{ 2}^{2}{ \partial ^{2}V \over \partial S_{2}^{2}} + (r -\delta _{2})S_{2}{ \partial V \over \partial S_{2}} +\rho \sigma _{1}\sigma _{2}S_{1}S_{2}{ \partial ^{2}V \over \partial S_{1}\partial S_{2}} = 0\,, \end{array} }$$
(5.31)

with dividend rates δ 1, δ 2. (For the general case see Sect. 6.2.) Notice that for S 2 = 0 the familiar one-dimensional Black–Scholes equation results. The model is completed by a payoff function Ψ(S 1, S 2) and the terminal condition V (S 1, S 2, T) = Ψ(S 1, S 2). The computational domain \(\mathcal{D}\) is two-dimensional, \(\mathcal{D}\subset \mathbb{R}^{2}\,\) (disregarding time t).

Example 5.6 (European Call on a Basket with Double Barrier)

We consider a call on a two-asset basket with two knock-out barriers. The payoff of this exotic European-style option is

$$\displaystyle{\varPsi (S_{1},S_{2}) = (S_{1} + S_{2} - K)^{+}\,,}$$

up to the barriers (see Fig. 5.1). In the underlying basket the two assets are of equal weight. The two knock-out barriers are given by B 1 and B 2, down-and-out at B 1, and up-and-out at B 2. That is, the option ceases to exist when S 1 + S 2B 1, or when S 1 + S 2B 2; in both cases V = 0. In this example, the computational domain \(\mathcal{D}\) is easy to define: The value function is zero outside the barriers. Hence the domain is bounded by the two lines S 1 + S 2 = B 1 and S 1 + S 2 = B 2. This shape of \(\mathcal{D}\) naturally suggests to tile the domain into a grid of triangular elements \(\mathcal{D}_{k}\). One possible triangulation is shown in Fig. 5.5, where a structured regular subdivision is applied. For this example we choose the parameters

$$\displaystyle\begin{array}{rcl} & & K = 1\,,\;T = 1\,,\;\sigma _{1} =\sigma _{2} = 0.25\,,\;\rho = 0.7\,,\;r = 0.05\,, {}\\ & & \delta _{1} =\delta _{2} = 0\,,\;\;B_{1} = 1\,,\;B_{2} = 2\,. {}\\ \end{array}$$

The values V for S 1 → 0 and S 2 → 0 are known by the one-dimensional Black–Scholes equation; just set either S 1 = 0 or S 2 = 0 in (5.31). These values of single-asset double-barrier options for B 1SB 2 can be evaluated by a closed-form formula, see [172]. We shall come back to this example below.

5.4.1 Analytical Preparations

It is convenient to solve the Black–Scholes equation in divergence form. To this end, use standard PDE variables x: = S 1, y: = S 2 for the independent variables, and u(x, y, t) for the dependent variable, and derive the vector PDE for u

$$\displaystyle{ -\nabla \cdot (D(x,y)\nabla u) + b(x,y)^{\mbox{ $tr$}}\nabla u + ru = u_{t}\,. }$$
(5.32)

This makes use of the formal “nabla” vector \(\nabla:= ({ \partial \over \partial x},{ \partial \over \partial y})^{\mbox{ $tr$}}\), and

$$\displaystyle{ \begin{array}{rcl} &&D(x,y):={ 1 \over 2}\left (\begin{array}{*{10}c} \sigma _{1}^{2}x^{2}&\rho \sigma _{1}\sigma _{2}xy \\ \rho \sigma _{1}\sigma _{2}xy&\sigma _{2}^{2}y^{2}\\ \end{array} \right )\,, \\ &&b(x,y):= -\left (\begin{array}{*{10}c} (r -\delta _{1} -\sigma _{1}^{2} -\rho \sigma _{1}\sigma _{2}/2)\,x \\ (r -\delta _{2} -\sigma _{2}^{2} -\rho \sigma _{1}\sigma _{2}/2)\,y\\ \end{array} \right )\,. \end{array} }$$
(5.33)

u is the gradient of u, and the dot-product notation

$$\displaystyle{\nabla \cdot U ={ \partial U_{1} \over \partial x} +{ \partial U_{2} \over \partial y} }$$

for a vector function U denotes the divergence; the ⋅ corresponds to the scalar product, similar astr for vectors. The reader is invited to check the equivalence with (5.31) ( Exercise 5.5). The advantage of version (5.32) over (5.31) lies in a simple treatment of the second-order derivatives; they can be removed, and a weak version can be derived. This will become apparent below.

5.4.2 Weighted Residuals

The partial differential equation (5.32) can be represented by R(u, x, y, t) = 0, where

$$\displaystyle\begin{array}{rcl} R(u,x,y,t):=& -& \nabla \cdot (D(x,y)\nabla u(x,y,t)) + b(x,y)^{\mbox{ $tr$}}\nabla u(x,y,t) {}\\ & +& ru(x,y,t) -{ \partial u(x,y,t) \over \partial t} {}\\ \end{array}$$

denotes the residual. As in Sect. 5.1, the residual is used to set up an integral equation. To this end, introduce weighting functions v, multiply the residual of the PDE with v(x, y, t) and request

$$\displaystyle{ \int _{\mathcal{D}}R(u,x,y,t)\,v\;\mathrm{d}x\,\mathrm{d}y = 0\,. }$$
(5.34)

This integral over the computational domain \(\mathcal{D}\subset \mathbb{R}^{2}\) is a double integral. It depends on t, and should vanish for all 0 ≤ tT and arbitrary v. We consider u to be a solution in case (5.34) holds for “all” v. This is a weak version of the PDE and requires less regularity of its “weak” solutions u. Aspects of accuracy are postponed to Sect. 5.5.

To exploit the potential of the integral version (5.34), we transform the second-order derivatives to first order, comparable to integration by parts. The leading integral over the second-order term is

$$\displaystyle{\int _{\mathcal{D}}-\nabla \cdot (D\nabla u)\,v\;\mathrm{d}x\,\mathrm{d}y\,.}$$

The reader may check for the vector U: = vDu the formula for the divergence ∇⋅ U, namely,

$$\displaystyle{\nabla \cdot \, (vD\nabla u) = (\nabla v)^{\mbox{ $tr$}}D\nabla u + v\nabla \cdot D\nabla u\,,}$$

and hence

$$\displaystyle{-\int _{\mathcal{D}}v\,\nabla \cdot (D\nabla u)\,\mathrm{d}x\,\mathrm{d}y =\int _{\mathcal{D}}(\nabla v)^{\mbox{ $tr$}}D\nabla u\,\mathrm{d}x\,\mathrm{d}y -\int _{\mathcal{D}}\nabla \cdot (vD\nabla u)\,\mathrm{d}x\,\mathrm{d}y\,.}$$

Next we quote the divergence theorem, here for the two-dimensional situation:

$$\displaystyle{ \int _{\mathcal{D}}\nabla \cdot U\,\mathrm{d}x\,\mathrm{d}y =\int _{\partial \mathcal{D}}U^{\mbox{ $tr$}}n\,\mathrm{d}s\,, }$$
(5.35)

where \(\partial \mathcal{D}\) denotes the boundary of \(\mathcal{D}\), and n is the outward unit normal vector on \(\partial \mathcal{D}\). (n is perpendicular to the curve \(\partial \mathcal{D}\) and points away from \(\mathcal{D}\).) The parameter s measures the arclength along the boundary \(\partial \mathcal{D}\).Footnote 4 We apply the divergence theorem to the specific vector U: = vDu, and arrive at the result for the second-order term

$$\displaystyle{-\int _{\mathcal{D}}v\,\nabla \cdot (D\nabla u)\,\mathrm{d}x\,\mathrm{d}y =\int _{\mathcal{D}}(\nabla v)^{\mbox{ $tr$}}D\nabla u\,\mathrm{d}x\,\mathrm{d}y-\int _{\partial \mathcal{D}}(vD\nabla u)^{\mbox{ $tr$}}n\,\mathrm{d}s\,.}$$

In (5.32)/(5.33) the matrix D is symmetric, D = D tr. For symmetric D the integrand in the boundary integral is v(∇u)tr Dn. After the above transformations of the leading integral, we rewrite (5.34) into

$$\displaystyle{ \int _{\mathcal{D}}\left [(\nabla v)^{\mbox{ $tr$}}D\nabla u + vb^{\mbox{ $tr$}}\nabla u + ruv -{ \partial u \over \partial t} v\right ]\,\mathrm{d}x\,\mathrm{d}y-\int _{\partial \mathcal{D}}v(\nabla u)^{\mbox{ $tr$}}Dn\,\mathrm{d}s = 0\,. }$$
(5.36)

Recall that both u and v as well as ∇u and ∇v depend on x, y, t, and the integrals on t. This is the weak version of the PDE (5.32).

Next discretize the time 0 ≤ tT as in Chap. 4, say, with equidistant steps Δt. For the simplest implicit approach, the derivative with respect to time t is resolved by the first-order difference quotient,

$$\displaystyle{{\partial u(x,y,t) \over \partial t} \approx { u(x,y,t +\varDelta t) - u(x,y,t) \over \varDelta t} \,.}$$

For backward running time t,

$$\displaystyle{u_{\mathrm{pre}}:= u(x,y,t +\varDelta t)}$$

is known at time t from the calculation of the previous time level. The analogue of the fully implicit time-stepping method is then to solve (5.36) at time level t for \({\partial u \over \partial t}\) replaced by

$$\displaystyle{{1 \over \varDelta t} (u_{\mathrm{pre}} - u)\,,}$$

starting at t = TΔt with the payoff, u pre = Ψ. With this approximation, the function u in (5.36) approximates the value function V at time level t. Alternatively, a second-order time-discretization can be applied, similar as in Sect. 4.3. For the required regularity of the functions u and v, consult Sect. 5.5.

5.4.3 Boundary

Boundary conditions enter via the boundary integral around the boundary \(\partial \mathcal{D}\). In practice, the computational domain \(\mathcal{D}\) is defined by specifying \(\partial \mathcal{D}\). To this end, express the curve \(\partial \mathcal{D}\) as the union of a finite number of non-overlapping piecewise smooth boundary curves \(\partial \mathcal{D}_{1},\partial \mathcal{D}_{2},\ldots\). Each of these curves must be parameterized as in

$$\displaystyle{\partial \mathcal{D}_{1}:=\{\, (g_{1}(\xi ),h_{1}(\xi ))\,\mid \,a_{1} \leq \xi \leq b_{1}\,\}\,.}$$

In this way, an orientation is given by starting the curve at the parameter value ξ = a 1 and ending at ξ = b 1. By specifying parameter intervals as a 1ξb 1 and parametric functions as g 1, h 1, the entire boundary is defined. The convention is that the orientation is done such that the domain \(\mathcal{D}\) is on the left-hand side, as we run through the parameterizations for increasing parameter values ξ.

Now the curve \(\partial \mathcal{D}\) is defined and we address the boundary integral along that curve. It is split into a sum of integrals according to the piecewise smooth curves \(\partial \mathcal{D}_{1},\partial \mathcal{D}_{2},\ldots\). For example, the boundary of the domain in Fig. 5.5 consists of four such parts ( Exercise 5.6).

The product-type integrand f(x, y): = v(∇u)tr Dn suggests to place emphasis on two specific kinds of boundary condition, namely,

  • v is prescribed (Dirichlet boundary conditions) ,

  • (∇u)tr Dn is prescribed (Neumann boundary conditions ).

The boundary differential operator (∇u)tr Dn = n tr Du can be considered as a generalized directional derivative since \({\partial u \over \partial n} = n^{\mbox{ $tr$}}\nabla u\). Mixed boundary conditions are possible as well. If we cast the components of the vector n tr D into a vector (α 1, α 2), then all type of boundary conditions can be written in the form

$$\displaystyle{\alpha _{1}(x,y){\partial u \over \partial x} +\alpha _{2}(x,y){\partial u \over \partial y} =\alpha _{0}(x,y)\,u +\beta (x,y)}$$

with proper functions α 0 and β. Then

$$\displaystyle{v\,(\alpha _{0}(x,y)\,u +\beta (x,y))}$$

is substituted into the boundary integral, which is approximated numerically using the edges of the triangulation of \(\mathcal{D}\).

Fortunately, boundary conditions are frequently of simple form. In particular one encounters the two types

  • u = 0 (or v = 0), which is of Dirichlet type with α 1 = α 2 = β = 0 and α 0 ≠ 0.

  • (∇u)tr Dn = 0, which is of Neumann type with α 0 = β = 0 and nonzero vector (α 1, α 2).

The boundary \(\partial \mathcal{D}\) may consist, for example, of two parts \(\partial \mathcal{D}_{\mathrm{D}}\) and \(\partial \mathcal{D}_{\mathrm{N}}\) with \(\partial \mathcal{D} = \partial \mathcal{D}_{\mathrm{D}} \cup \partial \mathcal{D}_{\mathrm{N}}\), \(\partial \mathcal{D}_{\mathrm{D}} \cap \partial \mathcal{D}_{\mathrm{N}} =\emptyset\), and Dirichlet conditions on \(\partial \mathcal{D}_{\mathrm{D}}\) and Neumann conditions on \(\partial \mathcal{D}_{\mathrm{N}}\). Clearly, boundary integrals vanish for the special cases v = 0 or (∇u)tr Dn = 0. Neumann conditions are advantageous in that they need not be specified for weak formulations. This entails an advantage of FEM over discretizing the PDEs by finite differences. In the latter case, all boundary conditions must be implemented. For FEM it suffices to implement Dirichlet conditions. Defining the right boundary conditions can be demanding. Aside to be financially meaningful, another aim is the problem to be well-posed —that is, it defines a unique solution. To some extent, defining proper boundary conditions is an art.

Example 5.7 (European Binary Put as in Example 3.9)

In Chap. 3 the Example 3.9 of a binary put was simulated with Monte Carlo, and no boundary or boundary conditions were needed. Here we prepare the example to be solved by FEM. Again, x: = S 1, y: = S 2. As in Chap. 4, the domain 0 < x < , 0 < y < must be truncated to finite size. A simple choice of a computational domain is a rectangle

$$\displaystyle{\mathcal{D} =\{\, (x,y)\mid \,0 \leq x \leq x_{\mathrm{max}},\ 0 \leq y \leq y_{\mathrm{max}}\,\}}$$

with x max, y max large enough such that zero boundary conditions u = 0 can be chosen as approximation for x = x max or y = y max. The rectangle is bounded by four straight lines, which can be parameterized, for example, by

$$\displaystyle\begin{array}{rcl} & & \partial \mathcal{D}_{1}:=\{\, x =\xi,\,y = 0\ \mid \quad 0 \leq \xi \leq x_{\mathrm{max}}\,\}\,, {}\\ & & \partial \mathcal{D}_{2}:=\{\, x = x_{\mathrm{max}},\,y =\xi \ \mid \quad 0 \leq \xi \leq y_{\mathrm{max}}\,\}\,, {}\\ & & \partial \mathcal{D}_{3}:=\{\, x = x_{\mathrm{max}}-\xi,\,y = y_{\mathrm{max}}\ \mid \quad 0 \leq \xi \leq x_{\mathrm{max}}\,\}\,, {}\\ & & \partial \mathcal{D}_{4}:=\{\, x = 0,\,y = y_{\mathrm{max}} -\xi \ \mid \quad 0 \leq \xi \leq y_{\mathrm{max}}\,\}\,. {}\\ \end{array}$$

Now \(\partial \mathcal{D} = \partial \mathcal{D}_{1} \cup \partial \mathcal{D}_{2} \cup \partial \mathcal{D}_{3} \cup \partial \mathcal{D}_{4}\), and the parameterized curve has the domain on the left. Dirichlet conditions are imposed for \(\partial \mathcal{D}_{2}\) and \(\partial \mathcal{D}_{3}\), where we have chosen to approximate boundary values by requesting u = 0. For y = 0 the boundary conditions can be chosen as the values of the one-dimensional European binary put. An analytic formula for the one-dimensional case of a European binary put is

$$\displaystyle{V _{\mathrm{binP}}^{\mathrm{Eur}}(S,t):= c\,\mathrm{e}^{-r(T-t)}\,F\left (-{\log (S/K) + (r -\sigma ^{2}/2)(T - t) \over \sigma \sqrt{T - t}} \right )\,,}$$

for a face value c, with standard normal distribution F [172]. For y = 0 we set S = x. The same formula can be applied for the boundary with x = 0; then S = y. In this way, on \(\partial \mathcal{D}_{1}\) and \(\partial \mathcal{D}_{4}\) the boundary conditions are of Dirichlet type with u = V binP Eur. With this choice of boundary conditions, \(\partial \mathcal{D}_{\mathrm{D}} = \partial \mathcal{D}\) and \(\partial \mathcal{D}_{\mathrm{N}} =\emptyset\). But there is a simpler choice: As [300] points out, this Dirichlet condition is implicitly defined by the PDE, because the one-dimensional PDE is embedded in (5.31) for S 1 = 0 or S 2 = 0. So no boundary condition needs to be specified along \(\partial \mathcal{D}_{1}\) and \(\partial \mathcal{D}_{4}\). This amounts to zero Neumann conditions. Both the Dirichlet version and the Neumann version work. The latter has the advantage of avoiding the effort of evaluating V binP Eur.

The implementation of the weak form in (5.36) is straightforward when, for example, the package FreeFem++ is applied. Thereby a figure similar as Fig. 3.7 is produced easily.

5.4.4 Involved Matrices

The accuracy of FEM depends on how the grid is chosen. Algorithms for mesh generation and mesh adaption are needed, but these are demanding topics. It is cumbersome to implement a two-dimensional FEM yourself. For first results, one may work with a fixed structured grid. But in general it is advisable and comfortable to apply a FEM package to solve (5.36). Here we merely focus on how the two-dimensional analogue of the hat functions enters.

For the Ritz–Galerkin approach we apply the basis representation

$$\displaystyle{ w(x,y,t) =\sum _{i}w_{i}(t)\,\varphi _{i}(x,y) }$$
(5.37)

as approximation for u, and set v = φ j . This ansatz separates time τ and “space” (x, y). The functions φ i are defined on \(\mathcal{D}\).

For basis functions, we choose the two-dimensional hat functions, which perfectly match triangular elements. The situation is shown schematically in Fig. 5.9. There the central node l is node of several adjacent triangles, which constitute the support (shaded) on which φ l is built by planar pieces. This approach defines a tent-like hat function φ l , which is zero “outside.” By linear combination of such basis functions, piecewise planar surfaces above the computational domain are constructed. Locally, for one triangle, this may look like the element in Fig. 5.4.

Fig. 5.9
figure 9

Two-dimensional hat function φ l (x, y) (zero outside the shaded area)

Notice that ∇w = ∑w i φ i . The weak form of (5.36) leads to

$$\displaystyle\begin{array}{rcl} & & \int _{\mathcal{D}}(\nabla \varphi _{j})^{\mbox{ $tr$}}D\sum w_{i}\nabla \varphi _{i} + {}\\ & & \qquad \varphi _{j}\left [b^{\mbox{ $tr$}}(\sum w_{i}\nabla \varphi _{i}) + r\sum w_{i}\varphi _{i} -\sum { \partial w_{i} \over \partial t} \varphi _{i}\right ]\,\mathrm{d}x\,\mathrm{d}y {}\\ & -& \int _{\partial \mathcal{D}}\varphi _{j}(\sum w_{i}\nabla \varphi _{i})^{\mbox{ $tr$}}Dn\,\mathrm{d}s = 0\,, {}\\ \end{array}$$

for all j. This is a system of ODEs

$$\displaystyle{ \begin{array}{rcl} &&\sum _{i}w_{i}\int _{\mathcal{D}}\left [(\nabla \varphi _{j})^{\mbox{ $tr$}}D\nabla \varphi _{i} +\varphi _{j}b^{\mbox{ $tr$}}\nabla \varphi _{i} +\varphi _{j}r\varphi _{i}\right ]\,\mathrm{d}x\,\mathrm{d}y \\ &&\quad -\sum _{i}{\partial w_{i} \over \partial t} \int _{\mathcal{D}}\varphi _{i}\varphi _{j}\,\mathrm{d}x\,\mathrm{d}y -\sum _{i}w_{i}\int _{\partial \mathcal{D}}\varphi _{j}(\nabla \varphi _{i})^{\mbox{ $tr$}}Dn\,\mathrm{d}s = 0\,. \end{array} }$$
(5.38)

As an exercise, the reader should rewrite this ODE system in matrix-vector notation. In summary, FEM needs the integrals over the domain \(\mathcal{D}\)

$$\displaystyle\begin{array}{rcl} & &\int (\nabla \varphi _{j})^{\mbox{ $tr$}}\,D\,\nabla \varphi _{i}\quad \mbox{ ($``$ diffusion terms" )}\,, {}\\ & & \int \varphi _{j}b^{\mbox{ $tr$}}\nabla \varphi _{i}\quad \mbox{ ($``$ convection terms'' )}\,, {}\\ & & \int \gamma \varphi _{j}\varphi _{i}\quad \mbox{ ($``$ reaction terms'' )}\,, {}\\ \end{array}$$

where γ is chosen appropriately, and in addition boundary integrals along \(\partial \mathcal{D}\).

For each number k of a triangle, there are three vertices of the triangle, with node numbers i, j, l in Fig. 5.9. Hence the table \(\mathcal{I}\) of index sets that assigns nodes to triangles includes the entry

$$\displaystyle{\mathcal{I}_{k}:=\{ i,j,l\}\,.}$$

Only for the three node numbers \(i,j,l \in \mathcal{I}_{k}\) the local integrals on \(\mathcal{D}_{k}\) are nonzero. They can be arranged into 3 × 3 element matrices. For the derivation of the integrals, it makes sense to use a local numbering 1 k , 2 k , 3 k for the nodes of \(\mathcal{D}_{k}\). For each global matrix, the assembling loop over k distributes up to 27 local integrals calculated on \(\mathcal{D}_{k}\), nine integrals of each of the above three types.Footnote 5

Back to Example 5.6, we solve (5.36) with FEM. Figure 5.10 shows a FEM solution with 192 triangles. Figure 5.11 illustrates a mesh structure for higher resolution obtained with FreeFem++. In the two-dimensional case, because of higher costs, we typically confine ourselves to an accuracy lower than in the one-dimensional situation. Based on our results we state

$$\displaystyle{V (1.25,\,0.25,\,0) \approx 0.2949\,.}$$
Fig. 5.10
figure 10

Rough approximation of the value function V (S 1, S 2, 0) of a basket double-barrier call option, Example 5.6. With kind permission of Anna Kvetnaia

Fig. 5.11
figure 11

Finer approximation of the value function V (S 1, S 2, 0) of a basket double-barrier call option, Example 5.6

Example 5.8 (Heston’s PDE)

In Example 1.16 Heston’s model was introduced, where v denotes a stochastic volatility. The corresponding PDE from [178] is

$$\displaystyle{ \begin{array}{rcl} {\partial V \over \partial t} &+&{1 \over 2}vS^{2}{\partial ^{2}V \over \partial S^{2}} +{ 1 \over 2}\sigma _{\mathrm{v}}^{2}v{\partial ^{2}V \over \partial v^{2}} +\rho \sigma _{\mathrm{v}}vS{ \partial ^{2}V \over \partial S\partial v} \\ &+&rS{\partial V \over \partial S} + [\kappa (\theta -v) -\lambda v]{\partial V \over \partial v} - rV = 0\,,\end{array} }$$
(5.39)

with parameters as in (1.59), and λ standing for the market price of volatility risk. Here we are interested in solutions V (S, v, t) on part of a two-dimensional (S, v)-plane. The PDE (5.39) can be cast into version (5.32). As an exercise, the reader is encouraged to derive D and b, and with the payoff of a call and an own choice of parameters, to think about suitable boundary conditions, and to do experiments with (5.39). Note that for a call a reasonable requirement for maximum values of the volatility v is V = S. When in addition the interest rate r is replaced by a stochastic variable, the PDE is based on a three-dimensional domain [163].

5.5 Error Estimates

The similarity of the finite-element equation (5.21) with the finite-difference equation (4.24) suggests that the errors may be of the same order. In fact, numerical experiments confirm that the finite-element approach with the linear basis functions from Definition 5.1 produces errors decaying quadratically with the mesh size. Applying the finite-element Algorithm 5.4 and entering the calculated data into a diagram as Fig. 4.14, confirms the quadratic order experimentally. The proof of this order of the error is more difficult for finite-element methods because weak solutions assume less smoothness. For standard options, the separation of variables in (5.19) also separates the discussion of the order, and an analysis of the one-dimensional situation suffices. This section explains some basic ideas of how to derive error estimates. We begin with reconsidering some of the related topics that have been introduced in previous sections.

5.5.1 Strong and Weak Solutions

Our exposition will be based on the model problem (5.15). That is, the simple second-order differential equation

$$\displaystyle{ -u'' = f(x)\quad \text{ for }\alpha <x <\beta }$$
(5.40)

with given f, and homogeneous Dirichlet-boundary conditions

$$\displaystyle{ u(\alpha ) = u(\beta ) = 0 }$$
(5.41)

will serve as illustration. The differential equation is of the form Lu = f, compare (5.2). The domain \(\mathcal{D}\subseteq \mathbb{R}^{n}\) on which functions u are defined specializes for n = 1 to the open and bounded interval \(\mathcal{D} =\{\, x \in \mathbb{R}^{1}\mid \alpha <x <\beta \,\}\). For continuous f, solutions of the differential equation (5.40) satisfy \(u \in \mathcal{C}^{2}(\mathcal{D})\). In order to have operative boundary conditions, solutions u must be continuous on \(\mathcal{D}\) including its boundary, which is denoted \(\partial \mathcal{D}\). Therefore we require \(u \in \mathcal{C}^{0}(\overline{\mathcal{D}})\) where \(\overline{\mathcal{D}}:= \mathcal{D}\cup \partial \mathcal{D}\). In summary, classical solutions of second-order differential equations require

$$\displaystyle{ u \in \mathcal{C}^{2}(\mathcal{D}) \cap \mathcal{C}^{0}(\overline{\mathcal{D}})\,. }$$
(5.42)

The function space \(\mathcal{C}^{2}(\mathcal{D}) \cap \mathcal{C}^{0}(\overline{\mathcal{D}})\) must be reduced further to comply with the boundary conditions.

For weak solutions the function space is larger ( Appendix C.3). For functions u and v we define the inner product

$$\displaystyle{ (u,v):=\int _{\mathcal{D}}uv\,\mathrm{d}x\,. }$$
(5.43)

Strong solutions u of Lu = f satisfy also

$$\displaystyle{ (Lu,v) = (\,f,v)\quad \text{ for all }v\,. }$$
(5.44)

Specifically for the model problem (5.40)/(5.41) integration by parts leads to

$$\displaystyle{(Lu,v) = -\int _{\alpha }^{\beta }u''v\,\mathrm{d}x = -u'v\Big\vert _{\alpha }^{\beta } +\int _{ \alpha }^{\beta }u'v'\,\mathrm{d}x\,.}$$

The nonintegral term on the right-hand side of the equation vanishes in case also v satisfies the homogeneous boundary conditions (5.41). The remaining integral is a bilinear form, which we abbreviate

$$\displaystyle{ b(u,v):=\int _{ \alpha }^{\beta }u'v'\,\mathrm{d}x\,. }$$
(5.45)

Bilinear forms as b(u, v) from (5.45) are linear in each of the two arguments u and v. For example, b(u 1 + u 2, v) = b(u 1, v) + b(u 2, v) holds. The bilinear form (5.45) is symmetric, b(u, v) = b(v, u). For several classes of more general differential equations analogous bilinear forms are obtained. Formally, (5.44) can be rewritten as

$$\displaystyle{ b(u,v) = (\,f,v)\,, }$$
(5.46)

where we assume that v satisfies the homogeneous boundary conditions (5.41).

The Eq. (5.46) has been derived out of the differential equation, for the solutions of which we have assumed smoothness in the sense of (5.42). Many “solutions” of practical importance do not satisfy (5.42) and, accordingly, are not smooth. In several applications, u or derivatives of u have discontinuities. For instance consider the obstacle problem of Sect. 4.5.5: The second derivative u″ of the solution fails to be continuous at α and β. Therefore \(u\notin \mathcal{C}^{2}(-1,1)\) no matter how smooth the data function is, compare Fig. 4.10 As mentioned earlier, integral relations require less smoothness.

In the derivation of (5.46) the integral version has resulted as a consequence of the primary differential equation. This is contrary to wide areas of applied mathematics, where an integral relation is based on first principles, and the differential equation is derived in a second step. For example, in the calculus of variations a minimization problem may be described by an integral performance measure, and the differential equation is a necessary criterion [350]. This situation suggests considering the integral relation as an equation of its own right rather than as offspring of a differential equation. This leads to the question, what is the maximal function space such that (5.46) with (5.43), (5.45) is meaningful? That means to ask, for which functions u and v do the integrals exist? For a more detailed background we refer to Appendix C.3. For the introductory exposition of this section it may suffice to sketch the maximal function space briefly. The suitable function space is denoted \(\mathcal{H}^{1}\), the version equipped with the boundary conditions is denoted \(\mathcal{H}_{0}^{1}\). This Sobolev space consists of those functions that are continuous on \(\mathcal{D}\) and that are piecewise differentiable and satisfy the boundary conditions (5.41). This function space corresponds to the class of functions \(\mathcal{K}\) in (5.23). By means of the Sobolev space \(\mathcal{H}_{0}^{1}\) a weak solution of Lu = f is defined, where L is a second-order differential operator and b the corresponding bilinear form.

Definition 5.9 (Weak Solution)

\(u \in \mathcal{H}_{0}^{1}\) is called weak solution [of Lu = f], if b(u, v) = ( f, v) holds for all \(v \in \mathcal{H}_{0}^{1}\,\).

This definition implicitly expresses the task: find a \(u \in \mathcal{H}_{0}^{1}\) such that b(u, v) = ( f, v) for all \(v \in \mathcal{H}_{0}^{1}\). This problem is called variational problem. The model problem (5.40)/(5.41) serves as example for Lu = f; the corresponding bilinear form b(u, v) is defined in (5.45) and ( f, v) in (5.43). For the integrals (5.43) to exist, we in addition require f to be square integrable \((\,f \in \mathcal{L}^{2}\), compare Appendix C.3). Then ( f, v) exists because of the Schwarzian inequality  (C.16). In a similar way, weak solutions are introduced for more general problems; the formulation of Definition 5.9 applies.

5.5.2 Approximation on Finite-Dimensional Subspaces

For a practical computation of a weak solution the infinite-dimensional space \(\mathcal{H}_{0}^{1}\) is replaced by a finite-dimensional subspace. Such finite-dimensional subspaces are spanned by basis functions φ i . Simple examples are the hat functions of Sect. 5.2. Reminding of the important role splines play as basis functions, the finite-dimensional subspaces are denoted \(\mathcal{S}\), and are called finite-element spaces. As stated in Property 5.2(a), the hat functions φ 0, , φ m span the space of polygons. Recall that each such polygon v can be represented as linear combination

$$\displaystyle{v =\sum _{ i=0}^{m}c_{ i}\varphi _{i}\,.}$$

The coefficients c i are uniquely determined by the values of v at the nodes, c i = v(x i ). We call hat functions “linear elements” because they consist of piecewise straight lines. Apart from linear elements, for example, also quadratic or cubic elements are used, which are piecewise polynomials of second or third degree [79, 335, 382]. The attainable accuracy is different for basis functions consisting of higher-degree polynomials.

Since by definition the functions of the Sobolev space \(\mathcal{H}_{0}^{1}\) fulfill the homogeneous boundary conditions, each subspace does so as well. Again the subscript0 indicates the realization of the homogeneous boundary conditions (5.41).Footnote 6 A finite-dimensional subspace of \(\mathcal{H}_{0}^{1}\) is defined by

$$\displaystyle{ \mathcal{S}_{0}:= \left \{\,v =\sum \limits _{ i=0}^{m}c_{ i}\varphi _{i}\,\mid \,\varphi _{i} \in \mathcal{H}_{0}^{1}\,\right \}\,. }$$
(5.47)

Properties of \(\mathcal{S}_{0}\) are determined by the basis functions φ i . As mentioned earlier, basis functions with small supports give rise to sparse matrices. The partition (5.4) of \(\mathcal{D}\) is implicitly included in the definition \(\mathcal{S}_{0}\) because this information is contained in the definition of the φ i . For our purposes the hat functions suffice. The larger m is, the better \(\mathcal{S}_{0}\) approximates the space \(\mathcal{H}_{0}^{1}\), since a finer discretization (smaller \(\mathcal{D}_{k}\)) allows to approximate the functions from \(\mathcal{H}_{0}^{1}\) better by polygons. We denote the largest diameter of the \(\mathcal{D}_{k}\) by h, and ask for convergence. That is, we study the behavior of the error for h → 0 (basically m).

In analogy to the variational problem expressed in connection with Definition 5.9, a discrete weak solution w is defined by replacing the space \(\mathcal{H}_{0}^{1}\) by a finite-dimensional subspace \(\mathcal{S}_{0}\):

Problem 5.10 (Discrete Weak Solution)

Find a \(w \in \mathcal{S}_{0}\) such that b(w, v) = ( f, v) for all \(v \in \mathcal{S}_{0}\).

The quality of the approximation relies on the discretization fineness h of \(\mathcal{S}_{0}\), which is occasionally emphasized by writing w h .

5.5.3 Quadratic Convergence

Having defined a weak solution u and a discrete approximation w, we turn to the error uw. To measure the distance between functions in \(\mathcal{H}_{0}^{1}\) we use the norm ∥ ∥1 ( Appendix C.3). That is, our first aim is to construct a bound on ∥uw1. Let us suppose that the bilinear form is continuous and \(\mathcal{H}^{1}\)-elliptic:

Assumptions 5.11 (Continuous \(\mathcal{H}^{1}\)-Elliptic Bilinear Form)

  1. (a)

    There is a γ 1 > 0 such that | b(u, v) | ≤ γ 1u1v1 for all \(u,v \in \mathcal{H}^{1}\,\).

  2. (b)

    There is a γ 2 > 0 such that b(v, v) ≥ γ 2v1 2 for all \(v \in \mathcal{H}^{1}\,\).

The assumption (a) is the continuity, and the property in (b) is called \(\mathcal{H}^{1}\)-ellipticity. Under the Assumptions 5.11, the problem to find a weak solution following Definition 5.9, possesses exactly one solution \(u \in \mathcal{H}_{0}^{1}\); the same holds true for Problem 5.10. This is guaranteed by the Theorem of Lax–Milgram [53, 79]. In view of \(\mathcal{S}_{0} \subseteq \mathcal{H}_{0}^{1}\),

$$\displaystyle{b(u,v) = (\,f,v)\quad \text{ for all }v \in \mathcal{S}_{0}\,.}$$

Subtracting b(w, v) = ( f, v) and invoking the bilinearity implies

$$\displaystyle{ b(w - u,v) = 0\quad \text{ for all }v \in \mathcal{S}_{0}\,. }$$
(5.48)

The property of (5.48) is called error-projection property. The Assumptions 5.11 and the error projection are the basic ingredients to obtain a bound on the error ∥uw1:

Lemma 5.12 (Céa)

Suppose the Assumptions  5.11 are satisfied. Then

$$\displaystyle{ \Vert u - w\Vert _{1} \leq { \gamma _{1} \over \gamma _{2}}\inf _{v\in \mathcal{S}_{0}}\Vert u - v\Vert _{1}\,. }$$
(5.49)

Proof

\(v \in \mathcal{S}_{0}\) implies \(\tilde{v}:= w - v \in \mathcal{S}_{0}\). Applying (5.48) for \(\tilde{v}\) yields

$$\displaystyle{b(w - u,w - v) = 0\quad \text{ for all }v \in \mathcal{S}_{0}\,.}$$

Therefore

$$\displaystyle\begin{array}{rcl} b(w - u,w - u)& =& b(w - u,w - u) - b(w - u,w - v) {}\\ & =& b(w - u,v - u)\,. {}\\ \end{array}$$

Applying the assumptions shows

$$\displaystyle\begin{array}{rcl} \gamma _{2}\Vert w - u\Vert _{1}^{2}& \leq & \vert b(w - u,w - u)\vert =\vert b(w - u,v - u)\vert {}\\ & \leq & \gamma _{1}\Vert w - u\Vert _{1}\Vert v - u\Vert _{1}\,, {}\\ \end{array}$$

from which

$$\displaystyle{\Vert w - u\Vert _{1} \leq { \gamma _{1} \over \gamma _{2}}\Vert v - u\Vert _{1}}$$

follows. Since this holds for all \(v \in \mathcal{S}_{0}\), the assertion of the lemma is proven.

Let us check whether the Assumptions 5.11 are fulfilled by the model problem (5.40)/(5.41). For (a) this follows from the Schwarzian inequality  (C.16) with the norms

$$\displaystyle{\Vert u\Vert _{1} = \left (\int _{\alpha }^{\beta }(u^{2} + u{\prime}^{2})\,\mathrm{d}x\right )^{1/2}\,\,,\ \Vert u\Vert _{ 0} = \left (\int _{\alpha }^{\beta }u^{2}\,\mathrm{d}x\right )^{1/2}\,,}$$

because

$$\displaystyle{\left (\int _{\alpha }^{\beta }u'v'\,\mathrm{d}x\right )^{2} \leq \left (\int _{\alpha }^{\beta }u{\prime}^{2}\,\mathrm{d}x\right )\left (\int _{\alpha }^{\beta }v{\prime}^{2}\,\mathrm{d}x\right ) \leq \Vert u\Vert _{ 1}^{2}\ \Vert v\Vert _{ 1}^{2}\,.}$$

The Assumption 5.11(b) can be derived from the inequality of the Poincaré-type

$$\displaystyle{\int _{\alpha }^{\beta }v^{2}\,\mathrm{d}x \leq (\beta -\alpha )^{2}\int _{ \alpha }^{\beta }v{\prime}^{2}\,\mathrm{d}x\,,}$$

which in turn is proven with the Schwarzian inequality ( Exercise 5.10). Adding ∫v′ 2 dx on both sides leads to

$$\displaystyle{\Vert v\Vert _{1}^{2} \leq [(\beta -\alpha )^{2} + 1]\,b(v,v)\,,}$$

from which the constant γ 2 of Assumption 5.11(b) results. Hence Céa’s lemma applies to the model problem.

The next question is, how small the infimum in (5.49) may be. This is equivalent to the question, how close the subspace \(\mathcal{S}_{0}\) can approximate the space \(\mathcal{H}_{0}^{1}\) ( Fig. 5.12). We will show that for hat functions and \(\mathcal{S}_{0}\) from (5.47) the infimum is of the order O(h). Again h denotes the maximum mesh size, and the notation w h reminds us that the discrete solution depends on the grid with a spacing symbolized by h. To apply Céa’s lemma, we need an upper bound for the infimum of ∥uv1. Such a bound is found easily by a specific choice of v, which is taken as an arbitrary interpolating polygon u I. Then by (5.49)

$$\displaystyle{ \Vert u - w_{h}\Vert _{1} \leq { \gamma _{1} \over \gamma _{2}}\inf _{v\in \mathcal{S}_{0}}\Vert u - v\Vert _{1} \leq \ { \gamma _{1} \over \gamma _{2}}\Vert u - u_{\mathrm{I}}\Vert _{1}\,. }$$
(5.50)

It remains to bound the error of interpolating polygons. This bound is provided by the following lemma, which is formulated for \(\mathcal{C}^{2}\)-smooth functions u:

Fig. 5.12
figure 12

Approximation spaces

Lemma 5.13 (Error of an Interpolating Polygon)

For \(u \in \mathcal{C}^{2}\) let u I be an arbitrary interpolating polygon and h the maximal distance between two consecutive nodes. Then

  1. (a)

     \(\mathop{\max }\limits _{x}\vert u(x) - u_{\mathrm{I}}(x)\vert \leq { h^{2} \over 8} \max \vert u''(x)\vert \,\) ,

  2. (b)

     \(\mathop{\max }\limits _{x}\vert u'(x) - u_{\mathrm{I}}^{{\prime}}(x)\vert \leq h\max \vert u''(x)\vert \,\).

We leave the proof to the reader ( Exercise 5.11). Lemma 5.13 asserts

$$\displaystyle{\Vert u - u_{\mathrm{I}}\Vert _{1} = O(h)\,,}$$

which together with (5.50) implies the claimed error statement

$$\displaystyle{ \Vert u - w_{h}\Vert _{1} = O(h)\,. }$$
(5.51)

Recall that this assertion is based on a continuous and \(\mathcal{H}^{1}\)-elliptic bilinear form and on hat functions φ i . The O(h)-order in (5.51) is dominated by the unfavorable O(h)-order of the first-order derivative in Lemma 5.13(b). This low order is at variance with the actually observed O(h 2)-order attained by the approximation w h itself (not its derivative). In fact, the square order holds. The final result is

$$\displaystyle{ \Vert u - w_{h}\Vert _{0} \leq Ch^{2}\Vert u\Vert _{ 2} }$$
(5.52)

for a constant C. This result is proven with the following lemma, which is based on a tricky idea due to Nitsche.

Lemma 5.14 (Nitsche)

Assume b is a symmetric bilinear form satisfying Assumptions  5.11 , and u and w are defined as above. Then

$$\displaystyle{\Vert u - w\Vert _{1} \leq Kh^{1}\Vert \,f\Vert _{ 0}\;\mathit{\text{ implies }}\;\Vert u - w\Vert _{0} \leq Ch^{2}\Vert \,f\Vert _{ 0}\,.}$$

Proof

 Consider the auxiliary problem \(Lz =\tilde{ f}:= u - w\), with weak version

$$\displaystyle{b(z,\tilde{v}) = (\,\tilde{f},\tilde{v})_{0}\quad \text{for all }\tilde{v} \in \mathcal{H}_{0}^{1}\,,}$$

which defines z. Choose specifically \(\tilde{v} = u - w =\tilde{ f}\). Then

$$\displaystyle{b(z,u - w) = (u - w,u - w)_{0} =\Vert u - w\Vert _{0}^{2}\,.}$$

Invoking the error-projection property (5.48) we note

$$\displaystyle{0 = b(u - w,v) = b(v,u - w)\quad \text{ for all }v \in \mathcal{S}_{0}\,.}$$

Subtracting this, yields

$$\displaystyle{b(z - v,u - w) =\Vert u - w\Vert _{0}^{2}\quad \text{ for all }\;v \in \mathcal{S}_{ 0}\,.}$$

We apply the continuity of b,

$$\displaystyle{\Vert u - w\Vert _{0}^{2} \leq \gamma _{ 1}\Vert z - v\Vert _{1}\;\Vert u - w\Vert _{1}\quad \text{for all }v \in \mathcal{S}_{0}\,,}$$

and choose specifically v as the finite-element approximation of z. Then

$$\displaystyle{\Vert u - w\Vert _{0}^{2} \leq \gamma _{ 1}K_{1}h^{1}\Vert \,\tilde{f}\Vert _{ 0} \cdot K_{2}h^{1}\Vert \,f\Vert _{ 0} = Ch^{2}\Vert u - w\Vert _{ 0}\;\Vert \,f\Vert _{0}\,,}$$

from which the assertion follows.

This error of the order h 2 can be observed for the examples of Sect. 5.4, but not easily. The error is somewhat hidden among the other errors, namely, localization error, interpolation error, and the error of the time discretization.

The derivations of this section have been focused on the model problem (5.40)/(5.41) with a second-order differential equation and one independent variable x (n = 1), and have been based on linear elements. Most of the assertions can be generalized to higher-order differential equations, to higher-dimensional domains (n > 1), and to nonlinear elements. For example, in case the elements in \(\mathcal{S}\) are polynomials of degree k, and the differential equation is of order 2l, \(\mathcal{S}\subseteq \mathcal{H}^{l}\), and the corresponding bilinear form on \(\mathcal{H}^{l}\) satisfies the Assumptions 5.11 with norm ∥ ∥ l , then the inequality

$$\displaystyle{\Vert u - w_{h}\Vert _{l} \leq Ch^{k+1-l}\Vert u\Vert _{ k+1}}$$

holds. This general statement includes for k = 1,  l = 1 the special case of Eq. (5.52) discussed above. For the analysis of the general case, we refer to [79, 162]. This includes boundary conditions more general than the homogeneous Dirichlet conditions of (5.41).

5.6 Notes and Comments

On Sect. 5.1

As an alternative to piecewise defined finite elements one may use polynomials φ j that are defined globally on \(\mathcal{D}\), and that are pairwise orthogonal. Then the orthogonality is the reason for the vanishing of many integrals. Such type of methods are called spectral methods. Since the φ i are globally smooth on \(\mathcal{D}\), spectral methods can produce high accuracies. In other context, spectral methods were applied in [142]. For historical remarks on Ritz–Galerkin type methods, see [145].

Specifically designed basis functions can be generated by some low-dimensional approximation, comparable to PCA in finite dimensions ( Exercise 43). Functions are suitable that represent preferred patterns of the solution. Then the number N of modes φ i can be small. Such methods are described under the heading principal orthogonal decomposition (POD), or Karhunen–Loève expansion.

On Sect. 5.2

In the early stages of their development, finite-element methods have been applied intensively in structural engineering. In this field, stiffness matrix and mass matrix have a physical meaning leading to these names [382].

On Sect. 5.3

The approximation ∑w i (τ)φ i (x) for \(\hat{y}\) is a one-dimensional finite-element approach. The geometry of the grid and the accuracy resemble the finite-difference approach. A two-dimensional approach as in

$$\displaystyle{\sum w_{i}\varphi _{i}(x,\tau )}$$

with two-dimensional hat functions and constant w i is more involved and more flexible. Sections 5.3.25.3.4 widely follow [376].

On Sect. 5.4

For the calculation of the local integrals on an arbitrary triangle \(\mathcal{D}_{k}\) consult the special FEM literature, such as [335]. In general an irregular triangulation better exploits the potential adaptivity of FEM. In particular, close to the barriers a fine mesh is required for high accuracy [304]. Since the gradient of u varies with time, a dynamic mesh refinement might be advisable, provided accuracy or stability do not deteriorate. For American options, boundary conditions V = Ψ along the boundary are recommendable. For an illustration of assembling, see Topic 12 of the Topics fCF.

On Sect. 5.5

The assumption \(u \in \mathcal{C}^{2}\) in Lemma 5.13 can be weakened to \(u'' \in \mathcal{L}^{2}\) [351]. For domains \(\mathcal{D}\in \mathbb{R}^{2}\) the claim of Lemma 5.13 holds analogously; then the second-order derivative u″ is replaced by the Hessian matrix of the second-order derivatives of u. This can be applied to mesh adaption, where one attempts to place nodes such that the Hessian is equilibrated across the mesh. The finite-dimensional function space \(\mathcal{S}_{0}\) in (5.47) is assumed to be subspace of \(\mathcal{H}_{0}^{1}\). Elements with this property are called conforming elements. A more accurate notation for \(\mathcal{S}_{0}\) of (5.47) is \(\mathcal{S}_{0}^{1}\). In the general case, conforming elements are characterized by \(\mathcal{S}^{l} \subseteq \mathcal{H}^{l}\). In the representation of v in Eq. (5.47) we avoid discussing the technical issue of how to organize different types of boundary conditions.

There are also smooth basis functions φ, for example, cubic Hermite polynomials. For sufficiently smooth solutions, such basis functions produce higher accuracy than hat functions do. For the accuracy of finite-element methods consult, for example, [2, 19, 53, 79, 162, 351].

On Other Methods

Finite-element methods are frequently used for approximating exotic options, in particular in multidimensional situations. For different types of options special methods have been developed. For applications, computational results and accuracies see also [2, 361, 362]. Front-fixing has been applied with finite elements in [188]. The accuracy aspect is also treated in [144]. Ritz–Galerkin methods are used with wavelet functions in [185, 263]; the latter paper is specifically devoted to stochastic volatility. A penalty approach with FEM is discussed in [230], where rectangular subdomains are furnished with basis functions as product of one-dimensional hat functions of the type φ(x, y) = φ i (x)φ j ( y).

5.7 Exercises

5.1 (Elliptical Probability Curves).

Suppose the situation of two asset prices S 1(t) and S 2(t) for t > 0 governed by GBM (3.35), with initial price point (S 1(0), S 2(0)). Barriers of a barrier option can be aligned such that the probability of (S 1(t), S 2(t)) reaching the barrier has the same constant value. Define Y 1: = logS 1, Y 2: = logS 2.

  1. (a)

    Show that the curve of constant probability in the ( Y 1, Y 2)-plane has an elliptical shape.

  2. (b)

    Let the covariance matrix be

    $$\displaystyle{\varSigma = \left (\begin{array}{*{10}c} \sigma _{1}^{2} & \rho \sigma _{1}\sigma _{2} \\ \rho \sigma _{1}\sigma _{2} & \sigma _{2}^{2}\\ \end{array} \right )\,.}$$

    Calculate its eigenvalues and eigenvectors.

  3. (c)

    Sketch representative ellipses in a (Y 1, Y 2)-plane. How do they depend on ρ?

5.2 (Cubic B-Spline).

Suppose an equidistant partition of an interval be given with mesh size h = x k+1x k . Cubic B-splines have a support of four subintervals. In each subinterval the spline is a piece of polynomial of degree three. Apart from special boundary splines, the cubic B-splines φ i are determined by the requirements

$$\displaystyle\begin{array}{rcl} \varphi _{i}(x_{i})& =& 1 {}\\ \varphi _{i}(x)& \equiv & 0\quad \text{ for }x <x_{i-2} {}\\ \varphi _{i}(x)& \equiv & 0\quad \text{ for }x> x_{i+2} {}\\ \varphi & \in & \mathcal{C}^{2}(-\infty,\infty )\,. {}\\ \end{array}$$

To construct these φ i proceed as follows:

  1. (a)

    Construct a spline S(x) that satisfies the above requirements for the special nodes

    $$\displaystyle{\tilde{x}_{k}:= -2 + k\quad \mbox{ for }k = 0,1,\ldots,4\,.}$$
  2. (b)

    Find a transformation T i (x), such that φ i = S(T i (x)) satisfies the requirements for the original nodes.

  3. (c)

    For which i, j does φ i φ j = 0 hold?

5.3 (Finite-Element Matrices).

For the hat functions φ from Sect. 5.2 calculate for arbitrary subinterval \(\mathcal{D}_{k}\) all nonzero integrals of the form

$$\displaystyle{\int \varphi _{i}\varphi _{j}\,\mathrm{d}x,\quad \int \varphi _{i}^{{\prime}}\varphi _{ j}\,\mathrm{d}x,\quad \int \varphi _{i}^{{\prime}}\varphi _{ j}^{{\prime}}\,\mathrm{d}x}$$

and represent them as local 2 × 2 matrices.

5.4 (Calculating Options with Finite Elements).

Design an algorithm for the pricing of standard options by means of finite elements. To this end proceed as outlined in Sect. 5.3. Start with a simple version using an equidistant discretization step Δx. If this is working properly change the algorithm to a version with nonequidistant x-grid. Distribute the nodes x i closer around x = 0. Always place a node at the strike.

5.5 (Black-Scholes Equation in Divergence-Free Form).

  1. (a)

    Prove the equivalence of (5.31) and (5.32), where D and b are given by (5.33). Specialize this to the one-dimensional case of the Black–Scholes equation.

  2. (b)

    Show

    $$\displaystyle{b^{\mbox{ $tr$}}\nabla u + ru = \nabla \cdot (bu) +\gamma u}$$

    and determine γ for the two-dimensional case, and for the Black–Scholes equation.

  3. (c)

    With the transformation

    $$\displaystyle{x:=\log ({ S_{1} \over K_{1}}),\ y:=\log ({ S_{2} \over K_{2}})}$$

    and writing u(x, y, t) for V leads to the PDE

    $$\displaystyle\begin{array}{rcl} u_{t}& +&{ 1 \over 2}\sigma _{1}^{2}u_{ xx} + (r -\delta _{1} -{ 1 \over 2}\sigma _{1}^{2})u_{ x} - ru {}\\ & +&{ 1 \over 2}\sigma _{2}^{2}u_{ yy} + (r -\delta _{2} -{ 1 \over 2}\sigma _{2}^{2})u_{ y} +\rho \sigma _{1}\sigma _{2}u_{xy} = 0\,.\ {}\\ \end{array}$$

    What are the matrix D and the vector b such that we arrive at (5.32)?

5.6 (Outward Normals).

The boundary \(\partial \mathcal{D}\) of the trapezoidal domain \(\mathcal{D}\) in Fig. 5.5 consists of four straight lines. What are the four unit outward vectors n orthogonal to \(\partial \mathcal{D}\)? Give a parameter representation of the boundary.

5.7 (Gradient on a Triangle).

Consider hat functions φ on a triangular element \(\mathcal{D}_{k}\) with vertex nodes numbers \(\mathcal{I}_{k} =\{ i,j,l\}\), and the local plane on \(\mathcal{D}_{k}\) represented by

$$\displaystyle{w(x,y) = w_{i}\varphi _{i}(x,y) + w_{j}\varphi _{j}(x,y) + w_{l}\varphi _{l}(x,y)\,.}$$
  1. (a)

    In the three-dimensional (x, y, w)-space let the plane w(x, y) = c 1 + c 2x + c 3y interpolate the three points (x i , y i , w i ), i = 1, 2, 3 (local node numbering). That is,

    $$\displaystyle{\left (\begin{array}{*{10}c} 1&x_{1} & y_{1} \\ 1&x_{2} & y_{2} \\ 1&x_{3} & y_{3}\\ \end{array} \right )\left (\begin{array}{*{10}c} c_{1} \\ c_{2} \\ c_{3}\end{array} \right ) = \left (\begin{array}{*{10}c} w_{1} \\ w_{2} \\ w_{3}\end{array} \right ),}$$

    shortly Ac = w. Establish a formula for the gradient ∇w = (c 2, c 3)tr, showing that there is a (2 × 3)-matrix G k such that

    $$\displaystyle{\nabla w = G_{k}w\,.}$$

    Hint: Use Cramer’s rule; | F k | is the area of the triangle, where

    $$\displaystyle{F_{k}:={ 1 \over 2}\det (A)\,.}$$
  2. (b)

    Show

    $$\displaystyle{(\nabla \varphi _{i}\,\vert \,\nabla \varphi _{j}\,\vert \,\nabla \varphi _{l}) = G_{k}\,.}$$
  3. (c)

    Show

    $$\displaystyle{\int _{\mathcal{D}_{k}}\nabla \varphi _{i}^{\mbox{ $tr$}}\nabla \varphi _{ j}\,\mathrm{d}x\,\mathrm{d}y = \nabla \varphi _{i}^{\mbox{ $tr$}}\nabla \varphi _{ j}\,\vert F_{k}\vert \,,}$$

    and all nine integrals of the element stiffness matrix are obtained by

    $$\displaystyle{\vert F_{k}\vert G_{k}^{\mbox{ $tr$}}G_{ k}\,.}$$

5.8 (Assembling).

Consider the domain \(\mathcal{D}:= \{(x,y)\,\vert \;x \geq 0,\,y \geq 0,\;1 \leq x + y \leq 2\}\) tiled by 12 triangles \(\mathcal{D}_{k}\), where triangles and vertices are numbered as in Fig. 5.13.

  1. (a)

    Set up the index set \(\mathcal{I}\) with entries \(\mathcal{I}_{k} = \{i_{k},j_{k},l_{k}\}\), which assigns node numbers to the kth triangle, for 1 ≤ k ≤ 12.

    Fig. 5.13
    figure 13

    Specific triangulation and numbering, see Exercise 5.8

  2. (b)

    Formulate the assembling algorithm that builds up the global stiffness matrix out of the element stiffness matrices

    $$\displaystyle{\left (\begin{array}{*{10}c} s_{11}^{(k)} & s_{12}^{(k)} & s_{13}^{(k)} \\ s_{21}^{(k)} & s_{22}^{(k)} & s_{23}^{(k)} \\ s_{31}^{(k)} & s_{32}^{(k)} & s_{33}^{(k)}\\ \end{array} \right )}$$

    for a general index set \(\mathcal{I}\) and 1 ≤ km.

  3. (c)

    The example of Fig. 5.13 leads to a banded stiffness matrix. What is the bandwidth?

5.9 (Variable Volatility (Project)).

For variable volatility σ(S, t) and constant K, T, r, δ , PDEs of the type

$$\displaystyle{{\partial y \over \partial \tau } -{ 1 \over 2}\hat{\sigma }^{2}(x,\tau )\left ({\partial ^{2}y \over \partial x^{2}} -{ 1 \over 4}y\right ) = 0}$$

are to be solved, with τ = Tt and transformations Sx, Vy from the Black–Scholes model given by (A.25), (A.26); consult Appendix A.6.

  1. (a)

    For an American put, apply these transformations to derive from V (S, t) ≥ (KS)+ an inequality y(x, τ) ≥ g(x, τ).

  2. (b)

    Carry out the finite-element formulation for the linear complementarity problem analogously as in Sect. 5.3.4.

  3. (c)

    Integrals will include local integrals

    $$\displaystyle{\int \sigma ^{2}(x,\tau )\varphi _{ i}\varphi _{j}\,\mathrm{d}x\,,\quad \int \sigma ^{2}(x,\tau )\varphi _{ i}^{{\prime}}\varphi _{ j}\,\mathrm{d}x\,.}$$

    Apply Simpson’s quadrature rule

    $$\displaystyle{\int _{a}^{b}f(x)dx \approx { b - a \over 6} \left [f(a) + 4f\left ({a + b \over 2} \right ) + f(b)\right ]}$$

    to approximate the above local integrals.

  4. (d)

    Set up a finite-element code, and test it with the artificial function [128]

    $$\displaystyle{\sigma (S):= 0.3 -{ 0.2 \over \log (S/K)^{2} + 1}\,.}$$

5.10.

Assume a function v(ζ) with αζβ and v(α) = 0.

  1. (a)

    Show

    $$\displaystyle{(v(\zeta ))^{2} \leq (\zeta -\alpha )\int _{\alpha }^{\zeta }(v^{{\prime}}(x))^{2}\,\mathrm{d}x\,.}$$

    Hint: Recall v(ζ) = α ζ v (x) dx, and apply the Schwarzian inequality  (C.16).

  2. (b)

    Use (a) to show

    $$\displaystyle{\int _{\alpha }^{\beta }(v(\zeta ))^{2}\,\mathrm{d}\zeta \leq { 1 \over 2}(\beta -\alpha )^{2}\int _{ \alpha }^{\beta }(v^{{\prime}}(x))^{2}\,\mathrm{d}x\,.}$$

5.11.

Prove Lemma 5.13, and for \(u \in \mathcal{C}^{2}\) the assertion ∥uw h 1 = O(h).