1 Introduction

Monte Carlo type methods are widely used in a range of scientific applications. The dimension independent convergence rate of the sampling error makes these methods attractive for high-dimensional problems, which often can not be approximated well by other types of methods. However, even though dimension independent, the convergence rate of conventional Monte Carlo methods is very slow, and achieving high accuracies is often not computationally feasible.

To improve on the convergence of conventional Monte Carlo methods, one can make use of a variety of variance reduction techniques, such as control variates and antithetic sampling. A variance reduction technique that has received a lot of attention recently, is the multilevel Monte Carlo (MLMC) method. It was first introduced by Heinrich [26] to compute high-dimensional, parameter-dependent integrals, and then extended by Giles [16] to infinite-dimensional integration related to stochastic differential equations (SDEs). Since then, it has been applied in many areas of mathematics related to differential equations, in particular stochastic differential equations [10, 15, 27, 29] and several types of partial differential equations (PDEs) with random forcing [17, 22] or random coefficients [2, 6, 8, 19].

In this paper, we are concerned with the application of multilevel Monte Carlo methods to elliptic PDEs with random coefficients. In particular, we will focus on rough coefficients, which cannot be uniformly bounded in the random parameter and only have Hölder continuous trajectories. This type of problem arises, for example, in stochastic groundwater flow modelling, where log-normal random coefficients are frequently used. A fundamental analysis of the multilevel Monte Carlo algorithm applied to this type of model problem was recently done in [6], and also [8] demonstrates numerically the effectiveness of multilevel Monte Carlo methods applied to elliptic PDEs with log-normal coefficients. The purpose of this paper is to extend the analysis to cover more situations of practical interest, and to expand on some of the issues raised in [6, 8].

The analysis in [6] addresses the convergence of the multilevel Monte Carlo method for simple output functionals, such as the \(L^2\) or the \(H^1\) norm of the solution. In practical situations, however, one is often interested in more complicated functionals, such as the outflow through parts of the boundary or local pressure values. Here, we therefore extend the analysis to cover bounded linear and continuously Fréchet differentiable nonlinear functionals of the solution.

Another issue, raised both in [6, 8], is the influence of the rough nature of our model problem on the performance of the multilevel Monte Carlo estimator. The oscillatory nature and the short characteristic length scale of the random coefficients puts a bound on how coarse the coarsest level in the multilevel estimator can be. Asymptotically, as the required accuracy goes to 0, this does not have any effect on the cost of the MLMC estimator. For a fixed tolerance, however, it restricts the gain that we can expect compared to a standard Monte Carlo estimator. In this paper, we propose a solution to this problem by using smoother approximations of the random coefficient on the coarse levels of the estimator. This allows us to choose the coarsest level independent of the length scale on which the random coefficient varies. See also [19] for a similar strategy in the context of the related Brinkman problem. In [19] the decay rate for the finite element error with respect to the number of KL-modes \(K\) was assumed. Here we make no such assumption and instead use the decay rates established in [5, 6] for certain log-normal fields and covariance functions.

Finally, we extend the theoretical aspects of [6] to more challenging model problems. This includes problems posed on polygonal domains, which are frequently used in connection with finite element methods, as well as problems where the random coefficient has a jump discontinuity. It is well known that these types of model problems do not always exhibit full global regularity, which directly influences the convergence rates of the finite element error.

The outline of the rest of the paper is as follows. In §2, we present the model problem, together with the main results on its regularity and finite element error estimates. This follows closely the work in [6]. The proof of the new regularity result for polygonal/polyhedral domains is postponed to §5. For the reader’s convenience, we also briefly recall the multilevel Monte Carlo algorithm and the abstract convergence theorem. In §3, we prove convergence of the MLMC algorithm for a wide class of (linear and nonlinear) output functionals, including boundary fluxes and local averages of the pressure. In §4, we improve on the performance of the MLMC estimator by using smoother approximations of the random coefficient on coarser levels. The gains possible with this approach are established theoretically and verified numerically. Finally, in §5, we give a detailed proof of the regularity result stated in §2, and extend the results further to certain classes of discontinuous coefficients.

A key task in this paper is to keep track of how the constants in the bounds and estimates depend on the coefficient \(a(\omega ,x)\) and on the mesh size \(h\). Hence, we will almost always be stating constants explicitly. Constants that do not depend on \(a(\omega ,x)\) or \(h\) will not be explicitly stated. Instead, we will write \(b \lesssim c\) for two positive quantities \(b\) and \(c\), if \(b/c\) is uniformly bounded by a constant independent of \(a(\omega ,x)\) and of \(h\).

2 Background

2.1 Problem setting and basic finite element error analysis

Given a probability space \(\left( \varOmega , \mathcal{A }, \mathbb{P }\right) \) and \(\omega \in \varOmega \), we consider the following linear elliptic partial differential equation (PDE) with random coefficients, posed on a bounded, Lipschitz polygonal/polyhedral domain \(D \subset \mathbb{R }^d, d=2,3\), and subject to Dirichlet boundary conditions: Find \(u:\varOmega \times D \rightarrow \mathbb{R }\) such that

$$\begin{aligned} -\nabla \cdot \left( a(\omega , x) \nabla u(\omega , x)\right)&= f(\omega ,x), \,\,\quad \text{ for } \, x \in D, \nonumber \\ u(\omega ,x)&= \phi _j(\omega ,x), \quad \text{ for }\, x \in \varGamma _j\,. \end{aligned}$$
(2.1)

The differential operators \(\nabla \cdot \) and \(\nabla \) are with respect to \(x \in D\), and \(\varGamma := \cup _{j=1}^m \overline{\varGamma }_j\) denotes the boundary of \(D\), partitioned into straight line segments in 2D and into planar polygonal panels in 3D. We assume that the boundary conditions are compatible, i.e. \(\phi _{j}(x) = \phi _{k}(x)\), if \( x \in \overline{\varGamma }_{j} \cap \overline{\varGamma }_{k}\). We also let \(\phi \in H^1(D)\) be an extension of the boundary data \(\{\phi _j\}_{j=1}^m\) to the interior of \(D\) whose trace coincides with \(\phi _j\) on \(\varGamma _j\).

Let us formally define, for all \(\omega \in \varOmega \),

$$\begin{aligned} a_\mathrm{min}(\omega ):= \min _{x \in \overline{D}} a(\omega , x) \qquad \text{ and } \qquad a_\mathrm{max}(\omega ):= \max _{x \in \overline{D}} a(\omega , x). \end{aligned}$$
(2.2)

We make the following assumptions on the input data:

  • A1. \(a_\mathrm{min} \ge 0\) almost surely and \(1/a_\mathrm{min} \in L^{p}(\varOmega )\), for all \(p \in (0,\infty )\).

  • A2. \(a \in L^{p}(\varOmega , C\,^{t}(\overline{D}))\), for some \(0 < t \le 1\) and for all \(p \in (0,\infty )\).

  • A3. \(f \in L^{p_*}(\varOmega ,H^{t-1}(D))\) and \(\phi _j \in L^{p_*}(\varOmega ,H^{t+1/2}(\varGamma _j)),j=1,\dots ,m\), for some \(p_*\in (0,\infty ]\).

Here, the space \( C\,^{t}(\overline{D})\) is the space of Hölder-continuous functions with exponent \(t,H^s(D)\) is the usual fractional order Sobolev space, and \(L^q(\varOmega , \mathcal{B })\) denotes the space of \(\mathcal{B }\)-valued random fields, for which the \(q^\mathrm{th}\) moment (with respect to the measure \({\mathbb{P }}\)) of the \(\mathcal{B }\)-norm is finite, see e.g [6]. A space which will appear frequently in the error analysis is the space \(L^q(\varOmega , H^1_0(D))\), which denotes the space of \(H_0^1(D)\)-valued random fields with the norm on \(H_0^1(D)\) being the usual \(H^1(D)\)-seminorm \(|\cdot |_{H^1(D)}\). We will weaken Assumption A2 in §5.2, and assume only piecewise continuity of \(a(\omega , \cdot )\), but chose not to do this here for ease of presentation. For the same reason, we do not choose to weaken Assumptions A1 and A2 to \(1/a_\mathrm{min}\) and \(\Vert a\Vert _{ C\,^{t}(\overline{D})}\) having finite moments of order \(p_a\), for some \(p_a \in (0,\infty )\), although this is possible. Finally, note that since \(a_\mathrm{max}(\omega ) = \Vert a\Vert _{ C\,^{0}(\overline{D})}\), Assumption A2 implies that \(a_\mathrm{max} \in L^p(\varOmega )\), for any \(p \in (0,\infty )\).

To simplify the notation in the following, let \(0 \!<\! C_{a,f,\phi _j} \!<\! \infty \) denote a generic constant which depends algebraically on \(L^q(\varOmega )\)-norms of \(a_\mathrm{max}, \, 1/a_\mathrm{min}, \Vert a\Vert _{C\,^{t}(\overline{D})}, \, \Vert f\Vert _{H^{t-1}(D)}\) and \(\Vert \phi _j\Vert _{H^{t+1/2}(\varGamma _j)}\), with \(q < p_*\) in the case of \(\Vert f\Vert _{H^{t-1}(D)}\) and \(\Vert \phi _j\Vert _{H^{t+1/2}(\varGamma _j)}\). Two additional random variables related to output functionals will be added to this notation later.

An example of a random field \(a(\omega ,x)\) that satisfies Assumptions A1 and A2, for all \(p \in (0,\infty )\), is a log-normal random field \(a = \exp (g)\), where the underlying Gaussian field \(g\) has a Hölder-continuous mean and a Lipschitz continuous covariance function. For example, \(g\) could have constant mean and an exponential covariance function, given by

$$\begin{aligned} \mathbb{E }\Big [(g(\omega ,x)-\mathbb{E }[g(\omega ,x)])(g(\omega ,y)-\mathbb{E }[g(\omega ,y)]) \Big ]= \sigma ^2 \exp (-\Vert x-y\Vert / \lambda ),\nonumber \\ \end{aligned}$$
(2.3)

where \(\sigma ^2\) and \(\lambda \) are real parameters known as the variance and correlation length, and \(\Vert \cdot \Vert \) denotes a norm on \(\mathbb{R }^d\). If \(\Vert \cdot \Vert = \Vert \cdot \Vert _p\), we will call it a \(p\)-norm exponential covariance.

If we denote by \(H^1_\phi (D) := \{v \in H^1(D) : v - \phi = 0\,\, \text{ on }\,\,\varGamma \}\), then the variational formulation of (2.1), parametrised by \(\omega \in \varOmega \), is to find \(u \in H^1_\phi (D)\) such that

$$\begin{aligned} b_{\omega }\big (u(\omega ,\cdot ),v\big ) = L_{\omega }(v) \,, \quad \text{ for } \text{ all }\, \quad v \in H^1_0(D). \end{aligned}$$
(2.4)

The bilinear form \(b_{\omega }\) and the linear functional \(L_{\omega }\) (both parametrised by \(\omega \)) are defined as usual, for all \(u,v\in H^1(D)\), by

$$\begin{aligned} b_{\omega }(u,v) \!:=\! \int _D a(\omega ,x)\nabla u(x) \cdot \nabla v(x) \, \mathrm{d} x\,\, \text{ and } \,\, L_{\omega }(v) \!:=\! \langle f(\omega ,\cdot ), v\rangle _{H^{t-1}(D),H^{1-t}_0(D)},\nonumber \\ \end{aligned}$$
(2.5)

where \(H^{1-t}_0(D)\) is the closure of \(C_0^\infty (D)\) in the \(H^{1-t}(D)\)-norm. We say that for any \(\omega \in \varOmega ,u(\omega ,\cdot )\) is a weak solution of (2.1) iff \(u(\omega ,\cdot ) \in H^1_\phi (D)\) and \(u(\omega ,\cdot )\) satisfies (2.4). An application of the Lax–Milgram Theorem ensures existence and uniqueness of \(u(\omega ,\cdot ) \in H^1_\phi (D)\), for almost all \(\omega \), and in combination with Assumptions A1– A3, this gives the existence of a unique solution \(u \in L^p(\varOmega , H^1(D))\), for any \(p <p_*\).

In [6], a regularity analysis of the above model problem was performed under the assumptions that the spatial domain \(D\) is \(C^2\). Here, the analysis is extended to polygonal domains, or more generally, to piecewise \(C^2\) domains that are rectilinear near the corners. This is very important since in standard finite element methods one naturally works with polygonal/polyhedral domains.

Definition 2.1

Let \(0 < \lambda _{\varDelta }(D) \le 1\) be such that for any \(0 < s \le \lambda _{\varDelta }(D), s \ne \frac{1}{2}\), the Laplace operator \(\varDelta \) is surjective as an operator from \(H^{1+s}(D) \cap H^1_0(D)\) to \(H^{s-1}(D)\). In other words, let \(\lambda _{\varDelta }(D)\) be no larger than the order of the strongest singularity of the Laplace operator with homogeneous Dirichlet boundary conditions on \(D\).

The number \(\lambda _{\varDelta }(D)\) exists for any Lipschitz polygonal/polyhedral domain, see e.g. [24, Remarks 2.4.6 and 2.6.7]. We will come back to specific values of \(\lambda _{\varDelta }(D)\) in §5.

Theorem 2.1

Let Assumptions A1-A3 hold for some \(0 < t \le 1\). Then,

$$\begin{aligned}&\Vert u(\omega , \cdot )\Vert _{H^{1+s}(D)} \lesssim \frac{a_\mathrm{max}(\omega )\Vert a(\omega , \cdot )\Vert ^2_{ C\,^{t}(\overline{D})}}{a_\mathrm{min}(\omega )^4}\nonumber \\&\qquad \left[ \Vert f(\omega , \cdot )\Vert _{H^{t-1}(D)} + \Vert a(\omega , \cdot )\Vert _{ C\,^{t}(\overline{D})}\sum _{j=1}^m\Vert \phi _j(\omega , \cdot )\Vert _{H^{t+1/2}(\varGamma _j)} \right] , \end{aligned}$$
(2.6)

for almost all \(\omega \in \varOmega \) and for all \(0 < s < t\) such that \(s \le \lambda _{\varDelta }(D)\). Moreover, \(u \in L^p(\varOmega ,H^{1+s}(D))\), for all   \(p < p_*\). If \(t=\lambda _{\varDelta }(D)=1\), then \(u \in L^p(\varOmega ,H^{2}(D))\) and the above bound holds with \(s=1\).

Proof

The proof for individual samples, for almost all \(\omega \in \varOmega \), is a classical result and follows Grisvard [23, Section 5.2]. A detailed proof making precise the dependence of the bound on the coefficients is given in Sect. 5.1. The remainder of the theorem follows by Hölder’s inequality from Assumptions A1–A3.

Theorem 2.1 can now be used to prove convergence of finite element (FE) approximations of \(u\) in the standard way. We will only consider lowest order elements in detail. Introduce a triangulation \(\mathcal{T }_h\) of \(D\), and let \(V_h\) be the space of continuous, piecewise linear functions on \(D\) that satisfy the boundary conditions in (2.1), i.e.

$$\begin{aligned} V_{h,\phi }&:= \Big \{{v_h} \in {C}(\overline{D})\, : \, {v_h}|_T \, \text{ linear, } \text{ for } \text{ all } T \in \mathcal{T }_h, \quad \text{ and } \quad {v_h}|_{\varGamma _{j}} = \phi _j, \\&\quad \text{ for } \text{ all } j=1,\dots ,m \Big \}. \end{aligned}$$

For simplicity we assume that the functions \(\{\phi _{j}\}_{j=1}^{m}\), are piecewise linear with respect to the triangulation \(\mathcal{T }_h\) restricted to \(\varGamma _{j}\). To deal with more general boundary conditions is a standard exercise in finite element analysis (see e.g. [4, §10.2]).

The finite element approximation of \(u\) in \(V_{h,\phi }\), denoted by \(u_h\), is now found by solving

$$\begin{aligned} b_{\omega }\big (u_h(\omega ,\cdot ),v\big ) = L_{\omega }(v) \,, \quad \text{ for } \text{ all }\, \quad v \in V_{h,0}. \end{aligned}$$

Using Cea’s lemma and standard interpolation results on \(V_{h,\phi }\), we then have (as in [6]) the following.

Theorem 2.2

Let Assumptions A1–A3 hold for some \(0 < t \le 1\). Then,

$$\begin{aligned} |(u-u_h)(\omega ,\cdot )|_{H^1(D)} \, \lesssim \, \left( \frac{a_\mathrm{max}(\omega )}{a_\mathrm{min}(\omega )}\right) ^{1/2} \Vert u(\omega , \cdot )\Vert _{H^{1+s}(D)} \, h^s, \end{aligned}$$

for almost all \(\omega \in \varOmega \) and for all \(0 < s < t\) such that \(s \le \lambda _{\varDelta }(D)\). Hence,

$$\begin{aligned} \Vert u-u_h\Vert _{L^p(\varOmega , H^1_0(D))} \, \le \, C_{a,f,\phi _j} \, h^s, \qquad \text{ for } \text{ all }\, \ p < p_* \,, \end{aligned}$$

with \(C_{a,f,\phi _j} < \infty \) a constant that depends on the input data, but is independent of \(h\). If A1–A3 hold with \(t=\lambda _{\varDelta }(D)=1\), then \(\Vert u-u_h\Vert _{L^p(\varOmega , H^1_0(D))} \le C_{a,f,\phi _j} \, h\).

The key novel result here (extending the results in [6]) is that the rate of convergence of the finite element error on polygonal/polyhedral domains \(D\) is the same as on \( C^2\) domains provided the order of the strongest singularity for the Laplacian on \(D\) is no stronger than \(t\) in A1–A3. No additional or stronger singularities are triggered by the random coefficient provided \(a\) satisfies A2. A sufficient (but not necessary) condition for \(t = 1\) is that \(D\) is convex. For \(t < 1\), even certain concave domains are allowed.

Remark 2.1

The results can easily be extended also to Neumann and mixed Dirichlet/Neumann boundary conditions. We will comment on this in Sect. 5.1 and confirm it numerically in Sect. 3.5. There is also no fundamental difficulty in extending the analysis to higher order finite elements. For an extension to mixed finite elements see [21].

2.2 Multilevel Monte Carlo algorithm

Before we go on to the main part of this paper, we will briefly recall the multilevel Monte Carlo algorithm. We also give a review of the main results on its convergence when applied to elliptic PDEs of the form described in the previous section.

Suppose we are interested in finding the expected value of some functional \(Q=M(u)\) of the solution \(u\) to our model problem (2.1). Since \(u\) is not easily accessible, \(Q\) is often approximated by the quantity \(Q_h:=M(u_h)\), where \(u_h\) is a finite dimensional approximation to \(u\), such as the finite element solution on a sufficiently fine spatial grid \(\mathcal{T }_h\) defined above. However, \(u_h\) may also include further approximations such as an inexact bilinear form \(b^h_\omega (\cdot ,\cdot ) \approx b_{\omega }(\cdot ,\cdot )\), e.g. due to quadrature or approximation of the input random field \(a\). We will return to this issue in Sect. 4.

To estimate \(\mathbb{E }\left[ Q\right] \), we then compute approximations (or estimators) \(\widehat{Q}_h\) to \(\mathbb{E }\left[ Q_h\right] \), and quantify the accuracy of our approximations via the root mean square error (RMSE)

$$\begin{aligned} e(\widehat{Q}_h) := \left( \mathbb{E }\big [(\widehat{Q}_h - \mathbb{E }(Q))^2\big ]\right) ^{1/2}. \end{aligned}$$

The computational cost \(\mathcal{C }_\varepsilon (\widehat{Q}_h)\) of our estimator is then quantified by the number of floating point operations that are needed to achieve a RMSE of \(e(\widehat{Q}_h) \le \varepsilon \). This will be referred to as the \(\varepsilon \)-cost.

The classical Monte Carlo (MC) estimator for \(\mathbb{E }\left[ Q_h\right] \) is

$$\begin{aligned} \widehat{Q}^\mathrm{MC}_{h,N} := \frac{1}{N} \sum _{i=1}^N Q_h(\omega ^{(i)}), \end{aligned}$$
(2.7)

where \(Q_h(\omega ^{(i)})\) is the \(i\)th sample of \(Q_h\) and \(N\) independent samples are computed in total.

There are two sources of error in the estimator (2.7), the approximation of \(Q\) by \(Q_h\), which is related to the spatial discretisation, and the sampling error due to replacing the expected value by a finite sample average. This becomes clear when expanding the mean square error (MSE) and using the fact that for Monte Carlo \(\mathbb{E }[\widehat{Q}^\mathrm{MC}_{h,N}] = \mathbb{E }[Q_h]\) and \(\mathbb{V }[\widehat{Q}^\mathrm{MC}_{h,N}] = N^{-1} \, \mathbb{V }[Q_h],\) where \(\mathbb{V }[X] := \mathbb{E }[(X-\mathbb{E }[X])^2]\) denotes the variance of the random variable \(X:\varOmega \rightarrow \mathbb{R }\). We get

$$\begin{aligned} e(\widehat{Q}^\mathrm{MC}_{h,N})^2 \ = \ N^{-1} \mathbb{V }[Q_h] + \big (\mathbb{E }[Q_h - Q]\big )^2. \end{aligned}$$
(2.8)

A sufficient condition to achieve a RMSE of \(\varepsilon \) with this estimator is that both of these terms are less than \(\varepsilon ^2/2\). For the first term, this is achieved by choosing a large enough number of samples, \(N=\mathcal{O }(\varepsilon ^{-2})\). For the second term, we need to choose a fine enough finite element mesh \(\mathcal{T }_h\), such that \(\mathbb{E }[Q_h - Q] = \mathcal{O }(\varepsilon )\).

The main idea of the MLMC estimator is very simple. We sample not just from one approximation \(Q_h\) of \(Q\), but from several. Linearity of the expectation operator implies that

$$\begin{aligned} \mathbb{E }[Q_h] = \mathbb{E }[Q_{h_0}] + \sum _{\ell =1}^L \mathbb{E }[Q_{h_\ell } - Q_{h_{\ell -1}}], \end{aligned}$$
(2.9)

where \(\{h_\ell \}_{\ell = 0, \dots , L}\) are the mesh widths of a sequence of increasingly fine triangulations \(\mathcal{T }_{h_\ell }\) with \(h := h_L\), the finest mesh width, and \(k_1 \le h_{\ell -1}/h_\ell \le k_2\), for all \(\ell = 1, \dots , L\) and some \(1 < k_1 \le k_2 <\infty \). Hence, the expectation on the finest mesh is equal to the expectation on the coarsest mesh, plus a sum of corrections adding the difference in expectation between simulations on consecutive meshes. The multilevel idea is now to independently estimate each of these terms such that the overall variance is minimised for a fixed computational cost.

Setting for convenience \(Y_0 := Q_{h_0}\) and \(Y_\ell := Q_{h_\ell } - Q_{h_{\ell -1}}\), for \(1 \le \ell \le L\), we define the MLMC estimator simply as

$$\begin{aligned} \widehat{Q}^\mathrm{ML}_{h,\{N_\ell \}} \ := \ \sum _{\ell =0}^L \widehat{Y}^\mathrm{MC}_{\ell , N_\ell } \ = \ \sum _{\ell =0}^L \frac{1}{N_{\ell }} \sum _{i=1}^{N_{\ell }} Y_\ell (\omega ^{(\ell ,i)}), \end{aligned}$$
(2.10)

where importantly \(Y_\ell (\omega ^{(i)}) = Q_{h_\ell }(\omega ^{(i)}) - Q_{h_{\ell -1}}(\omega ^{(i)})\), i.e. using the same sample on both meshes.

Since all the expectations \(\mathbb{E }[Y_{\ell }]\) are estimated independently in (2.9), the variance of the MLMC estimator is \(\sum _{\ell =0}^L N_{\ell }^{-1}\,\mathbb{V }[Y_\ell ]\) and expanding as in (2.8) leads again to

$$\begin{aligned} e(\widehat{Q}^\mathrm{ML}_{h,\{N_\ell \}})^2 \, := \, \mathbb E \Big [\big (\widehat{Q}^\mathrm{ML}_{h,\{N_\ell \}}\!-\! \mathbb E [Q]\big )^2\Big ] \! = \! \sum _{\ell =0}^L N_{\ell }^{-1}\,\mathbb{V }[Y_{\ell }] \!+ \! \big (\mathbb E [Q_{h}\!-\!Q]\big )^2.\qquad \end{aligned}$$
(2.11)

As in the classical MC case, we see that the MSE consists of two terms, the variance of the estimator and the error in mean between \(Q\) and \(Q_h\). Note that the second term is identical to the second term for classical MC in (2.8).

Let now \(\mathcal C _\ell \) denote the cost to obtain one sample of \(Q_{h_\ell }\). Then we have the following results on the \(\varepsilon \)-cost of the MLMC estimator (cf. [8, 16]).

Theorem 2.3

Suppose that there are positive constants \(\alpha , \beta , \gamma , c_{\scriptscriptstyle \mathrm{M1}}, c_{\scriptscriptstyle \mathrm{M2}}, c_{\scriptscriptstyle \mathrm{M3}} > 0\) such that \(\alpha \!\ge \!\frac{1}{2}\,\min (\beta ,\gamma )\) and

  • M1. \(\displaystyle \left| \mathbb{E }[Q_{h_\ell } - Q] \right| \ \le c_{\scriptscriptstyle \mathrm{M1}} \ h_\ell ^{\alpha }, \)

  • M2. \(\displaystyle \mathbb{V }[Q_{h_\ell }-Q_{h_{\ell -1}}] \ \le c_{\scriptscriptstyle \mathrm{M2}} \ h_\ell ^{\beta }, \)

  • M3. \(\displaystyle \mathcal C _{\ell } \ \le c_{\scriptscriptstyle \mathrm{M3}} \ h_{\ell }^{-\gamma }. \)

Then, for any \(0 < \,\varepsilon < \exp (-1)\), there exist an \(L\) and a sequence \(\{N_{\ell }\}_{\ell =0}^L\), such that \(e(\widehat{Q}^\mathrm{ML}_{h, \{N_\ell \}}) < \varepsilon \) and

$$\begin{aligned} \mathcal{C }_\varepsilon (\widehat{Q}^\mathrm{ML}_{h, \{N_\ell \}}) \lesssim \left\{ \begin{array}{l@{\quad }l} \ \varepsilon ^{-2} , &{} \text{ if } \ \beta >\gamma , \\ \ \varepsilon ^{-2} (\log \varepsilon )^2, &{} \text{ if } \ \beta =\gamma , \\ \ \varepsilon ^{-2-(\gamma - \beta )/\alpha }, &{} \text{ if } \ \beta <\gamma , \end{array}\right. \end{aligned}$$

where the hidden constant depends on \(c_{\scriptscriptstyle \mathrm{M1}}, c_{\scriptscriptstyle \mathrm{M2}}\) and \(c_{\scriptscriptstyle \mathrm{M3}}\).

In [6], it was shown that for our model problem and for the functional \(Q :=|u|^q_{H^1(D)}\) it follows immediately from the finite element error result in Theorem 2.2 that Assumptions M1–M2 in Theorem 2.3 hold with \(\alpha < t\) and \(\beta < 2t\) and for \(1 \le q < p_*/2\), provided Assumptions A1–A3 hold with \(0 < t < 1\) and \(p_* \in (0,\infty )\). For \(t=1\), we can even choose \(\alpha = 1\) and \(\beta = 2\). Using a duality argument we also showed that under the same hypotheses we could expect twice these rates for the functional \(Q:=\Vert u\Vert ^q_{L^2(D)}\). The aim is now to extend this theory to cover also other functionals of the solution \(u\) (see §3) as well as level-dependent estimators (see §4).

3 Output functionals

In practical applications, one is often interested in the expected value of functionals of the solution. A standard technique to prove convergence for finite element approximations of output functionals is to use a duality argument, similar to the classic Aubin-Nitsche trick used to prove optimal convergence rates for the \(L^2\)-norm.

We denote the functional of interest by \(M_{\omega }(v)\), for \(v \in H^1(D)\). Like the bilinear form \(b_{\omega }(\cdot ,\cdot )\), the functional \(M_{\omega }(\cdot )\) is again parametrised by \(\omega \), and the analysis is done almost surely in \(\omega \). When the functional does not depend on \(\omega \), we will simply write \(M(\cdot )\) instead of \(M_{\omega }(\cdot )\). Our analysis follows mainly [18].

3.1 Duality argument for linear functionals

Since it is simpler, we will first look at linear functionals. Let us assume for the moment that \(M_{\omega }:H^1(D) \rightarrow \mathbb{R }\) is linear and bounded on \(H^1(D)\), i.e. \(M_{\omega }(v) \lesssim \Vert v\Vert _{H^{1}(D)}\), for all \( v \in H^1(D).\) Now, let us associate with our primal problem (2.4) the following dual problem: find \(z(\omega ,\cdot ) \in H^1_0(D)\) such that

$$\begin{aligned} b_{\omega }\big (v, z(\omega ,\cdot )\big ) = M_{\omega }(v)\,, \quad \text{ for } \text{ all }\, \quad v \in H^1_0(D), \end{aligned}$$
(3.1)

and denote by \(z_h(\omega ,\cdot ) \in V_{h,0}\) the finite element approximation to (3.1). We can again apply the Lax-Milgram Theorem to ensure existence and uniqueness of a weak solution \(z(\omega ,\cdot ) \in H^1_0(D)\), for almost all \(\omega \). Moreover, since \(b_{\omega }(\cdot ,\cdot )\) is symmetric, we will also be able to apply Theorems 2.1 and 2.2. However, first we make the following observation.

Lemma 3.1

Let \(M_{\omega }:H^1(D) \rightarrow \mathbb{R }\) be linear and bounded. Then, for almost all \(\omega \in \varOmega \),

$$\begin{aligned}&\left| M_{\omega }\left( u(\omega ,\cdot )\right) - M_{\omega }\left( u_h(\omega ,\cdot )\right) \right| \nonumber \\&\quad \le a_\mathrm{max}(\omega ) \, |u(\omega ,\cdot ) -u_h(\omega ,\cdot )|_{H^1(D)} \, |z(\omega ,\cdot )-z_h(\omega ,\cdot )|_{H^1(D)} \,. \end{aligned}$$
(3.2)

Proof

Dropping for brevity the dependence of the FE functions on \(\omega \) and using the linearity of \(M_{\omega }\), the dual problem (3.1), as well as Galerkin orthogonality for the primal problem, we have

$$\begin{aligned}&\left| M_{\omega }(u) - M_{\omega }(u_h) \right| =\left| b_{\omega }(u - u_h, z)\right| = \left| b_{\omega }(u - u_h, z - z_h)\right| \\&\quad \le a_\mathrm{max}(\omega ) |u-u_h|_{H^1(D)} \, |z-z_h|_{H^1(D)}\,, \end{aligned}$$

where in the last step we have used the definition of \(a_\mathrm{max}(\omega )\) and the Cauchy-Schwarz inequality.

This simple argument will be crucial to obtain optimal convergence rates for functionals. However, the assumption that \(M_{\omega }\) is linear is not necessary, and so we will first generalise the above to nonlinear functionals.

3.2 Duality argument for nonlinear functionals

For nonlinear functionals, following [32], the dual problem is not defined as in (3.1) above. Instead, a different functional is chosen on the right hand side (which reduces to \(M_{\omega }\) in the linear case). It is related to the derivative of the functional of interest and so we need to assume a certain differentiability of \(M_{\omega }\). We will assume here that \(M_{\omega }\) is continuously Fréchet differentiable. In particular, this implies that \(M_{\omega }\) is also Gateaux differentiable, with the two derivatives being the same. We will see in Remark 3.1 below that it is in fact not necessary that \(M_{\omega }\) is continuously Fréchet differentiable everywhere, but it simplifies the presentation greatly.

Let \(v, \tilde{v} \in H^1(D)\). Then the Gateaux derivative of \(M_{\omega }\) at \(\tilde{v}\) and in the direction \(v\) is defined as

$$\begin{aligned} D_v M_{\omega }(\tilde{v}) := \lim _{\varepsilon \rightarrow 0} \frac{M_{\omega }(\tilde{v}+\varepsilon v)-M_{\omega }(\tilde{v})}{\varepsilon }. \end{aligned}$$

We define

$$\begin{aligned} \overline{D_v M_{\omega }}(u, u_h) := \int _0^1 D_v M_{\omega }(u+\theta (u_h-u)) \,\mathrm{d} \theta , \end{aligned}$$

which is in some sense an average derivative of \(M_{\omega }\) on the path from \(u\) to \(u_h\), and define the dual problem now as: find \(z(\omega ,\cdot ) \in H^1_0(D)\) such that

$$\begin{aligned} b_{\omega }\big (v, z(\omega ,\cdot )\big ) = \, \overline{D_v M_{\omega }}(u,u_h), \quad \text{ for } \text{ all }\, \quad v \in H^1_0(D). \end{aligned}$$
(3.3)

For any linear functional \(M_{\omega }\), we have \(\overline{D_v M_{\omega }}(u,u_h) = M_{\omega }(v)\), for all \(v \in H^1(D)\), and so (3.3) is equivalent to (3.1).

For our further analysis, we need to make the following assumption on \(M_{\omega }\).

  • F1. Let \(u\) (resp. \(u_h\)) be the exact (resp. the FE) solution of (2.1). Let \(M_{\omega }\) be continuously Fréchet differentiable, and suppose that there exists \(t_* \in [0,1],q_*\in (0,\infty ]\) and \(C_\mathrm{F} \in L^{q_*}(\varOmega )\), such that

    $$\begin{aligned}&|\overline{D_v M_{\omega }}(u, u_h)| \lesssim C_\mathrm{F}(\omega ) \Vert v\Vert _{H^{1-t_*}(D)}\,, \qquad \text{ for } \text{ all }\, \ \ v \in H^1_0(D) \\&\quad \text{ and } \text{ for } \text{ almost } \text{ all } \ \ \omega \in \varOmega . \end{aligned}$$

To get well-posedness of the dual problem, as well as existence and uniqueness of the dual solution \(z(\omega ,\cdot ) \in H^1_0(D)\), for almost all \(\omega \in \varOmega \), it is sufficient (as in the linear case, courtesy of the Lax-Milgram Theorem) to assume that \(|\overline{D_v M_{\omega }}(u, u_h)|\) is bounded in \(H^1(D)\). However, in order to apply Theorem 2.2 and to prove convergence of the finite element approximation of the dual solution, it is necessary to require stronger spatial regularity for \(z\). This is only possible if we assume boundedness of \(|\overline{D_v M_{\omega }}(u, u_h)|\) in \(H^{1-t_*}(D)\) for some \(t_* > 0\). In particular, if Assumptions A1–A3 and F1 are satisfied with \(t \in (0,1]\) and \(t_* \ge t\), then by Theorem 2.1 we have for almost all \(\omega \in \varOmega \),

$$\begin{aligned} \Vert z(\omega , \cdot )\Vert _{H^{1+s}(D)} \lesssim \frac{a_\mathrm{max}(\omega )\Vert a(\omega , \cdot )\Vert ^2_{ C\,^{t}(\overline{D})}}{a_\mathrm{min}(\omega )^4} \, C_\mathrm{F}(\omega ), \end{aligned}$$

for any \(0 < s < t\) such that \(s \le \lambda _{\varDelta }(D)\) and for almost all \(\omega \in \varOmega \). Hence,

$$\begin{aligned} \Vert z-z_h\Vert _{L^p(\varOmega , H^1_0(D))} \, \le \, C_{a,C_F} \, h^s, \quad \text{ for } \text{ all }\, \ \ p < q_*\,, \end{aligned}$$
(3.4)

for some constant \(C_{a,C_F} < \infty \) depending on \(a\) and the constant \(C_F\) in F1.

Recall that we assumed that the boundary data \(\{\phi _j\}_{j=1}^m\) are piecewise linear with respect to \(\mathcal{T }_h\), and so \(u-u_h \in H^1_0(D)\). From the Fundamental Theorem of Calculus for Fréchet derivatives, it follows that

$$\begin{aligned} M_{\omega }(u) - M_{\omega }(u_h)&= \int _0^1 D_{u-u_h} M_{\omega }(u+\theta (u_h-u)) \,\mathrm{d} \theta = \overline{D_{u-u_h} M_{\omega }}(u, u_h)\nonumber \\&= b_{\omega }(u - u_h, z), \end{aligned}$$
(3.5)

and so we have again the following error bound.

Lemma 3.2

Let Assumption F1 be satisfied. Then

$$\begin{aligned}&\left| M_{\omega }\left( u(\omega ,\cdot )\right) - M_{\omega }\left( u_h(\omega ,\cdot )\right) \right| \nonumber \\&\quad \le a_\mathrm{max}(\omega ) \, |u(\omega ,\cdot )-u_h(\omega ,\cdot )|_{H^1(D)} \, |z(\omega ,\cdot )-z_h(\omega ,\cdot )|_{H^1(D)} \,, \end{aligned}$$
(3.6)

for almost all \(\omega \in \varOmega \).

Remark 3.1

As already mentioned above, continuous Fréchet differentiability is not a necessary condition to obtain the bound in Lemma 3.2. It is possible to weaken Assumption F1 and to assume only slant differentiability of \(M_{\omega }(\cdot )\), which is equivalent to Lipschitz continuity (see e.g.[7]).

3.3 Multilevel Monte Carlo convergence for functionals

We are now ready to prove optimal convergence rates for the MLMC algorithm for Fréchet differentiable (and thus also for linear) functionals as defined above.

Lemma 3.3

Let Assumptions A1–A3 hold for some \(0 < t \le 1\) and \(p_* \in (0,\infty ]\), and let \(M_{\omega }(\cdot )\) satisfy Assumption F1 with \(t_* \ge t\) and \(q_* \in (0,\infty ]\). Then

$$\begin{aligned} \Vert M_{\omega }(u)- M_{\omega }(u_{h})\Vert _{L^p(\varOmega )} \, \lesssim C_{a,f,\phi _j,C_F} h^{2s}, \end{aligned}$$

for any \(p < \left( \frac{1}{p_*} + \frac{1}{q_*} \right) ^{-1}\) and \(\,0<s<t\) such that \( s\le \lambda _{\Delta }(D)\). If A1–A3 hold with \(t=1\), then \(\Vert M_{\omega }(u)- M_{\omega }(u_{h})\Vert _{L^p(\varOmega )} \, \lesssim C_{a,f,\phi _j,C_F} h^{2}\).

Proof

Using Lemma 3.2 and Hölder’s inequality, we have

$$\begin{aligned}&\Vert M_{\omega }(u)\!-\! M_{\omega }(u_{h})\Vert _{L^p(\varOmega )}^p \nonumber \\&\quad \le \Vert a^p_\mathrm{max}\Vert _{L^{q_1}(\varOmega )} \Vert \left( u_{h_\ell }\!-\!u_{h_{\ell -1}}\right) ^p\Vert _{L^{q_2}(\varOmega , H^1_0(D))} \Vert \left( z_{h_\ell } \!-\!z_{h_{\ell -1}}\right) ^p\Vert _{L^{q_3}(\varOmega , H^1_0(D))} \nonumber \\&\quad = \Vert a_\mathrm{max}\Vert ^p_{L^{pq_1}(\varOmega )} \Vert u_{h_\ell }\!-\!u_{h_{\ell -1}}\Vert ^p_{L^{pq_2}(\varOmega , H^1_0(D))} \Vert z_{h_\ell }\!-\!z_{h_{\ell -1}}\Vert ^p_{L^{pq_3}(\varOmega , H^1_0(D))}, \end{aligned}$$
(3.7)

where \(\sum _{i=1}^3 q_i^{-1} =1\). The norms on the right hand side of (3.7) are finite, if we choose \(q_1 < \infty ,q_2<p_*/p\), and \(q_3<q_*/p\). The claim of the proposition now follows from (3.4) and Theorem 2.2.

The following corollary of Lemma 3.3, follows immediately using the triangle inequality.

Corollary 3.1

Let Assumptions A1–A3 hold for some \(0 < t \le 1, \lambda _{\Delta }(D) \ge t\) and \(p_* > 2\), and let \(M_{\omega }(\cdot )\) satisfy Assumption F1 with \(t_* \ge t\) and \(q_* > \frac{2p_*}{p_*-2}\). Then Assumptions M1–M2 in Theorem 2.3 hold for any \(\alpha < 2t\) and \(\beta < 4t\). For \(t=1\), we can choose \(\alpha =2\) and \(\beta =4\).

Remark 3.2

In practice, it is necessary to use quadrature to compute the integrals in the bilinear form \(b_{\omega }(u,v)\). As a consequence we will compute only an approximate finite element solution \(\tilde{u}_h \in V_{h,\phi }\) and Galerkin orthogonality for the primal problem is lost. In general, it is then only possible to prove

$$\begin{aligned} \left| M_{\omega }\left( u(\omega ,\cdot )\right) - M_{\omega }\left( \tilde{u}_h(\omega ,\cdot )\right) \right| \le C_F(\omega ) \, \Vert u(\omega ,\cdot )-\tilde{u}_h(\omega ,\cdot )\Vert _{H^1(D)}, \end{aligned}$$

instead of (3.6), where \(C_F\) is the constant from Assumption F1. It is sufficient that F1 holds with \(t_* = 0\) in this case. The higher rates of convergence from Corollary 3.1 can be recovered, also in the presence of quadrature error, if the coefficient function has additional regularity, i.e. if \(a(\omega , \cdot ) \in \mathcal{C }^{r}(\overline{D})\), with \(r \ge 2t\). For an example of a log-normal random field which has this additional regularity, see §4.1.

One can also generalise the results in this section to the case where the dual solution has less spatial regularity than the primal solution. For example, if F1 holds only for some \(t_* \in [0, t)\), Assumptions M1–M2 in Theorem 2.3 can still be verified, for any \(\alpha < t+t_*\) and \(\beta <2(t+t_*)\).

3.4 Examples of output functionals

Before we go on to show some numerical results, we give examples of output functionals which fit into the framework of §3.13.3. We start with linear functionals.

  1. (a)

    Point evaluations: Since \(a(\omega , \cdot ) \in C\,^t(\overline{D}) \subset C(\overline{D})\), we know that trajectories of the solution \(u\) are in \( C ^{1}(\overline{D})\) (see e.g. [14]), and it is meaningful to consider point values. Consider \(M^{(1)}(u) := u(x^*)\), for some \(x^* \in D\). For \(D \subset \mathbb{R }\), i.e. in one space dimension, we have the compact embedding \(H^{1/2 + \delta }(D) \hookrightarrow C^{\delta }(\overline{D})\), for any \(\delta > 0\), and so

    $$\begin{aligned} M^{(1)}(v) = v(x^*) \, \le \, \Vert v\Vert _{\sup } \, \lesssim \, \Vert v\Vert _{H^{1/2 + \delta }(D)}, \quad \text{ for } \text{ all }\, \ \ v \in H^1(D). \end{aligned}$$

    Hence, Assumption F1 is satisfied for any \(t_* < \min (\frac{1}{2},t)\) with \(C_\mathrm{F}=1\) and \(q_*=\infty \). In space dimensions higher than one, point evaluation of the pressure \(u\) is not a bounded functional on \(H^1_0(D)\). One often regularises this type of functional by approximating the point value by a local average,

    $$\begin{aligned} M^{(2)}(v) := \frac{1}{|D^*|} \int _{D^*} v(\omega , x) \, \mathrm{d} x\quad \Big [\,\approx \, v(\omega ,x^*)\,\Big ], \end{aligned}$$

    where \(D^*\) is a small subdomain of \(D\) that contains \(x^*\) [18]. Here, \(M^{(2)}\) satisfies F1 with \(C_\mathrm{F}=1,t_* = 1\) and \(q_* = \infty \), due to the Cauchy-Schwarz inequality. Similarly, point evaluations of the flux \(-a \nabla u\) can be approximated by a local average. However, in this case F1 only holds for \(t_* = 0\) with \(C_\mathrm{F}=a_{\max }\) and \(q_* = \infty \), and the convergence rate thus is the same as for the \(H^1\)-seminorm.

Next we give some examples of non-linear functionals. The first obvious example is to estimate higher order moments of linear functionals.

  1. (b)

    Second moment of average local pressure: Let \(M_{\omega }\) be an arbitrary linear functional and let \(q > 1\). Then

    $$\begin{aligned} D_v \big (M_{\omega }(\tilde{v})^q\big ) = \lim _{\varepsilon \rightarrow 0} \frac{M_{\omega }(\tilde{v}+\varepsilon v)^q - M_{\omega }(\tilde{v})^q}{\varepsilon } \, = \, q M_{\omega }(\tilde{v})^{q-1} M_{\omega }(v). \end{aligned}$$

    Thus, in case of the second moment of the average local pressure \(M^{(3)}(v) := M^{(2)}(v)^2\), this gives

    $$\begin{aligned} D_v M^{(3)}(\tilde{v}) = \frac{2}{|D^*|^2} \left( \,\,\int _{D^*} v(x) \, \mathrm{d} x\right) \, \left( \,\,\int _{D^*} \tilde{v}(x) \, \mathrm{d} x\right) , \end{aligned}$$

    and so

    $$\begin{aligned} |\overline{D_v M^{(3)}}(u, u_h)|&= \frac{2}{|D^*|^2} \left| \left( \,\,\int _{D^*} v(x) \, \mathrm{d} x\right) \, \left( \int _0^1 \int _{D^*} (u + \theta (u_h-u))(x) \, \mathrm{d} x\mathrm{d} \theta \right) \right| \\&= \frac{1}{|D^*|^2} \left| \left( ~\int _{D^*} v(x) \, \mathrm{d} x\right) \, \left( ~\int _{D^*} (u(\omega ,x)+u_h(\omega ,x)) \, \mathrm{d} x\right) \right| \\&\lesssim \left( \Vert u(\omega , \cdot )\Vert _{L^2(D)} + \Vert u_h(\omega , \cdot )\Vert _{L^2(D)} \right) \Vert v\Vert _{L^2(D)}\,. \end{aligned}$$

    Now, it follows from the Lax-Milgram Theorem that \(C_\mathrm{F}(\omega ) := \Vert u(\omega , \cdot )\Vert _{L^2(D)} + \Vert u_h(\omega , \cdot )\Vert _{L^2(D)} \lesssim \Vert f(\omega ,\cdot )\Vert _{H^{-1}(D)} / a_\mathrm{min}(\omega )\), and so Assumption F1 is satisfied for all \(t_* \le 1\) and \(q_* < p_*\,\).

  2. (c)

    Outflow through boundary: Consider \(M^{(4)}_{\omega }(v) := L_{\omega }(\psi ) - b_{\omega }(\psi ,v)\), for some given function \(\psi \in H^{1}(D)\). Note that for the solution \(u\) of (2.4), by Green’s formula, we have

    $$\begin{aligned} M^{(4)}_{\omega }(u)&= \int _D \psi (x) f(x,\omega ) \, \mathrm{d} x\!-\! \int _D a(\omega ,x) \nabla \psi (x) \!\cdot \! \nabla u(\omega ,x) \, \mathrm{d} x\nonumber \\&= \!-\int _D \psi (x) \, \nabla \!\cdot \! \left( a(\omega , x) \nabla u(\omega , x)\right) \, \mathrm{d} x\!-\! \int _D a(\omega ,x) \nabla \psi (x) \!\cdot \! \nabla u(\omega ,x) \, \mathrm{d} x\nonumber \\&= \!- \int _{\varGamma } \psi (x) a(\omega ,x) \nabla u(\omega ,x) \cdot \nu \, \mathrm{d} s\,. \end{aligned}$$
    (3.8)

    Thus, \(M^{(4)}_{\omega }(u)\) is equal to the outflow through the boundary \(\varGamma \) weighted by \(\psi \), and so \(M^{(4)}_\omega \) can be used to approximate the flux through a part \(\varGamma _\mathrm{out} \subset \varGamma \) of the boundary, by setting \(\psi |_{\varGamma _\mathrm{out}} \approx 1\) and \(\psi |_{\varGamma \backslash \varGamma _\mathrm{out}} \approx 0\), see e.g. [1, 12, 18]. Note that for \(f \not \equiv 0\) this functional is only affine, not linear. When \(f \equiv 0\), then it is linear. In any case,

    $$\begin{aligned} D_v M^{(4)}_\omega (\tilde{v}) \!&:= \!\lim _{\varepsilon \rightarrow 0} \frac{M^{(4)}_\omega (\tilde{v}\!+\!\varepsilon v)\!-\!M^{(4)}_\omega (\tilde{v})}{\varepsilon }\\ \!&= \!\lim _{\varepsilon \rightarrow 0} \frac{\!- \int _D a(\omega ,x) \nabla \psi (x) \cdot \nabla (\varepsilon v(\omega ,x)) \, \mathrm{d} x}{\varepsilon } \\&= \!- \int _D a(\omega ,x) \nabla \psi (x) \cdot \nabla v(x) \, \mathrm{d} x\!=\! \int _D v(x) \, \nabla \cdot \left( a(\omega , x) \nabla \psi (x)\right) \, \mathrm{d} x, \end{aligned}$$

    for \(v, \tilde{v} \in H^1_0(D)\). Since this is independent of \(\tilde{v}\), we have in particular

    $$\begin{aligned} \overline{D_v M^{(4)}_\omega }(u, u_h) = \int _D v(x) \, \nabla \cdot \left( a(\omega , x) \nabla \psi (x)\right) \, \mathrm{d} x. \end{aligned}$$

    If we now assume that Assumptions A1-A3 are satisfied for some \(0< t \le 1\) and that \(\psi \in H^{1+t}(D)\), then using Theorems 9.1.12 and 6.2.25 in [25] (see also Lemmas A.1 and A.2 in [6]), we have \(\nabla \psi \in H^{t}(D)\) and for any \(t^* < t\),

    $$\begin{aligned} |\overline{D_v M^{(4)}_\omega }(u, u_h)|&\le \Vert \nabla \cdot \left( a(\omega , \cdot ) \nabla \psi \right) \Vert _{H^{t^*-1}(D)} \Vert v\Vert _{H^{1-t^*}(D)} \nonumber \\&\lesssim \Vert \left( a(\omega ,\cdot ) \nabla \psi \right) \Vert _{H^{t^*}(D)} \Vert v\Vert _{H^{1-t^*}(D)} \nonumber \\&\lesssim \Vert a(\omega , \cdot )\Vert _{ C\,^t(\overline{D})} \Vert \nabla \psi \Vert _{H^{t^*}(D)}\Vert v\Vert _{H^{1-t^*}(D)}. \end{aligned}$$
    (3.9)

    Hence, Assumption F1 is satisfied, for any \(q_* < \infty \) and \(t_* < t\), with \(C_\mathrm{F}(\omega )=\Vert a(\omega , \cdot )\Vert _{ C\,^t(\overline{D})}\). If \(t=1\), then estimate (3.9) holds with \(t^*=t=1\), and Assumption F1 is satisfied with \(t_*=1\). Our assumption on \(\psi \) is satisfied for example if \(\psi \) is linear, which is a suitable choice for the numerical test in the next section.

The functional \(\frac{1}{\varGamma _{\mathrm{out}}} \int _{\varGamma _{\mathrm{out}}} a(\omega ,x) \nabla u(\omega ,x) \cdot \nu \, \mathrm{d} s\) (or its regularised equivalent over a narrow region near \(\varGamma _{\mathrm{out}}\)), which also approximates the flux through \(\varGamma _\mathrm{out}\), can only be bounded in \(H^1(D)\) and will converge with a slower rate than \(M^{(4)}_\omega \).

3.5 Numerics

We consider two different model problems in 2D, both in the unit square \(D=(0,1)^2\): either (2.1) with \(f \equiv 1\) and \(\phi \equiv 0\), i.e.

$$\begin{aligned} -\nabla \cdot \left( a(\omega , x) \nabla u(\omega , x)\right) \!=\! 1, \ \ \mathrm{for} \ x \in D,\quad \mathrm{and} \ \ u(\omega ,x) \!= \!0 \ \ \mathrm{for} \ \ x \in \partial D, \end{aligned}$$
(3.10)

or the mixed boundary value problem

$$\begin{aligned}&- \nabla \cdot \left( a(\omega , x) \nabla u(\omega , x)\right) = 0, \qquad \mathrm{for} \ \ x \in D, \nonumber \\&\quad \text{ and } \quad u\big |_{x_1=0}= 1, \quad u\big |_{x_1=1}=0, \quad \frac{\partial u}{\partial \nu }\Big |_{x_2=0} =0, \quad \frac{\partial u}{\partial \nu }\Big |_{x_2=1} =0. \quad \quad \end{aligned}$$
(3.11)

We take \(a(\omega ,x)\) to be a log-normal random field, where the underlying Gaussian field has mean zero and exponential covariance function [(using the 2-norm in (2.3)]. We choose \(\lambda =0.3\) and \(\sigma ^2=1\). The finite element solutions are computed on a family of uniform triangular grids \(\mathcal{T }_h\) with mesh widths \(h = 1/2, 1/4, \ldots , 1/128\). The sampling from \(a(\omega ,x)\) is done using a circulant embedding technique (for details see [11, 20]). To assemble the stiffness matrix we use a quadrature rule. We chose the trapezoidal rule, evaluating the coefficient function at the vertices of the grids.

First, we consider the approximation of the pressure at the centre of the domain for model problem (3.10). As described in §3.4 for functional \(M^{(2)}\), we approximate it by the average of \(u_h\) over the region \(D^*\), which is chosen to consist of the six elements (of a uniform grid with \(h^*=1/256\)) adjacent to the node at \((1/2, 1/2)\). To estimate the errors we approximated the exact solution \(u\) by a reference solution \(u_{h^*}\) on a grid with mesh width \(h^*=1/256\). In Fig. 1, we see that \(\big |\mathbb{E }\big [M^{(3)}(u_{h^*})-M^{(3)}(u_{h})\big ]\big |\) converges linearly in \(h\) and \(\mathbb{V }\big [M^{(3)}(u_{h})-M^{(3)}(u_{2h})\big ]\) converges quadratically, as predicted by Lemma 3.1 for the “exact” FE solution. However, in the context of numerical quadrature this is better than expected (cf. Remark 3.2).

Fig. 1
figure 1

Left plot \(\big |\mathbb{E }\big [M^{(2)}(u_{h^*})^2-M^{(2)}(u_{h})^2\big ]\big |\) for 2D model problem (3.10) with \(\lambda =0.3,\sigma ^2=1\) and \(h^*=1/256\). Right plot Corresponding variance \(\mathbb{V }\big [M^{(2)}(u_{h})^2-M^{(2)}(u_{2h})^2\big ]\). The gradient of the dotted (resp. dashed) line is \(-1\) (resp. \(-2\)).

Fig. 2
figure 2

Left Plot of \(\big |\mathbb{E }\big [M^{(4)}_\omega (u_{h^*})-M^{(4)}_\omega (u_{h})\big ]\big |\), for 2D model (3.11) problem with \(\lambda =0.3,\sigma ^2=1,\psi =x_1\) and \(h^*=1/256\). Right Corresponding variance \(\mathbb{V }\big [M^{(4)}_\omega (u_{h})-M^{(4)}_\omega (u_{2h})\big ]\). The gradient of the dotted (resp. dashed) line is \(-1\) (resp. \(-2\))

For the second model problem (3.11), we consider an approximation of the average outflow through the boundary \(\varGamma _\mathrm{out} := \{x_1=1\}\) computed via the functional \(M^{(4)}_\omega \) in §3.4. As the weight function, we choose the linear function \(\psi (x) = x_1\), which is equal to 1 at all nodes on \(\varGamma _\mathrm{out}\) and equal to 0 at all other Dirichlet nodes. Thus, \(M^{(4)}_\omega (u)\) is exactly equal to the flow through \(\varGamma _\mathrm{out}\). As predicted we see again linear convergence in \(h\) for \(\big |\mathbb{E }\big [M^{(4)}_\omega (u_{h^*})-M^{(4)}_\omega (u_{h})\big ]\big |\), and quadratic convergence for \(\mathbb{V }\big [M^{(4)}_\omega (u_{h})-M^{(4)}_\omega (u_{2h})\big ]\) in Fig. 2.

4 Level dependent estimators

The key ingredient in the multilevel Monte Carlo algorithm is the telescoping sum (2.9),

$$\begin{aligned} \mathbb{E }[Q_h] = \mathbb{E }[Q_{h_0}] + \sum _{\ell =1}^L \mathbb{E }[Q_{h_\ell } - Q_{h_{\ell -1}}]. \end{aligned}$$

We are free to choose how to approximate \(Q\) on the different levels, without violating the above identity, as long as the approximation of \(Q_{h_\ell }\) is the same in the two terms in which it appears on the right hand side, for \(\ell =0,...,L-1\). In particular, this implies that we can approximate the coefficient \(a(\omega ,x)\) differently on each level. Even though this strategy does not introduce any additional bias in the final result \(\mathbb{E }[Q_h]\), it may influence the values of the convergence rates \(\alpha \) and \(\beta \) in Theorem 2.3. One has to be careful not to introduce any additional model/approximation errors that decay at a slower rate than the discretisation error.

It is particularly useful when the random field \(a(\omega ,x)\) is highly oscillatory and varies on a fine scale. Coarse grids will not be able to resolve the coefficient well. As a consequence of this, one needs to choose the coarsest grid size \(h_0\) smaller than a certain threshold to get the MLMC estimator with the smallest absolute cost. Numerical investigations in [8], for example, show that for log-normal random fields with underlying exponential, 1-norm covariance function and correlation length \(\lambda \), the optimal choice is \(h_0 \approx \lambda \). This limits the benefit that the MLMC estimator potentially offers. A possible solution to this problem is to use smoother approximations of the coefficient on the coarser levels. We will present one way of doing this in §4.1, by using level-dependent truncations of the Karhunen–Lòeve expansion of \(a(\omega ,x)\).

4.1 Truncated KL-expansions

As an exemplary case, let us now consider log-normal random fields \(a\), where \(\log (a)\) has exponential, 1-norm covariance, i.e. covariance function (2.3) with \(\Vert x\Vert =\Vert x\Vert _1:= \sum _{i=1}^d |x_i|\). We will comment on the general case at the end of the section.

For a Gaussian random field \(g\), the Karhunen–Lòeve (KL) expansion (see e.g. [13]) is an expansion in terms of a countable set of independent, standard Gaussian random variables \(\{\xi _n\}_{n \in \mathbb N }\). It is given by

$$\begin{aligned} g(\omega ,x)= \mathbb{E }\left[ g(\omega ,x)\right] + \sum _{n=1}^\infty \sqrt{\theta _n}b_n(x)\xi _n(\omega ), \end{aligned}$$

where \(\{\theta _n\}_{n \in \mathbb N }\) are the eigenvalues and \(\{b_n\}_{n \in \mathbb N }\) are the corresponding normalised eigenfunctions of the covariance operator with kernel function

$$\begin{aligned} C(x,y) := \mathbb{E }\Big [(g(\omega ,x)-\mathbb{E }[g(\omega ,x)])(g(\omega ,y)-\mathbb{E }[g(\omega ,y)]) \Big ]. \end{aligned}$$

The log-normal coefficient field shall then be written as

$$\begin{aligned} a(\omega ,x)=\exp \left[ \mathbb{E }\left[ g(\omega ,x)\right] + \sum _{n=1}^\infty \sqrt{\theta _n}b_n(x)\xi _n(\omega )\right] , \end{aligned}$$

and the random fields resulting from truncated expansions with \(K \in \mathbb N \) terms shall be denoted by

$$\begin{aligned} g_K(\omega ,x)\!:=\!\mathbb{E }\left[ g(\omega ,x)\right] \!+\! \sum _{n=1}^K \sqrt{\theta _n}b_n(x)\xi _n(\omega ) \quad \text{ and } \quad a_K(\omega ,x)\!:=\!\exp \left[ g_K(\omega ,x) \right] . \end{aligned}$$

Moreover, we denote by \(u_K \in H^1_\phi (D)\) the weak solution to

$$\begin{aligned} -\nabla \cdot (a_K(\omega , x) \nabla u_K(\omega , x))&= f(\omega ,x), \,\, \qquad \mathrm{for} \ x \in D, \nonumber \\ u_K(\omega ,x)&= \phi _j(\omega ,x), \qquad \mathrm{for} \ x \in \varGamma _j\,. \end{aligned}$$
(4.1)

i.e. our model problem (2.1) with the coefficient \(a\) replaced by its \(K\)-term approximation. The finite element approximation of \(u_K\) in \(V_{h,\phi }\) is denoted by \(u_{K,h}\). Recall that we assumed that \(\{\phi _j\}_{j=1}^m\) are piecewise linear, and so \(u_{K} - u_{K,h} \in H^1_0(D)\). It has been shown in [5, 6] that in the case of the 1-norm exponential covariance, Assumptions A1–A2 are satisfied also for \(a_K\), for any \(t < 1/2\) (independent of \(K\)). Therefore the theory in the earlier sections applies also to (4.1).

Since the convergence with respect to \(K\) is quite slow (see below), to get a good approximation to \(\mathbb{E }[Q_h]\) we need to include a large number of terms on the finest grid, both in the case of the standard and the MLMC estimator. The eigenvalues \(\{\theta _n\}_{n \in \mathbb N }\) are all non-negative with \(\sum _{n\ge 1}\theta _n<+\infty \), and if they are ordered in decreasing order of magnitude, the corresponding eigenfunctions \(\{b_n\}_{n \in \mathbb N }\) will be ordered in increasing order of oscillations over \(D\). By truncating the KL-expansion after fewer terms, we are hence disregarding the contributions of the most oscillatory eigenfunctions, leading to smoother problems that can be solved more accurately on the coarser levels. In order to determine a suitable strategy for the level dependent truncation of the KL-expansion, we make use of results from [5, 6].

Proposition 4.1

Let \(a\) be a log-normal random field s.t. \(\log (a)\) has 1-norm exponential covariance. Then,

$$\begin{aligned} \Vert u- u_{K}\Vert _{L^p(\varOmega ,H^1_0(D))} \, \lesssim \, C_{a,f,\phi _j} \, K^{-r} \quad \text{ and } \quad \Vert u- u_{K}\Vert _{L^p(\varOmega ,L^2(D))} \, \lesssim \, C_{a,f,\phi _j} \, K^{-r}, \end{aligned}$$

for all \(p < p_*\) and \(0<r<1/2\). The hidden constant is independent of \(K\).

As in the previous sections this result can again be extended in a straightforward way to functionals.

Corollary 4.1

Let \(a\) be as in Proposition 4.1. Suppose Assumption A3 is satisfied for some \(p_* \in (0, \infty ]\) and \(t \ge 1/2\), and suppose that for the truncated problem (4.1) and for the functional \(M_{\omega }(\cdot )\) we have Assumption F1 satisfied with \(t_* \ge \frac{1}{2}\) and \(q_* \in (0, \infty ]\), i.e. \(M_{\omega }\) is Fréchet differentiable and \(\overline{D_v M_{\omega }}(u_K, u_{K,h})\) is bounded in \(H^{1-t_*}(D)\). Assume further that there exists \(C_F^{\prime } \in L^{q_*}(\varOmega )\) such that \(\overline{D_v M_{\omega }}(u, u_{K}) \le C_F^{\prime } |v|_{H^1(D)}\), for all \(v \in H^1_0(D)\). Then

$$\begin{aligned} \Vert M_{\omega }(u)- M_{\omega }(u_{K,h})\Vert _{L^p(\varOmega )} \, \lesssim C_{a,f,\phi _j,C_F,C_F^{\prime }} \left( h^{2s} + K^{-r}\right) , \end{aligned}$$

for any \(p < \left( \frac{1}{p_*} + \frac{1}{q_*} \right) ^{-1},\,0<s<1/2\) and \(0<r<1/2\). The hidden constant is again independent of \(h\) and \(K\).

Proof

First note that due to the triangle inequality, we have of course

$$\begin{aligned} |M_{\omega }(u)\!-\! M_{\omega }(u_{K,h})| \!\le \,|M_{\omega }(u)\!-\!M_{\omega }(u_K)| \!+\! |M_{\omega }(u_{K})\!-\!M_{\omega }(u_{K,h})|. \end{aligned}$$
(4.2)

To bound the first term in (4.2), we can use (3.5) so that by assumption

$$\begin{aligned} |M_{\omega }(u)-M_{\omega }(u_{K})|= |\overline{D_{u-u_K} M_{\omega }}(u, u_{K})| \le C_F^{\prime } |u-u_{K}|_{H^1(D)}\,. \end{aligned}$$
(4.3)

It follows from Proposition 4.1 that \(\Vert u-u_{K}\Vert _{L^p(\varOmega ,H^1_0(D))} \le C_{a,f,\phi _j} K^{-r}\), for any \(p < p_*\) and \(\,0<r<1/2\). Thus, Hölder’s inequality implies that the \(L^p\)-norm of the first term is \(\mathcal{O }(K^{-r})\), for any \(p < \left( p_*^{-1} + q_*^{-1} \right) ^{-1}\) and \(\,0<r<1/2\), with a constant that is independent of \(h\) and \(K\).

As noted above, it follows from [5, §7] that Assumptions A1–A2 are satisfied for the truncated expansion \(a_K\) of a log-normal random field whose logarithm has 1-norm exponential covariance. Since Assumption A3 is also assumed to hold, it follows as in Corollary 3.1 from Hölder’s inequality that the \(L^p\)-norm of the second term in (4.2) is \(\mathcal{O }(h^{2s})\), for any \(p < \left( p_*^{-1} + q_*^{-1} \right) ^{-1}\) and \(\,0<s<1/2\), with a constant that is independent of \(h\) and \(K\).

These results suggest that to balance out the two error contributions, we should choose \(K_\ell \) as a power of \(h_\ell \). Note that a similar strategy was already suggested in the context of the related Brinkman problem in [19]. However, there, a certain decay rate for the error with respect to the number of KL-modes \(K\) was assumed. Here we make no such assumption and instead use Proposition 4.1 for the 1-norm exponential covariance. We have the following results for the multilevel Monte Carlo convergence rates in Theorem 2.3.

Proposition 4.2

Provided Assumption F1 is satisfied with \(t_* \ge \frac{1}{2}\) and \(K_\ell \gtrsim h_\ell ^{-2}\), for all \(\ell =0,\ldots ,L\), then the convergence rate of the multilevel Monte Carlo method in §2.2 does not deteriorate when approximating the functional \(M_{\omega }(u_{h_\ell })\) by \(Q_{h_\ell } := M_{\omega }(u_{K_\ell ,h_\ell })\) on each level \(\ell \). In particular, let the assumptions of Corollary 4.1 be satisfied with \(p_* > 2\) and \(q_* > \frac{2p_*}{p_*-2}\). Then the Assumptions M1–M2 in Theorem 2.3 hold for any \(\alpha < 1\) and \(\beta < 2\). If Assumption F1 is satisfied only for some \(t_* < 1/2\), then \(K_\ell \gtrsim h_\ell ^{-(1+2t_*)}\) is a sufficient condition.

Proof

The proof is analogous to that of Corollary 3.1 using Corollary 4.1. The final statement follows from Remark 3.2.

As before, in the presence of quadrature error (cf. Remark 3.2), we will not be able to get \(\mathcal{O }(h^{2s})\) convergence for the second term in (4.2) for the approximate finite element solution \(\tilde{u}_{K,h}\). Due to the loss of Galerkin orthogonality for the primal problem, it is in general only possible to prove \( |M_{\omega }(u)-M_{\omega }(\tilde{u}_{K,h})| = \mathcal{O } \left( h^s + K^{-s}\right) \). Thus with the quadrature error taken into account the optimal choice is \(K_\ell \gtrsim h_\ell ^{-1}\) for all functionals which satisfy assumption F1 with \(t_* \ge 1/2\) and we will always use that in our numerical tests in the next section.

Let us finish this section with some comments on truncated expansions \(a_K=\exp (g_K)\) of log-normal fields with other covariance functions. The convergence rate of \(|M_{\omega }(u)-M_{\omega }(u_{K})|\) depends in general on the rate of decay of the KL-eigenvalues \(\theta _n\) and on the rate of growth of \(\Vert \nabla b_n\Vert _\infty \). If we assume that \(|M_{\omega }(u_K)-M_{\omega }(u_{K,h})| = \mathcal{O }(h^{s})\) and \(|M_{\omega }(u)-M_{\omega }(u_{K})| = \mathcal{O }(K^{-\sigma })\), for some \(0<s\le 1\) and \(0 < \sigma < \infty \), then the number of KL-terms in a multilevel Monte Carlo method on each level should satisfy \(K_{\ell } \gtrsim h_\ell ^{-\frac{s}{\sigma }}\). For smoother fields (e.g. with covariance functions from the Matérn class with large parameters), \(\frac{s}{\sigma }\) will usually be significantly smaller than \(1\), and thus the number of KL-terms only needs to grow very slowly from level to level.

The only other rigorous results regarding convergence rates for truncated expansions \(a_K=\exp (g_K)\) of log-normal fields— except those for the 1-norm exponential covariance above—are for the case of a Gaussian covariance function

$$\begin{aligned} \mathbb{E }\Big [(g(\omega ,x)\!-\!\mathbb{E }[g(\omega ,x)])(g(\omega ,y)\!-\!\mathbb{E }[g(\omega ,y)]) \Big ]=\, \sigma ^2 \exp (-\Vert x\!-\!y\Vert _2^2/ \lambda ^2), \end{aligned}$$
(4.4)

for \(g\) with \(\sigma ^2\) and \(\lambda \) as in (2.3). In this case, provided the mean is sufficiently smooth, we have \(a(\omega , \cdot ) \in C^\infty (\overline{D})\) and

$$\begin{aligned} |M_{\omega }(u)-M_{\omega }(u_{K,h})| \, \lesssim C_{a,f,\phi _j} \left( h^2 \,+\, \exp \big (-c_1 \, K^{1/d}\big )\right) , \end{aligned}$$

for some \(c_1>0\) (cf. [6]), where \(d\) is again the spatial dimension. Thus, \(K_\ell \) only needs to be increased logarithmically with \(h_{\ell }^{-d}\) in this case.

However, all these results are asymptotic results, as \(h_\ell \rightarrow 0\), and thus they only guarantee that level-dependent truncations do not deteriorate the performance of the multilevel Monte Carlo method asymptotically as the tolerance \(\varepsilon \rightarrow 0\). The real benefit of using level-dependent truncations is in absolute terms for a fixed tolerance \(\varepsilon \), since the smoother fields potentially allow the use of coarser levels and thus significant gains in the absolute cost of the algorithm. In the next section, we see that this is in fact the case and we show the gains that are possible, especially for covariance functions with short correlation length \(\lambda \).

4.2 Numerics

To be able to deal with very short correlation lengths in a reasonable time, we start with the 1D equivalent of model problem (3.10), on \(D=(0,1)\). We take \(a\) to be a log-normal random field, with \(\log [a]\) having exponential covariance function (2.3), with correlation length \(\lambda =0.01\) and variance \(\sigma ^2=1\). The results in Fig. 3 are for point evaluation of the pressure, i.e. \(M^{(1)}(u)\) from §3.4 with \(x^*=2049/4096\). Similar gains can be obtained for other quantities of interest.

In the left plot in Fig. 3, we study the behaviour of \(\mathbb{V }[Q_{h_\ell }-Q_{h_{\ell -1}}]\) and \(\mathbb{V }[Q_{h_\ell }]\). When \(\mathbb{V }[Q_{h_\ell }-Q_{h_{\ell -1}}] \ge \mathbb{V }[Q_{h_\ell }]\), there is no benefit including level \(\ell -1\) in the multilevel estimator, since it would only increase the cost of the estimator. We can see that if we approximate \(a\) with a (large) fixed number of modes on each level (labelled “keep” in Fig. 3), we should not include any levels coarser than \(h_0 = 1/64 \, (\approx \lambda )\) in the estimator, as was already observed in [8]. With the level-dependent regime (labelled “drop”), however, it is viable to include levels as coarse as \(h_0 = 1/2\). This leads to significant reductions in computational cost, as is shown in the right plot in Fig. 3.

In the right plot in Fig. 3, we fix the required tolerance for the sampling error (i.e. the standard deviation of the estimator) at \(\delta =10^{-3}\), and look at how the cost of the different estimators grows as we decrease the mesh size \(h:= h_L\) of the finest grid, with each line in the plot using a fixed number of grid levels in the multilevel simulation (e.g. 4L means 4 levels). The computational cost of the multilevel estimator is calculated as \(N_0 h_0^{-1} + \sum _{\ell =1}^L N_\ell (h_\ell ^{-1} + h_{\ell -1}^{-1})\) work units, since we know that \(\gamma =1\) in M3 for \(d=1\). To make the estimators comparable, for each finest grid \(h_L\), the standard Monte Carlo estimator is computed with \(K_L=h_L^{-1}\) modes, the “MLMC keep” estimator is computed with \(K_\ell =h_L^{-1}\) modes on all levels, and the “MLMC drop” estimator is computed with a varying number \(K_\ell =h_\ell ^{-1}\) modes on the levels. We clearly see the benefit of using the level-dependent multilevel estimator. For example, on the grid of size \(h=1/2048\), the cheapest multilevel estimator with a fixed number of modes is the 4 level estimator, which has a cost of \(8.6 \times 10^5\) work units. The cheapest level-dependent multilevel estimator, on the other hand, is the 7 level estimator, whose computational cost is only \(1.8 \times 10^5\) units. For comparison, the cost of the single-level MC estimator on this grid is \(2.8 \times 10^6\) units.

Fig. 3
figure 3

Left Plot of \(\mathbb{V }\big [M^{(1)}(u_{h})\big ]\) and \(\mathbb{V }\big [M^{(1)}(u_{h})-M^{(1)}(u_{2h})\big ]\), for (3.10) with \(d=1,\lambda =0.01,\sigma ^2=1, K_\ell =h_\ell ^{-1},h^*=1/4096,K^*=4,\!096\) and \(x^*=2,\!049/4,\!096\). Right Plot of cost versus \(1/h\) for a fixed tolerance of the sampling error of \(\delta =10^{-3}\), for the same model problem

An important point we would like to make here, is that not only do the level-dependent estimators have a smaller absolute cost than the estimators with a fixed number of modes, they are also a lot more robust with respect to the coarse grids included. On the \(h=1/2048\) grid, the 11 level estimator (i.e. \(h_0=1/2\)) with fixed \(K\), costs \(1.1 \times 10^{7}\) units, which is 4 times the cost of the standard MC estimator. The 11 level estimator with level-dependent \(K_\ell \) costs \(2.4 \times 10^5\) units, which is only marginally more than the best level-dependent estimator (the 7 level estimator).

For practical purposes, the real advantage of the level-dependent approach is evident on coarser grids. We see in Fig. 3 that on grids coarser than \(h=1/256\), all multilevel estimators with a fixed number of modes are more expensive than the standard MC estimator. With the level-dependent multilevel estimators on the other hand, we can make use of (and benefit from) multilevel estimators on grids as coarse as \(h=1/64\). This is very important, especially in the limit as the correlation length \(\lambda \rightarrow 0\), as eventually all computationally feasible grids will be “coarse” with respect to \(\lambda \). With the level-dependent estimators, we can benefit from the multilevel approach even for very small values of \(\lambda \).

Let us now move on to a model problem in 2D. We will study the flow cell model problem (3.11) on \(D=(0,1)^2\), and take the outflow functional \(M^{(4)}_\omega (u)\) from §3.4 as our quantity of interest. As in §3.5, we choose the weight function \(\psi = x_1\). We choose \(a\) to be a log-normal random field s.t. \(\log (a)\) has 1-norm exponential covariance function (2.3), with \(\lambda =0.1\) and \(\sigma ^2=1\).

Fig. 4
figure 4

Left Plot of \(\mathbb{V }\big [M^{(4)}_\omega (u_{h})\big ]\) and \(\mathbb{V }\big [M^{(4)}_\omega (u_{h})-M^{(4)}_\omega (u_{2h})\big ]\), for (3.11) with \(d=2,\lambda =0.1,\sigma ^2=1, \psi =x_1,K_\ell =4h_\ell ^{-1},h^*=1/256\) and \(K^*=1,\!024\). Right Plot of cost CPU-time versus \(1/h\) for a fixed tolerance of the sampling error of \(\delta =10^{-3}\), for the same model problem

The left plot in Fig. 4 is similar to the left plot in Fig. 3. We again see that the level-dependent regime allows for much coarser grids. In the right plot, we see the gains in computational cost that are possible with the level-dependent estimators. Since we do not know the value of \(\gamma \) in (M3) theoretically, we quantify the cost of the estimators by the CPU-time. The results shown are calculated with a Matlab implementation on a 3GHz Intel Core 2 Duo E8400 processor with 3.2 GByte of RAM, using the sparse direct solver provided in Matlab through the standard backslash operation to solve the linear systems for each sample. On the finest grid \(h=1/256\), we clearly see a benefit from the level-dependent estimators. The cheapest multilevel estimator with a fixed number of modes is the 5 level estimator, with takes 13.5 min. The cheapest level-dependent estimator is the 7 level estimator, which takes only 2.5 min. For comparison, the standard MC estimator takes more than 7.5  h.

5 Domains with corners and discontinuous coefficients

We now come to the last and most technical part of the paper. The first aim is to prove Theorem 2.1, i.e. to extend the regularity results in [6] to piecewise \(C^2\) domains. In this situation, the solution \(u\) can have singularities near the non-smooth parts of the boundary \(\varGamma \), i.e. near corners in 2D and near corners and edges in 3D. These singularities can reduce the overall regularity of \(u\), and hence need to be analysed. However, we will see in §5.1 that under Assumptions A1–A2, this question can be reduced to analysing the singularities of the Laplace operator on \(D\). We will follow [23, §5.2], and as in [6] we will again establish the result first “pointwise” almost surely in \(\omega \in \varOmega \). The key technicality will again be to track how the constants in all the necessary estimates, in particular in the semi-Fredholm property of the underlying random differential operator, depend on \(\omega \). In §5.2, we then extend the results also to the practically very important case where the coefficient \(a(\omega ,x)\) is discontinuous.

5.1 Regularity of random differential operators in domains with corners

Let us recall that \(D\) was assumed to be a bounded, Lipschitz polygonal/polyhedral domain in \(\mathbb{R }^d,d = 2, 3\), and that \(\lambda _\varDelta (D) \in (0,1]\) is the largest number such that for all \(0 < s \le \lambda _\varDelta (D), \, s \ne \frac{1}{2}\), the Laplace operator with homogeneous Dirichlet boundary conditions is surjective as an operator from \(H^{1+s}(D) \cap H^1_0(D)\) to \(H^{s-1}(D)\) (cf. Definition 2.1). As in [23, §5.2], for simplicity we actually consider \(D\) to be a piecewise \( C^2\) domain and restrict ourselves for the most part to \(\mathbb{R }^2\). However, we will also comment on the case \(d=3\) in Remark 5.2(c) below. We again write the boundary \(\varGamma \) as \(\varGamma = \cup _{j=1}^m \varGamma _j\), where now in 2D each \(\varGamma _j\) is an open arc of curve of class \( C^2\), and \(\overline{\varGamma }_j\) meets \(\overline{\varGamma }_{j+1}\) at \(S_j\) (where we identify \(\varGamma _{m+1}\) and \(\varGamma _1\)). We consider only domains with boundaries that are rectilinear near the corners, which of course includes Lipschitz polygonal/polyhedral domains. This means that at each corner \(S_j\), we can find a polygonal domain \(W_j \subset D\) such that the boundary \(\partial W_j\) coincides with \(\varGamma \) near \(S_j\).

Applying the Lax-Milgram Theorem, a unique variational solution \(u(\omega , \cdot ) \in H^1_0(D)\) to our model problem (2.1) in the curvilinear polygon \(D\) exists, for almost all \(\omega \in \varOmega \) (i.e. for all \(\omega \in \varOmega \) with \(a_\mathrm{min}(\omega ) > 0\) and \(a_\mathrm{max}(\omega ) < \infty \)). Using Assumptions A1–A3, we can conclude as in [6] that \(u \in L^p(\varOmega , H^1(D))\), for all \(p< p_*\). The fact that \(D\) is no longer \( C^2\) is of no relevance here. To prove more spatial regularity on \(u\), we will now follow the proof in §5.2 of [23].

For a given \(\omega \in \varOmega \), with \(a_\mathrm{min}(\omega ) > 0\) and \(a_\mathrm{max}(\omega ) < \infty \), we define the differential operator

$$\begin{aligned} A_\omega u = -\nabla \cdot (a(\omega , \cdot ) \nabla u)). \end{aligned}$$

The following key result, which is based on [28, Theorem 5.26], is proved via a homotopy method in the proof of [23, Lemma 5.2.5], for \(s=1\). The proof for \(s < 1\) is analogous.

Lemma 5.1

Let \(m=1\) and \(\omega \in \varOmega \). If \(0 < s \le \lambda _\varDelta (D)\) and if there exists \(C_{\scriptscriptstyle \mathrm{semi}}(\omega ) > 0\) such that

$$\begin{aligned} \Vert v\Vert _{H^{1+s}(D)} \le C_{\scriptscriptstyle \mathrm{semi}}(\omega ) \Vert A_\omega v \Vert _{H^{s-1}(D)}, \quad \text{ for } \text{ all }\, \ v \in H^{1+s}(D) \cap H^1_0(D), \end{aligned}$$
(5.1)

then \(A_\omega \) is surjective from \(H^{1+s}(D) \cap H^1_0(D)\) to \(H^{s-1}(D)\).

Thus, if we can establish (5.1), which essentially means that \(A_\omega \) is semi-Fredholm as an operator from \(H^{1+s}(D) \cap H^1_0(D)\) to \(H^{s-1}(D)\), for some \(s \le \lambda _\varDelta (D)\), then we can also conclude on the regularity of solutions of the stochastic variational problem (2.1). The following lemma essentially follows [23, Lemma 5.2.3]. However, in the case of a random coefficient, we crucially need to make sure that the constant \(C_{\scriptscriptstyle \mathrm{semi}}(\omega )\) in (5.1) has sufficiently many moments as a random field on \(\varOmega \). To ensure this we need to carefully track the dependence on \(a\) in the bounds in [23, Lemma 5.2.3].

Lemma 5.2

Let \(m \in \mathbb N \) and let Assumptions A1–A2 hold for some \(0< t \le 1\). Then (5.1) holds for all \(0 < s < t\) and \(s \le \lambda _\varDelta (D),s \ne \frac{1}{2}\), with

$$\begin{aligned} C_{\scriptscriptstyle \mathrm{semi}}(\omega ) \,:=\, \frac{a_\mathrm{max}(\omega )\Vert a(\omega , \cdot )\Vert ^2_{ C\,^{t}(\overline{D})}}{a_\mathrm{min}(\omega )^4}. \end{aligned}$$
(5.2)

In the case \(t= \lambda _\varDelta (D)=1\), (5.1) also holds for \(s=1\), i.e. for the \(H^2(D)\)-norm.

Proof

We first consider the case where \(m=1\) and \(t = \lambda _\varDelta (D)=1\). Note that the case \(m=1\) is all that is needed to prove Lemma 5.1. We prove the more general case \(m \in \mathbb N \) so that we can apply the bound (5.1) to any polygonal domain in the proof of Theorem 2.1. For ease of notation, we suppress the dependence on \(\omega \) in the coefficient, and denote \(A_\omega \) simply by \(A\) and \(a(\omega ,x)\) by \(a(x)\).

We will prove (5.1) by combining the regularity results of \(A\) in \( C^2\) domains, with regularity results of the Laplace operator \(-\varDelta \) on polygonal domains. Since we assume that \(\varGamma \) is rectilinear near \(S_1\), we can find a polygonal domain \(W\) such that \(W \subset D\) and \(\partial W\) coincides with \(\varGamma \) near \(S_1\). Let \(v \in H^{2}(D) \cap H^1_0(D)\) and let \(\eta \) be a smooth cut-off function with support in \(W\), such that \(\eta \equiv 1\) near \(S_1\) and then consider \(\eta v\) and \((1-\eta )v\) separately. We start with \(\eta v\).

Let \(w \in H^2(W) \cap H^1_0(W)\). Since \(\lambda _\varDelta (D)=1\), we have for any polygonal domain \(W\) the estimate

$$\begin{aligned} \Vert w\Vert _{H^2(W)} \,\lesssim \, \Vert \varDelta w \Vert _{L^2(W)}\,, \end{aligned}$$
(5.3)

where the hidden constant depends only on \(W\) (cf. [23]). Hence,

$$\begin{aligned} a(S_1) \,\Vert w\Vert _{H^2(W)}&\lesssim \Vert A w \Vert _{L^2(W)} + \Vert A w - a(S_1) \,\varDelta w \Vert _{L^2(W)} \\&\lesssim \Vert A w \Vert _{L^2(W)} + |(a(\cdot )-a(S_1) ) \nabla w |_{H^1(W)} \,. \end{aligned}$$

Now, using [6, Lemma A.2] (see also Theorem 6.2.25 in [25]) we get

$$\begin{aligned} a(S_1) \,\Vert w\Vert _{H^2(W)}&\lesssim \Vert A w \Vert _{L^2(W)}+ |a|_{ C\,^1(\overline{W})}|w|_{H^1(W)}\nonumber \\&+ \Vert a-a(S_1)\Vert _{ C\,^0(\overline{W})}\Vert \nabla w \Vert _{H^1(W)} \, . \end{aligned}$$
(5.4)

Using integration by parts and the fact that \(w=0\) on \(\partial W\), we have

$$\begin{aligned} a_\mathrm{min}\,|w|^2_{H^1(W)} \,\le \, \int _W a |\nabla w|^2 \, \mathrm{d} x\,=\, \int _W w \nabla \cdot (a \nabla w) \, \mathrm{d} x, \end{aligned}$$

and so via the Cauchy-Schwarz and the Poincaré inequalities

$$\begin{aligned} |w|_{H^1(W)} \lesssim \frac{1}{a_\mathrm{min}} \Vert A w \Vert _{L^2(W)}. \end{aligned}$$
(5.5)

Denote now by \(C\) the best constant such that (5.4) holds. Since \(a\) was assumed to be in \(C\,^1(\overline{W})\), we can choose \(W\) (and hence the support of \(\eta \)) small enough so that

$$\begin{aligned} C \Vert a-a(S_1)\Vert _{ C\,^0(\overline{W})} \le \frac{1}{2} \, a(S_1). \end{aligned}$$
(5.6)

Then, substituting (5.5) and (5.6) into (5.4) and using \(\Vert \nabla w \Vert _{H^1(W)} \le \Vert w\Vert _{H^2(W)}\) and \(a_\mathrm{min} \le a_\mathrm{max}\) we have

$$\begin{aligned} a(S_1) \,\Vert w\Vert _{H^2(W)}&\le 2 C \left( 1 \!+\! \frac{|a|_{ C\,^1(\overline{W})}}{a_\mathrm{min}} \right) \Vert A w \Vert _{L^2(W)} \! \lesssim \! \frac{ \Vert a\Vert _{ C\,^1(\overline{W})}}{a_\mathrm{min}} \Vert A w \Vert _{L^2(W)}\,. \end{aligned}$$
(5.7)

Since \(v \in H^2(D) \cap H^1_0(D)\) and \(W\) contains the support of \(\eta \), we have \(\eta v \in H^2(W) \cap H^1_0(W)\) and so estimate (5.7) applies to \(\eta v\). Thus

$$\begin{aligned} \Vert \eta v\Vert _{H^2(D)} \lesssim \frac{ \Vert a\Vert _{ C\,^1(\overline{W})}}{a_\mathrm{min}^2} \Vert A (\eta v) \Vert _{L^2(W)}. \end{aligned}$$

Let us move on to \((1-\eta )v\). Let \(D^{\prime } \subset D\) be a \( C^2\) domain that coincides with \(D\) outside of the region where \(\eta = 1\). This is always possible due to our assumptions on the geometry of \(D\) near \(S_1\). Then \((1-\eta ) v \in H^2(D^{\prime }) \cap H^1_0(D^{\prime })\), and using [6, Proposition 3.1] we have

$$\begin{aligned} \Vert (1-\eta ) v\Vert _{H^{2}(D)} \lesssim \frac{a_\mathrm{max}\Vert a\Vert _{ C\,^{1}(\overline{D}^{\prime })}}{a_\mathrm{min}^3} \, \Vert A \left( (1-\eta ) v\right) \Vert _{L^{2}(D^{\prime })}. \end{aligned}$$

Adding the last two estimates together and using the triangle inequality, we have

$$\begin{aligned} \Vert v\Vert _{H^2(D)} \lesssim \frac{ \Vert a\Vert _{ C\,^1(\overline{D})}}{ a_\mathrm{min}^2} \left( \Vert A (\eta v) \Vert _{L^2(W)} +\frac{a_\mathrm{max}}{a_\mathrm{min}} \, \Vert A ((1-\eta ) v)\Vert _{L^{2}(D^{\prime })}\right) . \end{aligned}$$
(5.8)

It remains to bound the term in the bracket on the right hand side of (5.8) in terms of \(\Vert A v \Vert _{L^2(D)}\). Note that

$$\begin{aligned} A(\eta v) = \eta (A v) + 2 a \nabla \eta \cdot \nabla v + (A \eta ) v. \end{aligned}$$

Thus, applying the triangle inequality and using the fact that \(\eta \) was assumed to be smooth with \(0 \le \eta \le 1\), we get

$$\begin{aligned} \Vert A (\eta v) \Vert _{L^2(W)} \lesssim \Vert A v \Vert _{L^2(W)} + a_\mathrm{max} |v|_{H^1(W)} + \Vert a\Vert _{ C\,^{1}(\overline{D})} \Vert v \Vert _{L^2(W)}. \end{aligned}$$
(5.9)

The hidden constant depends on \(\Vert \nabla \eta \Vert _{L^\infty (W)}\) and on \(\Vert \varDelta \eta \Vert _{L^\infty (W)}\). Finally using Poincaré’s inequality on all of \(D\), as well as an elliptic estimate similar to (5.5) for \(v\), i.e. \(|v|_{H^1(D)} \le \Vert A v \Vert _{L^2(D)} / a_\mathrm{min}\), leads to

$$\begin{aligned} \Vert A (\eta v) \Vert _{L^2(W)} \lesssim \frac{\Vert a\Vert _{ C\,^{1}(\overline{D})}}{a_\mathrm{min}} \Vert A v \Vert _{L^2(D)}. \end{aligned}$$

Substituting this and the corresponding bound for \(\Vert A ((1- \eta ) v) \Vert _{L^2(D^{\prime })}\) into (5.8), we finally get

$$\begin{aligned} \Vert v\Vert _{H^2(D)} \,\lesssim \, \frac{a_\mathrm{max}\Vert a\Vert ^2_{ C\,^{1}(\overline{D})}}{a_\mathrm{min}^4} \, \Vert A v\Vert _{L^{2}(D)}, \end{aligned}$$

for all \(v \in H^2(D) \cap H^1_0(D)\). This completes the proof for the case \(m=1\) and \(t=\lambda _\varDelta =1\).

The proof for \(t<1\) and/or \(\lambda _\varDelta (D) < 1\) follows exactly the same lines. Instead of (5.3), we start with the estimate

$$\begin{aligned} \Vert w\Vert _{H^{1+s}(W)} \lesssim \Vert \varDelta w \Vert _{H^{s-1}(W)}, \end{aligned}$$
(5.10)

which holds for any \(0< s \le \lambda _\varDelta (D),s \ne \frac{1}{2}\), and the hidden constant depends again only on \(W\) (cf. [3] for example). Using [6, Lemma A.1] (see also Theorem 9.1.12 in [25]), one can derive the following equivalent of (5.4), for any \(s \ne \frac{1}{2}\):

$$\begin{aligned} a(S_1) \,\Vert w\Vert _{H^{1+s}(W)}&\lesssim \Vert A w \Vert _{H^{s-1}(W)} + |a|_{ C\,^t(\overline{W})}| w |_{H^1(W)}\nonumber \\&+ \Vert a-a(S_1)\Vert _{ C\,^0(\overline{W})}\Vert \nabla w \Vert _{H^s(W)}\,. \end{aligned}$$

As before, the \(|w|_{H^1(W)}\) term can be bounded using integration by parts, Hölder’s inequality and the Poincaré inequality:

$$\begin{aligned} a_\mathrm{min} \, | w |^2_{H^1(W)}&\le \Vert w\Vert _{H^{1-s}(W)} \Vert A w \Vert _{H^{s-1}(W)} \lesssim \Vert w\Vert _{H^{1}(W)} \Vert A w \Vert _{H^{s-1}(W)}\\&\lesssim |w|_{H^1(W)} \Vert A w \Vert _{H^{s-1}(W)}. \end{aligned}$$

The remainder of the proof requires only minor modifications.

The case \(m>1\) is treated by repeating the above procedure with a different cut-off function \(\eta _j\) at each corner \(S_j\). Estimate (5.7) applies to \(\eta _j v\), for all \(j=1,\dots ,m\), and the regularity estimate for \( C^2\) domains from [6] applies to \((1-\sum _{j=1}^m \eta _j) v\).

Remark 5.1

Lemma 5.2 excludes the case \(s=\frac{1}{2}\). However, an inequality very similar to (5.1) can easily be proved also in this case. Since \(\Vert v\Vert _{H^{1+s}(D)} \le \Vert v\Vert _{H^{1+t}(D)}\), for any \(s \le t,\Vert v\Vert _{H^{3/2}(D)}\) can also be bounded, as in (5.1), if the \(H^{1/2}(D)\)-norm on the right hand side is replaced by the \(H^{1/2+\delta }(D)\)-norm, for some \(\delta > 0\).

We are now ready to prove Theorem 2.1 for \(d=2\). For the case \(d=3\), see Remark 5.2(c).

Proof of Theorem 2.1

Let \(d=2\) and suppose \(u = u(\omega ,\cdot )\) is the unique solution of (2.1). Let us first consider the case \(\phi \equiv 0\). In this case, the fact that \(u \in H^{1+s}(D) \cap H^1_0(D)\) and the bound on \(\Vert u\Vert _{H^{1+s}(D)}\) in (2.6) follow immediately from Lemmas 5.1 and 5.2, for any \(s < t\) and \(s \le \lambda _\varDelta (D)\), as well as for \(s=1\) if \(t=\lambda _\varDelta (D)=1\), since \(f = Au\).

The case \(\phi \ne 0\) now follows from a simple trace theorem, see e.g. §1.4 in [24]. We will only show the proof for \(t=\lambda _\varDelta (D)=1\) in detail. Due to Assumption A3 we can choose \(\phi \in H^2(D)\) with \(\Vert \phi \Vert _{H^2(D)} \lesssim \sum _{j=1}^m \Vert \phi _j\Vert _{H^{3/2}(\varGamma _j)}\), and so \(f_0 := f - A\phi \in L_2(D)\). Since \(u_0 := u - \phi \in H^1_0(D)\) we can apply the result we just proved for the case \(\phi \equiv 0\) to the problem \(A u_0 = f_0\) to get

$$\begin{aligned} \Vert u_0\Vert _{H^2(D)}&\lesssim C_{\scriptscriptstyle \mathrm{semi}}(\omega ) \left( \Vert A u_0 \Vert _{L^2(D)} + \Vert A \phi \Vert _{L^2(D)}\right) \\&\lesssim C_{\scriptscriptstyle \mathrm{semi}}(\omega ) \left( \Vert f\Vert _{L^2(D)} + \Vert a\Vert _{C\,^1(\overline{D})} \, \Vert \phi \Vert _{H^2(D)} \right) , \end{aligned}$$

where in the last step we have used [6, Lemma A.2]. The claim of the Theorem then follows by the triangle inequality.

Remark 5.2

  1. (a)

    The behaviour of the Laplace operator near corners is described in detail in [23, 24]. In particular, in the pure Dirichlet case for convex domains we always get \(\lambda _\varDelta (D) = 1\). For non-convex domains \(\lambda _\varDelta (D) = \min _{j=1}^m \pi /\theta _j\), where \(\theta _j\) is the angle at corner \(S_j\). Hence, \(\lambda _\varDelta (D) > 1/2\) for any Lipschitz polygonal domain.

  2. (b)

    In a similar manner one can prove regularity of \(u\) also for Neumann and mixed Dirichlet/Neumann boundary conditions provided the boundary conditions are compatible, like in our model problem (3.11). For example, in order to apply the same proof technique used here at a point where a Dirichlet and a homogeneous Neumann boundary meet, we can first reflect the problem across the Neumann boundary. Then we apply the above theory on the union of the original and the reflected domain. The regularity for the Laplacian is in general lower in the mixed Dirichlet/Neumann case than in the pure Dirichlet case. In particular, full regularity (i.e. \(\lambda _\varDelta (D) = 1\)) is only possible, if all mixed angles are less than \(\pi /2\). For an arbitrary Lipschitz polygonal domain we can only guarantee \(\lambda _\varDelta (D) > 1/4\).

  3. (c)

    The 3D case is similar, but in addition to singularities at corners (for which the analysis is identical to the above) we also need to consider edge singularities. This is more involved and we refer to [23, §8.2.1] for more details. However, provided \(D\) is convex, we obtain again \(\lambda _\varDelta (D) = 1\) in the pure Dirichlet case.

5.2 Discontinuous coefficients

We now shift our attention to the random coefficient \(a(\omega ,x)\).

In practice, one is often interested in models with discontinuous coefficients, e.g. modelling different rock strata in the subsurface. Such coefficients do not satisfy Assumption A2, and the regularity results from Theorem 2.1 can not be applied directly. However, this loss of regularity is confined to the interface between different strata and it is still possible to prove a limited amount of regularity even globally.

Let us consider (2.1) on a Lipschitz polygonal domain \(D \subset \mathbb{R }^2\) that can be decomposed into disjoint Lipschitz polygonal subdomains \(D_k,k=1,\ldots ,K\). Let \(PC\,^{t}(\overline{D}) \subset L_\infty (D)\) denote the space of piecewise \(C\,^{t}\) functions with respect to the partition \(\{D_k\}_{k=1}^K\) (up to the boundary of each region \(D_k\)). We replace Assumption A2 by the following milder assumption on the coefficient function \(a\):

  • A2*. \(a \in L^{p}(\varOmega ,PC\,^{t}(\overline{D}))\), for some \(0 < t \le 1\) and for all \(p \in (0,\infty )\).

Our regularity results for discontinuous coefficients rely on the following result from [23, 31]. The proof of this result uses the fact that for \(0 \le s<1/2,w \in H^s(D_i)\) if and only if the extension \(\tilde{w}\) of \(w\) by zero is in \(H^s(\mathbb{R ^d})\).

Lemma 5.3

Let \(v \in H^1(D)\) and \(s< 1/2\), and suppose that \(v \in H^{1+s}(D_k)\), for all \(k=1,\ldots ,K\). Then \(v \in H^{1+s}(D)\) and

$$\begin{aligned} \Vert v\Vert _{H^{1+s}(D)} = \Vert v\Vert _{H^{1}(D)} + \sum _{k=1}^K |v|_{H^{1+s}(D_k)} \,. \end{aligned}$$

Thus, we cannot expect more than \(H^{3/2-\delta }(D)\) regularity globally in the discontinuous case. However, as in the case of continuous fields, the regularity of the solution will also depend on the parameter \(t\) in Assumptions A2* and A3 (i.e. on the Hölder/Sobolev regularity of \(a\) and \(f\), respectively), as well as on the behaviour of \(A_\omega \) at any singular points. Since Lemma 5.3 restricts us to \(s < 1/2\) and since \(\lambda _\varDelta (D) > 1/2\) for any Lipschitz polygonal \(D\) in the case of a pure Dirichlet problem, we do not have to worry about corners. Instead we define the set of singular (or cross) points \(\mathcal S ^\times := \{S^\times _\ell :\ell =1,\ldots ,L\}\) to consist of all points \(S^\times _\ell \) in \(D\) where three or more subdomains meet, as well as all those points \(S^\times _\ell \) on \(\partial D\) where two or more subdomains meet. By the same arguments as in §5.1, the behaviour of \(A_\omega \) at these singular points is again fully described by studying transmission problems for the Laplace operator, i.e. elliptic problems with piecewise constant coefficients, locally near each singular point (cf. [9, 30, 31]).

Definition 5.1

Denote by \(T(\alpha _1,\ldots ,\alpha _K)\) the operator corresponding to the transmission problem for the Laplace operator with (constant) material parameter \(\alpha _k\) on subdomain \(D_k,k=1,\ldots ,K\). Let \(0 \le \lambda _{T}(D) \le 1\) be such that \(T(\alpha _1,\ldots ,\alpha _K)\) is a surjective operator from \(H^{ 1+s}(D) \cap H^1_0(D)\) to \(H^{s-1}(D)\), for any choice of \(\alpha _1, \ldots , \alpha _K\) and for \(s \le \lambda _{T}(D),s\not =1/2\). In other words, \(\lambda _{T}(D)\) is a bound on the order of the strongest singularity of \(T(\alpha _1,\ldots ,\alpha _K)\).

Without any assumptions on the partition \(\{D_k\}_{k=1}^K\) or any bounds on the constants \(\{\alpha _k\}_{k=1}^K\) it is in general not possible to choose \(\lambda _T(D)>0\). However, if no more than three regions meet at every interior singular point and no more than two at every boundary singular point, then we can choose \(\lambda _T(D) \le 1/4\). If in addition each of the subregions \(D_k\) is convex, then we can choose any \(\lambda _T(D) < 1/2\), which due to the restrictions in Lemma 5.3 is the maximum we can achieve anyway. See for example [9, 30, 31] for details.

The following is an analogue of Theorem 2.1 on the regularity of the solution \(u\) of (2.1) for piecewise \(C\,^t\) coefficients. All the other results on the finite element convergence error discussed above follow of course again from this.

Theorem 5.1

Let \(D \subset \mathbb{R }^2\) be a Lipschitz polygonal domain and let \(\lambda _T(D) > 0\). Suppose Assumptions A1, A2*,A3 hold with \(t\le 1\). Then, the solution \(u\) of (2.1) is in \(L^p(\varOmega , H^{1+s}(D))\), for \(0 < s < t\) such that \(s \le \lambda _T(D)\) and for all \(p < p_*\).

Proof

Let us first consider \(\phi \equiv 0\) again. Then, the existence of a unique solution \(u(\omega ,\cdot ) \in H^1(D)\) of (2.1) follows again from the Lax-Milgram Theorem, for almost all \(\omega \in \varOmega \). Also note that restricted to \(D_k\) the transmission operator \(T(\alpha _1,\ldots ,\alpha _K) = \alpha _k \varDelta \), for all \(k=1,...,K\). Therefore, using Assumption A2* we can prove as in §5.1 via a homotopy method that \(u(\omega ,\cdot )\) restricted to \(D_k\) is in \(H^{1+s}(D_k)\), for any \(s < t\) and \(s \le \lambda _T(D)\), for almost all \(\omega \in \varOmega \). The result then follows from Lemma 5.3 and an application of Hölder’s inequality. The case \(\phi \not \equiv 0\) follows as in the proof to Theorem 2.1 via a trace estimate.

As an example of a random coefficient that satisfies Assumption A2* for any \(t < 1/2\), we can consider a random field \(a = \exp (g)\) such that \(g|_{D_k} := g_k\), for all \(k=1,\ldots ,K\), where each \(g_k\) is an independent Gaussian random field with mean \(\mu _k \in C^{1/2}(\overline{D}_k)\) and exponential covariance function (2.3). In a similar manner, if we let each \(g_k\) be a Gaussian field with mean \(\mu _k \in C\,^1(\overline{D}_k)\) and Gaussian covariance function (4.4), then Assumption A2* is satisfied for any \(t\le 1\). The mean \(\mu _k(x)\), the variance \(\sigma _k^2\) and the correlation length \(\lambda _k\) can be vastly different from one subregion to another.

5.3 Numerics

A rock formation which is often encountered in applications is a channelised medium. To simulate this, we divide \(D=(0,1)^2\) into 3 horizontal layers, and model the permeabilities in the 3 layers by 2 different log-normal distributions. The middle layer occupies the region \(\{1/3 \le x_2 \le 2/3\}\). The parameters in the top and bottom layer are taken to be \(\mu _1=0,\lambda _1=0.3\) and \(\sigma _1^2=1\), and for the middle layer we take \(\mu _2=4,\lambda _2=0.1\) and \(\sigma _2^2=1\) (assuming no correlation across layers). As a test problem we again choose the flow cell model problem (3.11). Samples from fields with 2-norm exponential covariance are produced using the circulant embedding technique already used in §3.5. Fields with Gaussian covariance are approximated by Karhunen–Loève expansions truncated after \(K^*=170\) terms. The eigenpairs of the covariance operator are computed numerically using a spectral collocation method.

Fig. 5
figure 5

Left Plot of \(\mathbb{E }\big [|u_{h^*} - u_{h} |_{H^1(D)}\big ]\) versus \(1/h\) for model problem (3.11) with 2-norm exponential covariance, with \(\mu =\mu _1=0,\mu _2=4,\lambda =\lambda _1=0.3,\lambda _2=0.1,\sigma ^2=\sigma _1^2=\sigma _2^2=1\) and \(h^*=1/256\). Right Plot of \(\mathbb{E }\big [\Vert u_{h^*} - u_{h} \Vert _{L^2(D)}\big ]\). The gradient of the dash-dotted (resp. dotted) line is \(-1/2\) (resp. \(-1\))

Fig. 6
figure 6

Left Plot of \(\mathbb{E }\big [|u_{h^*} - u_{h} |_{H^1(D)}\big ]\) versus \(1/h\) for model problem (3.11) with Gaussian covariance, with \(\mu =\mu _1=0,\mu _2=4,\lambda =\lambda _1=0.3,\lambda _2=0.1,\sigma ^2=\sigma _1^2=\sigma _2^2=1,h^*=1/256\) and \(K^*=170\). Right Plot of \(\mathbb{E }\big [\Vert u_{h^*} - u_{h} \Vert _{L^2(D)}\big ]\). The gradient of the dash-dotted (resp. dotted and dashed) line is \(-1/2\) (resp. \(-1\) and\(-2\))

Figures 5 and 6 show results for fields with exponential and Gaussian covariance functions, respectively. For comparison, we have added the graphs for the case where there is no “channel”, i.e. where the permeability field is one continuous log-normal field with \(\mu =\mu _1=0, \, \lambda =\lambda _1=0.3\) and \(\sigma ^2=\sigma _1^2=1\). As expected from the global regularity results in Theorem 5.1, we observe the same convergence rates for both the continuous and the discontinuous permeability fields in the case of an exponential covariance in Fig. 5. For the Gaussian covariance, however, we indeed observe slower convergence rates for the layered medium (Fig. 6).

6 Conclusions and further work

Multilevel Monte Carlo methods have the potential to significantly outperform standard Monte Carlo methods in a variety of contexts. In this paper, we considered the application of multilevel Monte Carlo methods to elliptic PDEs with random coefficients, in the practically relevant and technically demanding case of log-normal random coefficients with short correlation lengths, where realisations of the diffusion coefficient have limited regularity and are not uniformly bounded or elliptic. We extended the theory from [6] to cover more difficult model problems, including corner domains and discontinuous means, and we offered one possible remedy for the problem of correlation length dependent coarse mesh size restrictions in the standard multilevel estimator. This was done by using level-dependent truncations of the Karhunen–Loève expansion of the coefficient, resulting in smoother approximations of the coefficient on coarser levels. It is possible to achieve smoother approximations of the random coefficient on the coarser grids in a similar way using the circulant embedding method used in §3, and this is the subject of a future publication. Another area of future research is the adaptive choice of spatial grids in the multilevel estimator.