1 Introduction

Mathematical models of partial differential equations (PDEs) with random input receive increasing attention in recent years. In particular, linear diffusion equations with random coefficients to model random media are considered and respective system responses or quantities of interest are studied, cf. [12, 13, 15, 16, 27, 36]. Generally, the random input is function valued and will be referred to as random field. In the numerical analysis of these problems constants in error estimates become random variables and may have a distribution with unbounded support. In the case that quantities of interest are moments of system responses, numerical analysis has been performed if certain integrability conditions are satisfied by the random system input. There, the case of Gaussian random fields (GRFs) as random inputs is frequently considered. Multilevel techniques such as multilevel Monte Carlo (MLMC) have been established to accelerate the approximation of moments. There, sample numbers are chosen in a greedy technique to optimize the error versus the required computational cost, cf. [3, 13, 21, 22, 24, 36]. The requirement of MLMC to be efficient is a small variance between higher levels or to put it differently, the strong error between higher levels has to converge with an available rate. In the case that finite element (FE) discretizations are used and the random input is unbounded such as GRFs, the numerical analysis and numerical experiments presented in these references relied on sparse direct solvers, cf. [13, 27, 36]. Hence, the applicability of iterative methods which are known to be fast for deterministic problems is of natural interest, e.g., so-called full multigrid is well-known to have optimal complexity in the case of Poisson’s equation, cf. [2].

In this paper we establish rigorously the strong convergence of a wide class of standard iterative solvers, which yields essentially optimal computational cost estimates also in the case of GRF input with low spatial regularity. As an application, MLMC is discussed with deterministic, essentially optimal estimates of the computation cost, which was previously unknown. By optimal, we mean that solutions of linear systems under consideration with dimension \({\mathcal {O}}(N)\) may be approximated in computational cost \({\mathcal {O}}(N)\) consistently with the overall discretization error. In the computational uncertainty quantification (UQ) literature, iterative solvers have been considered mostly in the context of stochastic collocation and stochastic Galerkin, cf. [18, 35, 38]. A particular variant of MLMC with multigrid for GRF inputs has been proposed in [30] and computational experiments have been performed.

The present manuscript analyzes the applicability and strong convergence of well established iterative methods for operator equations with unbounded random input in a general setting. Let \({\mathcal {A}}\) be a random, continuous linear operator from \({\mathcal {V}}\) to \({\mathcal {V}}^*\) on a probability space \((\varOmega , {\mathcal {F}}, {\mathbb {P}})\) that is \({\mathbb {P}}\)-almost surely (\({\mathbb {P}}\)-a.s.) boundedly invertible, where \({\mathcal {V}}\) is a Hilbert space and \({\mathcal {V}}^*\) its dual space. Let its expectation \({{\,\mathrm{{\mathbb {E}}}\,}}({\mathcal {A}})\) be well-defined. In the present paper we are interested in the numerical analysis of approximations of the solution to the random linear equation that for \({\mathbb {P}}\)-almost every (\({\mathbb {P}}\)-a.e.) \(\omega \in \varOmega \)

(1)

where is deterministic, in the strong sense, by iterative methods such as multigrid and domain decomposition methods. This can be rewritten in variational form to find such that for \({\mathbb {P}}\)-a.e. \(\omega \)

Let us assume that there are strictly positive random variables \({\hat{a}}\) and \({\check{a}}\) such that for \({\mathbb {P}}\)-a.e. \(\omega \)

(2)

We will be particularly interested in the case that \({\hat{a}}\) and \({\check{a}}^{-1}\) are unbounded random variables. This is for example the case for elliptic PDEs with lognormal coefficients. Thus, preconditioned finite dimensional discretizations suffer from random condition numbers that are unbounded and respectively iterative methods contract with random contraction numbers, which may have realizations arbitrarily close to one with positive probability. At first sight one may overcome this with random iteration numbers specified by a threshold of residual errors with the disadvantage of the occurrence of large iteration numbers, when samples of the random contraction number are close to one. Also, bounds on the computational work for such strategies would be probabilistic. A main new contribution of this paper is that deterministic iteration numbers exist that allow for strong convergence, i.e., convergence of well-known iterative methods in the \(L^q(\varOmega ;{\mathcal {V}})\)-norm, \(q\in [1,+\infty )\), such as multigrid, multilevel preconditioned conjugate gradient, or domain decomposition. This is possible due to tail bounds of the random contraction numbers, which for example are satisfied in the important case of elliptic PDEs with lognormal coefficients. As a consequence, deterministic, essentially optimal complexity bounds are implied for the solution of resulting random linear systems when multigrid or domain decomposition methods are applied. This enables also rigorous, deterministic, essentially optimal complexity bounds for MLMC approximations of mean fields, which was previously unknown. Assumptions on the computational cost of PDE solvers that were made in previous papers [13, 27, 36] to obtain complexity bounds of MLMC and partly to calibrate the MLMC estimator are pervaded by the new theory presented in this manuscript. We will treat the case that is symmetric for \({\mathbb {P}}\)-a.e. \(\omega \). However, the presented theory can be extended to certain non-symmetric operators, see for example [6, Sect. 11].

In Sect. 2, we will review iterative methods as they were formulated in [39] in order to discuss various multilevel method in an unified framework. It will also be highlighted which parts in the framework and in the iterative methods are random. As a main result, we will develop integrability conditions on the random contraction numbers in Sect. 3 that result in sufficiently strong tail bounds in order to ensure strong convergence in the setting of multilevel discretizations of (1). The integrability conditions posed in Sect. 3 are analyzed for several multilevel methods such as multigrid, the so-called BPX preconditioner, cf. [10], and domain decomposition methods in Sects. 4 and 5. An important application of the presented theory are lognormal diffusion problems, which are briefly reviewed in Sect. 6. In particular deterministic bounds on the computational work without assumptions on the PDE solver of MLMC are implied, which is concluded in Sect. 7.1. Numerical experiments with GRF input are presented in Sect. 7.2 and confirm the theoretical analysis.

2 Iterative methods

In this section, we review iterative methods to approximate solutions to linear equations on a finite dimensional inner product space \((V,(\cdot ,\cdot ))\), where \(\Vert \cdot \Vert \) denotes the norm that is induced by \((\cdot ,\cdot )\). Let us consider the random linear equation that for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} A(\omega )u(\omega )=f, \end{aligned}$$
(3)

where \(A:V\rightarrow V\) is a random linear operator that is \({\mathbb {P}}\)-a.s. symmetric positive definite with respect to \((\cdot ,\cdot )\). Hence, (3) is \({\mathbb {P}}\)-a.s. uniquely solvable. Note that we will often omit dependencies of random quantities on \(\omega \) for notational convenience. Let us denote the bilinear form that is induced by A by \((\cdot ,\cdot )_A\) and let \(\lambda _{\max }(A)\), \(\lambda _{\min }(A)\) denote the maximal and minimal eigenvalue of A. The condition number of A is denoted by \(\kappa (A)\) and \(\rho (A)\) denotes the spectral radius. This notation will also be used for other linear operators that occur. Note that since A is random, \(\lambda _{\max }(A)\), \(\lambda _{\min }(A)\), \(\kappa (A)\), and \(\rho (A)\) are random variables. In particular, the review article [39] enables the discussion of multigrid and domain decomposition methods in an unified framework. These methods allow in some cases for optimal preconditioning or uniform contraction numbers with respect to the dimension of V. In this section, we will mainly follow [39] and introduce abstract algorithms, which in later sections will be used as BPX or additive Schwarz preconditioner, symmetric multigrid, and overlapping domain decomposition method. We will also highlight which of the occurring objects in this review section are random.

Since A is \({\mathbb {P}}\)-a.s. symmetric positive definite, the conjugate gradient (CG) method implies after \(n\in {\mathbb {N}}\) iterations with initial guess \(U^0\) the error bound that for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} \Vert u(\omega ) - U^n(\omega )\Vert _{A(\omega )} \le 2\left( \frac{\sqrt{\kappa (A(\omega ))}-1}{\sqrt{\kappa (A(\omega ))}+1}\right) ^n \Vert u(\omega ) - U^0(\omega )\Vert _{A(\omega )}. \end{aligned}$$

Since the random condition number \(\kappa (A)\) may depend on the dimension of the linear space V, we consider the preconditioned linear system that for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} B(\omega )A(\omega ) u(\omega ) = B(\omega )f, \end{aligned}$$
(4)

where the random linear operator B is chosen to be symmetric positive definite with respect to \((\cdot ,\cdot )\). The random operator B shall satisfy that \({\mathbb {P}}\)-a.s. \(\kappa (BA)\le \kappa (A)\), which then accelerates the convergence of the CG method. The combination of preconditioning and CG will be referred to as preconditioned conjugate gradient (PCG) method.

Another method to be discussed is the linear iterative method of the form

$$\begin{aligned} U^{n+1}(\omega ) = U^n(\omega ) + B(\omega )(f - A(\omega ) U^n(\omega )), \quad n\in {\mathbb {N}}_0, \end{aligned}$$
(5)

for \({\mathbb {P}}\)-a.e. \(\omega \), where B is a suitable random operator that is not necessarily symmetric and \(U^0\) is given. Note that this linear iterative methods converges \({\mathbb {P}}\)-a.s. if \(\Vert {{\,\mathrm{Id}\,}}- BA\Vert _A <1\), \({\mathbb {P}}\)-a.s. Alternatively, one could also introduce a relaxed version of B with relaxation parameter in \((0,2/\rho (BA))\) to guarantee convergence with random contraction number \((\kappa (BA) -1)/(\kappa (BA)+1)\), cf. [39, Proposition 2.3]. Also we remark that generally the contraction number of the PCG method is smaller, cf. [39, Proposition 2.2], which is why one may say that PCG accelerates convergence.

Lemma 1

Let \({\tilde{A}}:V \rightarrow V\) be a symmetric positive definite operator with respect to \((\cdot ,\cdot )\) and let \({\tilde{B}}:V\rightarrow V\) be a symmetric positive definite preconditioner with respect to \((\cdot ,\cdot )\) for \({\tilde{A}}\). If there exists positive random variables \(c_0,c_1\) such that for \({\mathbb {P}}\)-a.e. \(\omega \in \varOmega \)

$$\begin{aligned} c_0(\omega ) (v,v)_{{\tilde{A}}} \le ( v,v )_{A(\omega )} \le c_1(\omega ) (v,v)_{{\tilde{A}}} \quad \forall v\in V, \end{aligned}$$

then for \({\mathbb {P}}\)-a.e. \(\omega \in \varOmega \)

$$\begin{aligned} \kappa ({\tilde{B}}A(\omega )) \le \kappa ({\tilde{B}}{\tilde{A}}) \frac{c_1(\omega )}{c_0(\omega )}. \end{aligned}$$

Proof

Since \({\tilde{B}}{\tilde{A}}\) is symmetric positive definite with respect to \((\cdot ,\cdot )_{{\tilde{A}}}\), it holds that

$$\begin{aligned} \kappa ({\tilde{B}}{\tilde{A}}) = \frac{\lambda _{\max }({\tilde{B}}{\tilde{A}})}{ \lambda _{\min }({\tilde{B}}{\tilde{A}})} \end{aligned}$$

and for every \(v\in V\),

$$\begin{aligned} |\lambda _{\max }({\tilde{B}}{\tilde{A}})|^{-1} (v,v)_{{\tilde{A}}} \le ({\tilde{B}}^{-1} v, v) \le |\lambda _{\max }({\tilde{B}}{\tilde{A}})|^{-1} (v,v)_{{\tilde{A}}}. \end{aligned}$$

Thus, by the third equivalence in [39, Lemma 2.1],

$$\begin{aligned} \lambda _{\max }({\tilde{B}}{\tilde{A}})^{-1} (A v, v) \le ({\tilde{B}}^{-1} v,v) \le \lambda _{\min }({\tilde{B}}{\tilde{A}})^{-1} (A v, v). \end{aligned}$$

Then, the assumption of the lemma implies that

$$\begin{aligned} c_1^{-1}\lambda _{\max }({\tilde{B}}{\tilde{A}})^{-1} (A v, v) \le ({\tilde{B}}^{-1} v,v) \le c_0^{-1}\lambda _{\min }({\tilde{B}}{\tilde{A}})^{-1} (A v, v), \end{aligned}$$

which implies the claim by the condition number estimate in [39, Lemma 2.1]. \(\square \)

For \(J\in {\mathbb {N}}\), let us assume a decomposition of V in subspaces \((V_j: j=1,\dots ,J)\), i.e., it holds that \(V_j\subset V\), \(j=1,\dots ,J\), and

$$\begin{aligned} V = \sum _{j=1}^J V_j. \end{aligned}$$
(6)

We define the orthogonal projections \(Q_j,P_j:V\rightarrow V_j\) for every \(v\in V\) by

$$\begin{aligned} (Q_j v, w_j) := (v,w_j), \quad (A P_j v,w_j) := (Av,w_j) \quad \forall w_j\in V_j \end{aligned}$$

and the operator \(A_j:V_j\rightarrow V_j\) for every \(v\in V_j\) by

$$\begin{aligned} (A_j v, w_j) := (A v,w_j) \quad \forall w_j\in V_j, \end{aligned}$$

\(j=1,\dots ,J\). Consequently it holds for every \(j=1,\dots ,J\) that \(A_j P_j = Q_j A\), which implies that if u is the random solution of (3), then \(u_j:= P_j u\) satisfied for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} A_j(\omega ) u_j(\omega ) = f_j, \end{aligned}$$

where \(f_j := Q_j f\), \(j=1,\dots ,J\). Note that \(A_j,P_j\) are random whereas \(Q_j\) is deterministic, \(j=1,\ldots ,J\). Let \(R_j:V_j\rightarrow V_j\), \(j=1,\dots ,J\), be random symmetric positive definite operators with respect to \((\cdot ,\cdot )\), which shall approximate the inverse of \(A_j\) respectively. Thus,

$$\begin{aligned} B^a:=\sum _{j=1}^J R_jQ_j \end{aligned}$$
(7)

is also symmetric positive definite, cf. [39, Lemma 3.1], and shall be used as preconditioner for A.

Algorithm 1

Apply the PCG method to (4) with the random preconditioner \(B^a\) defined in (7).

A multilevel iteration can be defined under the assumption that there are nested subspaces satisfying

$$\begin{aligned} M_1\subset M_2 \subset \cdots \subset M_J = V, \end{aligned}$$
(8)

where we also define operators \({\hat{Q}}_j,{\hat{P}}_j : M_J \rightarrow M_j\) and \({\hat{A}}_j:M_j\rightarrow M_j\) for \(j=1,\dots ,J\) respectively. The random multilevel iterations \({\hat{B}}^s_j:M_j\rightarrow M_j\), \(j=1,\dots ,J\) with parameters \(m,k\in {\mathbb {N}}\) will be defined iteratively. Set \({\hat{B}}^s_1:= {\hat{A}}_1^{-1}\) and assume that \({\hat{B}}^s_{j-1}:M_{j-1}\rightarrow M_{j-1}\) is already defined. For every \(g\in M_j\) the multigrid V-cycle iteration \({\hat{B}}^s_j g \) is defined by:

  1. 1.

    \(v := {\hat{R}}_j g\)

  2. 2.

    \(w := v + {\hat{B}}^s_{j-1}{\hat{Q}}_{j-1}(g - {\hat{A}}_j v)\)

  3. 3.

    \({\hat{B}}^s_j g := w + {\hat{R}}_j(g - {\hat{A}}_j w ) \)

Algorithm 2

Let \(U^0\) be given, then \(U^n\) is defined by the linear iteration in (5) with \(B={\hat{B}}_J^s\).

According to [9, Eq. (2.14)] the residual operator is given by

$$\begin{aligned} {\hat{E}}^s_J = ({{\,\mathrm{Id}\,}}- {\hat{B}}^s_J {\hat{A}}_J) = ({{\,\mathrm{Id}\,}}- {\hat{T}}_J)\cdots ({{\,\mathrm{Id}\,}}- {\hat{T}}_1) ({{\,\mathrm{Id}\,}}- {\hat{T}}_1^*)\cdots ({{\,\mathrm{Id}\,}}- {\hat{T}}_J^*), \end{aligned}$$

where

$$\begin{aligned} {\hat{T}}_j := {\hat{R}}_j{\hat{A}}_j{\hat{P}}_j. \end{aligned}$$

For the convergence of Algorithm 1 we have to prove bounds of \(\kappa (BA)\), whereas for Algorithm 2 we have to show that \(\Vert {\hat{E}}^s_J\Vert _A \le \delta \) for some random variable \(\delta \) taking values in (0, 1) \({\mathbb {P}}\)-a.s. The additive preconditioner in Algorithm 1 will in later applications be the BPX, if the nested decomposition (8) is considered, or the additive Schwarz preconditioner. Algorithm 2 is multigrid by symmetric V-cycle.

In [39], assumptions are introduced involving two parameters \(K_0\) and \(K_1\) that allow to discuss their convergence. Here \(K_0\) and \(K_1\) are positive random variables. We recall the two conditions [39, Eqs. (4.2) and (4.3)]. There exists a positive random variable \(K_0\) such that for every \(v\in V\), there exists a decomposition \(v=\sum _{j=1}^J v_j\) with \(v_j \in V_j\), \(j=1,\dots ,J\), such that for \({\mathbb {P}}\)-a.e. \(\omega \)

figure a

There exists a positive random variable \(K_1\) such that for \({\mathbb {P}}\)-a.e. \(\omega \) and for every \(S\subset \{1,\dots ,J\}\times \{1,\dots ,J\}\) and \(v_j, w_j \in V\), \(j=1,\dots ,J\),

figure b

Theorem 2

Let assumptions (A.1) and (A.2) be satisfied. Let \(B^a\) be the random preconditioner given by (7). Then, for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} \kappa (B^a(\omega ) A(\omega ) ) \le K_0(\omega ) K_1(\omega ). \end{aligned}$$
(9)

The residual operator \({\hat{E}}^s_J\) from Algorithm 2 satisfies for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} \Vert {\hat{E}}^s_J(\omega )\Vert _{A(\omega )} \le 1 - \frac{2 - \nu }{K_0(\omega )(1+K_1(\omega ))^2}, \end{aligned}$$
(10)

where \(\nu \ge \max _{j=1,\dots ,J}\{\rho (R_j(\omega ) A_j(\omega ))\}\).

Proof

The first assertion is explicitly [39, Theorems 4.1]. The second assertion follows by [39, Theorems 4.4 and Proposition 3.5]. \(\square \)

The random parameters \(K_0\) and \(K_1\) can be estimated in some cases with [39, Lemmas 4.5, 4.6, and 4.7]. Let us state a specific case of [39, Lemma 4.6] as the following lemma.

Lemma 3

Let \(K_3\) be a positive random variable that is independent of J and let \(\gamma \in (0,1)\) be deterministic. If for \({\mathbb {P}}\)-a.e. \(\omega \) and for every \(v,w\in V\) and every \(i,j\in \{1,\dots ,J\}\) it holds that

$$\begin{aligned} (T_i(\omega ) v, T_j(\omega ) w)_{A(\omega )} \le K_3(\omega ) \nu \gamma ^{|i-j|} (T_i(\omega ) v, v)_{A(\omega )}^{1/2} (T_j(\omega ) w, w)_{A(\omega )}^{1/2}, \end{aligned}$$
(11)

then

$$\begin{aligned} K_1(\omega ) \le K_3(\omega )\nu \frac{2}{1-\gamma }. \end{aligned}$$

Note that (11) is often called strengthened Cauchy–Schwarz inequality.

The operator \({\hat{B}}^s_J\) can also be used as preconditioner in a PCG method to accelerate convergence. The respective condition number can be bounded by the following proposition in combination with Theorem 2.

Proposition 4

Let \(\delta \) be a random variable taking values in (0, 1). If for \({\mathbb {P}}\)-a.e. \(\omega \),

$$\begin{aligned} \Vert {{\,\mathrm{Id}\,}}- {\hat{B}}^s_J(\omega )A(\omega ) \Vert _{A(\omega )} \le \delta (\omega ), \end{aligned}$$

then \(\kappa (B(\omega )A(\omega ))\le 1/(1-\delta (\omega )) \).

Proof

Since the operator \({{\,\mathrm{Id}\,}}-{\hat{B}}^s_J A\) can be written as \(E^* E\), where \(E^*\) denotes the adjoint operator of E, for some appropriate E that was discussed above, it holds that \((({{\,\mathrm{Id}\,}}- {\hat{B}}^s_J A)v,v)_A = \Vert E^*v\Vert _A^2 \ge 0\) for every \(v\in V\). Hence,

$$\begin{aligned} ({\hat{B}}^s_JAv,v)_A \le (v,v)_A \quad \forall v\in V. \end{aligned}$$
(12)

The assumption implies that

$$\begin{aligned} (1-\delta ) (v,v)_A \le ({\hat{B}}^s_J Av,v)_A \quad \forall v\in V, \end{aligned}$$
(13)

which then implies the assertion. \(\square \)

3 Strong convergence of iterative methods

We recall the possibly infinite dimensional Hilbert space \({\mathcal {V}}\) and let \(({\mathcal {H}},(\cdot ,\cdot ))\) be another Hilbert space such that the embedding \({\mathcal {H}}\subset {\mathcal {V}}\) is continuous. Let \(({\mathcal {V}}_\ell : \ell \in {\mathbb {N}})\) be a nested sequence of finite dimensional subspaces of \({\mathcal {V}}\), i.e., \({\mathcal {V}}_1 \subset {\mathcal {V}}_2\subset \dots \subset {\mathcal {V}}\). Let the finite dimensional spaces \({\mathcal {V}}_\ell \) have dimensions \(N_\ell :=\dim ({\mathcal {V}}_\ell )\), \(\ell \in {\mathbb {N}}\). Similar to Sect. 2, we introduce random operators \({\mathcal {A}}_\ell \), \({\mathcal {P}}_\ell \), \(R_\ell \), \(T_\ell \), and \(Q_\ell \) with respect to the inner product \((\cdot ,\cdot )\) of \({\mathcal {H}}\), \(\ell \in {\mathbb {N}}\). The inner product \((\cdot ,\cdot )_{{{\,\mathrm{{\mathbb {E}}}\,}}({\mathcal {A}})}\) on \({\mathcal {V}}\) is given by

The inner product \((\cdot ,\cdot )_{{\mathcal {A}}}\) with respect to the random symmetric operator \({\mathcal {A}}\) will also be considered.

For every \(\ell \in {\mathbb {N}}\), we consider the variational form of (1) on the subspace \({\mathcal {V}}_\ell \). For every \(\ell \in {\mathbb {N}}\), this gives rise to the following linear equation. We seek to find such that for \({\mathbb {P}}\)-a.e. \(\omega \)

This is uniquely solvable by the Lax–Milgram lemma using (2).

We assume that the random solution of the problem in (1) is approximated by the Galerkin approximations , \(\ell \in {\mathbb {N}}\). such that for every \(\ell \in {\mathbb {N}}\) and for \({\mathbb {P}}\)-a.e. \(\omega \)

(14)

We apply an iterative method such as Algorithm 1 or 2 with random contraction number \(\delta \), that is independent of \(\ell \), with n iterations and solve exactly on level 1, i.e., starting with \(U^n_{\ell -1}\) as initial guess for level \(\ell \) we carry out n iterations of the algorithm with a random contraction number \(\delta \) that takes values in (0, 1). Hence, we obtain a sequence \(U_\ell ^n\), \(\ell \in {\mathbb {N}}\), where we set . This multilevel process was used in [2, Sect. 3] to derive optimal complexity bounds for the solution of certain linear equations and is also commonly referred to as “full multigrid”.

Lemma 5

Let us assume that satisfies (14) for some \(s>0\) and let the sequence \((h_\ell ,\ell \in {\mathbb {N}})\) satisfy that \(h_{\ell }=\gamma ^{\ell } h_0\), \(\ell \in {\mathbb {N}}\), for some fixed \(h_0>0\) and \(\gamma \in (0,1)\). Then, for every \(\ell \in {\mathbb {N}}\), \(U^n_\ell \) and for \({\mathbb {P}}\)-a.e. \(\omega \)

Proof

The argument is for example given in [7, Chapter 10]. \(\square \)

For any Banach space \((B,\Vert \cdot \Vert _B)\) and any \(p\in [1,+\infty )\), let us denote the space of strongly measurable mappings \(X:\varOmega \rightarrow B\) such that \(\Vert X\Vert _B^p\) is integrable with respect to the probability measure \({\mathbb {P}}\) by \(L^p(\varOmega ;B)\). For \(B={\mathbb {R}}\), we simply write \(L^p(\varOmega )\).

Theorem 6

Let the assumptions of Lemma 5 be satisfied. Let us assume that \(({\mathcal {C}}/ \sqrt{{\check{a}}})\in L^{p}(\varOmega )\) for some \(p\in [1,+\infty )\), where \({\check{a}}\) is defined in (2) and \({\mathcal {C}}\) is given in (14). Further, assume that \(1/(1-\delta )\in L^{p'}(\varOmega )\) for some \(p'\in [1,+\infty )\). For every \(\eta \in (0,1)\), every deterministic number of iterations \(n\in {\mathbb {N}}\), \(q\in [1,p]\), and \(r:= p'(p-q)/(pq)\) it holds that for every \(\ell \ge 2\)

where

$$\begin{aligned} {\mathcal {C}}_{r} := 2 \max \left\{ \frac{\Vert {\mathcal {C}}/ \sqrt{{\check{a}}}\Vert _{L^q(\varOmega )}}{1-\eta }, \Vert {\mathcal {C}}/ \sqrt{{\check{a}}}\Vert _{L^p(\varOmega )} \frac{h_1^s}{1-\gamma ^{s}} \left\| \frac{1}{1-\delta }\right\| _{L^{p'}(\varOmega )}^{r}\log \left( \frac{1}{\eta \gamma ^s}\right) ^{r}\right\} . \end{aligned}$$

Proof

The idea of the proof is to decompose the probability space into \(\varOmega = \varOmega _n \cup (\varOmega _n)^{\mathsf {c}}\), where

$$\begin{aligned} \varOmega _n := \left\{ \omega \in \varOmega : \delta (\omega )^n < \eta \gamma ^s\right\} \end{aligned}$$

and therefore \((\varOmega _n)^{\mathsf {c}} = \{\omega \in \varOmega : \delta (\omega )^n \ge \eta \gamma ^s\}\). Note that both sets are measurable. For notational convenience, we omit \(\omega \) in the following when discussing subsets of \(\varOmega \). Our goal is to show, how the probability of \((\varOmega _n)^{\mathsf {c}}\) tends to zero for increasing values of \(n\in {\mathbb {N}}\), to be able to justify the applicability of a classical argument on the sets \(\varOmega _n\), \(n\in {\mathbb {N}}\). Naturally, \(\varOmega _{n_1}\subset \varOmega _{n_2}\) for every choice of natural numbers \(n_1\le n_2\). We recall a version of the Markov inequality, cf. [5, Eq. (2.1)], i.e., for a random variable X taking values in \((1,+\infty )\) and a non-decreasing function \(\phi \) such that \(\phi (t)>0\) it holds that

$$\begin{aligned} {\mathbb {P}}(X \ge t)\le {\mathbb {P}}(\phi (X)\ge \phi (t)) \le \frac{{{\,\mathrm{{\mathbb {E}}}\,}}(\phi (X))}{\phi (t)}, \quad t\in (1,+\infty ). \end{aligned}$$

We select the non-decreasing positive function \(\phi (t) := t^{p'}\). Then, for \(X=1/(1-\delta )\) and every \(t\in (1,+\infty )\) it holds that

$$\begin{aligned} {\mathbb {P}}\left( \frac{1}{1-\delta } \ge t \right) \le {{\,\mathrm{{\mathbb {E}}}\,}}\left( \left( \frac{1}{1-\delta }\right) ^{p'}\right) \frac{1}{t^{p'}} . \end{aligned}$$
(15)

Since for every \(r\in (0,1)\), \(\{\delta \ge r\} = \{1/(1-\delta ) \ge 1/(1-r)\}\), we conclude that

$$\begin{aligned} {\mathbb {P}}((\varOmega _n)^{\mathsf {c}}) = {\mathbb {P}}\left( \delta ^n \ge \eta \gamma ^s\right) = {\mathbb {P}}\left( \delta \ge \left( \eta \gamma ^s\right) ^{1/n}\right) = {\mathbb {P}}\left( \frac{1}{1-\delta } \ge \frac{1}{1-\left( \eta \gamma ^s\right) ^{1/n}}\right) . \end{aligned}$$

We observe that the function \(x\mapsto (1-(\eta \gamma ^s)^{1/x}) x\) from \((1,+\infty )\) to \((0,+\infty )\) is increasing. Since the rule of L’Hospital implies that \(\lim _{x\rightarrow +\infty } (1-(\eta \gamma ^s)^{1/x}) x = \log (1 / (\eta \gamma ^s))\), we conclude that for every \(n\in {\mathbb {N}}\)

$$\begin{aligned} 1-\left( \eta \gamma ^s\right) ^{1/n} \le \log \left( \frac{1}{\eta \gamma ^s}\right) \frac{1}{n}. \end{aligned}$$

For every \(n\in {\mathbb {N}}\), we choose \(t:= 1/(1- (\eta \gamma ^s)^{1/n})\) in (15), and conclude that for every \(n\in {\mathbb {N}}\) it holds that

$$\begin{aligned} {\mathbb {P}}((\varOmega _n)^{\mathsf {c}}) \le {{\,\mathrm{{\mathbb {E}}}\,}}\left( \left( \frac{1}{1-\delta }\right) ^{p'}\right) \log \left( \frac{1}{\eta \gamma ^s}\right) ^{p'} \left( \frac{1}{n}\right) ^{p'}. \end{aligned}$$
(16)

Hence, we have established estimates for the probability of the sets \((\varOmega _n)^{\mathsf {c}}\), \(n\in {\mathbb {N}}\). We apply Lemma 5, (2), and decompose the probability space into \(\varOmega = \varOmega _n \cup (\varOmega _n)^{\mathsf {c}}\) to obtain that for q as in the statement of the theorem

Since on \(\varOmega _n\) holds that \(\gamma ^{-s}\delta ^n<\eta \), we obtain with a geometric series argument that

$$\begin{aligned} {{\,\mathrm{{\mathbb {E}}}\,}}\left( \left( 2\frac{{\mathcal {C}}}{\sqrt{{\check{a}}}}\sum _{k=0}^{\ell -1} (\gamma ^{-s}\delta ^n)^k \right) ^q \mathbb {1}_{\varOmega _n}\right) h^{sq}_\ell \le 2^q {{\,\mathrm{{\mathbb {E}}}\,}}\left( \left( \frac{{\mathcal {C}}}{\sqrt{{\check{a}}}}\right) ^q\right) \left( \frac{1}{1-\eta }\right) ^q h^{sq}_\ell . \end{aligned}$$

The relation \(h_\ell = \gamma ^{\ell } h_0\), \(\ell \in {\mathbb {N}}\), implies that for every \(\ell \ge 2\) it holds that \(\sum _{k=0}^{\ell -1} h_{\ell -k}^s \le \sum _{\ell \ge 1} \gamma ^{\ell s} h_0 = h_1^s/(1-\gamma ^{s})\). The Hölder inequality with \(r_1=p/q\) and \(r_2=p/(p-q)\) implies with the tail bound of \(\delta ^n\) in (16) that

$$\begin{aligned} \begin{aligned}&{{\,\mathrm{{\mathbb {E}}}\,}}\left( \left( 2\frac{{\mathcal {C}}}{\sqrt{{\check{a}}}}\sum _{k=0}^{\ell -1} h^s_{\ell -k}\delta ^{nk} \right) ^q \mathbb {1}_{(\varOmega _n)^{\mathsf {c}}}\right) \\&\quad \le 2^q \left\| \frac{{\mathcal {C}}}{\sqrt{{\check{a}}}}\right\| _{L^p(\varOmega )}^q \left( \frac{h_1^{s}}{1-\gamma ^{s}}\right) ^q {{\,\mathrm{{\mathbb {E}}}\,}}\left( \mathbb {1}_{(\varOmega _n)^{\mathsf {c}}}\right) ^{1/r_2} \\&\quad \le 2^q \left\| \frac{{\mathcal {C}}}{\sqrt{{\check{a}}}}\right\| _{L^p(\varOmega )}^q \left( \frac{h_1^{s}}{1-\gamma ^{s}}\right) ^q \left\| \frac{1}{1-\delta }\right\| _{L^{p'}(\varOmega )}^{rq} \log \left( \frac{1}{\eta \gamma ^s}\right) ^{rq} \left( \frac{1}{n}\right) ^{rq}, \end{aligned} \end{aligned}$$

which finishes the proof of the theorem. \(\square \)

Remark 7

Let X be a random variable with values in (0, 1) such that \(1/(1-X)\in L^p(\varOmega )\) for some \(p\in [1,+\infty )\), then (16) implies that for every \(\eta <1\) and \(n\in {\mathbb {N}}\)

$$\begin{aligned} {\mathbb {P}}\left( X^n \ge \eta \right) \le \left\| \frac{1}{1-X}\right\| _{L^p(\varOmega )}^p \log \left( \eta ^{-1}\right) ^p n^{-p}. \end{aligned}$$

We illustrate the assumptions in Theorem 6 on \({\mathcal {C}}\) and \({\check{a}}\) in the following example.

Example 8

Consider the random coefficient function a given by \(a(\omega ,x)=\exp (y(\omega )\psi (x))\) for a.e. \(x\in D\), \({\mathbb {P}}\)-a.e. \(\omega \in \varOmega \), where \(\psi \in W^{1,\infty }(D)\) and y is standard normally distributed. It is easy to see that \(\Vert {\check{a}}^{-1}\Vert _{L^p(\varOmega )} \le \Vert a\Vert _{L^p(\varOmega ; L^\infty (D))}\le \exp (p \Vert \psi \Vert _{L^\infty (D)}^2/2)\) for every \(p\in [1,+\infty )\), where \({\check{a}}={\mathrm{ess\, inf}}_{x\in D}\{a(x)\}\).

Let us consider the elliptic Dirichlet problem \({\mathcal {A}}u:= -\nabla \cdot (a\nabla u)=f\) on a convex polygon D with \(f\in L^2(D)\), which has a unique solution \(u:\varOmega \rightarrow H^1_0(D)\) that satisfies \(\Vert u\Vert _{H^1_0(D)} \le \Vert f\Vert _{H^{-1}(D)}/{\check{a}}\), \({\mathbb {P}}\)-a.s. The Sobolev spaces are denoted by \(H^k(D)\), \(k\in {\mathbb {N}}\), and \(H^1_0(D):=\{v\in H^1(D) : v\vert _{\partial D} =0 \}\) with dual space \(H^{-1}(D)\). Since \(\psi \in W^{1,\infty }(D)\), we may write \(-\varDelta u = f/a +y \nabla \psi \cdot \nabla u\). Since the Dirichlet Laplacian is boundedly invertible from \(L^2(D)\) to \(H^2(D)\cap H^1_0(D)\), \(\Vert u\Vert _{H^2(D)}\le C{\check{a}}^{-1} ( |y|\Vert \psi \Vert _{W^{1,\infty }(D)} +1)\Vert f\Vert _{L^2(D)}\) \({\mathbb {P}}\)-a.s., where the constant C neither depends on a nor on f. Then, Céa’s lemma implies with a standard approximation property in FE spaces that (14) holds with \({\mathcal {C}}= C\sqrt{\Vert a\Vert _{L^\infty (D)}}{\check{a}}^{-1} ( |y|\Vert \psi \Vert _{W^{1,\infty }(D)} +1)\Vert f\Vert _{L^2(D)}\) and \(s=1\). By a multiple application of the Cauchy–Schwarz inequality, we conclude that \(({\mathcal {C}}/ \sqrt{{\check{a}}})\in L^{p}(\varOmega )\) for every \(p\in [1,+\infty )\).

The case \(p'=+\infty \) in Theorem 6 is trivial, since then the random contraction number may be upper bounded to be uniformly strictly less than one, i.e., if \(p'=+\infty \) in Theorem 6, then \(\hbox {ess } \sup _{\omega \in \varOmega }\delta (\omega )<1\). In this case, the standard theory applies with \({\overline{\delta }}=\hbox {ess } \sup _{\omega \in \varOmega }\delta (\omega )<1\).

4 Multigrid methods

Here, we provide sufficient conditions under which the strengthened Cauchy–Schwarz inequality holds with explicit dependence on the operator \({\mathcal {A}}\). This will allow us to show \(L^q(\varOmega )\) bounds of the condition numbers and tail bounds of the random contraction number in order to apply the strong error bounds from Theorem 6.

We will provide a proof in the case of a random, symmetric elliptic differential operators. To be specific, let \(H^1_0(D)\), \(H^{s}(D)\), \(s\in [-1,2]\), be the Sobolev–Slobodeckij spaces for some polytopal domain \(D\subset {\mathbb {R}}^d\), \(d\ge 1\) arbitrary, such that \(H^{-s}(D)\) is the dual space of \(H^s(D)\), \(s\in [0,1]\), and \( H^0(D)=L^2(D)\). The reader is referred to [23, Chapter 1] for details on Sobolev spaces. We consider the class of random symmetric operators

$$\begin{aligned} {\mathcal {A}}:=-\sum _{i,j=1}^d \frac{\partial }{\partial x_i} \left( a_{ij} \frac{\partial }{\partial x_j} \right) +a: H^1_0(D) \rightarrow (H^1_0(D))^*, \end{aligned}$$
(17)

where \((a_{ij}(x))_{i,j=1,\dots ,d}\) is a random symmetric matrices for a.e. \(x\in D\). Let us assume that \({\overline{D}} = \bigcup _{k=1}^K {\overline{D}}_k\), where the subdomains \(D_k\) are pairwise disjoint with a polytopal boundary. Furthermore, we assume that the random fields a and \(a_{ij}\) are strongly measurable as mappings from \(\varOmega \) to \(L^\infty (D)\), and \(a_{ij}\vert _{D_k}\) is strongly measurable as a mapping from \(\varOmega \) to \(W^{s,p}(D_k)\) such that \(s>d/p\), \(k=1,\dots ,K\), \(i,j=1,\dots ,d\), where \(W^{s,p}(D_k)\), \(s\ge 0\), \(p\in [1,+\infty )\), \(k=1,\dots ,K\), denote the Sobolev–Slobodeckij spaces, cf. [23, Definition 1.3.2.1]. We assume that for \({\mathbb {P}}\)-a.e. \(\omega \), \({\mathrm{ess\, inf}}_{x\in D}\{a(\omega ,x)\}\ge 0\) and that there exists a strictly positive random variable \({\check{a}}\) such that for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} \mathop {\mathrm{ess\, inf}}\limits _{x\in D} \left\{ \sum _{i,j=1}^d a_{ij}(\omega ,x)\xi _i\xi _j\right\} \ge {\check{a}}(\omega ) |\xi |^2 \quad \forall \xi \in {\mathbb {R}}^d. \end{aligned}$$

The corresponding random bilinear form is for \({\mathbb {P}}\)-a.e. \(\omega \) given by

Furthermore, let \({\hat{a}}\) be a positive random variable such that for \({\mathbb {P}}\)-a.e. \(\omega \)

(18)

The following example is a class of random coefficients, which will be further discussed in Sects. 6 and 7.2.

Example 9

The class of lognormal random coefficient fields \(a_{ij}=\exp (Z)\delta _{ij}\) and \(a=0\), where \(Z:\varOmega \rightarrow W^{s,p}(D)\), \(s>d/p\), is a strongly measurable Gaussian random field and \(\delta _{ij}\) denotes the Kronecker symbol, satisfy these conditions; see ahead in Sect. 7.2 for a class of instances of such GRFs.

The proof of the strengthened Cauchy–Schwarz inequality, see (11), draws on [9, Sects. 4 and 5] and [39, Sect. 4]. The setting in reference [9] allows for low Hölder regularity of the coefficients of elliptic operators, but does not provide a strengthened Cauchy–Schwarz inequality needed for the setting of assumptions (A.1) and (A.2). The strengthened Cauchy–Schwarz inequality proved in [39, Lemmas 6.1 and 6.3] is limited to coefficients with \(W^{1,\infty }(D)\) regularity. Here, a strengthened Cauchy–Schwarz inequality will be proved with explicit dependence on the coefficients that is valid for arbitrary low Hölder regularity of the coefficients and also allows for jumps across \(\partial D_k\) (see ahead Proposition 11). We will identify estimates with explicit dependence on the random coefficients of \({\mathcal {A}}\).

Let \(({\mathcal {T}}_\ell ,\ell \in {\mathbb {N}})\) be a nested sequence of shape regular simplicial, uniformly refined meshes of D. Note that in one refinement step one simplex is refined into \(2^d\) subsimplices. For every \(k=1,\dots ,K\) and \(\ell \in {\mathbb {N}}\), we require that \({\overline{D}}_k=\bigcup _{\tau \in {\mathcal {T}}_\ell , D_k\cap \tau \ne \emptyset }{\overline{\tau }}\). Let \({\mathcal {V}}_\ell \subset {\mathcal {V}}\) be the space of piecewise polynomial function with respect to the mesh \({\mathcal {T}}_\ell \), \(\ell \in {\mathbb {N}}\). For simplicity, we will consider here only first order FE, i.e., polynomial degree one. Define \(-\varDelta _\ell :{\mathcal {V}}_\ell \rightarrow {\mathcal {V}}_\ell \) by the bilinear form over \({\mathcal {V}}_\ell \times {\mathcal {V}}_\ell \), \(\ell \in {\mathbb {N}}\). By [9, Eq. (5.1)], there exists a deterministic constant \(C>0\) such that for every \(j\in {\mathbb {N}}\) and every \(v\in {\mathcal {V}}_j\) with \(v=\sum _{i=1}^j v_i\), \(v_i\in {\mathcal {V}}_i\), such that

$$\begin{aligned} \sum _{i=1}^j\lambda _{\max }(-\varDelta _i)\Vert v_i\Vert ^2_{L^2(D)} \le C \int _D|\nabla v|^2 \mathrm{d}x. \end{aligned}$$
(19)

Moreover, the following inverse estimates hold. There exists a constant C such that for every \(v\in {\mathcal {V}}_\ell \)

$$\begin{aligned} \Vert v\Vert _{H^1(D)} \le C h_\ell ^{-1} \Vert v\Vert _{L^2(D)} \quad \text {and}\quad \Vert v\Vert _{H^{1+s}(D)} \le C h_\ell ^{-s} \Vert v\Vert _{H^1(D)}, \end{aligned}$$
(20)

where

$$\begin{aligned} h_\ell : = \max _{\tau \in {\mathcal {T}}_\ell } \{{{\,\mathrm{diam}\,}}(\tau )\} \end{aligned}$$

and \(s\in (0,1/2)\), cf. [14, Theorem 3.2.6] and [11, Eq. (10.1)]. These inverse estimates are sharp, which can be seen by choosing v to be a nodal basis function of the FE space \({\mathcal {V}}_\ell \). Since by (18) for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} {\check{a}}(\omega )\lambda _{\max }(-\varDelta _\ell )\le \lambda _{\max }({\mathcal {A}}_\ell (\omega ))\le {\hat{a}}(\omega )\lambda _{\max }(-\varDelta _\ell ) , \end{aligned}$$
(21)

we also observe that for \({\mathbb {P}}\)-a.e. \(\omega \)

$$\begin{aligned} \lambda _{\max }({\mathcal {A}}_\ell (\omega )) \ge C{\check{a}}(\omega ) h_\ell ^{-2}, \end{aligned}$$
(22)

which for \(-\varDelta _\ell \) is a consequence of the sharpness of (20).

We require the following assumptions on the smoothers \((R_j:j=1,\dots ,J)\). There exists a deterministic \(\nu \in (0,2)\) such that for every \(j=1,\dots ,J\), and \({\mathbb {P}}\)-a.e. \(\omega \)

figure c

There exists deterministic \(c_0,c_1 >0\) such that for every \(j=1,\dots ,J\), and for \({\mathbb {P}}\)-a.e. \(\omega \)

figure d

Assume that there exists \(\gamma \in (0,1)\) such that for all \(i,j\in {\mathbb {N}}\) satisfying \(i\le j\) it holds that

figure e

Note that (B) implies that \(\rho (R_iA_i)\le \nu \) for every i. There exist smoothers that satisfy these assumptions, cf. [6, Chapter 8] and [8, 10].

Lemma 10

Let \(s\in (0,1/2)\) and \(p\in (d/s,+\infty )\), then for \({\mathbb {P}}\)-a.e. \(\omega \), for every \(\eta >0\), \(\phi \in H^1(D)\), and \(\psi \in H^{1+s}(D)\) it holds that

where for a deterministic constant C independent of \((a_{ij},a: i,j=1,\dots ,d)\)

$$\begin{aligned} {\widetilde{{\mathcal {C}}}}(\omega ) := C\left( \max _{k=1,\dots ,K}\sum _{i,j=1}^d\Vert a_{ij}(\omega )\Vert _{W^{s,p}(D_k)} + \Vert a(\omega )\Vert _{L^\infty (D)}\right) . \end{aligned}$$

Proof

The following argument originates from the proof of [9, Lemma 4.3]. We will track the dependence on the random elliptic coefficients \((a_{ij},a : i,j=1,\dots ,d)\). Let us fix \(k\in \{1,\dots ,K\}\). There exists a bounded linear extension operator \(I_k:H^1(D_k)\rightarrow H^1({\mathbb {R}}^d)\), cf. [23, Theorem 1.4.3.1]. For every function \(v:D_k \rightarrow {\mathbb {R}}\) the zero extension to \({\mathbb {R}}^d\) is denoted by \({\widetilde{v}}\). Let \({\mathscr {F}}\) denote the Fourier transform on \({\mathbb {R}}^d\). We obtain with Plancherel’s theorem

$$\begin{aligned} \begin{aligned} \int _{D_k} a_{ij} \frac{\partial \phi }{\partial x_i}\frac{\partial \psi }{\partial x_j} \mathrm {d} x&= \int _{{\mathbb {R}}^d} \frac{\partial (I_k\phi )}{\partial x_i}\widetilde{a_{ij}\frac{\partial \psi }{\partial x_j}} \mathrm {d} x \\&= \left( {\mathscr {F}} \left( \frac{\partial (I_k\phi )}{\partial x_i}\right) , {\mathscr {F}}\left( \widetilde{a_{ij}\frac{\partial \psi }{\partial x_j}}\right) \right) _{L^2({\mathbb {R}}^d)}. \end{aligned} \end{aligned}$$

Recall that \(\Vert (1+|\xi |^2)^{s/2}{\mathscr {F}}(v)\Vert _{L^2({\mathbb {R}}^d)}\) is the Bessel potential norm in the Hilbert case of order s of a function v. The fact that in the Hilbert case Bessel potential and Slobodeckij spaces are equal with equivalent norms, cf. [37, Definition 2.3.1(d), Theorem 2.3.2(d), Eq. 4.4.1(8)], and the boundedness of the zero extension as an operator from \(H^s(D_k)\) to \(H^s({\mathbb {R}}^d)\), cf. [23, Corollary 1.4.4.5], imply with the Cauchy–Schwarz inequality and differentiation rules for \({\mathscr {F}}\) that there exists a constant C such that

$$\begin{aligned} \begin{aligned}&\int _{D_k} a_{ij} \frac{\partial \phi }{\partial x_i}\frac{\partial \psi }{\partial x_j} \mathrm {d} x \\&\qquad \le C \left( \int _{{\mathbb {R}}^d} \frac{|\xi |^2}{(1+|\xi |^2)^s}{\mathscr {F}} (I_k\phi ) \mathrm {d} \xi \right) ^{1/2} \left\| a_{ij} \frac{\partial \psi }{\partial x_j}\right\| _{H^s(D_k)} \\&\qquad \le C (\eta ^{-1} \Vert \phi \Vert ^2_{L^2(D_k)} + \eta ^{s/(1-s)} \Vert \phi \Vert ^2_{H^1(D_k)})^{1/2} \left\| a_{ij} \frac{\partial \psi }{\partial x_j}\right\| _{H^s(D_k)} , \end{aligned} \end{aligned}$$

where the inequality \(|\xi |^2/(1-|\xi |^2)^s \le \eta ^{-1} + \eta ^{s/(1-s)}(1+|\xi |^2)\) is derived with elementary manipulations for every \(\eta >0\) and every \(\xi \in {\mathbb {R}}^d\). By [23, Theorem 1.4.4.2], the multiplication of elements of \(W^{s,p}(D_k)\) is a bounded linear operator on \(H^s(D_k)\). Thus, by summing over k and by the Cauchy–Schwarz inequality there exists a constant C such that

$$\begin{aligned} \begin{aligned}&\int _{D} a_{ij} \frac{\partial \phi }{\partial x_i}\frac{\partial \psi }{\partial x_j} \mathrm {d} x \\&\;\;\le C \max _{k=1,\dots ,K}\Vert a_{ij}\Vert _{W^{s,p}(D_k)} (\eta ^{-1} \Vert \phi \Vert ^2_{L^2(D)} + \eta ^{s/(1-s)} \Vert \phi \Vert ^2_{H^1(D)})^{1/2} \Vert \psi \Vert _{H^{1+s}(D)}. \end{aligned} \end{aligned}$$

Since it also holds that

$$\begin{aligned} \int _D a\phi \psi \mathrm {d}x \le \Vert a\Vert _{L^\infty (D)} (\eta ^{-1} \Vert \phi \Vert ^2_{L^2(D)} + \eta ^{s/(1-s)} \Vert \phi \Vert ^2_{H^1(D)})^{1/2} \Vert \psi \Vert _{H^{1+s}(D)}, \end{aligned}$$

the assertion of the lemma follows. \(\square \)

Proposition 11

Let Assumptions (B) and (C) be satisfied by the smoothers \((R_j:j=1\ldots ,J)\), let Assumption (D) hold, and let \(s\in (0,1/2)\) and \(p\in (d/s,+\infty )\).

Then for some deterministic constant C independent of J the inequality (A.2) holds with the random variable

$$\begin{aligned} K_1 := C \left( \max _{k=1,\dots ,K}\sum _{i,j=1}^d\Vert a_{ij}\Vert _{W^{s,p}(D_k)} + \Vert a\Vert _{L^\infty (D)}\right) ^2 \left( \frac{1}{{\check{a}}}\right) ^2. \end{aligned}$$

Proof

The proof of this proposition merges ideas of the proofs of [9, Lemma 4.2] and [39, Lemma 6.3] to obtain a strengthened Cauchy–Schwarz inequality with explicit dependence on the coefficients \((a_{ij},a:i,j=1,\ldots ,d)\) in the setting considered here that allows for low spatial regularity of \((a_{ij})\), \(i,j=1,\ldots ,d\). We may assume that \(j\ge i\) due to the symmetry of \((\cdot ,\cdot )_{\mathcal {A}}\) and let \(w\in {\mathcal {V}}_i\) and \(\phi \in {\mathcal {V}}_j\) be arbitrary, which are both elements of \(H^{1+s}(D)\) due to \(s<1/2\). The first inverse estimate in (20) and Lemma 10 imply that

where we tacitly absorbed the deterministic constant in (20) into \({\widetilde{{\mathcal {C}}}}\). The random variable \({\widetilde{{\mathcal {C}}}}\) is stated in Lemma 10. Then, the second inverse estimate in (20), (22), and the choice \(\eta := h_{j}^{2(1-s)}\) results in

where we again tacitly absorbed the deterministic constant in (22) into \({\widetilde{{\mathcal {C}}}}\) as well as the constant in (20). Since

we conclude with (18) and the assumption of the lemma that for every \(w\in {\mathcal {V}}_i\)

$$\begin{aligned} (T_j w,w)_{\mathcal {A}}\le c_1\frac{\Vert {\mathcal {A}}_j w \Vert ^2_{L^2(D)}}{\lambda _{\max }({\mathcal {A}}_j)} \le c_1 4 {\widetilde{{\mathcal {C}}}}^2 \left( \frac{1}{{\check{a}}}\right) ^2 \left( \frac{h_j}{h_i}\right) ^{2s} (w,w)_{\mathcal {A}}. \end{aligned}$$
(23)

We argue in a similar fashion as in the second part of the proof of [39, Lemma 6.3], i.e., we conclude with the Cauchy–Schwarz inequality, (23), and with the scaling property of the smoothers in (B) that

$$\begin{aligned} \begin{aligned} (T_jv,T_iw)_{\mathcal {A}}&\le (T_j v,v)_{\mathcal {A}}^{1/2} (T_j T_i w,T_i w)_{\mathcal {A}}^{1/2} \\&\le {\widetilde{{\mathcal {C}}}}^2 \left( \frac{1}{{\check{a}}}\right) ^2(T_j v,v)_{\mathcal {A}}^{1/2} \left( \frac{h_j}{h_i}\right) ^{2s} ( T_i w,T_i w)_{\mathcal {A}}^{1/2} \\&\le {\widetilde{{\mathcal {C}}}}^2 \eta \left( \frac{1}{{\check{a}}}\right) ^2 \left( \frac{h_j}{h_i}\right) ^{2s} (T_j v,v)_{\mathcal {A}}^{1/2} ( T_i w, w)_{\mathcal {A}}^{1/2}, \end{aligned} \end{aligned}$$

where we again absorbed deterministic constants into \({\widetilde{{\mathcal {C}}}}\). Since \(h_j/h_i\le \gamma ^{j-i}\) by assumption, the assertion of the proposition follows with Lemma 3. \(\square \)

Proposition 12

Let the smoothers \((R_j:j=1,\ldots ,J)\) satisfy Assumption (C) with a deterministic constant \(c_0\). There exists a deterministic constant C independent of j such that inequality (A.1) holds with the random variable

$$\begin{aligned} K_0:=C \frac{{\hat{a}}}{{\check{a}}}. \end{aligned}$$

Proof

Since the assumption implies that \((R_i^{-1} v_i,v_i)\le \lambda _{\max }({\mathcal {A}}_i)/c_0\; (v_i,v_i)\), the assertion of the proposition follows with (18), (19), and (21). \(\square \)

5 Domain decomposition methods

We will consider overlapping domain decomposition methods in the setting of assumptions (A.1) and (A.2) and for the random elliptic operator defined in (17). We denote by \({\mathcal {V}}_0\subset {\mathcal {V}}\) a coarse first order FE space with mesh width \(h_0\) of the type introduced in Sect. 4. The first order FE space with a fine grid is denoted by \({\widetilde{{\mathcal {V}}}}\subset {\mathcal {V}}\). For a given set of overlapping subdomains \((D_j:j=1,\ldots ,J)\) of D such that \({\overline{D}} = \bigcup _{j=1}^J {\overline{D}}_j\). These subdomains result for example by extending a given disjoint set of subdomains by a multiple of \(h_0\) in each spatial direction such that the union of its closures contains D. Also we assume that the boundary \(\partial D_j\) aligns with the considered mesh, \(j=1,\ldots , J\). The FE spaces \({\mathcal {V}}_j\), \(j=1,\ldots ,J\), are subspaces of \({\widetilde{{\mathcal {V}}}}\) and are defined by

$$\begin{aligned} {\mathcal {V}}_j := \{v\in {\widetilde{{\mathcal {V}}}}: v(x) =0\; \forall x \in D\backslash D_j \}, \quad j=1,\ldots ,J \end{aligned}$$

So, we consider the redundant space decomposition

$$\begin{aligned} {\widetilde{{\mathcal {V}}}} = \sum _{j=0}^J {\mathcal {V}}_j. \end{aligned}$$

We consider the case that symmetric multigrid solvers from Sect. 4 (Assumptions (B), (C), and (D) are satisfied) are used as so-called subpaces solvers \((R_j:j=0,\ldots ,J)\), which are random here. Therefore, suppose that the spaces \({\mathcal {V}}_j\) have nested subspaces \({\mathcal {M}}_{j,1}\subset \ldots \subset {\mathcal {M}}_{j,J'(j)} = {\mathcal {V}}_j\), \(j=0,\ldots ,J\). Naturally, only few levels are used on the subspace \({\mathcal {V}}_0\), i.e. \(J'(0)={\mathcal {O}}(1)\). As in Sect. 4, we seek for random variables \(K_0\) and \(K_1\) with explicit dependence on the random operator \({\mathcal {A}}\) in (17), in order to obtain \(L^q(\varOmega )\)-estimates for the condition numbers using additive Schwarz preconditioners.

Proposition 13

There exists a deterministic constant \(C>0\) that is independent of J and \(J'(j)\), \(j=0,\ldots ,J\), such that inequality (A.1) holds with the random variable

$$\begin{aligned} K_0 := C \left( \frac{{\hat{a}}}{{\check{a}}} \right) ^{4} \frac{({\hat{a}})^2}{({\check{a}})^6} \left( 1 + \left( \max _{k=1,\dots ,K}\sum _{i,j=1}^d\Vert a_{ij}\Vert _{W^{s,p}(D_k)} + \Vert a\Vert _{L^\infty (D)}\right) ^4 \right) . \end{aligned}$$

Proof

By (18) and [39, Lemmas 4.5 and 7.1],

$$\begin{aligned} K_0 \le \frac{{\hat{a}}}{{\check{a}}} \frac{1}{\min _{j=0,\ldots ,J}\lambda _{\min }(R_j{\mathcal {A}}_j)}. \end{aligned}$$

Since the \(R_j\)’s are chosen to be symmetric multigrid solvers, Propositions 12 and 11, Theorem 2, and (13) imply there exists a deterministic constant \(c>0\) such that for every \(j=0,\ldots ,J\),

$$\begin{aligned} \lambda _{\min }(R_j{\mathcal {A}}_j) \ge c \frac{({\check{a}})^5}{{\hat{a}}} \left( 1 + \left( \max _{k=1,\dots ,K}\sum _{i,j=1}^d\Vert a_{ij}\Vert _{W^{s,p}(D_k)} + \Vert a\Vert _{L^\infty (D)}\right) ^4 \right) ^{-1}. \end{aligned}$$

This implies the assertion of the proposition. \(\square \)

Proposition 14

Inequality (A.2) holds with the deterministic number

$$\begin{aligned} K_1 := (1+ |\{(i,j)\in \{1,\ldots ,J\}^2:D_i\cap D_j\}|) \end{aligned}$$

Proof

The assertion will follow by [39, Lemma 4.7] after we show an estimate of the form of Assumption (B). By (12), it holds that \(\lambda _{\max }(R_j {\mathcal {A}}_j)\le 1\), \(j=0,\ldots ,J\). Let \(j=0,\ldots ,J\) be arbitrary. Since \(R_j\) is a symmetric multigrid solver, \(R_j{\mathcal {A}}_j\) is symmetric and positive definite with respect to \((\cdot ,\cdot )_{\mathcal {A}}\). There exists an orthonormal basis of eigenvectors of \(R_j{\mathcal {A}}_j\) with respect to \((\cdot ,\cdot )_{\mathcal {A}}\) such that \(R_j{\mathcal {A}}_j v_i = \lambda _i v_i\). Hence, for every \(v\in {\mathcal {V}}_j\), \(v=\sum _i (v,v_i)_{\mathcal {A}}v_i\) and

$$\begin{aligned} \begin{aligned} (R_j{\mathcal {A}}_j v, R_j {\mathcal {A}}_j v )_{\mathcal {A}}&= \sum _{i} \lambda _i^2 ((v,v_i)_{\mathcal {A}})^2 \\&\le \lambda _{\max }(R_j{\mathcal {A}}_j) \sum _{i} \lambda _i ((v,v_i)_{\mathcal {A}})^2 = (R_j{\mathcal {A}}_j v, v )_{\mathcal {A}}, \end{aligned} \end{aligned}$$

which implies the estimate of the proposition with [39, Lemma 4.7]. \(\square \)

6 Application to lognormal diffusion problems

The presented theory in Sects. 34, and 5 is in particular applicable to lognormal diffusion problems. Let Z be a GRF on D that takes values in Hölder spaces \(C^t({\overline{D}})\) such that for some \(t\in (0,1]\)

figure f

see ahead Sect. 7.2 for a class of instances of such GFRs. Since for every \(v\in C^t({\overline{D}})\), \(t\in (0,1]\), \(\Vert \exp (v)\Vert _{C^t({\overline{D}})} \le \Vert \exp (v)\Vert _{C^0({\overline{D}})}(1+\Vert v\Vert _{C^t({\overline{D}})})\), the assumption in (E) implies by the Cauchy–Schwarz inequality

$$\begin{aligned} \exp (Z)\in L^q(\varOmega ;C^t({\overline{D}})) \quad \forall q\in [1,+\infty ) \end{aligned}$$

For the lognomal coefficient \(a:=\exp (Z)\), we consider the elliptic diffusion problem with Dirichlet boundary conditions in variational form to find such that for \({\mathbb {P}}\)-a.e. \(\omega \)

(24)

where \({\mathcal {V}}= H^1_0(D)\). Well-posedness and approximation by Finite Elements is well-known, cf. [12, 13, 36]. We use the FE spaces \({\mathcal {V}}_\ell \) from Sect. 4 with maximal mesh width \(h_\ell \) of \({\mathcal {T}}_\ell \), \(\ell \in {\mathbb {N}}\), and remark that for each \(\ell \), the space \({\mathcal {V}}_\ell \) may have the additional structure for overlapping domain decomposition methods with multigrid subspace solvers as introduced in Sect. 5.

Elements of \(H^{1+s}(D)\), \(s\in [0,1]\), can be approximated by functions in \({\mathcal {V}}_\ell \), cf. [14, Theorem 3.2.1], i.e., there exists a deterministic constant \(C>0\) such that for every there is such that

(25)

Note that the approximation property stated in [14, Theorem 3.2.1] can be interpolated to also hold for non-integer order Sobolev spaces. The following regularity estimate makes the dependence on the coefficient a explicit. For every \(s\in [0,\min \{t,t_{-\varDelta }\})\backslash \{1/2\}\) there exists a deterministic constant \(C>0\) such that for \({\mathbb {P}}\)-a.e. \(\omega \)

(26)

where \(t_{-\varDelta }\) is the maximal value such that the inverse of the Dirichlet Laplacian satisfies \((-\varDelta )^{-1} : H^{-1+t_{-\varDelta }}(D) \rightarrow {\mathcal {V}}\cap H^{1+t_{-\varDelta }}(D)\) is bounded. Recall that \(D\subset {\mathbb {R}}^d\) is a polytope. For \(d=2\), the estimate (26) is due to [36, Lemma 5.2] (for \(d=3\), the reader is referred to [36, Remark 5.2(c)]). The solution can be approximated in \({\mathcal {V}}_\ell \) by the FE approximation denoted by , in the \(L^q(\varOmega ,{\mathcal {V}})\)-norm, \(\ell \in {\mathbb {N}}\). Specifically, by Céa’s lemma, (25), and (26), there exists a deterministic constant \(C>0\) that is independent of a such that for \({\mathbb {P}}\)-a.e. \(\omega \) and every \(\ell \in {\mathbb {N}}\)

Finiteness of the right hand side follows by the Cauchy–Schwarz inequality using the Assumption (E). Since the embedding \(C^{t}({\overline{D}})\subset W^{s,p}(D)\) is continuous for every \(0< s<t\) and every \(p\in [1,+\infty )\), the conditions from Sects. 4 and 5 are satisfied. Let

$$\begin{aligned} U_\ell ^n:\varOmega \rightarrow {\mathcal {V}}_\ell \end{aligned}$$
(27)

be the result of \(n\in {\mathbb {N}}\) iterations of an iterative algorithm introduced in the previous sections, with initial guess \(U_{\ell -1}^n\), \(2\le \ell \in {\mathbb {N}}\), and . The iterative methods are symmetric multigrid (see Algorithms 2) or PCG using the BPX preconditioner (see Algorithm 1) in Sect. 4. In the setting of Sect. 5, \(U_\ell ^n:\varOmega \rightarrow {\mathcal {V}}_\ell \) in (27) may result from \(n\in {\mathbb {N}}\) iterations of PCG using the additive Schwarz preconditioner (see Algorithm 1), where symmetric multigrid (see Algorithm 2) is used as subspace solvers.

Theorem 15

For \(0<s<t\le 1\) and every \(q,r\in [1,+\infty )\), there exists a constant \(C>0\) such that for every number of iterations \(n\in {\mathbb {N}}\) and \(\ell \ge 2\),

Proof

By the Cauchy–Schwarz inequality, \(\Vert a\Vert ^2_{C^0({\overline{D}})}\Vert a\Vert ^2_{C^t({\overline{D}})}(\min _{x\in {\overline{D}}}a(x))^{-5}\in L^p(\varOmega )\) for every \(p\in [1,+\infty )\), which is one of the conditions of Theorem 6.

It remains to verify the needed properties of the random contraction number in the conditions of Theorem 6. In the framework of assumptions (A.1) and (A.2), by Theorem 2 the random contraction number \(\delta \) satisfies for the linear iteration \(1/(1-\delta ) \le K_0(1+K_1)^2/(2-\nu )\). In the case of multigrid, Propositions 11 and 12 and the Cauchy–Schwarz inequality imply that \(1/(1-\delta )\in L^{p'}(\varOmega )\) for every \(p'\in [1,+\infty )\). For overlapping domain decomposition methods, this statement is due to Propositions 13 and 14. If the additive preconditioner is applied with PCG, the random contraction number \(\delta \) satisfies by Theorem 2, \(1/(1-\delta ) \le \sqrt{K_0 K_1} +1\). By the same argument, \(1/(1-\delta )\in L^{p'}(\varOmega )\) for every \(p'\in [1,+\infty )\). Hence, the parameter r in Theorem 6 may be arbitrarily large, which implies the assertion. \(\square \)

Corollary 16

In the setting of Theorem 15, let \(U_\ell ^n\) result from \(n\in {\mathbb {N}}\) iterations of PCG with a deterministic preconditioner \({\widetilde{B}}_j\) such that \(\kappa ({\widetilde{B}}_j{{\,\mathrm{{\mathbb {E}}}\,}}({\mathcal {A}}_j))\) is bounded uniformly in j. Then, the strong convergence estimate of Theorem 15 also holds.

Proof

By Lemma 1, for \({\mathbb {P}}\)-a.e. \(\omega \), \(\kappa ({\widetilde{B}}_j{\mathcal {A}}_j(\omega ))\le {\hat{a}}(\omega )/{\check{a}}(\omega )\kappa ({\widetilde{B}}_j{{\,\mathrm{{\mathbb {E}}}\,}}({\mathcal {A}}_j))\). Since \({\hat{a}}/{\check{a}}\in L^{q'}(\varOmega )\) for every \(q'\in [1,+\infty )\), the claim follows as in the proof of Theorem 15. \(\square \)

Corollary 17

In the setting of Theorem 15, for every \(\varepsilon >0\) there exists a constant \(C_{q,\varepsilon ,s}>0\) that is independent of \(h_\ell \) such that for every \(\ell \ge 2\)

with a deterministic number of iterations given by \(n=\lceil n_0 h_\ell ^{-\varepsilon } \rceil \) for a deterministic constant \(n_0>0\).

The cost for one sample of \(U_\ell ^n\) is

$$\begin{aligned} {\mathcal {O}}(h_\ell ^{-d-\varepsilon }) \end{aligned}$$

with deterministic constants that are independent of \(h_\ell \), \(\ell \ge 0\).

Proof

The cost of one iteration is \({\mathcal {O}}(h_\ell ^{-d})\), since the matrix vector product has cost \({\mathcal {O}}(h_{\ell '}^{-d})\) for the sparse stiffness matrices that result taking the nodal basis of \({\mathcal {V}}_{\ell '}\), \(\ell '\le \ell \).

The error contributions in the estimate of Theorem 15 are equilibrated for this choice of iteration number, since r in Theorem 15 can be chosen arbitrarily large, i.e., \(r=s/\varepsilon \) is admissible. \(\square \)

Remark 18

The established theory in this paper is also applicable for deterministic preconditioners that do not imply uniform condition numbers. A possible class of examples are so-called “algebraic multigrid” (AMG) preconditioners for which a multilevel convergence theory does not seem to be available, cf. [40, Sect. 5]. However, if such a deterministic symmetric AMG preconditioner could be tuned to \({{\,\mathrm{{\mathbb {E}}}\,}}({\mathcal {A}}_j)\) or any operator which is spectrally equivalent in the sense of (2), the respective iterative method also converges strongly as in Theorem 15 by a similar argument applying Corollary 16. Note that in the example (24) (see also Example 9), if Z is stationary, then \({{\,\mathrm{{\mathbb {E}}}\,}}({\mathcal {A}})\) is the Dirichlet Laplacian multiplied by \({{\,\mathrm{{\mathbb {E}}}\,}}(\exp (Z))\), which is constant with respect to \(x\in D\).

7 Application to multilevel Monte Carlo

Multilevel Monte Carlo methods make the numerical approximation of moments of random quantities feasible where sampling is computationally costly. The analysis of the computational cost versus accuracy of MLMC that does not rely on additional assumptions on the used PDE solver is an application of the presented theory in previous sections. The MLMC estimator with \(L\in {\mathbb {N}}\) levels to approximate the mean field is given by

$$\begin{aligned} E_L^{\mathrm{ML}}(U_L^{n_L}):= \sum _{\ell =1}^L E_{M_\ell }(U_\ell ^{n_\ell } - U^{n_{\ell -1}}_{\ell -1}), \end{aligned}$$

where \((E_{M_\ell }:\ell =1,\ldots ,L)\) are Monte Carlo estimators that are mutually independent. We used the convention that \(U^{n_0}_{0} :=0\). Sample numbers \(M_\ell \), \(\ell =1,\ldots ,L\) are chosen to optimize the accuracy versus the computational cost.

7.1 Computational cost versus accuracy

In the literature [13, 22], generic asymptotic bounds of the required computational cost for a certain accuracy with MLMC are given that pose assumptions on the behavior of and on the required computational cost to sample . In [13], the assumption on the decay of was verified without investigating the computational cost to sample . In the present paper this is achieved by Corollary 17, i.e., with computational cost \({\mathcal {O}}(h_\ell ^{-d-\varepsilon })\) to sample \(U^{n_\ell }_\ell \) for any \(\varepsilon >0\), where the implied constants depend on \(\varepsilon \). Thus, by [13, Theorem 4.1], an error threshold \(0<\mathrm{TOL}\), i.e.,

can be achieved with computational cost

$$\begin{aligned} \mathrm{work}_{L} = {\left\{ \begin{array}{ll} {\mathcal {O}}(\mathrm{TOL}^{-2}) &{} \text {if } 2s > d+\varepsilon , \\ {\mathcal {O}}(\mathrm{TOL}^{-(d+\varepsilon )/s}) &{} \text {if } 2s < d+\varepsilon , \end{array}\right. } \end{aligned}$$
(28)

where \(d=1,2,3\) is the dimension of the domain D. We have assumed that a fast method is available to sample the random coefficient a. This is for example the case for certain stationary Gaussian random fields (GRFs), which can be sampled by truncated Karhunen Loève series or circulant embedding which are fast, FFT based techniques, cf. [34, Chapter 7].

7.2 Numerical experiments

We consider a class of centered, stationary GRFs that are solutions to the white noise stochastic PDE

$$\begin{aligned} (-\varDelta + \kappa ^2)^{\alpha /2} Z = W \quad \text {on }{\widetilde{D}}, \end{aligned}$$
(29)

where \({\widetilde{D}}\) is a superset of the domain D, W is spatial white noise on \({\widetilde{D}}\) (cf. [1]), \(\nu =\alpha -d/2>0\), and \(\kappa >0\). It is well-known that for \({\widetilde{D}} = {\mathbb {R}}^d\), the GRF Z has so-called Matérn covariance with smoothness parameter \(\nu \) and length scale parameter \(\lambda =\sqrt{\nu }/\kappa \), cf. [33].

In our numerical experiments, we will choose \(d=2\) and \(D=[0,1]^2\). A stationary GRF as a solution to (29) results by restricting Z to the domain D. It is convenient to choose \({\widetilde{D}}=[-1/2,3/2]^2 \) with periodic boundary conditions. Note that \(\mathrm{dist}(D,\partial {\widetilde{D}})= 1/2\), and we consider values of the correlation length, which are smaller or equal than this window size, cf. [29]. The solution Z to (29) can be obtained by a spectral Galerkin method using the eigenfunctions of the Laplacian with periodic boundary conditions on \({\widetilde{D}}\) normalized in \(L^2({\widetilde{D}})\). Since these eigenfunctions separate, we denote them with double indices \((\psi _{k_1,k_2})_{k_1,k_2\in {\mathbb {N}}_0}\). We observe that white noise W applied to any ONB yields a sequence of independent standard normally distributed random variables. Since this ONB diagonalizes the operator \((-\varDelta + \kappa ^2)^{\alpha /2}\), the stationary GRF Z can be explicitly expanded with respect to this basis. The random coefficients with respect to this ONB are given by

$$\begin{aligned} \frac{1}{(\pi ^2(k_1^2 + k_2^2) + \kappa ^2)^{\alpha /2}} y_{k_1,k_2}, \quad k_1,k_2 \in {\mathbb {N}}_0, \end{aligned}$$

where \((y_{k_1,k_2})_{k_1,k_2 \in {\mathbb {N}}_0}\) is a sequence of independent, standard normally distributed random variables. For \(\sigma >0\), to be determined later, the GRF Z will be rescaled such that the pointwise variance satisfies

$$\begin{aligned} {{\,\mathrm{{\mathbb {E}}}\,}}(Z(x)^2)=\sigma ^2, \quad \forall x\in D. \end{aligned}$$

The expansion of Z can be truncated for numerical purposes and efficiently implemented with FFT. For any \({\widetilde{k}}\in {\mathbb {N}}\), let us denote the truncation of the expansion of Z to the terms such that \(k_1,k_2\le {\widetilde{k}}\) by \(Z^{{\widetilde{k}}}\). By an argument similar to the proof of [27, Theorem 2.2], for every \(t\in (0,\nu )\), \(\varepsilon \in (0,\nu -t)\), and every \(q\in [1,+\infty ) \), there exists a constant \(C>0\) such that for every truncation \({\widetilde{k}}\in {\mathbb {N}}\)

$$\begin{aligned} \Vert Z - Z^{{\widetilde{k}}}\Vert _{L^q(\varOmega ;C^t({\widetilde{D}}))} \le C {\widetilde{k}}^{-(\nu -t -\varepsilon )}. \end{aligned}$$

The additional error introduced by truncating the expansion of Z is consistent with the FE discretization error if \({\widetilde{k}}\) is chosen level-dependently such that \({\widetilde{k}}_\ell = \lceil {\widetilde{k}}_1 h_\ell ^{-\nu } \rceil \) for some \({\widetilde{k}}_1>0\) at our disposal and FE mesh width \(h_\ell \). The reader is referred to the discussion in [27, Sects. 3 and 5]. As a consequence of Fernique’s theorem, we may conclude similarly as in [12, Proposition 3.10]. that \(\exp (Z)\in L^q(\varOmega ;C^{0}({\overline{D}}))\) for every \(q\in [1,+\infty )\); see also [27, Proposition B.1]. Thus, the assumptions in (E) are verified by the GRF Z and the developed theoretical error estimates of this paper, in particular Corollary 17 and (28), hold.

The GRF Z is taken as a random input and we seek to approximate the expectation of the solution to (24) with a MLMC estimator, where \(a=\exp (Z)\) and right hand side \(f(x_1,x_2) = \sin (\pi x_1)\sin (\pi x_2)\). We use the sample numbers according to [26, Eqs. (44) and (47)], i.e., for \(M^*\in {\mathbb {N}}\)

$$\begin{aligned} M_\ell = \lceil M_1 h_\ell ^{(2s + d+\varepsilon )/2)} \rceil , \quad \ell =2,\ldots ,L, \end{aligned}$$
(30)

and

$$\begin{aligned} M_1 = M^* {\left\{ \begin{array}{ll} \lceil 2^{sL/2}\rceil &{}\quad \text {if } \quad d+\varepsilon <2s, \\ \lceil 2^{(2s + d+\varepsilon )L /2} \rceil &{}\quad \text {if }\quad d+\varepsilon >2s, \end{array}\right. } \end{aligned}$$
(31)

here with \(M^*=5\) and \(L=1,\ldots ,8\) levels. For the level dependent truncation of the expansion of the GRF, we use \({\widetilde{k}}_\ell = \lceil {\widetilde{k}}_1 h_\ell ^{-\nu } \rceil \) with \({\widetilde{k}}_1=2\). We consider a full multigrid cycle with PCG using BPX with Jacobi smoothers and a direct solver on the lowest level. We recall that the iteration numbers depend mildly on the level of the MLMC estimator and will be chosen by \(n_\ell = \lceil n_0 h_\ell ^{-\varepsilon } \rceil \) for \(0<\varepsilon \ll 1\). We use a triangulation of D, which is uniformly refined, resulting in FE spaces \({\mathcal {V}}_\ell \subset {\mathcal {V}}\), \(\ell \ge 1\), with maximal mesh width \(h_\ell \), which incorporate Dirichlet boundary conditions. The FE spaces are spanned by a nodal basis. The implementation uses the FE C++ library BETL, cf. [28], the GRF is implemented using FFTW, cf. [17], and the execution of MLMC is parallelized using the MPI-based wrapper gMLQMC, cf. [19]. Since the GRF Z may be periodically extended, \([0,2]^2\) can be used as a computational domain, which eases implementation. The \(L^2(\varOmega ;{\mathcal {V}})\)-norm will be estimated by where R i.i.d. realizations of \(E_L^{\mathrm{ML}}(U^{n_L}_L)\) are used. The reference solution is approximated by the average of \(R=20\) realizations of \(E_L^{\mathrm{ML}}(U^{n_L}_L)\) with \(L=9\), sample numbers (30) and (31) with \(M^*=40\), and a sparse direct solver.

In Fig. 1a, b, the error is plotted versus the degrees of freedom in the FE space of the highest level that is active in the MLMC estimator. The empirical rate is computed by least squares taking into account the five data pairs corresponding to finer resolution. The fitted lines are shifted down for better visability. We remark that the relation to the total required work is asymptotically \(\mathrm{work}_L = {\mathcal {O}}(h_L^{-d-\varepsilon })\). Thus, the rate implied by (28) is for \(\nu = 0.5\) approximately 0.25. We observe that the performance of the iterative solver is in the required range of accuracy as good as a sparse direct solver. In Fig. 1a, b, the rate seems to depend on the value of the correlation length. Sometimes for practitioners, a smaller value for the correlation length is of interest. However, the performance of the iterative solver is still as good as the sparse direct solver. These numerical tests were performed with iteration numbers \(n_\ell = \lceil n_0 h_\ell ^{-\varepsilon } \rceil \) for \(0<\varepsilon \ll 1\). The presented theory also underpins the strategy to apply PCG with a stopping criterion and use \(n_\ell \) as an upper bound of the iteration numbers. This way, one may benefit from certain superlinear convergence effects of CG, cf. [4], which were for ease of exposition not accounted for in the presented theory.

Fig. 1
figure 1

Comparison of full multigrid cycle using PCG with BPX with \(n_0=5\), \(\varepsilon =0.05\) to a sparse direct solver

8 Conclusions and extensions

In the study of linear random operator equations with non-uniform input a rigorous framework has been established to verify strong convergence of a wide range of iterative solvers. This includes standard solvers such as multigrid, domain decomposition, and preconditioned conjugate gradient. In the case of lognormal random input, essentially optimal, deterministic complexity bounds are implied. This offers an alternative to direct solvers for this type of problems. For MLMC, we concluded deterministic, essentially optimal computational cost versus accuracy estimates, which is a novel result. Numerical experiments with CG preconditioned by BPX confirmed these asymptotic bounds and we conclude that standard iterative solvers are applicable in the case of lognormal Hölder continuous input. Assumptions on the computational cost of PDE solvers are also common in the context of multilevel quasi-Monte Carlo methods (MLQMC) for PDE problems, cf. [20, 26, 31, 32]. Applicability of iterative solvers for MLQMC in a certain setting has been analyzed by the author in [25].