Abstract
We develop a multilevel Monte Carlo (MLMC)-FEM algorithm for linear, elliptic diffusion problems in polytopal domain \({\mathcal {D}}\subset {\mathbb {R}}^d\), with Besov-tree random coefficients. This is to say that the logarithms of the diffusion coefficients are sampled from so-called Besov-tree priors, which have recently been proposed to model data for fractal phenomena in science and engineering. Numerical analysis of the fully discrete FEM for the elliptic PDE includes quadrature approximation and must account for (a) nonuniform pathwise upper and lower coefficient bounds, and for (b) low path-regularity of the Besov-tree coefficients. Admissible non-parametric random coefficients correspond to random functions exhibiting singularities on random fractals with tunable fractal dimension, but involve no a-priori specification of the fractal geometry of singular supports of sample paths. Optimal complexity and convergence rate estimates for quantities of interest and for their second moments are proved. A convergence analysis for MLMC-FEM is performed which yields choices of the algorithmic steering parameters for efficient implementation. A complexity (“error vs work”) analysis of the MLMC-FEM approximations is provided.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The efficient numerical solution of partial differential equations with uncertain inputs is key in forward uncertainty quantification, i.e., the computational quantification of uncertainty of solutions to PDEs with uncertain inputs. It is also crucial in computational inverse uncertainty quantification, e.g. via Markov chain Monte Carlo methods, where numerous numerical solves of the forward model subject to realizations of the uncertain input are required. Here, we consider the linear, elliptic diffusion with uncertain coefficient. It models a wide range of phenomena such as diffusion through a medium with uncertain or even unknown permeability, e.g. in subsurface flow, light scattering in dust clouds, to name but a few. Physical modelling of subsurface flow in particular stipulates systems of fractures of uncertain geometry with high permeability along fractures (see, e.g., [8] and the references there). With fracture geometry being only statistically known, it is natural in computational uncertainty quantification (UQ) to specify the geometry in a nonparametric fashion, rather than, for instance, through a Gaussian random field (GRF for short) with a known, parametric two-point correlation to be calibrated from experimental data. This function space perspective has also become topical recently in the context of inverse imaging noisy signals. Modelling with random, fractal geometries also has found applications in biology (roots, lungs [2]). There, Gaussian parametric models have been found computationally efficient due to the availability of padding and circulant embedding based numerics, enabling the use of fast Fourier transform algorithms for sample path generation. However, Gaussian models are perceived as inadequate for the efficient representation of edges and interfaces in imaging. Accordingly, non-parametric representations of inputs with fractal irregularities in sample paths have been proposed recently, e.g. in [19, 23], and the references there. We also mention the so-called Besov priors in Bayesian inverse problems with elliptic PDE constraints (e.g. [4, 22, 24] and the references there).
In the present paper, we investigate so-called Besov random tree priors [23], as stochastic log-diffusion coefficient in a linear elliptic PDE. These priors are given by a wavelet series with stochastic coefficients, and certain terms in the expansion vanishing at random, according to the law of so-called Galton-Watson trees. Samples of the corresponding random fields involve fractal geometries, hence the Besov random tree prior may be a viable candidate in applications, where models based on GRFs do not allow for sufficiently flexibility. We quantify the pathwise regularity of the random tree prior in terms of Hölder-regularity, and investigate the forward propagation of the uncertainty in the elliptic PDE model in a bounded domain. All results in the present article encompass the “standard” Besov prior from [24] as special case, when no terms in the wavelet series are eliminated. As we point out in our analysis, regularity is inherently low, both with respect to the spatial and stochastic domain of the random field. This is taken into account when developing efficient numerical methods for the elliptic PDE problem at hand.
We develop a multilevel Monte Carlo (MLMC) Finite Element (FE) simulation algorithm and furnish its mathematical analysis for the estimation of quantities of interest (QoI) in the forward PDE model. Multilevel Monte Carlo methods ([6, 14, 15]) are, by now, a well-established methodology in computational UQ, and are effective in regimes with comparably low regularity. We mention that our MLMC-FE error analysis includes the case of standard low-regularity Besov priors as a special case. In contrast, higher-order methods that consider an equivalent parametric, deterministic PDE problem, such as (multilevel) Quasi-Monte Carlo ([20]), generalized polynomial chaos (gPC) expansions ([21]), or multilevel Smolyak quadrature ([28]) are not suitable in the present random tree model: The parametrization of the prior involves discontinuities in the stochastic domain, which strongly violates the regularity requirements of the aforementioned higher-order methods. We refer to Appendix A.3 for a detailed discussion. On the other hand, MLMC techniques merely require square-integrability of the pathwise solution
1.1 Contributions
For a model linear elliptic diffusion equation, in a polytopal domain \({\mathcal {D}}\subset {\mathbb {R}}^d\), we provide the mathematical setting and the numerical analysis of a MLMC-FEM for diffusion in random media with log-fractal Besov random tree structure. In particular, we establish well-posedness of the forward problem including strong measurability of random solutions (a key ingredient in the ensuing MLMC-FE convergence analysis), and pathwise almost sure Besov regularity of weak solutions. Technical results of independent interest include: (i) Bounds on exponential moments of Besov random variables in Hölder norms, generalizing results in [11, 23, 24], (ii) Numerical analysis of elliptic forward problems with fractal coefficient, in particular bounds on the fractal scale truncation error and on the finite element approximation error, as well as the impact of numerical quadrature in view of low (Hölder) path regularity of the random coefficients, (iii) a complete MLMC-FE convergence analysis, for estimating the mean of non-linear functionals of the random solution field.
1.2 Preliminaries and notation
We denote by \({\mathcal {V}}'\) the topological dual for any vector space \({\mathcal {V}}\) and by \(_{{\mathcal {V}}'}^{}{\langle }_{}^{} \cdot ,\cdot \rangle _{{\mathcal {V}}}\) the associated dual pairing. We write \({\mathcal {X}}\hookrightarrow {\mathcal {Y}}\) for two normed spaces \(({\mathcal {X}}, \left\| \cdot \right\| _{\mathcal {X}}), ({\mathcal {Y}}, \left\| \cdot \right\| _{\mathcal {Y}})\), if \({\mathcal {X}}\) is continuously embedded in \({\mathcal {Y}}\), i.e., there exists \(C>0\) such that \(\Vert \varphi \Vert _{\mathcal {Y}}\le C\Vert \varphi \Vert _{\mathcal {X}}\) holds for all \(\varphi \in {\mathcal {X}}\). The Borel \(\sigma \)-algebra of any metric space \(({\mathcal {X}}, d_{\mathcal {X}})\) is generated by the open sets in \({\mathcal {X}}\) and denoted by \({\mathcal {B}}({\mathcal {X}})\). For any \(\sigma \)-finite and complete measure space \((E,{\mathcal {E}},\mu )\), a Banach space \(({\mathcal {X}}, \left\| \cdot \right\| _{\mathcal {X}})\), and integrability exponent \(p\in [1,\infty ]\), we define the Lebesgue-Bochner spaces
where
In case that \({\mathcal {X}}={\mathbb {R}}\), we use the shorthand notation \(L^p(E):=L^p(E;{\mathbb {R}})\). If \(E\subset {\mathbb {R}}^d\) is a subset of the Euclidean space, we assume \({\mathcal {E}}={\mathcal {B}}(E)\) and \(\mu \) is the Lebesgue measure, unless stated otherwise. For any bounded and connected spatial domain \({\mathcal {D}}\subset {\mathbb {R}}^d\) we denote for \(k\in {\mathbb {N}}\) and \(p\in [1,\infty ]\) the standard Sobolev space \(W^{k,p}({\mathcal {D}})\) with k-order weak derivatives in \(L^p({\mathcal {D}})\). The Sobolev-Slobodeckji space with fractional order \(s\ge 0\) is denoted by \(W^{s,p}({\mathcal {D}})\). Furthermore, \(H^s({\mathcal {D}}):=W^{s,2}({\mathcal {D}})\) for any \(s\ge 0\) and we use the identification \(H^0({\mathcal {D}})=L^2({\mathcal {D}})\). Given that \({\mathcal {D}}\) is a Lipschitz domain, we define for any \(s>1/2\)
Here, \(\gamma _0\in {\mathcal {L}}(H^s({\mathcal {D}}),H^{s-1/2}(\partial {\mathcal {D}}))\) denotes the trace operator.
Let \(\textrm{C}(\overline{{\mathcal {D}}})\) denote the space of all continuous functions \(\varphi :\overline{{\mathcal {D}}}\rightarrow {\mathbb {R}}\). For any \(\alpha \in {\mathbb {N}}\), \(\textrm{C}^{\alpha }(\overline{{\mathcal {D}}})\) is the space of all functions \(\varphi \in \textrm{C}(\overline{{\mathcal {D}}})\) with \(\alpha \) continuous partial derivatives. For non-integer \(\alpha >0\), we denote by \(\textrm{C}^{\alpha }(\overline{{\mathcal {D}}})\) the space of all \(\varphi \in \textrm{C}^{\lfloor \alpha \rfloor }(\overline{{\mathcal {D}}})\) with \(\alpha -\lfloor \alpha \rfloor \)-Hölder continuous \(\lfloor \alpha \rfloor \)-th partial derivatives. For any positive, real \(\alpha >0\) we further denote by \({\mathcal {C}}^\alpha ({\mathcal {D}})\) the Hölder-Zygmund space of smoothness \(\alpha \). We refer to, e.g., [25, Section 1.2.2] for a definition. We denote by \(\textrm{S}({\mathbb {R}}^d)\) the Schwartz space of all smooth, rapidly decaying functions, and with \(\textrm{S}'({\mathbb {R}}^d)\) its dual, the space of tempered distributions. Moreover, for any open set \(O\subseteq {\mathbb {R}}^d\), \(\mathrm D(O)\) denotes the space of all smooth functions \(\varphi \in \textrm{C}^\infty (O)\) with compact support in O.
For the finite element error analysis we introduce a countable set \({\mathfrak {H}}\subset (0,\infty )\) of discretization parameters, and denote by \(h\in {\mathfrak {H}}\) a generic discretization parameter, such as in the present paper the FE meshwidth of a regular, simplicial and quasi-uniform partition of the physical domain. We further assume there exists a decreasing sequence \((h_\ell , \ell \in {\mathbb {N}})\subset {\mathfrak {H}}\) such that \(\lim _{\ell \rightarrow \infty } h_\ell = 0\).
1.3 Layout of this paper
In Sect. 2 we introduce the class of random fields taking values in the Besov spaces \(B^s_{p,p}\) which we will use in the sequel to model the logarithm of the diffusion coefficient function. Using multiresolution (“wavelet”) bases in \(B^s_{p,p}\), in Sects. 2.2, 2.3 we construct probability measures on \(B^s_{p,p}\) in the spirit of the Gaussian measure on path space for the Wiener process, in Lévy-Cieselski representation. The multilevel structure of the construction will be essential in the ensuing MLMC-FE convergence analysis and its algorithmic realization. In Sect. 3 we introduce the linear, elliptic divergence-form PDE with Besov-tree coefficients. We recapitulate (mostly known) results on existence, uniqueness and on strong measurability of random solutions. In Sect. 4 we introduce a conforming Galerkin Finite Element discretization based on continuous, piecewise linear approximations in the physical domain. We account for the error due to finite truncation of the random tree priors, and provide sharp error bounds for the Finite Element discretization errors, under the (generally low) Besov regularity of the coefficient samples. Section 5 then addresses the MLMC-FE error analysis, also for Fréchet-differentiable, possibly nonlinear functionals. Section 6 then illustrates the theory with several numerical experiments, where the impact of the parameter choices in the Besov random tree priors on the overall error convergence of the MLMC-FEM algorithms is studied. Section 7 provides a brief summary of the main results, and indicates several generalizations of these and directions of further research. Appendix A collects definitions and key properties of Galton-Watson trees which are used in the main text. Appendix B provides a detailed description of the FE implementation in the experiments reported in Sect. 6.
2 Random variables in Besov spaces
2.1 Wavelet representation of Besov spaces
Let \({\mathbb {T}}^d:=[0,1]^d\) denote the d-dimensional torus for \(d\in {\mathbb {N}}\). We briefly recall the construction of orthonormal wavelet basis on \(L^2({\mathbb {R}}^d)\) and \(L^2({\mathbb {T}}^d)\) and the wavelet representation of the associated Besov spaces. For more detailed accounts we refer to [26, Chapter 1], [27, Chapter 1.2], and to [12, Chapter 5] for orthonormal wavelets in multiresolution analysis (MRA).
2.1.1 Univariate MRA
Let \(\phi \) and \(\psi \) be compactly supported scaling and wavelet functions in \(\textrm{C}^{\alpha }({\mathbb {R}})\), \(\alpha \ge 1\), suitable for multi-resolution analysis in \(L^2({\mathbb {R}})\). We assume that \(\psi \) satisfies the vanishing moment condition
One example are Daubechies wavelets with \( M:= \lfloor \alpha \rfloor \in {\mathbb {N}}\) vanishing moments also known as \({\mathrm D}{\mathrm B}(\lfloor \alpha \rfloor )\)-wavelets), that have support \([-M+1,M]\) and are in \(\textrm{C}^1({\mathbb {R}})\) for \(M\ge 5\) (see, e.g., [12, Section 7.1]). For any \(j\in {\mathbb {N}}_0\) and \(k\in {\mathbb {Z}}\), the MRA is defined by the dilated and translated functions
As \(\Vert \phi \Vert _{L^2({\mathbb {R}})}=\Vert \psi \Vert _{L^2({\mathbb {R}})}=1\), it follows that \(((\psi _{0,k,0}), k\in {\mathbb {Z}})\,\cup \,((2^{j/2}\psi _{j,k,1}), (j,k)\in {\mathbb {N}}_0\times {\mathbb {Z}})\) is an orthonormal basis of \(L^2({\mathbb {R}})\).
2.1.2 Multivariate MRA
A corresponding isotropicFootnote 1 wavelet basis that is orthormal in \(L^2({\mathbb {R}}^d)\), \(d\ge 2\) may be constructed by tensorization of univariate MRAs. We define index sets \({\mathcal {L}}_0:=\{0,1\}^d\) and \({\mathcal {L}}_j:={\mathcal {L}}_0\setminus \{(0,\dots ,0)\}\) for \(j\in {\mathbb {N}}\). We note that \({\mathcal {L}}_j\) has cardinality \(|{\mathcal {L}}_j|=2^d\) if \(j=0\), and \(|{\mathcal {L}}_j|=2^d-1\) otherwise. For any \(l\in {\mathcal {L}}_0\), we define furthermore
to obtain that \(((\psi _{j,k,l}),\; j\in {\mathbb {N}}_0,\, k\in {\mathbb {Z}}^d,\, l\in {\mathcal {L}}_j)\) is an orthonormal basis of \(L^2({\mathbb {R}}^d)\).
Orthonormal bases consisting of locally supported, periodic functions on the torus \({\mathbb {T}}^d\) can be introduced by tensorization, as e.g. in [26, Section 1.3]. Given \(\phi \) and \(\psi \), we fix a scaling factor \(w\in {\mathbb {N}}\) such that
With this choice of w, it follows for \(j\in {\mathbb {N}}_0\) that
Now let \(K_{j}:=\{k\in {\mathbb {Z}}^d|\,0\le k_1,\dots ,k_d<2^{j}\}\subset 2^{j}{\mathbb {T}}^d\) and note that \(|K_{j+w}|=2^{d(j+w)}\). Define the one-periodic wavelet functions
and their restrictions to \({\mathbb {T}}^d\) by
We now obtain for the index set \({\mathcal {I}}_{w}:=\{j\in {\mathbb {N}}_0,\; k\in K_{j+w},\;l\in {\mathcal {L}}_j\}\) that
is a \(L^2({\mathbb {T}}^d)\)-orthonormal basis, see [26, Proposition 1.34]. We next introduce Besov spaces via suitable wavelet-characterization as developed in [26, Chapter 1.3]. For this we introduce the set of one-periodic distributions on \({\mathbb {R}}^d\) given by
see [26, Eq. (1.131)]. We distinguish between spaces of one-periodic functions on \({\mathbb {R}}^d\) and their restrictions to \({\mathbb {T}}^d\):
Definition 2.1
-
1.
For any \(p\in [1,\infty )\) and \(s\in (0,\alpha )\) the Besov space \(B_{p,p}^{s,per}({\mathbb {R}}^d)\) of one-periodic functions on \({\mathbb {R}}^d\) is given by
$$\begin{aligned} B_{p,p}^{s,per}({\mathbb {R}}^d):= \left\{ \varphi \in \textrm{S}'^{per}({\mathbb {R}}^d)\bigg |\; \sum _{(j,k,l)\in {\mathcal {I}}_w} 2^{jp\left( s+\frac{d+w}{2}-\frac{d}{p}\right) } |(\varphi ,\psi _{j+w,k,l}^{per})_{L^2({\mathbb {T}}^d)}|^p <\infty \right\} .\nonumber \\ \end{aligned}$$(7)In case that \(p=\infty \), one has
$$\begin{aligned} B_{\infty ,\infty }^{s,per}({\mathbb {R}}^d):= \left\{ \varphi \in \textrm{S}'^{per}({\mathbb {R}}^d)\bigg |\; \sup _{(j,k,l)\in {\mathcal {I}}_w} 2^{j\left( s+\frac{d+w}{2}\right) } |(\varphi ,\psi _{j+w,k,l}^{per})_{L^2({\mathbb {T}}^d)}| <\infty \right\} .\nonumber \\ \end{aligned}$$(8) -
2.
For any \(p\in [1,\infty )\) and \(s\in (0,\alpha )\) the Besov space \(B_{p,p}^s({\mathbb {T}}^d)\) on \({\mathbb {T}}^d\) is given by
$$\begin{aligned} B_{p,p}^s({\mathbb {T}}^d):= \left\{ \varphi \in \mathrm D'({\mathbb {T}}^d) \bigg |\; \sum _{(j,k,l)\in {\mathcal {I}}_w} 2^{jp\left( s+\frac{d+w}{2}-\frac{d}{p}\right) } |(\varphi ,\psi _{j+w,k}^l)_{L^2({\mathbb {T}}^d)}|^p <\infty \right\} . \end{aligned}$$(9)In case that \(p=\infty \), we set
$$\begin{aligned} B_{\infty ,\infty }^s({\mathbb {T}}^d):= \left\{ \varphi \in \mathrm D'({\mathbb {T}}^d) \bigg |\; \sup _{(j,k,l)\in {\mathcal {I}}_w} 2^{j\left( s+\frac{d+w}{2}\right) } |(\varphi ,\psi _{j+w,k}^l)_{L^2({\mathbb {T}}^d)}| <\infty \right\} . \end{aligned}$$(10)
Remark 2.2
Definition 2.1 may be generalized to define the spaces \(B_{p,q}^{s,per}({\mathbb {R}}^d)\) and \(B_{p,q}^s({\mathbb {T}}^d)\) with \(p,q\in [1,\infty ]\) and \(p\ne q\), see [26, Chapter 1.3]. The random fields introduced in Sects. 2.2 and 2.3 are naturally \(B_{p,p}^s({\mathbb {T}}^d)\)-valued by construction, thus we only treat the case \(p=q\) for the sake of brevity. By [26, Theorem 1.29] there exists a prolongation isomorphism
that extends \(\varphi \in B_{p,p}^s({\mathbb {T}}^d)\) to its (unique) one-periodic counterpart \(\textrm{prl}^{per}(\varphi )\in B_{p,p}^{s,per}({\mathbb {R}}^d)\). This in turn allows to identify any \(\varphi \in B_{p,p}^s({\mathbb {T}}^d)\) as the restriction of a periodic function \(\textrm{prl}^{per}(\varphi )\in B_{p,p}^{s,per}({\mathbb {R}}^d)\) to \({\mathbb {T}}^d\). We use \(\textrm{prl}^{per}\) to define (non-periodic) Besov space-valued random variables on subsets \({\mathcal {D}}\subset {\mathbb {T}}^d\) by restriction in Sect. 3.2.
Definition 2.1 is based on an equivalent characterization of the spaces \(B_{p,p}^{s,per}({\mathbb {R}}^d)\) and \(B_{p,p}^s({\mathbb {T}}^d)\). They are often (equivalently) defined using a dyadic partition of unity (see e.g. [26, Definitions 1.22 and 1.27]): Using the latter definition for \(B_{p,p}^{s,per}({\mathbb {R}}^d)\), [26, Theorem 1.36(i)] shows that the spaces (7) resp. (8) are isometrically isomorphic to \(B_{p,p}^{s,per}({\mathbb {R}}^d)\). As a consequence of the prolongation map \(\textrm{prl}^{per}\) in (11), the same holds true for the spaces \(B_{p,p}^s({\mathbb {T}}^d)\), see [26, Theorem 1.37(i)].
2.1.3 Besov spaces and MRAs
We define the subspace \(V_{w+1}:=\text {span}\{\psi _{w,k}^l|\; k\in K_w,\ l\in {\mathcal {L}}_0\}\subset L^2({\mathbb {T}}^d)\) and observe that \(\dim (V_{w+1})=2^{d(w+1)}\). By the multiresolution analysis for one-periodic, univariate functions in [12, Chapter 9.3], it follows that \(( (\psi _{j,k}^l),\;j\le w,\; k\in K_j,\ l\in {\mathcal {L}}_j)\) is another orthonormal basis of \(V_{w+1}\). Hence, we may replace the first \(2^{d(w+1)}\) basis functions in (5) to obtain the (computationally more convenient) \(L^2({\mathbb {T}}^d)\)-orthonormal basis
Based on (12), we define for \(s>0\), \(p\in [1,\infty )\), and sufficiently regular \(\varphi \in L^2({\mathbb {T}}^d)\) the Besov norms
and
By Definition 2.1, it follows that \(\varphi \in B_{p,p}^s({\mathbb {T}}^d)\) if and only if \(\Vert \varphi \Vert _{B_{p,p}^s({\mathbb {T}}^d)}<\infty \).
2.1.4 Notation
We fix some notation for Besov, Hölder and Zygmund spaces to be used in the remainder of this paper. As the (periodic) domain \({\mathbb {T}}^d\) does not vary in the subsequent analysis, we use the abbreviations \(B_p^s:=B_{p,p}^s({\mathbb {T}}^d)\), \(\textrm{C}^\alpha :=\textrm{C}^\alpha ({\mathbb {T}}^d)\) and \({\mathcal {C}}^\alpha :={\mathcal {C}}^\alpha ({\mathbb {T}}^d)\) for convenience in the following. Furthermore, we will assume that the basis functions in \({\varvec{\Psi }}_w, {\varvec{\Psi }}\subset \textrm{C}^\alpha \), are sufficiently smooth with Hölder index \(\alpha >s\) for given \(s>0\), and therefore omit the restriction \(s\in (0,\alpha )\) in the following. In this case, it holds that \({\varvec{\Psi }}_w\) (and thus \({\varvec{\Psi }}\)) is a basis of \(B_p^s\) for \(p<\infty \), see [26, Theorem 1.37].
We recall that for any \(s>0\) there holds \({\mathcal {C}}^s=B^s_\infty \) (see [26, Remark 1.28]), as well as \(\textrm{C}^s={\mathcal {C}}^s\) for \(s\in (0,\infty )\setminus {\mathbb {N}}\), and \(\textrm{C}^s\subsetneq {\mathcal {C}}^s\) for \(s\in {\mathbb {N}}\) (see [25, Section 1.2.2]). By (13) and (14) we further obtain the continuous embeddings
with the embedding constants in (15) bounded by one (cf. [27, Chapter 2.1]).
2.2 Besov priors
To introduce Besov space-valued random variables as in [24], we consider a complete probability space \((\Omega ,{\mathcal {A}}, P)\). Following the constructions in [4, 11, 23], based on the representation in (9), we now define \(B_p^s\)-valued random variables by replacing the \(L^2({\mathbb {T}}^d)\)-orthogonal projection coefficients \((\varphi ,\psi _{j,k}^l)_{L^2({\mathbb {T}}^d)}\) with suitable random variables. More precisely, consider for any \(p\in [1,\infty )\) an independent and identically distributed (i.i.d.) sequence \(X=((X_{j,k}^l), (j,k,l)\in {{\mathcal {I}}_{\varvec{\Psi }}})\) of p-exponential random variables. That is, each \(X_{j,k}^l:\Omega \rightarrow {\mathbb {R}}\) is \({\mathcal {A}}/{\mathcal {B}}({\mathbb {R}})\)-measurable with density
where \(\kappa >0\) is a fixed scaling parameter. We recover the normal distribution with variance \(\frac{\kappa }{2}\) if \(p=2\), and the Laplace distribution with scaling \(\kappa \) for \(p=1\).
Definition 2.3
[24, Definition 9] Let \({\varvec{\Psi }}\) be the \(L^2({\mathbb {T}}^d)\)-orthogonal wavelet basis as in (12), let \(s>0\), \(p\in [1,\infty )\) and let \(X=((X_{j,k}^l), (j,k,l)\in {{\mathcal {I}}_{\varvec{\Psi }}})\) be an i.i.d. sequence of p-exponential random variables with density \(\phi _p\) as in (16). Let the random field \(b:\Omega \rightarrow L^2({\mathbb {T}}^d)\) be given by
We call b a \(B_p^s\)-random variable.
The random variables b from Definition 2.3 are also referred to as Besov priors in the literature on inverse problems. The following regularity results are well-known:
Proposition 2.4
-
1.
([24, Lemma 10], [11, Proposition 1]) Let b be a \(B_p^s\)-random variable for \(s>0\) and \(p\in [1,\infty )\). Then, the following conditions are equivalent:
-
(i)
\(\Vert b\Vert _{B_p^t}<\infty \) holds P-a.s.;
-
(ii)
\({\mathbb {E}}\left( \exp \left( \varepsilon \Vert b\Vert _{B_p^t}^p\right) \right) <\infty \), \(\varepsilon \in \left( 0,\frac{1}{4 \kappa }\right) \);
-
(iii)
\(t<s-\frac{d}{p}\).
-
(i)
-
2.
[11, Theorem 2.1] If, in addition, \({\varvec{\Psi }}\) forms a basis of \(B_p^t\) for a \(t<s-\frac{d}{p}\), \(t\notin {\mathbb {N}}\), then it holds
$$\begin{aligned} {\mathbb {E}}\left( \exp \left( \varepsilon \Vert b\Vert _{\textrm{C}^t}\right) \right) <\infty ,\quad \varepsilon \in (0,\overline{\varepsilon }), \end{aligned}$$where \(\overline{\varepsilon }>0\) is a constant depending on p, d, s and t.
Remark 2.5
Note that \(B_p^s\)-random variables as defined above only take values in \(B_p^t\) for \(t<s-\frac{d}{p}\) based on the previous proposition. Nevertheless, we use the notion \(B_p^s\)-random variables in the following, for a clearer emphasize on the dependence of \(\eta _j\) in (17) on s.
We derive a considerably stronger version of [11, Theorem 2.1] in Theorem 2.11 below, that implies in particular
for any \(p\ge 1\) and some \(\overline{\varepsilon }>0\). In the Gaussian case with \(p=2\), this estimate would be a consequence of Fernique’s theorem, however, we are not aware of a similar result for arbitrary \(p\ge 1\) in the literature.
We recall from [26, Theorem 1.37] that \({\varvec{\Psi }}\) forms an unconditional basis of \(B_p^t\) (since \(p<\infty \)), if the scaling and wavelet functions \(\phi \) and \(\psi \) satisfy \(\phi , \psi \in \textrm{C}^\alpha ({\mathbb {R}}^d)\) for \(\alpha>t>0\) and the vanishing moment condition (2).
2.3 Besov random tree priors
Taking the cue from [23], we introduce Besov random tree priors in this subsection and derive several regularity results for this \(B_p^s\)-valued random variable. We investigate all results for periodic functions defined on the torus \({\mathbb {T}}^d\) in this subsection. For the elliptic problem in Sect. 3, we will later introduce the corresponding \(B_p^s({\mathcal {D}})\)-valued random variables on physical domains \({\mathcal {D}}\subset {\mathbb {R}}^d\) with \({\mathcal {D}}\subseteq {\mathbb {T}}^d\) by their restrictions from \({\mathbb {T}}^d\) (cf. Definition 3.6). The random tree structure in our prior construction is based on certain set-valued random variables, so-called Galton-Watson (GW) trees. For the readers’ convenience, definitions of discrete trees, GW trees, along with some other useful results, are listed in Appendix A.
Definition 2.6
[23, Definition 3] Let \({\varvec{\Psi }}\), \(s>0\), \(p\in [1,\infty )\), \(X=((X_{j,k}^l), (j,k,l)\in {{\varvec{{\mathcal {I}}}}_{\varvec{\Psi }}})\) and \((\eta _j,j\in {\mathbb {N}}_0)\) be as in Definition 2.3.
Let \({\mathfrak {T}}\) denote the set of all trees with no infinite node (cf. Definition A.1) and let \(T:\Omega \rightarrow {\mathfrak {T}}\) be a GW tree (cf. Definition A.3) with offspring distribution \({\mathcal {P}}=\text {Bin}(2^d, \beta )\) for \(\beta \in [0,1]\), and independent of X. Furthermore, let \(\mathfrak I_T\) be the set of wavelet indices associated to T from (78).
We define the random tree index set \({\mathcal {I}}_T(\omega ):=\{(j,k,l)|\; (j,k)\in \mathfrak I_T(\omega ),\; l\in {\mathcal {L}}_j\}\) and
We refer to \(b_T\) as a \(B_p^s\)-random variable with wavelet density \(\beta \).
We have depicted a sample of a binomial GW tree and the associated set of wavelet indices \(\mathfrak I_T\) for a series expansion in one physical dimension (\(d=1\)) in Fig. 1. Recall that for \(d=1\) there holds \({\mathcal {L}}_j=\{1\}\) for \(j\ge 1\).
Remark 2.7
Definition 2.6 actually slightly deviates from [23, Definition 3]. By definition of \({\mathcal {I}}_T(\omega )\), we include the constant function \(\psi _{0,0}^{(0,\dots ,0)}\equiv 1\in L^2({\mathbb {T}}^d)\) in the series expansion (18). Of course, adding the random constant \(X_{0,0}^{(0,\dots ,0)}\) does not affect the spatial regularity or integrability of \(b_T\). However, in our definition, series (18) has a natural interpretation as orthogonal expansion of a random function with respect to the (deterministic, fixed) basis \({\varvec{ \Psi }}\).
The tree structure in the “active” (i.e., with index in \({\mathcal {I}}_T\)) coefficients in the wavelet representation of \(b_T\) gives rise to random fractals on \({\mathbb {T}}^d\), that occur whenever the tree T in Definition 2.6 does not terminate after a finite number of nodes. It follows by Lemma A.4, that the latter event occurs with positive probability if \(\beta \in (2^{-d},1]\). In this case the Hausdorff dimension of the fractals is \(d+\log _2(\beta )\in (0,d]\), see [23, Section 3] for further details.
Examples of realizations of a \(B_p^s\)-random variable on \({\mathbb {T}}^2\) with varying wavelet density \(\beta \) are shown in Fig. 2.
To treat elliptic inverse problems with \(b_T\) as log-diffusion coefficient, we detail the corresponding probability space of parameters. Let \({\mathbb {Q}}_0\) denote the univariate, p-exponential measure on \(({\mathbb {R}}, {\mathcal {B}}({\mathbb {R}}))\) of the random variables \(X_{j,k}^l\) with Lebesgue density as in (16). The product-probability space of the p-exponentials X is given by \((\Omega _p, {\mathcal {A}}_p, {\mathbb {Q}}_p)\), where
Now let \(s>0\) and \(p\in [1,\infty )\) be fixed such that \(s>\frac{d}{p}\). We define the weighted \(\ell ^p\)-spaces
where
Then \((\ell _s^p, \left\| \cdot \right\| _{s,p})\) is a separable Banach space, for \(1\le p<\infty \). Moreover, we observe that for \(X\sim {\mathbb {Q}}_p\) it holds
since \(s>\frac{d}{p}\), thus \({\mathbb {Q}}_p\) is concentrated on \(\ell _s^p\). We therefore regard \((\ell _s^p, {\mathcal {B}}(\ell _s^p), {\mathbb {Q}}_p)\) as the probability space of random coefficient sequences X in the expansions (17) and (18). The set-valued random variable T is a GW tree, and hence takes values in the Polish space \(({\mathfrak {T}}, \delta _{{\mathfrak {T}}})\) of all trees with no infinite node. The metric \(\delta _{\mathfrak {T}}\) and the associated Borel \(\sigma \)-algebra \({\mathcal {B}}({\mathfrak {T}})\) with respect to \({\mathfrak {T}}\) are stated explicitly in Definition A.2 in the Appendix. The image measure \({\mathbb {Q}}_T\) of the GW tree T on \(({\mathfrak {T}}, {\mathcal {B}}({\mathfrak {T}}))\) then solely depends on the parameters \(\beta \) and d of the offspring distribution \({\mathcal {P}}=\text {Bin}(2^d, \beta )\), and is given in Equation (76) in the Appendix. Hence, the parameter probability space of GW trees is given by \(({\mathfrak {T}}, {\mathcal {B}}({\mathfrak {T}}), {\mathbb {Q}}_T)\). To combine the random coefficients X with the GW tree T, we define the cartesian product \(\Omega :=\ell _s^p\times {\mathfrak {T}}\) and equip \(\Omega \) with the metric
Proposition 2.8
The space \((\Omega , d_\Omega )\) is Polish with Borel \(\sigma \)-algebra given by \({\mathcal {B}}(\Omega )={\mathcal {B}}(\ell _s^p\times {\mathfrak {T}})={\mathcal {B}}(\ell _s^p)\otimes {\mathcal {B}}({\mathfrak {T}})\).
Proof
By Lemma [1, Lemma 2.1] the metric space \(({\mathfrak {T}}, \delta _{\mathfrak {T}})\) with \(\delta _{\mathfrak {T}}\) given in (74) is complete and separable. Therefore, separability and completeness of \((\Omega , d_\Omega )\) follows by [5, Corollary 3.39]. Moreover, \({\mathcal {B}}(\Omega )={\mathcal {B}}(\ell _s^p\times {\mathfrak {T}})={\mathcal {B}}(\ell _s^p)\otimes {\mathcal {B}}({\mathfrak {T}})\) holds by [5, Theorem 4.44]. \(\square \)
We are now ready to define the probability space associated to the \(\ell _s^p\times {\mathfrak {T}}\)-valued random variable (X, T): Let \((\Omega , {\mathcal {A}}, {\mathbb {P}})\) be the product probabilty space given by
We remark that the product structure of the measure \({\mathbb {P}}={\mathbb {Q}}_p \otimes {\mathbb {Q}}_T\) is tantamount to stochastic independence of X and T. It still remains to identify a realization of the random variable (X, T) with the corresponding random tree prior \(b_T\). To this end, we consider the canonical mapping
The map \(b_T:\Omega \rightarrow L^2({\mathbb {T}}^d)\) is indeed well-defined since \(\Vert b_T\Vert _{L^2({\mathbb {T}}^d)}<\infty \) holds due to \(s>\frac{d}{p}\). Moreover, \(b_T\) is \({\mathcal {A}}/{\mathcal {B}}(L^2({\mathbb {T}}^d))\)-measurable, as we show in Proposition 2.10 below. As \(b_T\) is a \(L^2({\mathbb {T}}^d)\)-valued random variable, we may define the pushforward probability measure of \(b_T\) via
Thus, the associated probability space of \(B_p^s\)-random variables \(b_T\) with wavelet density \(\beta \) is
Remark 2.9
We know from Proposition 2.4 that \(b_T\#{\mathbb {P}}\) is concentrated on \(B_p^t\) for any \(t\in (0, s-\frac{d}{p})\). A more refined result that concentrates \(b_T\#{\mathbb {P}}\) on Besov spaces \(B_q^t\) for \(q\ge 1\) with smoothness index \(t=t(s,d,p,\beta ,q)\) is given in Theorem 2.11 below.
We recall at this point that we have assumed Hölder-regularity \({\varvec{\Psi }}\subset \textrm{C}^{\alpha }\) for some \(\alpha \ge 1\) in Sect. 2.1. For the remainder of this article, we will from now on implicitly assume that the parameter \(s>0\) of a \(B_p^s\)-random variable satisfies \(\alpha \ge s>0\) for the sake of presentation.
Proposition 2.10
Let \(s>\frac{d}{p}\), \(\beta \in [0,1]\), and let \(b_T\) be a \(B_p^s\)-random variable with wavelet density \(\beta \). Then \(b_T:\Omega \rightarrow \textrm{C}({\mathbb {T}}^d)\) and \(b_T\) is (strongly) \({\mathcal {A}}/{\mathcal {B}}(\textrm{C}({\mathbb {T}}^d))\)-measurable.
Proof
Fix a \(t\in (0,s-\frac{d}{p})\) such that \(t\notin {\mathbb {N}}\). The second part of Proposition 2.4 then shows that \(b_T\in \textrm{C}^t \) holds P-a.s., and thus \(b_T:\Omega \mapsto \textrm{C}({\mathbb {T}}^d)\) follows, after possibly modifying \(b_T\) on a P-nullset.
As in Appendix A, we denote by \({\mathcal {U}}\) the set of all finite sequences in \({\mathbb {N}}\) and introduce the subset \({\mathcal {U}}_{\textrm{Bin}}\subset {\mathcal {U}}\) with entries in \(\{1,\dots , 2^d\}\) as
Note that \(T(\omega )\subset {\mathcal {U}}_{\textrm{Bin}}\) holds P-a.s., since \({\mathcal {P}}=\text {Bin}(2^d, \beta )\). Now let \(\mathfrak I_{d,j}\) be as in (77) and recall from Appendix A.2 that \((j,k)\in \mathfrak I_T(\omega )\) if and only if there is \(\mathfrak n\in T(\omega )\) such that \((j,k)=(|\mathfrak n|, \mathfrak I_{d,|\mathfrak n|}(|\mathfrak n|))\). Hence, we may rewrite the series expansion (18) as
As \(T:\Omega \rightarrow {\mathfrak {T}}\) is \({\mathcal {A}}/{\mathcal {B}}({\mathfrak {T}})\)-measurable, it holds that \(\mathbbm {1}_{\{\mathfrak n\in T(\cdot )\}}:\Omega \rightarrow \{0,1\}\) is measurable for any fixed \(\mathfrak n\in {\mathcal {U}}_{\textrm{Bin}}\). Also, the \(X_{j,k}^l\) are real-valued random variables and \(\psi _{j,k}^l\in \textrm{C}({\mathbb {T}}^d)\) by assumption. Thus, \(b_T:\Omega \rightarrow \textrm{C}({\mathbb {T}}^d)\) is measurable, and strongly measurable as \(\textrm{C}({\mathbb {T}}^d)\) is separable. \(\square \)
More insight in the pathwise regularity of Besov random tree priors, in particular with regard to their Hölder regularity, is obtained by the following result.
Theorem 2.11
Let \(b_T\) be a \(B_p^s\)-random variable with wavelet density \(\beta =2^{\gamma -d}\) as in Definition 2.6 with \(\gamma \in (-\infty , d]\).
-
1.)
It holds that \(b_T\in L^q(\Omega ; {B^t_q})\), and hence \(b_T\in B_q^t\) P-a.s., for all \(t>0\) and \(q\ge 1\) such that \(t < s+ \frac{d-\gamma }{q} - \frac{d}{p}\).
-
2.)
Let \(s-\frac{d}{p}>0\) and \(t\in (0,s-\frac{d}{p})\). Then there is a \(\varepsilon _p>0\) such that
$$\begin{aligned} {\mathbb {E}}\left( \exp \left( \varepsilon \Vert b\Vert _{{\mathcal {C}}^t}^p\right) \right) < \infty ,\quad \varepsilon \in (0,\varepsilon _p), \end{aligned}$$In particular, it holds \(b_T\in L^q(\Omega ; {\mathcal {C}}^t)\) for any \(q\ge 1\).
-
3.)
Let \(q\ge 1\) and \(s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q}>0\). For any \(t\in (0,s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) it holds \(b_T\in L^q(\Omega ; {\mathcal {C}}^t)\).
Proof
(1) For given \(q\ge 1\) and \(t>0\) it holds by (13) that
For any given \(j\in {\mathbb {N}}\), by Definition 2.6, the number of nodes v(j) on scale j in the random tree T is binomial distributed (conditional on \(v(j-1)\)) as
with initial value \(v(0)=1\). Now let \((X_{j,m}, j\in {\mathbb {N}}_0, m\in {\mathbb {N}})\) be an i.i.d. sequence of p-exponential random variables, independent of v(j) for any \(j\in {\mathbb {N}}\), and recall that \(l\in {\mathcal {L}}_j\) with \(|{\mathcal {L}}_0|=2^d\) and \(|{\mathcal {L}}_j|=2^d-1\) for \(\in {\mathbb {N}}\). We obtain by Fubini’s theorem, Wald’s identity, \({\mathbb {E}}(v(j))=(2^d\beta )^j=2^{j\gamma }\), and since \({\mathbb {E}}\left( |X_{j,m}|^q\right) <\infty \) for any \(q>0\) that
with \({\mathbb {E}}\left( |X_{1,1}|^q\right) <\infty \). The series converges if \(t < s+ \frac{d-\gamma }{q} - \frac{d}{p}\), in which case \(b_T\in L^q(\Omega ; {B^t_q})\), and hence \(b_T\in B^t_q\) holds P-a.s.
(2) Now let \(q_0\ge q\ge 1\), \(t_0>\frac{d}{q_0}\) and \(t=t_0-\frac{d}{q_0}\), so that \(B_{q_0}^{t_0}\hookrightarrow {\mathcal {C}}^t\) holds by (15). The embedding follows by a direct comparison of the norms in (13), (14) with \(t=t_0-\frac{d}{q_0}\), and also shows that the corresponding embedding constant \(C_0>0\) is bounded by \(C_0\le 1\).
We obtain with Hölder’s inequality and analogously to the first part the estimate
Now let \(t<s-\frac{d}{p}\) be fixed, and let \(\gamma \in (0,d]\). For every fixed \(q\ge \max (2\gamma (s-\frac{d}{p}-t)^{-1}, 1)\), we choose \(q_0:=q\) to obtain that
with a constant \(C>0\) that is independent of q. We now define for given \(\varepsilon >0\), finite \(n\in {\mathbb {N}}\) and \(p \in [1,\infty )\) the random variable
Clearly, \(E_n(\omega )\rightarrow \exp (\varepsilon \Vert b_T(\omega )\Vert ^p_{{\mathcal {C}}^t})\) holds P-.a.s as \(n\rightarrow \infty \) and Inequality (25) yields, for any \(n\in {\mathbb {N}}\) and \(n_\gamma :=\frac{1}{p}\lceil 2\gamma (s-\frac{d}{p}-t)^{-1} \rceil \), that
where \({\widetilde{C}} = {\widetilde{C}}(\gamma , s,d,p,t)>0\). The monotone convergence theorem then shows that for sufficiently small \(\varepsilon >0\) and \(t<s-\frac{d}{p}\), it holds
(3) The claim for \(\gamma \in (0,d]\) follows by the previous part. For \(\gamma \in (-\infty ,0]\), \(q\ge 1\) and \(t\in (0,s-\frac{d}{p}-\frac{\gamma }{q})\), we finally use \(q_0 = q\) and \(t_0=t+\frac{d}{q}\) in (24) to obtain that
\(\square \)
Remark 2.12
[23, Theorems 4 and 5] state that \(b_T\in B_p^t\) holds P-a.s. for all \(t\in (0, s-\gamma /p)\), and that \(b_T\notin B_p^{s-\gamma /p}\) occurs with probability \(1-p_\beta >0\) for \(\gamma \in (0, d]\), where \(p_\beta \) is the solution to the equation \(p_\beta =((1-\beta )+\beta p_\beta )^{2^d}\) (cf. Lemma A.4 in the Appendix.) We emphasize that Theorem 2.11 significantly extends these previous results, as we quantify precisely the regularity of \(b_T\) in terms of Besov and Hölder-Zygmund norms.
Recall that we may replace the Hölder-Zygmund spaces \({\mathcal {C}}^t\) in the second part of Theorem 2.11 by the “usual” Hölder spaces \(\textrm{C}^t\) if \(t\notin {\mathbb {N}}\) (which is not true for integer t). Theorem 2.11 shows that a wavelet density \(\beta =2^{\gamma -d}<1\) improves smoothness in \(B_q^t\), as the upper bound \(t < s+ \frac{d-\gamma }{q} - \frac{d}{p}\) is decreasing in \(\gamma \in (-\infty ,d]\). However, given that \(\gamma >0\) we may not expect to gain any (pathwise) Hölder regularity beyond \(t<s-\frac{d}{p}\). This is not surprising with regard to Remark 2.7: \(b_T\) admits an infinite series expansion on random fractals in \({\mathcal {D}}\) for \(\beta >2^{-d}\) with positive probability. Hence, the local Hölder-regularity of \(b_T\) on such fractals corresponds to a \(B^s_p\)-random variable b as in Definition 2.3 (with full wavelet density \(\beta =1\)). In case that \(\gamma \le 0\), the series expansion of \(b_T\) terminates almost surely after a finite number of terms. As \({\varvec{\Psi }}\subset \textrm{C}^{\alpha }\), this shows that \(b_T\in \textrm{C}^\alpha \) almost surely if \(\gamma \le 0\). Furthermore, we may increase the smoothness exponent t for \(b_T\in L^q(\Omega ; {\mathcal {C}}^t)\) to the admissible range \(t<s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q}\). For large q, we see that essentially the restriction \(t<s-\frac{d}{p}\) applies as for \(\gamma >0\). This in turn indicates that the bound
from part 2.) of Theorem 2.11 can not be improved to Hölder indices \(t\ge s-\frac{d}{p}\), even if \(\gamma \le 0\).
3 Linear elliptic PDEs with Besov random coefficients
In this section, we first recall well-posedness and regularity results for linear, second order elliptic diffusion problems with random coefficient. Thereafter, we transfer the results to a setting with Besov tree random diffusion coefficient by exploiting the results from Sect. 2.
3.1 Well-posedness and regularity
Let \({\mathcal {D}}\subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\), be a convex polygonal domain for \(d=2,3\), and a finite interval for \(d=1\), with the boundary \(\partial {\mathcal {D}}\) consisting of a finite number of line or plane segments. We consider the random elliptic problem to find \(u(\omega ):{\mathcal {D}}\rightarrow {\mathbb {R}}\) for given \(\omega \in \Omega \) such that
The diffusion coefficient a in Problem (26) admits positive paths on \({\mathcal {D}}\), i.e., \(a(\omega ):{\mathcal {D}}\rightarrow {\mathbb {R}}_{>0}\). Moreover, a is a random variable \(a:\Omega \rightarrow {\mathcal {X}}\), taking values in a suitable Banach space \({\mathcal {X}}\subset L^\infty ({\mathcal {D}})\). The source term \(f:{\mathcal {D}}\rightarrow {\mathbb {R}}\) is assumed to be a deterministic function for the sake of simplicity, but may as well be modeled by a random function \(f:{\mathcal {D}}\times \Omega \rightarrow {\mathbb {R}}\). For the variational formulation of Problem (26) we define \(H:=L^2({\mathcal {D}})\), \(V:=H_0^1({\mathcal {D}})\) and recall that \(\left\| \cdot \right\| _V:V\rightarrow {\mathbb {R}}_{\ge 0},\, v\mapsto \Vert \nabla v \Vert _H\) defines a norm on V by Poincare’s inequality. The weak formulation of Problem (26) for fixed \(\omega \in \Omega \) is to find \(u(\omega )\in V\) such that for any \(v\in V\) it holds
Definition 3.1
The map \(\omega \mapsto u(\omega ) \in V\) with \(u(\omega )\) the solution of (27) is the pathwise weak solution.
Existence and uniqueness of pathwise weak solutions are ensured by the following theorem.
Theorem 3.2
Let \(a:\Omega \rightarrow L^\infty ({\mathcal {D}})\) be strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurable such that
and let \(f\in V'\). Then, there exists for all \(\omega \in \Omega \) a unique weak solution \(u(\omega )\in V\) to Problem (26). The map \(u:\Omega \rightarrow V\) is strongly \({\mathcal {A}}/{\mathcal {B}}(V)\)-measurable.
Proof
By the completeness of \((\Omega ,{\mathcal {A}},P)\), we may assume without loss of generality that \(a_-(\omega )>0\) and \(a(\omega )\in L^\infty ({\mathcal {D}})\) holds for all \(\omega \in \Omega \)Footnote 2.
Existence and uniqueness of a pathwise solution \(u(\omega )\) now follows for all \(\omega \in \Omega \) by the Lax-Milgram Lemma. To show strong measurability of u, we use Lipschitz dependence of the coefficient-to-solution map: consider two diffusion coefficients \(a_1,a_2:\Omega \rightarrow L^\infty ({\mathcal {D}})\) that satisfy the assumption of the theorem with lower bounds \(a_{1,-}, a_{2,-}>0\) as in (28) and denote by \(u_1,u_2:\Omega \rightarrow V\) the associated unique weak solutions. Equation (26) together with \(\Vert v\Vert _V^2 = \Vert \nabla v\Vert _H^2\) and Hölder’s inequality yields for any fixed coefficients \(a_1,a_2 \in L^\infty ({\mathcal {D}})\) such that \(a_{i,-}:= {{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} a_{i}(x) > 0\) that
Therefore, the data-to-solution map \(U: S \rightarrow V,\; a \mapsto u\) is (Lipschitz) continuous on the set \(S:=\{ a\in L^\infty ({\mathcal {D}})|\, {{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} a(x) > 0 \}\). Since the pathwise weak solution \(u:\Omega \rightarrow V\) of (26) may be written as \(u=U\circ a\), the claim follows with the strong \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurability of \(a:\Omega \rightarrow L^\infty ({\mathcal {D}})\). \(\square \)
Lipschitz continuity (29) of the data-to-solution map will be essential in deriving error estimates in Sect. 4 ahead, and also implies strong measurability of random solutions.
Proposition 3.3
Let \(a_1,a_2:\Omega \rightarrow L^\infty ({\mathcal {D}})\) be strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurable such that
Then, for every \(f\in V'\) exists for \(i\in \{1,2\}\) and for all \(\omega \in \Omega \) a unique weak solution \(u_i(\omega )\in V\) to Problem (26) (with a in place of \(a_i\)). There holds the continuous-dependence estimate
Proof
This follows immediately with Theorem 3.2 and (29). \(\square \)
From the regularity analysis of deterministic linear elliptic problems it is well known that \(H^s({\mathcal {D}})\)-regularity of u may be derived for certain \(s>1\), provided that a is Hölder continuous. The corresponding estimates usually do not reveal the explicit dependence of constants on \(a(\omega )\) or bounds on the Hölder norm \(\Vert a(\omega )\Vert _{\textrm{C}^t}\). For the stochastic problem and the ensuing numerical analysis in Sects. 4 and 5, however, we need the explicit dependence for given \(\omega \) to ensure that all pathwise estimates also hold in in \(L^q(\Omega ; H^s({\mathcal {D}}))\) for suitable \(q\ge 1\). To obtain explicit estimates, we follow the approach from [13, Chapter 3.3] for parametric elliptic PDEs, where regularity estimates are derived via the K-method of function space interpolation.Footnote 3 One obtains in particular Hölder spaces \(\textrm{C}^r(\overline{{\mathcal {D}}})\) by interpolation ([3, Lemma 7.36]):
To investigate spatial regularity of solutions to (26), we introduce the normed space
Note that \(v=0\Leftrightarrow \Vert v\Vert _W=0\) follows by the maximum principle, since \(v\in V=H^1_0({\mathcal {D}})\) has vanishing trace. We formulate regularity results in terms of the interpolation space
For a concise notation, we further set \(W^1:=W\) in the following.
Lemma 3.4
[13, Propositions 3.2 and 3.5] Let \(a:\Omega \rightarrow \textrm{C}^r(\overline{{\mathcal {D}}})\subset L^\infty ({\mathcal {D}})\) be strongly measurable for some \(r\in (0,1]\) such that \(a_-(\omega )>0\) holds P-a.s. and let \(f\in H\). Then, there is a constant \(C=C(r,{\mathcal {D}})\), such that it holds
All results from this subsection so far hold under the considerably weaker assumption that \({\mathcal {D}}\subset {\mathbb {R}}^d\) is a bounded Lipschitz domain. However, since \({\mathcal {D}}\) is assumed convex, we are able to embed \(W^r\) in (fractional) Sobolev spaces. This is made precise in the following Lemma, which is in required for the finite element error analysis in Sect. 4.2.
Lemma 3.5
Let \({\mathcal {D}}\) be convex, \(W^r:=[V, W]_{r,\infty }\) for \(r\in (0,1)\) and let \(W^1:=W\). Then, it holds that \(W=W^1\hookrightarrow H^2({\mathcal {D}})\). Moreover, \(W^r\hookrightarrow H^{1+r_0}({\mathcal {D}})\) for any \(r_0\in (0,r)\).
Proof
By convexity of \({\mathcal {D}}\), we have that \(\Vert v\Vert _{H^2({\mathcal {D}})}\le C_{\mathcal {D}}\Vert v\Vert _W\) holds for all \(v\in W\), where \(C_{\mathcal {D}}\) only depends on the diameter of \({\mathcal {D}}\), see, e.g., [16, Theorem 3.2.1.2]. Thus, \(W\hookrightarrow H^2({\mathcal {D}})\cap V\) follows.
For the case \(r\in (0,1)\), we recall that there is \(C_V>0\), such that \(\Vert v\Vert _{H^1({\mathcal {D}})}\le C_{V}\Vert v\Vert _V\) holds for all \(v\in V\) by Poincaré’s inequality. Moreover, we have \(\Vert w\Vert _{H^2({\mathcal {D}})}\le C_{{\mathcal {D}}}\Vert w\Vert _W\) for any \(w\in W\), and hence \(W\subset H^2\cap V\) from the first part. For \(v\in V\subset H^1\) this yields
Hence, \(W^r\hookrightarrow [H^1({\mathcal {D}}),H^2({\mathcal {D}})]_{r,\infty }\). The claim now follows since for any \(\varepsilon \in (0,1+r)\) there holds
see [3, Section 7.32]. \(\square \)
3.2 Besov random tree priors as log-diffusion coefficient
To formulate Problem (26) with a Besov random tree coefficient, we assume that \({\mathcal {D}}\subseteq {\mathbb {T}}^d\). We follow [26, Section 2] and define, for given \(\omega \in \Omega \), the random element \(b_T(\omega ):{\mathcal {D}}\rightarrow {\mathbb {R}}\) as the restriction of a periodic function in \(B_{p,p}^{s,per}\) to the domain \({\mathcal {D}}\). The restriction \(\varphi |_{{\mathcal {D}}}\) of any \(\varphi \in \textrm{S}'({\mathbb {R}}^d)\) to \({\mathcal {D}}\) is in turn given by the element \(\varphi |_{{\mathcal {D}}}\in \textrm{D}'({\mathcal {D}})\) such that
where \(v_0\in \mathrm D({\mathbb {R}}^d)\subset \textrm{S}({\mathbb {R}}^d)\) denotes the zero-extension of any \(v\in \mathrm D({\mathcal {D}})\).
Definition 3.6
Let \({\mathcal {D}}\subseteq {\mathbb {T}}^d\) be a bounded, connected domain. Let \(b_T\) be given in Definition 2.6 for \(p\in [1,\infty )\), \(s>0\) and \(\beta =2^{\gamma -d}\in [0,1]\), and let \(\textrm{prl}^{per}:B_{p,p}^s({\mathbb {T}}^d)\rightarrow B_{p,p}^{s,per}({\mathbb {R}}^d)\) denote the isomorphic extension operator from (11). Then we define for any \(\omega \in \Omega \)
and call \(b_{T,{\mathcal {D}}}\) a \(B_p^s({\mathcal {D}})\)-valued random variable.
Remark 3.7
In case that \({\mathcal {D}}={\mathbb {T}}^d\), we may readily use the identification \(b_{T,{\mathcal {D}}}=b_T\). Note that \(b_{T,{\mathcal {D}}}\) is periodic in this case, in the sense that there exists an extension \(\textrm{prl}^{per}b_{T,{\mathcal {D}}}\in B_{p,p}^{s,per}({\mathbb {R}}^d)\). If \({\mathcal {D}}\subsetneq {\mathbb {T}}^d\), however, \(b_{T,{\mathcal {D}}}\) is not (necessarily) periodic, but merely the restriction of a periodic function from the torus \({\mathbb {T}}^d\).
Remark 3.8
The same procedure could be applied for general bounded domains \({\mathcal {D}}\not \subset {\mathbb {T}}^d\), by extending Definition (2.6) from the torus \({\mathbb {T}}^d\) to a sufficiently large (periodic) domain \([-L,L]^d\) for \(L>1\). This would increase the index-set \(K_j\) of wavelet coefficients by at most a constant factor on each dyadic scale j. However, all regularity proofs from Sect. 2 are carried out similar in this setting, with minor changes to absolute constants. For instance, the admissible range of \(\varepsilon \) in Proposition 2.4 may become smaller if \(L>1\), but the smoothness parameter \(t\in (0,s-\frac{d}{p})\) is unaffected. Therefore, assuming \({\mathcal {D}}\subseteq {\mathbb {T}}^d\) for the sake of brevity does not have any substantial impact on the following results.
We consider Problem (26), resp. its weak formulation (27), with \(a(\omega ):=\exp \left( b_T(\omega )\right) \), where \(b_T\) is a \(B_p^s\)-random variable with wavelet density \(\beta \). That is, we model the log-diffusion by a Besov random tree prior to incorporate fractal structures. With this preparation, we are able to derive well-posedness and regularity of the corresponding pathwise weak solution.
Theorem 3.9
Let \(a:=\exp \left( b_{T,{\mathcal {D}}}\right) \) with \(b_{T,{\mathcal {D}}}\) given in Definition 3.6 for \(p\in [1,\infty )\), \(s>0\) and \(\beta =2^{\gamma -d}\in [0,1]\), so that \(sp>d\). Furthermore, let \(f\in V'\).
-
(1)
Then, there exists almost surely a unique weak solution \(u(\omega )\in V\) to (26) and \(u:\Omega \rightarrow V\) is strongly measurable.
-
(2)
For sufficiently small \(\kappa >0\) in (16), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that
$$\begin{aligned} \Vert u\Vert _{L^q(\Omega ; V)}\le C \Vert f\Vert _{V'} <\infty \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1,\text { and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$ -
(3)
Let \(r\in (0,s-\frac{d}{p})\cap (0,1]\), \(f\in H\) and \(W^r\) as in (32). For sufficiently small \(\kappa >0\) in (16), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that
$$\begin{aligned} \Vert u\Vert _{L^q(\Omega ; W^r)} \le C \Vert f\Vert _{H} <\infty \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1\text { and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$
Proof
1.) As \(sp>d\), Theorem 2.11 shows that \(b_T\in \textrm{C}({\mathbb {T}}^d)\) holds P-a.s. Moreover, \(b_T:\Omega \rightarrow \textrm{C}({\mathbb {T}}^d)\) is strongly measurable by Proposition 2.10, and thus in particular strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathbb {T}}^d))\)-measurable, since \({\mathcal {B}}(\textrm{C}({\mathbb {T}}^d))\subset {\mathcal {B}}(L^\infty ({\mathbb {T}}^d))\). As \(b_{T,{\mathcal {D}}}\) in Definition 3.6 is the restriction of \(\textrm{ext}^{per}b_T\) to \({\mathcal {D}}\subset {\mathbb {T}}^d\), and \(a=\exp \circ \,b_{T,{\mathcal {D}}}\), it follows that, \(a:\Omega \rightarrow \textrm{C}(\overline{{\mathcal {D}}})\), and a is strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurable such that \(a_->0\) holds P-a.s. Theorem 3.2 then guarantees the P-a.s. existence of a unique pathwise weak solution \(u(\omega )\). Moreover, \(u:\Omega \rightarrow V\) is strongly measurable and Equation (27) shows that
2.) To show the second part, we fix \(t\in (0,s-\frac{d}{p})\) and \(q\ge 1\) to see that
We have used that \(\exp (\cdot )\) is strictly increasing for the second equality, and that \(b_T(x)\) is a centered random variable such that \(b_T\) and \(-b_T\) are equal in distribution. This implies in turn that \(a_-\) and \(\Vert a\Vert _{L^\infty ({\mathcal {D}})}\) are equal in distribution, which we used in the second inequality. The last estimate is due to \(\Vert b_T\Vert _{L^\infty ({\mathbb {T}}^d)}\le \Vert b_T\Vert _{{\mathcal {C}}^t}\) for any \(t>0\). For \(p=1\), we note that \(\varepsilon _p\) in the second part of Theorem 2.11 may be chosen as \(\varepsilon _p=(\kappa C)^{-1}\), where \(C>0\) is the constant in (25). Hence, for sufficiently small \(\kappa >0\), we may set \(\overline{q}:= \varepsilon _p>1\) in the claim. In case that \(p>1\), Young’s inequality shows that for any \(q\ge 1\) there is an arbitrary small \(\varepsilon >0\) and a constant \(C_\varepsilon =C_\varepsilon (p,q)\in (0,\infty )\) such that
Thus, we have no restrictions on \(q\in [1,\infty )\), which proves the second part of the claim.
3.) Let \(\left\| \cdot \right\| \) denote the Euclidean norm on \({\mathcal {D}}\). Observe that for any fixed \(r\in (0,1)\) we obtain by Taylor expansion and since \(\exp (\cdot )\) is strictly increasing that
We obtain further for \(r=1\) that
For any fixed \(r\in (0,s-\frac{d}{p})\cap (0,1]\) Lemma 3.4 now shows that
where we have used (34), (35) in the second step, applied Jensen’s inequality twice in the third step, and used again that \(b_T\) and \(-b_T\) are equal in distribution together with \(\exp (\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})}) \ge 1\) to derive the third estimate. We may further assume without loss of generality that \(\Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}\ge 1\) to obtain with (36) and Hölder’s inequality for \(q_1, q_2>1\) such that \(\frac{1}{q_1}+\frac{1}{q_2}=1\)
where we have also used that \(\Vert b_{T,{\mathcal {D}}}\Vert _{L^\infty ({\mathcal {D}})}\le \Vert b_{T}\Vert _{\textrm{C}^r}\) and that \(\Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})} \le \Vert b_T\Vert _{\textrm{C}^r}\) for any \(r>0\). To bound the Hölder-norm \(\Vert b_T\Vert _{\textrm{C}^r}\) in (37), we first consider the case \(r<1\). Then, we recall from Sect. 2.1 that \(\textrm{C}^r={\mathcal {C}}^r\) with equivalent norms, thus \(\Vert b_T\Vert _{\textrm{C}^r}\le C\Vert b_T\Vert _{{\mathcal {C}}^r}\). If \(r=1\), then \(s-\frac{d}{p}>1\), and we use the same argument to derive the bound \(\Vert b_T\Vert _{\textrm{C}^r}\le \Vert b_T\Vert _{\textrm{C}^{r+\varepsilon }}\le C\Vert b_T\Vert _{{\mathcal {C}}^{r+\varepsilon }}\) for any \(\varepsilon \in (0,s-\frac{d}{p}-1)\).
For \(p=1\), Theorem 2.11 now shows again that for sufficiently small \(\kappa >0\), there are admissible choices \(q,q_1,q_2\in [1,\infty )\), dependent on r, such that the right hand side in (37) is finite. The proof is concluded by noting that \(q\in [1,\infty )\) may again be arbitrary large in (37) if \(p>1\), independent of r. \(\square \)
4 Pathwise finite element approximation
4.1 Dimension truncation
To obtain a tractable approximation of \(b_T\) in (18), we truncate the wavelet series expansion after \(N\in {\mathbb {N}}\) scales to obtain the \(truncated random tree Besov prior \)
The corresponding diffusion problem in weak form with truncated coefficient for fixed \(\omega \in \Omega \) is to find \(u_N(\omega )\in V\) such that for all \(v\in V\)
where
Existence, uniqueness, and regularity of \(u_N\) follows analogously as for u in the previous section.
Corollary 4.1
Let \(N\in {\mathbb {N}}\), \(a_N=\exp \left( b_{T,N}|_{{\mathcal {D}}}\right) \) with \(b_{T,N}\) be given as in (38) for \(p\in [1,\infty )\), \(s>0\) and \(\beta =2^{\gamma -d}\in [0,1]\), so that \(sp>d\). Furthermore, let \(f\in V'\). Then the following holds.
-
1.)
There exists almost surely a unique weak solution \(u_N(\omega )\in V\) to the truncated Problem (39) and \(u_N:\Omega \rightarrow V\) is strongly measurable.
-
2.)
For sufficiently small \(\kappa >0\) in (16), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\)
$$\begin{aligned} \Vert u_N\Vert _{L^q(\Omega ; V)}\le C \Vert f\Vert _{V'} <\infty \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1\text {, and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$ -
3.)
Let \(r\in (0,s-\frac{d}{p})\cap (0,1]\) and \(f\in H\). There are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\)
$$\begin{aligned} \Vert u_N\Vert _{L^q(\Omega ; W^r)}\le C \Vert f\Vert _{H} < \infty \quad {\left\{ \begin{array}{ll} &{}\text {for }q\in [1,\overline{q})\text { if }p=1\text { and} \\ &{}\text {for any }q\in [1,\infty )\text { if }p>1. \end{array}\right. } \end{aligned}$$
Proof
The result follows analogously to Theorems 2.11 and 3.9, upon observing that \(\Vert b_{T,N}(\omega )\Vert _{B^t_q}\le \Vert b_{T}(\omega )\Vert _{B^t_q}\) holds P-a.s. for any \(t>0\), \(q\in [1,\infty ]\), and \(N\in {\mathbb {N}}\). \(\square \)
The important observation from Corollary 4.1 is that the bounds are independent of N, which is crucial when estimating the finite element discretization error of \(u_N\) in the next subsection. We bound the truncation errors \(a-a_N\) and \(u-u_N\) in the remainder of this section.
Proposition 4.2
Let \(a:=\exp \left( b_{T,{\mathcal {D}}}\right) \) with \(b_{T,{\mathcal {D}}}\) as given in Definition 3.6 with \(p\in (1,\infty )\), \(s>0\), \(\beta =2^{\gamma -d}\in [0,1]\) and such that \(sp>d+\min (\gamma ,0)\). Let \(b_{T,N}\) and \(a_N\) be the approximations of \(b_T\) and a for given \(N\in {\mathbb {N}}\) as in (38) and (40), respectively.
-
1.)
For any \(q\ge 1\) and \(t\in (0,s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) there is a constant \(C>0\) such that for every \(N\in {\mathbb {N}}\) it holds
$$\begin{aligned} \Vert b_{T,{\mathcal {D}}}-b_{T,N}|_{\mathcal {D}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \le C 2^{N\left( t-s+\frac{d}{p}+\frac{\min (\gamma ,0)}{q}\right) }. \end{aligned}$$ -
2.)
Moreover, for any \(q\ge 1\), \(\varepsilon >0\) and \(t\in (0,s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) there is a \(C>0\) such that for every \(N\in {\mathbb {N}}\) it holds
$$\begin{aligned} \Vert a-a_N\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))} \le C 2^{N\left( t-s+\frac{d}{p}+\frac{\min (\gamma +\varepsilon ,0)}{q}\right) }. \end{aligned}$$
Proof
1.) Let \(q_0\ge q\), \(t_0>\frac{d}{q_0}\) and \(t = t_0-\frac{d}{q_0}\), so that \(B_{q_0}^{t_0}\hookrightarrow {\mathcal {C}}^t\). For any fixed \(N\in {\mathbb {N}}\), we obtain with Hölder’s inequality analogously to the proof of Theorem 2.11 the estimate
Now let \(t<s-\frac{d}{p}\) and \(\gamma \in (0,d]\) in (41), and choose \(q_0=\max \Big (\gamma N, 2\gamma (s-\frac{d}{p}-t)^{-1}, q\Big )\) (for sufficiently large, given N) to obtain that
The final bound in (42) is independent of \(q_0=q_0(N)\), which shows the first part of the claim for \(\gamma \in (0,d]\). For \(\gamma \in (-\infty ,0]\), we use \(q_0 = q\) in (41) to obtain for any \(t\in (0,s-\frac{d}{p}-\frac{\gamma }{q})\) that
2.) To prove the second part, we observe that for any \(t\in \left( 0, s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q}\right) \) there holds
where the last equation follows by independence of \(b_T - b_{T,N}\) and \(b_{T,N}\). The first factor in this expression is bounded analogously to \(\Vert e^{b_{T,{\mathcal {D}}}}\Vert _{L^q(\Omega ; {\mathcal {C}}^t(\overline{{\mathcal {D}}}))}\) by using the estimates (34) (resp. (35)), Theorem 2.11 and Hölder’s inequality
where \(q_1,q_2>1\) are such that \(\frac{1}{q_1}+\frac{1}{q_2}=1\) and the bound holds uniformly in N.
In addition, Taylor expansion yields
From the proof of the second part of Theorem 3.9 it follows that \(\Vert e^{b_{T,{\mathcal {D}}} - b_{T,N}|_{\mathcal {D}}}\Vert _{L^q(\Omega ; L^\infty ({\mathcal {D}}))}<\infty \) is bounded uniformly with respect to N for all \(q\ge 1\).
Hölder’s inequality for \(p_1,p_2>1\) such that \(\frac{1}{p_1}+\frac{1}{p_2}=1\) thus shows together with the truncation error in (43) that
The claim follows for any \(\varepsilon >0\) by choosing \(p_2>1\) so small that \(\min (\gamma , 0) \le p_2\min (\gamma +\varepsilon , 0)\). \(\square \)
Remark 4.3
We emphasize that all estimates in Proposition 4.2 are independent of \({\mathcal {D}}\subset {\mathbb {T}}^d\), as all uniform error bounds are derived with respect to \({\mathbb {T}}^d\). Proposition 4.2 shows in particular that for any \(q\ge 1\) and \(t\in (0, s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) there is a \(C>0\) such that for any \(N\in {\mathbb {N}}\) it holds
This estimate is essential to bound the truncation error \(u-u_N\) of the approximated elliptic problem in (39), see Theorem 4.4 below. In the borderline case \(p=1\) with sufficiently small \(\kappa >0\) and \(sp>d\), we still recover the slightly weaker estimates
for sufficiently small \(q\ge 1\) (depending on \(\kappa \)) and \(t\in (0,s-\frac{d}{p})\), independently of \(\gamma \). This may be seen from by letting \(p_1\rightarrow 1\) and \(p_2\rightarrow \infty \) in the last part of the proof for Proposition 4.2.
Theorem 4.4
Let u be as in (26) with \(a=\exp \left( b_{T,{\mathcal {D}}}\right) \) and let \(u_N\) be as in (26) with \(a_N=\exp \left( b_{T,N}|_{\mathcal {D}}\right) \) given by (40). Furthermore, let \(b_{T,{\mathcal {D}}}\) be such that \(p\in (1,\infty )\), \(s>0\), \(\beta =2^{\gamma -d}\in [0,1]\), and \(sp >d\ge d + \min (\gamma ,0)\). Then, for any \(q\ge 1\) and \(t\in (0, s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q})\) there is a \(C>0\) such that for every \(N\in {\mathbb {N}}\) and it holds
Proof
For fixed \(\omega \in \Omega \) and \(N\in {\mathbb {N}}\), we obtain by Proposition 3.3
where \(a_{N,-}(\omega ):={{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} a_N(\omega ,x)\). Taking expectations yields with Hölder’s inequality
where \(q_1,q_2,q_3>1\) are such that \(\frac{1}{q}=\sum _{i=1}^3\frac{1}{q_i}\) and \(\Vert f\Vert _{V'}<\infty \). As in the proof of part 2.) in Theorem 3.2, we conclude for any \(q_1\in [1,\infty )\) and \(t\in (0,s-\frac{d}{p})\) with Theorem 2.11 that
Similarly, it follows for all \(q_2\in [1,\infty )\) that
where we emphasize that the last bound is uniform with respect to N. Proposition 4.2 and Remark 4.3 show for \(q_3\in [1,\infty )\) and \(t\in (0,s-\frac{d}{p}-\frac{\min (\gamma ,0)}{q_3})\) that
This, together with (46), shows the claim, as \(q_3>q\) may be chosen arbitrary close to q, and
holds for all \(q_1,q_2\in [1,\infty )\) with \(C=C(q_1,q_2)>0\), and uniform with respect to N. \(\square \)
Remark 4.5
In view of Remark 4.3, we note that for \(p=1\) with sufficiently small \(\kappa >0\) and \(sp>d\) there holds the slightly weaker estimate
for sufficiently small \(q\ge 1\) (depending on \(\kappa \)) and \(t\in (0,s-\frac{d}{p})\), independently of \(\gamma \). This may also be seen by letting \(q_1,q_2\rightarrow \frac{1}{2q}\) and \(q_3\rightarrow \infty \) in the proof of Theorem 4.4.
4.2 Finite element discretization
The solution \(u_N:\Omega \rightarrow V\) to Problem (39) with truncated diffusion coefficient is still not fully tractable, as it takes values in the infinite-dimensional Hilbert space V. Thus, we consider Galerkin-finite element approximations of \(u_N\) in a finite-dimensional subspace of V. Corollary 4.1 provides the necessary regularity of \(u_N\), independent of the truncation index N, therefore we fix \(N\in {\mathbb {N}}\) for the remainder of this section.
We partition the convex, polytopal domain \({\mathcal {D}}\subset {\mathbb {T}}^{d}\), \(d\in \{1,2,3\}\) by a sequence of simplices (intervals/triangles/tetrahedra) or parallelotopes (intervals/parallelograms/parallelepipeds), denoted by \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\). The refinement parameter \(h>0\) takes values in a countable index set \({\mathfrak {H}}\subset (0,\infty )\) and corresponds to the longest edge of a simplex/parallelotope \(K\in {\mathcal {K}}_h\). We impose the following assumptions on \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) to obtain a sequence of “well-behaved” triangulations.
Assumption 4.6
The sequence \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) satisfies:
-
1.
Admissibility: For each \(h\in {\mathfrak {H}}\), \({\mathcal {K}}_h\) consists of open, non-empty simplices/parallelotopes K such that
-
\(\overline{{\mathcal {D}}}= \bigcup _{K\in {\mathcal {K}}_h} \overline{K}\),
-
\(K_1\cap K_2=\emptyset \) for any two \(K_1, K_2\in {\mathcal {K}}_h\) such that \(K_1\ne K_2\), and
-
the intersection \(\overline{K}_1\cap \overline{K}_2\) for \(K_1\ne K_2\) is either empty, a common edge, a common vertex, or (in space dimension \(d=3\)) a common face of \(K_1\) and \(K_2\).
-
-
2.
Shape-regularity: Let \(\rho _{K,in}\) and \(\rho _{K,out}\) denote the radius of the largest in- and circumscribed circle, respectively, for a given \(K\in {\mathcal {K}}_h\). There is a constant \(\rho > 0\) such that
$$\begin{aligned} \rho : = \sup _{h\in {\mathfrak {H}}} \sup _{K\in {\mathcal {K}}_h} \frac{\rho _{K,out}}{\rho _{K,in}} < \infty . \end{aligned}$$
Based on a given tesselation \({\mathcal {K}}_h\), we define the space of piecewise (multi-)linear finite elements
Clearly, \(V_h\subset V\) is a finite-dimensional space and we define \(n_h:=\dim (V_h)\in {\mathbb {N}}\). This yields for fixed \(\omega \in \Omega \) the fully discrete problem to find \(u_{N,h}(\omega )\in V_h\) such that for all \(v_h\in V_h\)
Theorem 4.7
Let \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) be a sequence of triangulations satisfying Assumption 4.6, and let \(u_N\) and \(u_{N,h}\) be the pathwise weak solutions to (39) and (47). Furthermore, let \(N\in {\mathbb {N}}\), \(a_N\) be given as in (40) for \(p\in [1,\infty )\) and \(s>0\), such that \(sp>d\), and with \(\beta =2^{\gamma -d}\in [0,1]\).
For any \(f\in H\), sufficiently small \(\kappa >0\) in (16) and any \(r\in (0,s-\frac{d}{p})\cap (0,1]\), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds
Proof
We recall that \(a_{N,-}(\omega ):={{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} a_{N,-}(\omega )>0\) and obtain by Cea’s Lemma
Now first suppose that \(p>1\). Since \(f\in H\), it holds by Corollary 4.1 for any \(q\ge 1\) that \(u_N\in L^q(\Omega ; W^r)\) for \(r\in (0, s-\frac{d}{p})\cap (0,1]\). For \(0<s-\frac{d}{p}\le 1\), we have \(r\in (0,s-\frac{d}{p})\), and Lemma 3.5, shows \(u_N\in L^q(\Omega ; H^{1+r_0}({\mathcal {D}}))\) for any \(r_0\in (0,r)\). It hence follows for \(r_0\in (0,r)\) that
This is a standard result for first order Lagrangian FEM, see, e.g., [17, Theorems 8.62/8.69] or [9, Theorem 4.4.20]. The constant \(C>0\) in (49) depends on the shape-regularity parameter \(\rho \) and on \({\mathcal {D}}\), but is independent of \(u_N\) and h. Combining (48) and (49) shows with Hölder’s inequality
We have used that \(a_{N,-}\) and \(\Vert a_N\Vert _{L^\infty ({\mathcal {D}})}\) are equal in distribution for the second estimate, and Lemma 3.5 in the third line. The last step follows for any \(q\in [1,\infty )\) by Corollary 4.1 and Proposition 4.2 since \(p>1\). Moreover, as a further consequence of Corollary 4.1 and Proposition 4.2, the constant \(C>0\) in the final estimate in (50) bounded independently of N and h. Since \(0<s-\frac{d}{p}\le 1\), we may choose \(r_0<r<s-\frac{d}{p}\) arbitrary close to \(s-\frac{d}{p}\).
On the other hand, if \(s-\frac{d}{p}>1\) and \(r=1\), Lemma 3.5 implies that \(u_N\in L^q(\Omega ; H^{2}({\mathcal {D}}))\). Estimates (49) and (50) then hold for \(r_0=r=1\), which proves the claim in case that \(p>1\).
For \(p=1\) and given \(q\ge 1\), we need to assume in addition that \(\kappa >0\) be sufficiently small such that Corollary 4.1 and (45) in Remark 4.3 hold with q replaced 3q. In this case, the claim for \(p=1\) follows analogously as for \(p>1\). \(\square \)
In the proof of Theorem 4.7, we obtain exponential moments of power 3q by Hölder’s inequality. For the case \(p=1\), we therefore need approximately that \(\kappa < \frac{1}{3q}\) (up to summation constants) to counter-balance this exponent and obtain \(a_N \in L^{3q}(\Omega ; L^\infty ({\mathcal {D}}))\) and \(u_N \in L^{3q}(\Omega ; W^r)\).
Theorem 4.8
Let the assumptions of Theorem 4.7 hold. For any \(f\in H\), sufficiently small \(\kappa >0\) in (16) and for any \(r\in (0,s-\frac{d}{p})\cap (0,1]\), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds
Proof
The proof uses the well-known Aubin-Nitsche duality argument. Let \(e_{N,h}:=u_N-u_{N,h}\) and consider for fixed \(\omega \in \Omega \) the dual problem to find \(\varphi (\omega )\in V\) such that for all \(v\in V\) it holds
We need to investigate the regularity and integrability of \(\varphi \) as a first step. Lemma 3.4 shows that
Let \(t\in (0, s-\frac{d}{p})\) be fixed. We integrate both sides of (52) and use Hölder’s inequality as in the third part of Theorem 3.9 (cf. Inequality (37)) to obtain for \(q_0\ge 1\) and
\(q_1,\dots ,q_3\in [1,\infty )\) such that \(1=\sum _{i=1}^3\frac{1}{q_i}\)
Note that we have again assumed that \(\Vert b_{T,{\mathcal {D}}}\Vert _{\textrm{C}^r(\overline{{\mathcal {D}}})}\ge 1\) without loss of generality to derive (53).
By Theorems 2.11, 4.7 and Proposition 4.2, we now conlude that the right hand side in (53) is finite and bounded uniformly in N for any \(q_0\ge 1\) if \(p>1\), as the Hölder conjugates \(q_1,\dots ,q_{3}\in [1,\infty )\) may be arbitrary large.
For \(p=1\), we further need that \(\kappa >0\) in (16) is sufficiently small, so that \(\varepsilon _p>q_0\max (q_1(1+\frac{1}{r}), \frac{q_2}{r})\) in Theorem 2.11 and that \(\overline{q}\ge q_0q_3\) in Theorem 4.7. Given that \(\kappa >0\) is sufficiently small, there is for any \(p\ge 1\) a \(q_0\ge 1\) such that \(\varphi \in L^{q_0}(\Omega ; W^r)\).
For the next step, we combine Equations (39) and (47) to show the Galerkin orthogonality
Let \(P_h:V\rightarrow V_h\) denote the V-orthogonal projection onto \(V_h\). Testing with \(v=e_{N,h}(\omega )\in V\) in (51) then shows together with \(v_h=P_h\varphi (\omega )\) in (54) that
Estimate (55) then yields for \(q\in [1,\infty )\) with Hölder’s inequality
First, suppose again that \(p>1\), where \(\varphi \in L^{q_0}(\Omega ; W^r)\) holds for any \(q_0\ge 1\). Proposition 4.2 and Theorem 4.7 yield
where \(C>0\) is independent of N and h.
Lemma 3.5, \(\varphi \in L^{q_0}(\Omega ; W^r)\), and (49) further show that
The claim follows as in Theorem 4.7, since \(r=r_0=1\) if \(s-\frac{d}{p}>1\), and \(r_0<r<s-\frac{d}{p}\) may be arbitrary close to \(s-\frac{d}{p}\) otherwise.
For \(p=1\) and given \(q\ge 1\) on the other hand, we need to assume that \(\kappa >0\) is sufficiently small so that \(\varphi \in L^{q_0}(\Omega ; W^r({\mathcal {D}}))\) for \(q_0=\frac{3q}{2}\), and that (45) and Theorem 4.7 hold with q replaced by \(\frac{3q}{2}\). The claim then follows as for \(p>1\) from (4.2). \(\square \)
Bounds on the overall approximation errors with respect to V and H now follow as an immediate consequence of Theorems 4.4, 4.7, 4.8 and Remark 4.5.
Corollary 4.9
Let the assumptions of Theorem 4.7 hold, let \(f\in H\), let \(t\in (0,s-\frac{d}{p})\) and assume given \(r\in (0,s-\frac{d}{p})\cap (0,1]\). Then there holds:
-
1.)
For \(p=1\) and sufficiently small \(\kappa >0\) in (16), there are constants \(\overline{q}=\overline{q}(\kappa )\in (1,\infty )\) and \(C>0\) such that for any \(q\in [1,\overline{q})\), \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds
$$\begin{aligned} \Vert u-u_{N,h}\Vert _{L^q(\Omega ; V)}&\le C(2^{-tN}+h^{r}), \\ \Vert u-u_{N,h}\Vert _{L^q(\Omega ; H)}&\le C(2^{-tN}+h^{2r}). \end{aligned}$$ -
2.)
For \(p\in (1,\infty )\) and any \(q\in [1,\infty )\) there is a constant \(C>0\) such that for any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds
$$\begin{aligned} \Vert u-u_{N,h}\Vert _{L^q(\Omega ; V)}&\le C(2^{N\left( -t+\frac{\min (\gamma ,0)}{q}\right) }+h^{r}), \\ \Vert u-u_{N,h}\Vert _{L^q(\Omega ; H)}&\le C(2^{N\left( -t+\frac{\min (\gamma ,0)}{q}\right) }+h^{2r}). \end{aligned}$$
5 Multilevel Monte Carlo estimation
We consider Monte Carlo estimation of \({\mathbb {E}}(\Psi (u))\) for a given functional \(\Psi \) and u as solution to (27) with Besov random tree coefficient a. We replace u by a tractable approximation \(u_{N,h}\) to evaluate \(\Psi (u_{N,h})\approx \Psi (u)\) and bound the overall error consisting of the pathwise discretization from Sect. 4 and the statistical error of the Monte Carlo approximation.
Assumption 5.1
-
1.)
Let \(\theta \in [0,1]\), let \(\Psi :H^\theta ({\mathcal {D}})\rightarrow {\mathbb {R}}\) be Fréchet-differentiable on \(H^\theta ({\mathcal {D}})\) and denote by
$$\begin{aligned} \Psi ':H^\theta ({\mathcal {D}})\rightarrow {\mathcal {L}}(H^\theta ({\mathcal {D}}); {\mathbb {R}})=(H^{\theta }({\mathcal {D}}))' \end{aligned}$$the Fréchet-derivative of \(\Psi \). There are constants \(C>0\), \(\rho _1,\rho _2\ge 0\) such that for all \(v\in H^\theta ({\mathcal {D}})\)
$$\begin{aligned} |\Psi (v)|\le C(1+\Vert v\Vert _{H^\theta ({\mathcal {D}})}^{\rho _1}), \quad \Vert \Psi '(v)\Vert _{{\mathcal {L}}(H^\theta ({\mathcal {D}}); {\mathbb {R}})}\le C(1+\Vert v\Vert _{H^\theta ({\mathcal {D}})}^{\rho _2}). \end{aligned}$$(57) -
2.)
For \(q:=2\max (\rho _1,\rho _2+1)\), there holds \(u\in L^{q}(\Omega ;V)\).
-
3.)
\(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) is a collection of triangulations satisfying Assumption 4.6.
-
4.)
There are constants \(t>0\), \(r\in (0,1]\) and \(C>0\) such that for \(q=2\max (\rho _1,\rho _2+1)\) and any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) it holds
$$\begin{aligned} \Vert u-u_{N,h}\Vert _{L^q(\Omega ; V)}\le C(2^{-tN}+h^{r}),&\quad \Vert u-u_{N,h}\Vert _{L^q(\Omega ; H)}\le C(2^{-tN}+h^{2r}). \end{aligned}$$
Remark 5.2
Assumption 5.1 is natural, and includes in particular bounded linear functions \(\Psi \), where \(\rho _1=1\) and \(\rho _2=0\). Item 2.) follows by Theorem 3.9 and Item 4.) by Corollary 4.9, with no further restrictions whenever \(p>1\). Only in case that \(p=1\), \(\kappa >0\) needs to be sufficiently small to ensure that all bounds hold for \(q=2\max (\rho _1,\rho _2 + 1)\ge 2\).
5.1 Singlelevel Monte Carlo
We use Monte Carlo (MC) methods to approximate \({\mathbb {E}}(\Psi (u))\) for a given functional \(\Psi \). To this end, we first consider the standard MC estimator for (general) real-valued random variables.
Definition 5.3
Let \(Y:\Omega \rightarrow {\mathbb {R}}\) be a random variable and let \((Y^{(i)},i\in {\mathbb {N}})\) be a sequence of i.i.d. copies of Y. For \(M\in {\mathbb {N}}\) we define Monte Carlo estimator \(E_M(Y):\Omega \rightarrow {\mathbb {R}}\) as
As we are not able to sample directly from the distribution of u, we rely on i.i.d. copies \((u^{(i)}_{N,h}, i\in {\mathbb {N}})\) of the pathwise approximation \(u_{N,h}\) from Sect. 4. Thereby, in addition to the statistical MC error of order \({\mathcal {O}}(M^{-1/2})\), we introduce a sampling bias that depends on N and h.
Theorem 5.4
Let \(M\in {\mathbb {N}}\), let \(E_M(\Psi (u_{N,h}))\) be the MC estimator as in (58), and let Assumption 5.1 hold. Then, there is a constant \(C>0\), such that for any \(M, N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) it holds
Proof
We split the overall error via
To bound I, we use independence of \(\Psi (u)^{(i)}\) and \(\Psi (u)^{(j)}\) for \(i\ne j\) to see that
Assumption 5.1 further shows that
where we have used that \(\theta \le 1\) and \(u\in L^{2\rho _1}(\Omega ; V)\).
To bound II, we use Equation (57) and derive the pathwise estimate
By Assumption 5.1, there is a \(C>0\) such that for every N and every \(0<h\le 1\) it holds
Furthermore, as \(\theta \in [0,1]\), we have by the Gagliardo-Nirenberg interpolation inequality
Thus, Hölder’s inequality with conjugate exponents \(q_1=\frac{\rho _2+1}{\rho _2}\), \(q_2=\rho _2+1\) (and \(q_1=\infty \), \(q_2=1\) for \(\rho _2=0\)) shows that
\(\square \)
The error contributions in Theorem 5.4 are balanced by choosing
With this choice, achieving the target accuracy \(\Vert {\mathbb {E}}(\Psi (u))-E_M(\Psi (u_{N,h}))\Vert _{L^2(\Omega )} ={\mathcal {O}}(\varepsilon )\) requires sampling \(M={\mathcal {O}}(\varepsilon ^{-2})\) high-fidelity approximations \(u_{N,h}\) with \(N={\mathcal {O}}(\frac{\log (\varepsilon )}{t})\) scales and mesh refinement \(h={\mathcal {O}}(\varepsilon ^{\frac{1}{(2-\theta )r}})\). This is computationally challenging in dimension \(d\ge 2\) and for low-regularity problems, i.e., when \(t,r>0\) are small. To alleviate the computational burden, we propose a multilevel Monte Carlo extension of the estimator \(E_M\) in the next subsection.
5.2 Multilevel Monte Carlo
The multilevel Monte Carlo (MLMC) algorithm was invented by Heinrich [18] to compute parametric integrals, then rediscovered and popularized by Giles [14, 15], and has since then found various applications in uncertainty quantification and beyond.
To apply this methodology to our model problem we fix a maximum refinement level \(L\in {\mathbb {N}}\) and consider a sequence of approximated solutions \(u_{N_\ell , h_\ell }\) with \((N_\ell , h_\ell )\in {\mathbb {N}}\times {\mathfrak {H}}\) for \(\ell \in \{1,\dots ,L\}\). We assume that \(N_1<\dots <N_L\) and \(h_1>\dots >h_L\), so that the error \(u-u_{N_\ell ,h_\ell }\) decreases with respect to the level \(\ell \). For notational convenience, we define \(\Psi _\ell :=\Psi (u_{N_\ell ,h_\ell })\) as the approximation of the quantity of interest \(\Psi (u)\) on level \(\ell \), and set \(\Psi _0:=0\). The basic idea of the MLMC method for estimating \({\mathbb {E}}(\Psi (u))\) is to exploit the telescopic expansion
of the high-fidelity approximation \(\Psi _L\). On each level \(\ell \), the correction \({\mathbb {E}}(\Psi _\ell -\Psi _{\ell -1})\) is estimated by (standard) MC estimator with \(M_\ell \) samples. This yields the multilevel Monte Carlo estimator
with level-dependent numbers of samples \(M_1,\dots ,M_L\in {\mathbb {N}}\). We assume that the estimators \(E_{M_\ell }(\Psi _\ell -\Psi _{\ell -1})\) are jointly independent across the levels \(\ell =1,\dots ,L\).
Provided that \(\textrm{Var}(\Psi _\ell -\Psi _{\ell -1})\) decays sufficiently fast in \(\ell \), we choose \(M_1>\dots >M_L\) such that the majority of samples are generated cheaply on low levels \(\ell \), while only a few expensive samples for large \(\ell \) are necessary. This entails massive computational savings compared to a singlelevel Monte Carlo (SLMC) estimator as in (58), that requires a large number of expensive samples on level L, and does not exploit the level hierarchy whatsoever. The computational gain of the MLMC method is precisely quantified under certain assumptions in Giles’ complexity theorem ([14, Theorem 3.1]). Given some \(\varepsilon >0\), Giles derives the optimal number of refinement levels L and associated numbers of samples \(M_1,\dots , M_L\) that guarantee \(\Vert {\mathbb {E}}(\Psi (u))-E^L(\Psi (u))\Vert _{L^2(\Omega )}\le \varepsilon \). The latter requires exact knowledge of all constants in Assumption 5.1, and, furthermore, exact knowledge of the cost for sampling one instance of \(\Psi _\ell \). As this is not feasible a-priori, we choose a slightly different approach to determine the MLMC parameters. We retain the optimal order of complexity as in [14, Theorem 3.1].
Assumption 5.5
Let \(({\mathcal {K}}_h)_{h\in {\mathfrak {H}}}\) be a sequence of triangulations satisfying Assumption 4.6, and assume that \(h_\ell \in {\mathfrak {H}}\) for any \(\ell \in {\mathbb {N}}\). Furthermore, in view of the multilevel convergence analysis, we assume that there are \(0<\underline{c}_{\mathcal {K}}\le \overline{c}_{\mathcal {K}}<1\) and \(h_0>0\) such that
One sample of \(\Psi _\ell =\Psi (u_{N_\ell ,h_\ell })\) with \(u_{N_\ell ,h_\ell }\in V_{h_\ell }\) and \(n_\ell :=\dim (V_{h_\ell })={\mathcal {O}}(h_\ell ^{-d})\) is realized in \({\mathcal {O}}(n_\ell )\) work and memory.
Remark 5.6
Assumption 5.5 is natural and holds, for instance, with \(\underline{c}_{\mathcal {K}}, \overline{c}_{\mathcal {K}}\approx \frac{1}{2}\) if the mesh \({\mathcal {K}}_{h_\ell }\) is obtained from \({\mathcal {K}}_{h_{\ell -1}}\) by bisection of the longest edge of each \(K\in {\mathcal {K}}_{h_{\ell -1}}\). We remark that in general, it may be hard to achieve \(\underline{c}_{\mathcal {K}}= \overline{c}_{\mathcal {K}}\), which is why we imposed an upper and lower bound in (66). Simulating \(\Psi _\ell \) requires \({\mathcal {O}}(n_\ell )={\mathcal {O}}(h_\ell ^{-d})\) floating point operations per sample when using multilevel solvers for continuous piecewise linear or multi-linear elements. We also refer to Lemma B.2 in Appendix B.3, where we show that the expected cost of sampling \(b_{N,T}\) on the associated grid is of order \({\mathcal {O}}(h_\ell ^{-d})\) if \(\beta <1\).
Theorem 5.7
Let Assumptions 5.1 and 5.5 hold. For t, r and \(\theta \) as in Assumption 5.1, let \(\varepsilon \in (0, h_0^{(2-\theta )r})\) and select the MLMC parameters in (65) for \(\ell \in \{1,\dots ,L\}\) as
For given \(L\in {\mathbb {N}}\), choose the weights \(w_\ell >0\) to determine \(M_\ell \) such that \(\sum _{\ell =1}^L w_\ell ^{-1} \le C_w < \infty \), for sufficiently large, fixed \(C_w > 0\) independent of L.
Then, there is a \(C>0\), such that for any \(\varepsilon \in (0, h_0^{(2-\theta )r})\) it holds
Proof
We use the error splitting
We obtain in the same fashion as for the term II in the proof of Theorem 5.4 that
where we have used \(2^{-tN_L}\le h_L^{(2-\theta )r}\) by (67) in the last step. To bound the second term, we expand \({\mathbb {E}}(\Psi _L)\) in a telescopic sum to obtain with (65)
The first equality holds since the MC estimators \(E_{M_1}(\Psi _1), \dots , E_{M_L}(\Psi _L-\Psi _{L-1})\) are jointly independent and unbiased in the sense that \({\mathbb {E}}(\Psi _\ell -\Psi _{\ell -1})={\mathbb {E}}(E_{M_\ell }(\Psi _\ell -\Psi _{\ell -1}))\). The triangle inequality and the estimate (62) (from the proof of Theorem 5.4) then further yield
where we have used Assumption 5.5 and the choices for \(M_\ell \) and \(N_\ell \) in (67) in the last step. As \(\sum _{\ell =1}^L w_\ell ^{-1}\le C_w<\infty \) is bounded independently of L, we conclude with (66), L as in (67), and \(0<\underline{c}_{\mathcal {K}}\le \overline{c}_{\mathcal {K}}<1\) that
\(\square \)
The computational advantages of the MLMC method are precisely quantified in the next statement. Therein, the choice of \(w_\ell \) plays a key role and depends on the relation of variance decay and cost of sampling on each level.
Theorem 5.8
Let Assumptions 5.1 and 5.5 hold, and let \(\varepsilon >0\). Given t, r and \(\theta \) and \(\varepsilon >0\), set \(L, M_\ell \) and \(N_\ell \) as in Theorem 5.7 and choose the weight functions
where \(\iota >0\) is an arbitrary small constant. Then, the MLMC estimator satisfies
with computational cost \({\mathcal {C}}_{MLMC}\) for \(\varepsilon \rightarrow 0\) of order
Proof
Since \(\underline{c}_{\mathcal {K}}\in (0,1)\), it holds in each scenario \(\sum _{\ell =1}^L w_\ell ^{-1}\le C\) for a constant \(C>0\), and uniform with respect to L. Therefore, we conclude by Theorem 5.7 that
and it remains to derive the complexity in terms of \(\varepsilon \).
Assumption 5.5 implies that \(h_L \le h_\ell \underline{c}_{\mathcal {K}}^{(L-\ell )} h_0\). We obtain with (67) that
Let \({\mathcal {C}}_\ell \) denote the work required to generate on sample of \(\Psi _\ell \). As \(h_\ell \ge \underline{c}_{\mathcal {K}}^{\ell } h_0\), it holds that
Since we generate \(M_\ell \) independent \(\Psi _\ell -\Psi _{\ell -1}\) on each level (and also generate independent samples across all levels), the overall cost of the MLMC estimator is with (68) and (69) bounded by
Now first suppose that \(2(2-\theta )r-d>0\). Since \(\underline{c}_{\mathcal {K}}\in (0,1)\), the ratio test for sum convergence shows that for any \(\iota >0\) we obtain the uniform bound (with respect \(L\in {\mathbb {N}}\))
On the other hand, if \(2(2-\theta )r-d=0\), there holds with \(w_\ell =L\)
Finally, for \(2(2-\theta )r-d<0\), it follows that there is a \(C>0\) such that
Altogether, we obtain that there exists a constant \(C>0\) independent of L such that
With L from (67) it follows that \(\underline{c}_{\mathcal {K}}^L={\mathcal {O}}(\varepsilon ^{\frac{1}{(2-\theta )r}}h_0^{-1})\) for \(\varepsilon \rightarrow 0\). This shows the following asymptotics for the \(\varepsilon \)-cost bounds as \(\varepsilon \rightarrow 0\)
\(\square \)
Remark 5.9
The asymptotic complexity bounds for \(\varepsilon \rightarrow 0\) are of the same magnitude as in [14, Theorem 3.1] and [10, Theorem 1], but only require knowledge of the parameters r, t and \(\theta \), and not on further absolute constants. From Theorem 5.4, the SLMC estimator requires for given \(\varepsilon >0\) a total of \(M\approx \varepsilon ^{-2}\) samples with refinement parameters satisfying \(2^{-tN}\approx h^{(2-\theta )r}\approx \varepsilon \). Hence, \(h={\mathcal {O}}(\varepsilon ^{\frac{1}{(2-\theta )r}})\) and, assuming availability of a linear complexity solver such as multigrid, the computational cost per sample is bounded asymptotically as \({\mathcal {O}}(\dim (V_h))={\mathcal {O}}(h^{-d})={\mathcal {O}}(\varepsilon ^{-\frac{d}{(2-\theta )r}})\). The total cost of the SLMC estimator to achieve \(\varepsilon \)-accuracy is therefore
Consequently, under the stated assumptions, MLMC-FEM achieves a considerable reduction in (asymptotic) \(\varepsilon \)-complexity, even in low-regularity regimes with \(2(2-\theta )r<d\).
In case that \(2(2-\theta )r>d\), the assumption that \(E_{M_1}(\Psi _1), \dots , E_{M_L}(\Psi _L-\Psi _{L-1})\) are independent MC estimators is not required to derive the optimal complexity \({\mathcal {C}}_{MLMC}={\mathcal {O}}(\varepsilon ^{-2})\). Instead, setting \(w_\ell =\ell ^{2(1+\iota )}\) is sufficient to compensate for the dependence across discretiation levels. This could be exploited in a simulation to “recycle” samples from coarser discretization levels in order to further increase efficiency. We refer, e.g., to the discussion in [7, Section 5.2].
6 Numerical experiments
We consider numerical experiments in the rectangular domain \({\mathcal {D}}:={\mathbb {T}}^2=(0,1)^2\) and use the constant source function \(f\equiv 1\). For the spatial discretization we use bilinear finite elements that may be efficiently computed by exploiting their tensor product-structure, see Appendix B.2 for details. The initial mesh width is given by \(h_0=\frac{1}{2}\) and we use dyadic refinements with factor \(\underline{c}_{\mathcal {K}}=\overline{c}_{\mathcal {K}}= \frac{1}{2}\) to obtain a sequence of tesselations \(({\mathcal {K}}_h,\; h=2^{-\ell } h_0,\; \ell \in {\mathbb {N}})\) that satisfies Assumption 4.6 for the MLMC algorithm. We further use midpoint quadrature to assemble the stiffness matrix for each realization of the diffusion coefficient. The resulting quadrature error does not dominate the FE error convergence from Theorems 4.7 and 4.8, as we show in Lemma B.1 in the Appendix. For given N and a rectangular mesh \({\mathcal {K}}_h\), we evaluate \(b_{T,N}\) at the midpoint of each \(K\in {\mathcal {K}}_h\) as described in Appendix B.3.
We investigate different parameter regimes of varying smoothness for the diffusion coefficient, the values and resulting pathwise approximation rates r and t as in Corollary 4.9 are collected in Table 1. In all experiments, we build the random field \(b_T\) resp. \(b_{T,N}\) based on Daubechies wavelets with five vanishing moments (“\(\textrm{DB}(5)\)-wavelets”), with smoothness \(\phi , \psi \in C^{1.177}({\mathbb {R}})\) (see [12, Section 7.1.2]). We consider the \(L^2({\mathcal {D}})\)-norm of the gradient as quantity of interest (QoI), with associated functional given by
so that Assumption 5.1 holds with \(\theta =1\).
Given \(\theta \), r and t, we prescribe target accuracies \(\varepsilon = 2^{-r\xi }\), \(\xi \in \{5,\dots , 9\}\) and select, for given \(\varepsilon >0\), the MLMC parameters as in Theorem 5.7. The maximum refinement level is denoted by \(L_\varepsilon \) and the corresponding estimators by \(E^{L_\varepsilon }(\Psi _{L_\varepsilon })\). We sample \(n_{ML}=2^8\) realizations of \(E^{L_\varepsilon }(\Psi _{L_\varepsilon })\) for every \(\varepsilon \). As reference solution, we use \(n_{ref}=2^4\) realizations of \(E^{L_{ref}}(\Psi _{L_{ref}})\) with parameters adjusted to achieve \(\varepsilon _{ref}:=2^{-r11}\). We report for prescribed \(\varepsilon \) the realized empirical RMSE
All computations are realized using MATLAB on a workstation with two Intel Xeon E5-2697 CPUs with 2.7 GHz, a total of 12 cores, and 256 Gigabyte RAM.
We start with the “smooth Gaussian” case from Table 1. A sample of the diffusion coefficient and the associated bilinear FE approximation of u is given in Fig. 3, where we also plot the (average) CPU times against the realized RMSE. As we see, the realized error is very close to the prescribed accuracy \(\varepsilon \), which corresponds to the error estimate from Theorem 5.7. Moreover, the empirical results are in line with the work estimates from Theorem 5.8, as is seen in the right plot of Fig. 3, since the computational complexity is (asymptotically) of order \({\mathcal {O}}(\varepsilon ^{-2}|\log (\varepsilon )|^2)\).
Next, we decrease the parameter s to obtain the “rough Gaussian” scenario from Table 1. A sample of the diffusion coefficient and the associated bilinear FE approximation of u is given in Fig. 4. Compared to Fig. 3, we now see more detailed, sharp features in the diffusion coefficient, due to the slower decay factor of the wavelet basis. Average CPU times vs. realized RMSE are given in Fig. 4. Again, the realized error is of order \({\mathcal {O}}(\varepsilon )\), and the computational times are asymptotically of order \({\mathcal {O}}(\varepsilon ^{-4})\), as expected from Theorem 5.8.
Finally, we investigate the “p-exponential” scenario from Table 1, where we use a heavier-tailed distribution of \(X_{j,k}^{l}\) and increase the wavelet density to \(\beta =\frac{3}{4}\). We use a standard Acceptance-Rejection algorithm to sample from the p-exponential density for \(p=1.6\). A sample of the diffusion coefficient and the associated bilinear FE approximation of u is given in Fig. 5. We observe that the variance of coefficient and solution is increased, compared to the previous two examples. This is indicated by the larger bars of the confidence intervals in the right plot of Fig. 5. The a-priori accuracy has been scaled by a factor of three in this plot, for a better visual comparison of realized and prescribed RMSE. Although absolute magnitude and variance of the realized RMSE have increased, we still recover in line with Theorem 5.8 the asymptotic error decay of order \({\mathcal {O}}(\varepsilon )\), together with CPU times of order \({\mathcal {O}}(\varepsilon ^{-\frac{8}{3}})\).
Our experiments show that sharper features in the prior model are achieved by decreasing either s or p. We emphasize that the asymptotic complexity depends on the differential dimension \(s-\frac{d}{p}\) as indicated in Table 1,the latter essentially corresponds to the parameter r in Theorem 5.7. Consequently, the effect of lowering p on the overall regularity is more pronounced in as the physical dimension d increases.
7 Conclusions
We have developed a computational framework for the efficient discretization of linear, elliptic PDEs with log-Besov random field coefficients which are modelled by a multiresolution in the physical domain whose coefficients are p-exponential with random choices of active coefficients according to GW-trees. The corresponding pathwise diffusion coefficients generally admit only rather low path regularity, thereby mandating low order Finite Element discretizations in the physical domain. We established strong pathwise solution regularity, and FE error bounds for the corresponding single-level Monte Carlo-FEM algorithm. The corresponding error vs. work bounds for the multi-level Monte Carlo algorithm follow then in the standard way. We emphasize again that higher order sampling methods seem to be obstructed by the GW-tree structure which has recently been identified in [23] as well-suited for modelling diffuse media such as clouds, fog and aerosols. The presently proposed MLMC-FE error analysis for Elliptic PDEs with (log-) Besov random tree coefficients will imply corresponding complexity bounds in multilevel Markov Chain Monte Carlo sampling strategies for Bayesian Inverse Problems on log-Besov random tree priors, as considered for example in imaging applications in [4, 11, 19, 24]. Details on their analysis and computation will be developed elsewhere.
Notes
Anisotropic tensorizations leading upon truncation to so-called “hyperbolic cross approximations” may be considered. As such constructions tend to inject preferred directions along the cartesian axes, we do not consider them here.
If this holds only P-.a.s., we may modify \(a:\Omega \rightarrow L^\infty ({\mathcal {D}})\) on a P-nullset to obtain a strongly measurable modification \({\widetilde{a}}:\Omega \rightarrow L^\infty ({\mathcal {D}})\) of a, so that \({{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} {\widetilde{a}}(x, \omega )>0\) and \({\widetilde{a}}\in L^\infty ({\mathcal {D}})\) holds for all \(\omega \in \Omega \). In fact, let
$$\begin{aligned} A_0:=\{\omega \in \Omega |\;a_-(\omega )\le 0 \text { or } a(\omega )\notin L^\infty ({\mathcal {D}})\}. \end{aligned}$$Then \(P(A_0)=0\) by assumption, and hence \(A_0\in {\mathcal {A}}\) by completeness of \((\Omega ,{\mathcal {A}},P)\). Thus, we may consider, for instance, the modification \({\widetilde{a}}(\omega ):=a(\omega )\mathbbm {1}_{\{\omega \notin A_0\}} + \mathbbm {1}_{\{\omega \in A_0\}}\). It is readily verified that \({\widetilde{a}}\) is strongly \({\mathcal {A}}/{\mathcal {B}}(L^\infty ({\mathcal {D}}))\)-measurable, and for all \(\omega \in \Omega \) it holds \({{\,\textrm{ess inf}\,}}_{x\in {\mathcal {D}}} {\widetilde{a}}(x, \omega )>0\) and \({\widetilde{a}}\in L^\infty ({\mathcal {D}})\).
Recall the K-method of interpolation of two Banach spaces \((A_0, \left\| \cdot \right\| _{A_0})\) and \((A_1, \left\| \cdot \right\| _{A_1})\) with continuous embedding \(A_1\hookrightarrow A_0\): their K-functional is defined by
$$\begin{aligned} K(a, z; A_0, A_1):=\inf _{a_1\in A_1} \{\Vert a-a_1\Vert _{A_0} + z\Vert a_1\Vert _{A_1}\}, \quad a\in A_0,\; z>0. \end{aligned}$$For any \(r\in (0,1)\) and \(q\in [1,\infty ]\) the interpolation space of order r with fine index q is
$$\begin{aligned}{}[A_0, A_1]_{r,q} = \left\{ a\in A_0|\, \Vert a\Vert _{[A_0, A_1]_{r,q}}<\infty \right\} , \end{aligned}$$where
$$\begin{aligned} \Vert a\Vert _{[A_0, A_1]_{r,q}}:= {\left\{ \begin{array}{ll} \left( \int _0^\infty z^{-rq} K(a,z;A_0, A_1)^q\frac{1}{z}dz\right) ^{\frac{1}{q}}, &{}\quad q\in [1,\infty ), \\ \sup _{z>0} z^{-r}K(a,z;A_0, A_1),&{}\quad q=\infty . \end{array}\right. } \end{aligned}$$The set \([A_0, A_1]_{r,q}\) is a (generally non-separable) Banach space.
References
Abraham, R., Delmas, J.-F.: An introduction to Galton-Watson trees and their local limits. arXiv preprint arXiv:1506.05571, (2015)
Achdou, Y., Sabot, C., Tchou, N.: Diffusion and propagation problems in some ramified domains with a fractal boundary. M2AN Math. Model. Numer. Anal. 40(4), 623–652 (2006)
Adams, R. A., Fournier, J. J.: Sobolev Spaces. Elsevier, 2nd edition, (2003)
Agapiou, S., Dashti, M., Helin, T.: Rates of contraction of posterior distributions based on \(p\)-exponential priors. Bernoulli 27(3), 1616–1642 (2021)
Aliprantis, C.D., Border, K.: Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, Berlin (2006)
Barth, A., Schwab, C., Zollinger, N.: Multi-level Monte Carlo finite element method for elliptic PDEs with stochastic coefficients. Numer. Math. 119(1), 123–161 (2011)
Barth, A., Stein, A.: A study of elliptic partial differential equations with jump diffusion coefficients. SIAM/ASA J. Uncertain. Quant. 6(4), 1707–1743 (2018)
Berre, I., Doster, F., Keilegavlen, E.: Flow in fractured porous media: a review of conceptual models and discretization approaches. Transp. Porous Media 130(1), 215–236 (2019)
Brenner, S. C., Scott, L. R.: The Mathematical Theory of Finite Element Methods, volume 3. Springer, (2008)
Cliffe, K.A., Giles, M.B., Scheichl, R., Teckentrup, A.L.: Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Comput. Vis. Sci. 14(1), 3–15 (2011)
Dashti, M., Harris, S., Stuart, A.: Besov priors for Bayesian inverse problems. Inverse Probl. Imaging 6(2), 183–200 (2012)
Daubechies, I.: Ten Lectures on Wavelets. SIAM, (1992)
Dung, D., Nguyen, V., Schwab, C., Zech, J.: Analyticity and sparsity in uncertainty quantification for PDEs with Gaussian random field inputs. Technical Report 2022-02, Seminar for Applied Mathematics, ETH Zürich, Switzerland, 2022. (to appear in Springer LNM (2023))
Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
Giles, M.B.: Multilevel Monte Carlo methods. Acta Numer. 24, 259–328 (2015)
Grisvard, P.: Elliptic Problems in Nonsmooth Domains. SIAM, (2011)
Hackbusch, W.: Elliptic Differential Equations: Theory and Numerical Treatment, volume 18. Springer, 2nd edition, (2017)
Heinrich, S.: Multilevel Monte Carlo methods. In International Conference on Large-Scale Scientific Computing, 58–67. Springer, (2001)
Helin, T., Lassas, M.: Hierarchical models in statistical inverse problems and the Mumford-Shah functional. Inverse Prob. 27(1), 015008, 32 (2011)
Herrmann, L., Schwab, C.: Multilevel quasi-Monte Carlo integration with product weights for elliptic PDEs with lognormal coefficients. ESAIM: Math. Model. Numer. Anal. 53(5), 1507–1552 (2019)
Hoang, V.H., Schwab, C.: Sparse tensor Galerkin discretization of parametric and random parabolic PDEs–analytic regularity and generalized polynomial chaos approximation. SIAM J. Math. Anal. 45(5), 3050–3083 (2013)
Hosseini, B., Nigam, N.: Well-posed Bayesian inverse problems: priors with exponential tails. SIAM/ASA J. Uncertain. Quantif. 5(1), 436–465 (2017)
Kekkonen, H., Lassas, M., Saksman, E., Siltanen, S.: Random tree Besov priors – towards fractal imaging. arXiv preprint arXiv:2103.00574, (2021)
Saksman, E., Lassas, M., Siltanen, S.: Discretization-invariant Bayesian inversion and Besov space priors. Inverse Probl. Imaging 3(1), 87–122 (2009)
Triebel, H.: Theory of Function Spaces II. Modern Birkhäuser Classics. Birkhäuser, 2nd edition, (2000)
Triebel, H.: Function Spaces and Wavelets on Domains, volume 7 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, (2008)
Triebel, H.: Theory of Function Spaces IV, volume 107 of Monographs in Mathematics. Birkhäuser, (2020)
Zech, J., Schwab, C.: Convergence rates of high dimensional Smolyak quadrature. ESAIM: Math. Model. Numer. Anal. 54(4), 1259–1307 (2020)
Acknowledgements
AS was partly funded by the ETH Foundations of Date Science Initiative (ETH-FDS), and it is gratefully acknowledged.
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Appendices
A Galton-Watson trees
We provide some basic concepts of discrete trees, give a formal definition of Galton-Watson trees, and record a result on their extinction probabilities. The presentation follows [1, Section 2], with modified notation where necessary.
1.1 A.1 Notation and basic definitions
Let \({\mathcal {U}}:=\bigcup _{n\ge 0}{\mathbb {N}}^n\) be the set of all finite sequences of positive integers, where \(\varrho :=()\) denotes the empty sequence, and we use the convention \({\mathbb {N}}^0=\{\varrho \}\). For any \(\mathfrak n\in {\mathcal {U}}\), let \(|\mathfrak n|\) denote the length of \(\mathfrak n\), with \(|\varrho |:=0\). For \(\mathfrak n,\mathfrak m\in {\mathcal {U}}\), we denote by \(\mathfrak n\mathfrak m\) the concatenation of two sequences, with the convention \(\mathfrak n\varrho =\varrho \mathfrak n=\mathfrak n\) for all \(\mathfrak n\in {\mathcal {U}}\). There exists a partial order, called the genealogical order, on \({\mathcal {U}}\): we say that \(\mathfrak m\preceq \mathfrak n\), whenever there is a \(\mathfrak n_0\in {\mathcal {U}}\) such that \(\mathfrak m\mathfrak n_0 = \mathfrak n\). We say that \(\mathfrak m\) is an ancestor of \(\mathfrak n\) and write \(\mathfrak m\prec \mathfrak n\) if \(\mathfrak m\preceq \mathfrak n\) and \(\mathfrak m\ne \mathfrak n\). The set of all ancestors of \(\mathfrak n\) is denoted by \(A_\mathfrak n:=\{\mathfrak m\in {\mathcal {U}}|\, \mathfrak m\prec \mathfrak n\}\subset {\mathcal {U}}\).
Definition A.1
[1, Section 2.1] A tree \(\textbf{t}\) is a subset \(\textbf{t}\subset {\mathcal {U}}\) that satisfies
-
\(\varrho \in \textbf{t}\),
-
If \(\mathfrak n\in \textbf{t}\), then \(A_{\mathfrak n}\subset \textbf{t}\),
-
For any \(\mathfrak n\in \textbf{t}\), there exists \(\mathfrak K_{\mathfrak n}(\textbf{t})\in {\mathbb {N}}_0\cup \{\infty \}\), such that for every \(n\in {\mathbb {N}}\), \(\mathfrak n n\in \textbf{t}\) if, and only if, \(1\le n\le \mathfrak K_{\mathfrak n}(\textbf{t})\).
We denote the set of all trees by \({\mathfrak {T}}_\infty \). Let \(|\textbf{t}|\in {\mathbb {N}}\cup \{\infty \}\) be the cardinality of the tree \(\textbf{t}\in {\mathfrak {T}}_\infty \), and introduce the set of all finite trees by \({\mathfrak {T}}_0:=\{\textbf{t}\in {\mathfrak {T}}|\, |\textbf{t}|<\infty \}\). The set \({\mathfrak {T}}_0\) is countable. The integer \(\mathfrak K_{\mathfrak n}(\textbf{t})\) represents the number of offsprings in \(\textbf{t}\) at the node \(\mathfrak n\). The set of all trees without infinite nodes is a subset \({\mathfrak {T}}_\infty \) and denoted by
For \(\mathfrak n\in \textbf{t}\), the sub-tree \(\mathfrak S_{\mathfrak n}(\textbf{t})\) of \(\textbf{t}\) above node \(\mathfrak n\) is defined as:
We also need the restriction functions \(\mathfrak r_n:{\mathfrak {T}}\rightarrow {\mathfrak {T}}\), \(n\in {\mathbb {N}}_0\) which are given by
With these preparations, we are in position to define metric and associated Borel \(\sigma \)-algebra on \({\mathfrak {T}}\).
Definition A.2
[1, Section 2.1] Let
Furthermore, define the \(\sigma \)-algebra
By [1, Lemma 2.1], \(({\mathfrak {T}}, \delta )\) is a complete and separable metric space. The countable set of all finite trees \({\mathfrak {T}}_0\) is dense in \({\mathfrak {T}}\): for all \(\textbf{t}\in {\mathfrak {T}}\), we have \(\mathfrak r_n(\textbf{t})\rightarrow \textbf{t}\) as \(n\rightarrow \infty \) in \(({\mathfrak {T}}, \delta )\). Further, \(\delta _{{\mathfrak {T}}}\) is an ultra-metric (see [1, Section 2.1]), hence \(\mathfrak r_n^{-1}(\mathfrak r_n(\textbf{t}))\) is the set of open (and closed) balls with center \(\textbf{t}\) and radius \(2^{-n}\) [1, Section 2.1] with respect to \(\delta \). By separability, \({\mathcal {B}}({\mathfrak {T}})\) coincides with the Borel \(\sigma \)-algebra on \({\mathfrak {T}}\), that is generated by all open sets on \(({\mathfrak {T}}, \delta _{{\mathfrak {T}}})\). Given a probability space \((\Omega , {\mathcal {A}}, P)\), we then call any \({\mathcal {A}}/{\mathcal {B}}({\mathfrak {T}})\)-measurable mapping \(\tau :\Omega \rightarrow {\mathfrak {T}}\) a \({\mathfrak {T}}\)-valued random variable. This allows us to formalize Galton-Watson trees:
Definition A.3
(Galton-Watson(GW) tree with offspring distribution \({\mathcal {P}}\)) A \({\mathfrak {T}}\)-valued random variable T has the branching property if, for any \(n\in {\mathbb {N}}\), conditionally on \(\{k_\varrho (\tau )=n\}\), the sub-trees \(\mathfrak S_1(T), \dots , \mathfrak S_n(T)\) are independent and distributed as the original tree \(\tau \).
Now let \({\mathcal {P}}\) be a probability distribution on \({\mathbb {N}}_0\), i.e., a probability measure on \(({\mathbb {N}}_0, {\mathcal {B}}({\mathbb {N}}_0))\). A \({\mathfrak {T}}\)-valued random variable T is called a Galton-Watson(GW) tree with offspring distribution \({\mathcal {P}}\) if T has the branching property and if \(\mathfrak K_\varrho (T)\sim {\mathcal {P}}\).
According to [1, Equation (12)], the distribution \({\mathbb {Q}}_T\) of a GW tree T, restricted to the set of finite trees, is given by
where the product on the right hand side is finite.
1.2 A.2 Random wavelet trees and extinction probabilities
Now we consider again the d-dimensional torus \({\mathbb {T}}^d\) with wavelet basis \({\varvec{\Psi }}\) as in (12). To relate the nodes of a GW tree to the wavelet indices in \({\mathcal {I}}_{{\varvec{\Psi }}}=\{j\in {\mathbb {N}}_0,\; k\in K_j,\;l\in {\mathcal {L}}_j\}\), we assume that T is a GW tree with offspring distribution \({\mathcal {P}}=\text {Bin}(2^d, \beta )\) for some \(\beta \in [0,1]\). For a given realization \(T(\omega )\) and node \(\mathfrak n\in T(\omega )\), we identify the length of \(\mathfrak n\) with the corresponding wavelet scale via \(j:=|\mathfrak n|\in {\mathbb {N}}_0\). Further, since \({\mathcal {P}}\) is binomial, there are at most \(2^{dj}\) nodes \(\mathfrak n\) of length j in \(T(\omega )\). Each of these nodes has j entries in \(\{1,\dots ,2^d\}\). We assign an integer to all \(\mathfrak n\in T(\omega )\) with \(|\mathfrak n|=j\) via the bijection
On the other hand, we assign for any \(\{ 1,\dots , 2^{jd}\}\) an index in \(K_j=\{0,\dots ,2^j-1\}^d\) via
which yields a one-to-one mapping
Thus, each \(\mathfrak n\in T(\omega )\) corresponds to a unique pair of indices (j, k) via \(\mathfrak n\mapsto (|\mathfrak n|, \mathfrak I^2_{d,|\mathfrak n|}\circ \mathfrak I^1_{d,|\mathfrak n|}(\mathfrak n))\). Collecting the pairs (j, k) for all nodes in T yields the random active index set
Then, only wavelets with indices \({\mathcal {I}}_T(\omega ):=\{(j,k,l)|\; (j,k)\in \mathfrak I_T(\omega ),\; l\in {\mathcal {L}}_j\}\subset {\mathcal {I}}_{{\varvec{\Psi }}}\) are “activated” in the series expansion of a sample of \(b_T\) in Definition 2.6.
It is then of course of interest whether the GW tree T terminates after a finite number of nodes, in which case \(\mathfrak I_T\) is finite, or if T has infinitely many nodes. In the latter case, \(b_T\) exhibits fractal structures on \({\mathbb {T}}^d\), in areas where the wavelet expansion has infinitely many terms. Let the extinction event of a GW tree T be denoted by \({\mathcal {E}}(T):=\{T\in {\mathfrak {T}}_0\}\). The extinction probability of GW trees are quantified in the following result:
Lemma A.4
[1, Corollary 2.5/Lemma 2.6] Let \(T:\Omega \rightarrow {\mathfrak {T}}\) be GW tree with offspring distribution \({\mathcal {P}}\) and let \(\zeta \sim {\mathcal {P}}\).
-
1.
If \({\mathcal {P}}(0)=0\), then \({\mathbb {Q}}_T({\mathcal {E}}(T))=0\),
-
2.
If \({\mathcal {P}}(0)\in (0,1)\) and \({\mathcal {P}}(0)+{\mathcal {P}}(1)=1\), then \({\mathbb {Q}}_T({\mathcal {E}}(T))=1\),
-
3.
If \({\mathcal {P}}(0)\in (0,1)\), \({\mathcal {P}}(0)+{\mathcal {P}}(1)<1\), and \({\mathbb {E}}(\zeta )\le 1\), then \({\mathbb {Q}}_T({\mathcal {E}}(T))=1\),
-
4.
If \({\mathcal {P}}(0)\in (0,1)\), \({\mathcal {P}}(0)+{\mathcal {P}}(1)<1\), and \({\mathbb {E}}(\zeta )> 1\), then \({\mathbb {Q}}_T({\mathcal {E}}(T))=q\in (0,1)\). Here q is the smallest root in [0, 1] of the equation \({\mathbb {E}}(q^\zeta )=q\).
Lemma A.4 does not require a Binomial offspring distribution, but remains true for arbitrary distributions \({\mathcal {P}}\) on \({\mathbb {N}}_0\). Moreover, we conclude that a GW tree T with offspring distribution \({\mathcal {P}}=\text {Bin}(2^d, \beta )\) generates finite wavelet expansions via \(\mathfrak I_T\) P-a.s. if and only if \(\beta \in [0,2^{-d}]\).
1.3 A.3 Parametrization of binomial GW trees
Recall that \(({\mathfrak {T}}, d_{\mathfrak {T}})\) is polish, but not normed. The product parameter space \({\mathbb {R}}^{\mathbb {N}}\times {\mathfrak {T}}\) of the Besov random tree prior is thus not a Banach space, which is in turn a key assumption for Bayesian inverse problems. However, it is possible to obtain suitable parametrizations for GW trees with binomial offspring distribution, by exploiting the connection to the continuous uniform distribution on (0, 1).
Let T be a GW tree with with offspring distribution \({\mathcal {P}}=\text {Bin}(2^d, \beta )\) for \(\beta \in [0,1]\), let \(\Omega _U:=[0,1]^{\mathbb {N}}\), and denote by \({\mathcal {P}}_U\) the univariate uniform distribution on (0, 1) with density \(f_U(u)=\mathbbm {1}_{\{ u\in (0,1)\}}(u)\). We equip \(\Omega _U\) with the countable product-\(\sigma \)-algebra and product measure, given by
respectively, consider the probability space \((\Omega _U, {\mathcal {A}}_U, {\mathbb {Q}}_U)\), and a sequence \(U:=(U_{j,k}, j\in {\mathbb {N}},\; k\in K_j)\) of i.i.d. uniform random variables defined on \((\Omega _U, {\mathcal {A}}_U, {\mathbb {Q}}_U)\).
Let \(j\in {\mathbb {N}}\), and consider the wavelet index \((j,k,l)\in {\mathcal {I}}_\Psi \). By (77), there is a unique node \(\mathfrak n\) of length \(|\mathfrak n|=j\) associated to (j, k) for \(k\in K_j\) via \((j,k) = (|\mathfrak n|, \mathfrak I^2_{d,|\mathfrak n|}\circ \mathfrak I^1_{d,|\mathfrak n|}(\mathfrak n))=(j, \mathfrak I^2_{d,j}\circ \mathfrak I^1_{d,j}(\mathfrak n))\). Recall that \(A_{\mathfrak n}\) is the set of ancestors of \(\mathfrak n\) and note that for each \(i\le j-1\), there exists a unique ancestor \(\mathfrak m_i\in A_\mathfrak n\) of \(\mathfrak n\) with length \(|\mathfrak m_i|=i\). We further introduce the notation \(\mathfrak m_j:=\mathfrak n\) for convenience. By independence of the elements in the sequence U we obtain that
Hence, we may parametrize T, i.e., the random index set \({\mathcal {I}}_T\), by the equivalence
for each \((j,k,l)\in {\mathcal {I}}_\Psi \) with \(j\ge 1\). Equation (80) hence shows that the parameter-to-prior map of a \(B_s^p\)-random variable with wavelet density \(\beta \) as in (18) is given by
In particular, the parametrization with respect to \(U\in (0,1)^{\mathbb {N}}\) is discontinuous, obstructing higher-order quadrature methods in the parameter domain, such as Quasi-Monte Carlo.
B Finite element approximation
This appendix collects details on the (standard) implementation of the pathwise FE discretization from Sect. 4. In particular, we analyze the quadrature error arising during matrix assembly, describe an assembly routine based on tensorization for bilinear finite elements on \({\mathbb {T}}^2\), and comment on the efficient evaluation and sampling cost of \(a_N\).
1.1 B.1 Finite element quadrature error
Let \(h>0\) be the FE meshwidth and let \(\{v_1,\dots ,v_{n_h}\}\) be a suitable basis of \(V_h\). As \(u_{h,N}=\sum _{i=1}^{n_h} \underline{u}_i v_i\) for a coefficient vector \(\underline{u}\), problem (47) is equivalent to the linear system of equations
For any \(i,j\in \{1,\dots ,n_h\}\), the entries of \({\textbf{A}}\) and \(\underline{F}\) are given by
and have in general to be evaluated by numerical quadrature. Thereby, we commit a variational crime in the assembling of \({\textbf{A}}\) and \(\underline{F}\). As \(a_n\) resp. a is of low regularity, we have to make sure to choose an appropriate quadrature method, that does not spoil the convergence rate of the finite element approximation. It turns out that the midpoint rule is sufficient for (d-)linear elements, as we show in the remainder of this subsection. We restrict ourselves to the quadrature error analysis for the stiffness matrix \({\textbf{A}}\) for brevity, the corresponding analysis for the load vector \(\underline{F}\) is carried out analogously. We denote for any simplex/parallelotope \(K\in {\mathcal {K}}_h\) its midpoint or barycenter by \(x_K^m\in K\). Furthermore, we define the piecewise constant approximation \(\overline{a}_N\) of \(a_n\) given by
As we consider piecewise (d-)linear finite elements, approximating \({\textbf{A}}_{ij}\) in (83) by midpoint quadrature on each K is equivalent to solving the following discrete problem: Find \(\overline{u}_{N,h}(\omega )\in V_h\) such that for all \(v\in V\) it holds
There exists a.s. a unique solution \(\overline{u}_{N,h}(\omega )\) and the quadrature error is bounded in the following.
Lemma B.1
Let the assumptions of Theorem 4.7 hold. For any \(f\in H\), sufficiently small \(\kappa >0\) in (16) and any \(r\in (0,s-\frac{d}{p})\cap (0,1]\), there are constants \(\overline{q}\in (1,\infty )\) and \(C>0\) such that for any \(N\in {\mathbb {N}}\) and \(h\in {\mathfrak {H}}\) there holds
Furthermore, if \(s-\frac{d}{p}>1\), the statement holds for \(r=1\).
Proof
There exists a.s. a unique solution \(\overline{u}_{N,h}(\omega )\) and the distance to \(u_{N,h}(\omega )\) is readily bounded with Proposition 3.3 by
Theorems 2.11 and Proposition 4.2 show that \(a_N\in L^{3q}(\Omega ; {\mathcal {C}}^t({\mathcal {D}}))\) for all \(t\in (0,s-\frac{d}{p})\) and \(q\ge \frac{1}{3}\). For \(p>1\), we may again choose any \(q\in [\frac{1}{3},\infty )\), for \(p=1\) we have \(q\in [\frac{1}{3},\overline{q})\), where \(\overline{q}>1\) for sufficiently small \(\kappa >0\). This yields for \(q\in [1,\overline{q})\) with Hölder’s inequality
To prove the error with respect to H, we recall the duality argument from Theorem 4.8: Let \(\overline{e}_{N,h}:=u_{N,h}-\overline{u}_{N,h}\) and consider for fixed \(\omega \in \Omega \) the dual problem to find \(\overline{\varphi }(\omega )\in V\) such that for all \(v\in V\) it holds
Analogously to the proof of Theorem 4.8, we derive the pathwise estimate
Provided sufficiently integrability if \(p=1\), we find that \(\Vert u_{N,h}-\overline{u}_{N,h}\Vert _{L^q(\Omega ; H)}\le Ch^{2r}\). \(\square \)
Note that r in Lemma B.1 is identical to r in Theorems 4.7 and 4.8. Hence the quadrature error does not dominate the finite element convergence rate. We further emphasize that Lemma B.1 holds for arbitrary piecewise (multi-)linear elements, regardless if we use simplices or parallelotopes to discretize \({\mathcal {D}}\).
1.2 B.2 Bilinear finite element discretization
We focus on the rectangular domain \({\mathcal {D}}={\mathbb {T}}^2=[0,1]^2\) in this subsection. It is convenient to use a spatial discretization based on bilinear finite elements. Let therefore \(h=1/n\) for a \(n\in {\mathbb {N}}\) and consider the nodes \(\overline{x}_i:=ih\), \(i\in \{0,\dots ,n\}\). Then \(\Xi _h:=\{\overline{x}_0,\dots , \overline{x}_n\}\subset [0,1]\) defines an equidistant mesh of \({\mathbb {T}}^1\). A rectangular tesselation of \({\mathcal {K}}_h\) of \({\mathbb {T}}^2\) is then given by the \((n+1)^2\) grid points \(\Xi _h^2:=\{(\overline{x}_{i_1}, \overline{x}_{i_2})|\, i_1,i_2\in \{0,\dots ,n\} \}\subset {\mathbb {T}}^2\). Let
be the one-dimensional hat function basis at the nodes in \(\Xi _h\). Then, the space of bilinear finite elements corresponding to \(\Xi _h^2\) resp. \({\mathcal {K}}_h\) is given by
The dyads in the tensor product basis coincide with the pointwise products
Note that \(\dim (V_h)=(n-1)^2\) due to the homogeneous Dirichlet boundary conditions. Now let \(i, j\in \{1,\dots ,(n-1)^2\}\) be indices such that \(v_i=\phi _{i_1}\otimes \phi _{i_2}\in V_h\) and \(v_j=\phi _{j_1}\otimes \phi _{j_2}\in V_h\). The entries of the associated stiffness matrix \({\textbf{A}}\in {\mathbb {R}}^{(n-1)\times (n-1)}\) are given by
We approximate the entries of \({\textbf{A}}\) by midpoint quadrature on each square in \({\mathcal {K}}_h\), which may be realized by replacing a in (85) by a suitable piecewise constant interpolation at the midpoints as in Appendix 1. Let the midpoint of each square \(K=[\overline{x}_{i_1}, \overline{x}_{i_1+1}]\times [\overline{x}_{i_2}, \overline{x}_{i_2+1}]\in {\mathcal {K}}_h\) be given by \(x_K^m=x_{i_1,i_2}^m:=(\overline{x}_{i_1}+\frac{h}{2}, \overline{x}_{i_2}+\frac{h}{2})\) for \(i_1,i_2\in \{0,\dots , n-1\}\) and define
With indices i, j as above this yields
We define the matrices \({\textbf{S}}\) and \({\textbf{M}}\) via
for \(i_1,j_1\in \{1,\dots ,n-1\}\), and observe that
This yields
and we only obtain a contribution to \({\textbf{A}}_{i,j}\) if \(|i_1-j_1|, |i_2-j_2|\le 1\). Hence, the representation in (86) may be used for an efficiently assembly of \({\textbf{A}}\).
1.3 B.3 Evaluation of \(a_N\)
To assemble \({\textbf{A}}\) via (86) it still remains to evaluate \(a_N(\omega )=\exp (b_N(\omega ))\) at the quadrature points in \(\Xi _h^2\), or, more generally, at the d-dimensional grid \(\Xi _h^d:=\otimes _{i=1}^d \Xi _h\subset [0,1]^d\). We recall that
The tensor-product representation of \(\psi _{j,k}^l\in L^2({\mathbb {T}}^2)\) given in (5) and (4) then shows
We define the vectors \(\underline{\psi _{j,k_i,l(i)}}:=2^{\frac{j}{2}}\psi _{j+w,k_i,l(i)}(\overline{x})|_{\overline{x}\in \Xi }\in {\mathbb {R}}^{n}\) and finally obtain
Therefore, it is sufficient to evaluate the scaled and shifted functions \(\psi _{j,k,l(i)}\) on the one-dimensional grid \(\Xi _h\), the values of \(b_N\), resp. \(a_N\), at the d-dimensional grid \(\Xi _h^d\) are then obtained by tensorization. Evaluating \(\psi _{j,k,l(i)}\) eventually requires to approximate the fractal functions \(\phi \) and \(\psi \) at a discrete set of points. This is feasible to arbitrary precision with the iterative Cascade algorithm, see, e.g., [12, Chapter 6.5]. Using \(J\in {\mathbb {N}}\) iterations in the Cascade algorithm yields approximate values of \(\phi , \psi \) at \(2^{J}\) dyadic grid points, which are then interpolated to obtain piece-wise linear or constant approximation of continuous \(\phi \) and \(\psi \) interpolation. The resulting error is of order \({\mathcal {O}}(2^{-J\alpha })\) if \(\phi , \psi \in \textrm{C}^\alpha ({\mathbb {R}})\) with \(\alpha \in (0,1]\). Consequently, we use \(J_\ell :=\lceil \frac{N_\ell t}{\alpha } \rceil \) on each level in the MLMC algorithm to match the midpoint quadrature error in Lemma B.1. The cost of sampling \(b_{T,N}\) on a uniform, dyadic grid is quantified in the following.
Lemma B.2
Let \(h_\ell =2^{-(\ell +1)}\) for \(\ell \in {\mathbb {N}}_0\), let \(\Xi _{h_\ell }^d:=\otimes _{i=1}^d \Xi _{h_\ell }\subset [0,1]^d\) for \(d\in {\mathbb {N}}\), and let \({\mathcal {C}}_{sample}\) denote the random cost (in terms of work and memory required) of sampling \(b_{T,N}\) with respect to the grid \(\Xi _{h_\ell }^d\). Then, there is a constant \(C>0\), independent of \(h_\ell \) and N, such that
Proof
Given that \(\psi \) and \(\phi \) are evaluated at the \(2^{\ell +1}\in {\mathbb {N}}\) grid points in \(\Xi _{h_\ell }\), we need to calculate \(2^d-1\) tensor products of scaled and translated vectors \(\underline{\psi _{j,k_i,l(i)}}\in {\mathbb {R}}^{2^{\ell +1}}\).
Recall from the MRA in Sect. 2.1.2 that tensorization yields \(2^{jd}\) one-periodic wavelet functions \(\psi _{j,k}^{l}\) on each scale \(j\in {\mathbb {N}}_0\). Moreover, the support of \(\psi _{j+w,k}^{l}\) has diameter bounded by \(2^{-d(j+1-w)}\) in \({\mathbb {T}}^d\) for fixed index \((j,k,l)\in {\mathcal {I}}_\mathbf{\Psi }\), where \(w\in {\mathbb {N}}\) is a scaling factor that only depends on the choice of \(\phi \) and \(\psi \).
Hence, the number of grid points lying in the support of \(\psi _{j,k}^{l}\) is given by
Now we also fix a realization \((T(\omega ), X(\omega ))\) of the random tree T and the coefficients X. If \((j,k,l)\in {\mathcal {I}}_T(\omega )\), we multiply the corresponding \(2^{d(l-j+w)}\) grid points with the coefficient \(X_{j,k}^l(\omega )\). Otherwise, if \((j,k,l)\notin {\mathcal {I}}_T(\omega )\), there is no contribution to \(b_{T,N}(\omega )\) from this index.
Summing over all non-zero contributions and grid points thus requires computational cost of
Since \({\mathcal {P}}=\text {Bin}(2^d, \beta )\), it readily follows that \(P((j,k,l)\in {\mathcal {I}}_{T(\omega )})=\beta ^j\), which in turn shows
\(\square \)
Remark B.3
Lemma B.2 shows that for given \(\beta \in (0,1)\), the expected cost of sampling \(b_{N,T}\) is bounded by \(Ch_\ell ^{-d}\) uniformly with respect to \(N\in {\mathbb {N}}\). Thus, the condition that a sample of \(u_{N_\ell ,h_\ell }\) may be realized with (expected) work \({\mathcal {O}}(h_\ell ^{-d})\) from Assumption 5.5 is indeed justified. On the other hand, we note that \(C=C(d,\beta )\rightarrow \infty \) as \(\beta \rightarrow 1\), resulting in a possibly large hidden constant within the asymptotic costs in Theorem 5.8. However, if we choose the error balancing \(N\propto |\log (h_\ell )|\) according to (63), we still recover the only slightly worse complexity bound of \({\mathcal {O}}(h_\ell ^{-d}|\log (h_\ell )|)\) per sample in the limit \(\beta =1\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Schwab, C., Stein, A. Multilevel Monte Carlo FEM for elliptic PDEs with Besov random tree priors. Stoch PDE: Anal Comp 12, 1574–1627 (2024). https://doi.org/10.1007/s40072-023-00313-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40072-023-00313-w