1 Introduction

1.1 Background

The level set method has been pervasive as a tool for the study of interface problems since its introduction in the 1980s Osher and Sethian (1988). In a seminal paper in the 1990s, Santosa demonstrated the power of the approach for the study of inverse problems with unknown interfaces Santosa (1996). The key benefit of adopting the level set parametrization of interfaces is that topological changes are permitted. In particular for inverse problems the number of connected components of the field does not need to be known a priori. The idea is illustrated in Fig. 1. The type of unknown functions that we might wish to reconstruct are piecewise continuous functions, illustrated in the bottom row by piecewise constant ternary functions. However in the inversion, we work with a smooth function, shown in the top row and known as the level set function, which is thresholded to create the desired unknown function in the bottom row. This allows the inversion to be performed on smooth functions, and allows for topological changes to be detected during the course of algorithms. After Santosa’s paper there were many subsequent papers employing the level set representation for classical inversion, and examples include (Burger 2001; Tai and Chan 2004; Chung et al. 2005; Dorn and Lesselier 2006), and the references therein.

Fig. 1
figure 1

Four continuous scalar fields (top) and the corresponding ternary fields formed by thresholding these fields at two levels (bottom). The smooth function in the top row is known as the level set function and is used in the inversion procedure. The discontinuous function in the bottom row is the physical unknown

In many inverse problems arising in modern day science and engineering, the data are noisy and prior regularizing information is naturally expressed probabilistically since it contains uncertainties. In this context, Bayesian inversion is a very attractive conceptual approach Jari (2005). Early adoption of the Bayesian approach within level set inversion, especially in the context of history matching for reservoir simulation, includes the papers Xie et al. (2011), Ping and Zhang (2014), Lorentzen et al. (2012), and Lorentzen et al. (2012). In a recent paper Iglesias et al. (2016) the mathematical foundations of Bayesian level set inversion were developed, and a well-posedness theorem established, using the infinite-dimensional Bayesian framework developed in Stuart (2010), Lasanen (2012), Lasanen (2012), and Dashti and Stuart (2016). An ensemble Kalman filter method has also been applied in the Bayesian level set setting Iglesias (2016) to produce estimates of piecewise constant permeabilities/conductivities in groundwater flow/electrical impedance tomography (EIT) models.

For linear Bayesian inverse problems, the adoption of Gaussian priors leads to Gaussian posteriors, formulae for which can be explicitly computed (Franklin 1970; Lehtinen et al. 1999; Mandelbaum 1984. However the level set map, which takes the smooth underlying level set function (top row, Fig. 1) into the physical unknown function (bottom row, Fig. 1), is non-linear; indeed it is discontinuous. As a consequence, Bayesian level set inversion, even for inverse problems which are classically speaking ‘linear,’ does not typically admit closed form solutions for the posterior distribution on the level set function. Thus, in order to produce samples from the posterior arising in the Bayesian approach, MCMC methods are often used. Since the posterior is typically defined on an infinite-dimensional space in the context of inverse problems, it is important that the MCMC algorithms used are well defined on such spaces. A formulation of the Metropolis–Hastings algorithm on general state spaces is given in Tierney (1998). A particular case of this algorithm, well suited to posterior distributions on function spaces and Gaussian priors, is the preconditioned Crank–Nicolson (pCN) method introduced (although not named this way) in Beskos et al. (2008). As the method is defined directly on a function space, it has desirable properties related to discretization—in particular the method is robust with respect to mesh refinement (discretization invariance)—see Cotter et al. (2013) and the references therein. On the other hand, the need for hierarchical models in Bayesian statistics, and in particular in the context of non-parametric (i.e., function space) methods in machine learning, is well established Bishop (2006). However, care is needed when using hierarchical methods in order to ensure that discretization invariance is not lost Agapiou et al. (2014). In this paper we demonstrate how hierarchical methods can be employed in the context of discretization-invariant MCMC methods for Bayesian level set inversion.

1.2 Key contributions of the paper

The key contribution of this paper is in computational statistics: we develop a Metropolis Hastings method with mesh-independent mixing properties that makes an order of magnitude of improvement in the Bayesian level set method as introduced in Iglesias et al. (2016).

Study of Fig. 1 suggests that the ability of the level set representation to accurately reconstruct piecewise continuous fields depends on two important scale parameters:

  • the length scale of the level set function, and its relation to the typical separation between discontinuities;

  • the amplitude scale of the level set function, and its relation to the levels used for thresholding.

If these two scale parameters are not set correctly, then MCMC methods to determine the level set function from data can perform poorly. This immediately suggests the idea of using hierarchical Bayesian methods in which these parameters are learned from the data. However, there is a second consideration which interacts with this discussion. From the work of Tierney Tierney (1998) it is known that absolute continuity of certain measures arising in the definition of Metropolis–Hastings methods is central to their well-definedness, and hence to discretization-invariant MCMC methods Cotter et al. (2013). In fact it appears algorithms defined on infinite-dimensional spaces have spectral gaps that are bounded independently of the mesh, and so their convergence rates are bounded below in the limit Hairer et al. (2014). The key contribution of our paper is to show how enforcing absolute continuity links the two scale parameters, and hence leads to the construction of a hierarchical Bayesian level set method with a single scalar hierarchical parameter which deals with the scale and absolute continuity issues simultaneously, resulting in effective sampling algorithms.

The hierarchical parameter is an inverse length scale within a Gaussian random field prior for the level set function. In order to preserve absolute continuity of different priors on the level set function as the length-scale parameter varies, and relatedly to make well-defined MCMC methods, the mean square amplitude of this Gaussian random field must decay proportionally to a power of the inverse length scale. It is thus natural that the level values used for thresholding should obey this power law relationship with respect to the hierarchical parameter. As a consequence the likelihood depends on the hierarchical parameter, leading to a novel form of posterior distribution.

We construct this posterior distribution and demonstrate how to sample from it using a Metropolis-within-Gibbs algorithm which alternates between updating the level set function and the inverse length scale. As a second contribution of the paper, we demonstrate the applicability of the algorithm on three inverse problems, by means of simulation studies. The first concerns reconstruction of a ternary piecewise constant field from a finite noisy set of point measurements: in this context, the Bayesian level set method is very closely related to a spatial probit model Rasmussen and Williams (2006). This relation is discussed in Sect. 2.4. The other two concern reconstruction of the coefficient of a divergence form elliptic PDE from measurements of its solution; in particular, groundwater flow (in which measurements are made in the interior of the domain) and EIT (in which measurements are made on the boundary).

1.3 Structure of the paper

In Sect. 2 we describe a family of prior distributions on the level set function, indexed by an inverse length-scale parameter, which remain absolutely continuous with respect to one another when we vary this parameter; we then place a hyperprior on this parameter. We describe an appropriate level set map, dependent on the length-scale parameter because length and amplitude scales are intimately connected through absolute continuity of measures, to transform these fields into piecewise constant ones, and use this level set map in the construction of the likelihood. We end by showing existence and well-posedness of the posterior distribution on the level set function and the inverse length-scale parameter. In Sect. 3 we describe a Metropolis-within-Gibbs MCMC algorithm for sampling the posterior distribution, taking advantage of existing state-of-the-art function space MCMC, and the absolute continuity of our prior distributions with respect to changes in the inverse length-scale parameter, established in the previous section. Section 4 contains numerical experiments for three different forward models: a linear map comprising pointwise observations, groundwater flow, and EIT; these illustrate the behavior of the algorithm and, in particular, demonstrate significant improvement with respect to non-hierarchical Bayesian level set inversion.

2 Construction of the posterior

In Sect. 2.1 we recall the definition of the Whittle-Matérn covariance functions, and define a related family of covariances parametrized by an inverse length-scale parameter \(\tau \). We use these covariances to define our prior on the level set function u, and also place a hyperprior on the parameter \(\tau \), yielding a prior \(\mathbb {P}(u,\tau )\) on a product space. In Sect. 2.2 we construct the level set map, taking into account the amplitude scaling of prior samples with \(\tau \), and incorporate this into the forward map. The inverse problem is formulated, and the resulting likelihood \(\mathbb {P}(y|u,\tau )\) is defined. Finally in Sect. 2.3 we construct the posterior \(\mathbb {P}(u,\tau |y)\) by combining the prior \(\mathbb {P}(u,\tau )\) and likelihood \(\mathbb {P}(y|u,\tau )\) using Bayes’ formula. Well-posedness of this posterior is established.

2.1 Prior

As discussed in the introduction it can be important, within the context of Bayesian level set inversion, to attempt to learn the length scale of the level set function whose level sets determine interfaces in piecewise continuous reconstructions. This is because we typically do not know a priori the typical separation of interfaces. It is also computationally expedient to work with Gaussian random field priors for the level set function, as demonstrated in Iglesias et al. (2016), Dunlop and Stuart (2015). A family of covariances parameterized by length scale is hence required.

A widely used family of distributions, allowing for control over sample regularity, amplitude, and length scale, are Whittle-Matérn distributions. These are a family of stationary Gaussian distributions with covariance function

$$\begin{aligned} c_{\sigma ,\nu ,\ell }(x,y) = \sigma ^2\frac{2^{1-\nu }}{{\varGamma }(\nu )}\left( \frac{|x-y|}{\ell }\right) ^\nu K_\nu \left( \frac{|x-y|}{\ell }\right) , \end{aligned}$$

where \(K_\nu \) is the modified Bessel function of the second kind of order \(\nu \) Matérn (2013), Stein (2012). These covariances interpolate between exponential covariance, for \(\nu = 1/2\), and Gaussian covariance, for \(\nu \rightarrow \infty \). As a consequence, the regularity of samples increases as the parameter \(\nu \) increases. The parameter \(\ell > 0\) acts as a characteristic length scale (sometimes referred to as the spatial range) and \(\sigma \) as an amplitude scale (\(\sigma ^2\) is sometimes referred to as the marginal variance). On \(\mathbb {R}^d\), samples from a Gaussian distribution with covariance function \(c_{\sigma ,\nu ,\ell }\) correspond to the solution of a particular stochastic partial differential equation (SPDE). This SPDE can be derived using the Fourier transform and the spectral representation of covariance functions—the paper Lasanen et al. (2014) derives the appropriate SPDE for the covariance function above:

$$\begin{aligned} \frac{1}{\sqrt{\beta \ell ^d}}(I-\ell ^2\Delta )^{(\nu +d/2)/2}v = W, \end{aligned}$$
(1)

where W is a white noise on \(\mathbb {R}^d\), and

$$\begin{aligned} \beta = \sigma ^2 \frac{2^d\pi ^{d/2}{\varGamma }(\nu +d/2)}{{\varGamma }(\nu )}. \end{aligned}$$

Computationally, implementation of this SPDE approach requires restriction to a bounded subset \(D\subseteq \mathbb {R}^d\), and hence the provision of boundary conditions for the SPDE in order to obtain a unique solution. Choice of these boundary conditions may significantly affect the autocorrelations near the boundary. The effects for different boundary conditions are discussed in Lasanen et al. (2014). Nonetheless, the computational expediency of the SPDE formulation makes the approach very attractive for applications and, if necessary, boundary effects can be ameliorated by generating the random fields on larger domains which are a superset of the domain of interest.

From (1) it can be seen that the covariance operator corresponding to the covariance function \(c_{\sigma ,\nu ,\ell }\) is given by

$$\begin{aligned} \mathcal {D}_{\sigma ,\nu ,\ell } = \beta \ell ^d(I - \ell ^2\Delta )^{-\nu -d/2}. \end{aligned}$$
(2)

The fact that the scalar multiplier in front of the covariance operator \(\mathcal {D}_{\sigma ,\nu ,\ell }\) changes with the length scale means that the family of measures \(\{N(0,\mathcal {D}_{\sigma ,\nu ,\ell })\}_{\ell }\), for fixed \(\sigma \) and \(\nu \), are mutually singular. This leads to problems when trying to design hierarchical methods based around these priors. We hence work instead with the modified covariances

$$\begin{aligned} \mathcal {C}_{\alpha ,\tau } = (\tau ^2 I - \Delta )^{-\alpha }, \end{aligned}$$

where \(\tau = 1/\ell > 0\) now represents an inverse length scale, and \(\alpha = \nu +d/2\) still controls the sample regularity. To be concrete we will always assume that the domain of the Laplacian is chosen so that \(\mathcal {C}_{\alpha ,\tau }\) is well defined for all \(\tau \ge 0\); for example, we may choose a periodic box, with domain restricted to functions which integrate to zero over the box, Neumann boundary conditions on a box, again with domain restricted to functions which integrate to zero over the box, or Dirichlet boundary conditions. We have the following theorem concerning the family of Gaussians \(\{N(0,\mathcal {C}_{\alpha ,\tau })\}_{\tau \ge 0}\), proved in Appendix.

Theorem 1

Let \(D = \mathbb {T}^d\) be the d-dimensional torus, and fix \(\alpha > 0\). Define the family of Gaussian measures \(\mu ^{\tau }_0 = N(0,\mathcal {C}_{\alpha ,\tau })\), \(\tau \ge 0\). Then

  1. (i)

    for \(d \le 3\), the \(\{\mu _0^\tau \}_{\tau \ge 0}\) are mutually equivalent;

  2. (ii)

    if \(u \sim \mu _0^\tau \), then \(\mu _0^\tau \)-a.s. we have \(u \in H^s(D)\) and \(u \in C^{\lfloor s \rfloor ,s-\lfloor s \rfloor }(D)\) for all \(s< \alpha - d/2.\) Footnote 1

  3. (iii)

    if \(u \sim \mu _0^\tau \) and \(v \sim N(0,\mathcal {D}_{\sigma ,\nu ,\ell })\), then

    $$\begin{aligned} \mathbb {E}\Vert u\Vert ^2 \propto \tau ^{d-2\alpha }\cdot \mathbb {E}\Vert v\Vert ^2 \end{aligned}$$

    with constant of proportionality independent of \(\tau .\)

Remark 1

  1. (a)

    Proof of this theorem is driven by the smoothness of the eigenfunctions of the Laplacian subject to periodic boundary conditions, together with the growth of the eigenvalues, which is like \(j^{2/d}.\) These properties extend to Laplacians on more general domains and with more general boundary conditions, and to Laplacians with lower order perturbations, and so the above result still holds in these cases. For discussion of this in relation to (ii) see Dashti and Stuart (2016); for parts (i) and (iii) the reader can readily extend the proof given in Appendix.

  2. (b)

    The proportionality in part (iii) above could be simplified if it were the case that \(\mathbb {E}\Vert v\Vert ^2\) were independent of \(\tau \). However, since we restrict to a bounded domain \(D\subset \mathbb {R}^d\), boundary effects mean that this is not necessarily true. Neumann boundary conditions, for example, inflate the variance up to a distance of approximately \(\ell \sqrt{8\nu } = \sqrt{8\nu }/\tau \) from the boundary Lindgren and Rue (2015). Nonetheless, at points \(x \in D\) sufficiently far away from the boundary we have \(\mathbb {E}|v(x)|^2\approx \sigma ^2\) independently of x. At these points we would hence expect that, for \(u \sim \mu _0^\tau \),

    $$\begin{aligned} \mathbb {E}|u(x)|^2 \propto \tau ^{d-2\alpha }. \end{aligned}$$

    Note also that numerically, we may produce samples on a larger domain \(D^*\) that contains the domain of interest D, in order to minimize the boundary effects within D.

Let \(X = C(D)\) denote the space of continuous real-valued functions on domain D. In what follows we will always assume that \(\alpha - d/2 > 0\) in order that the measures have samples in X almost surely. Additionally we shall write \(\mathcal {C}_\tau \) in place of \(\mathcal {C}_{\alpha ,\tau }\) when the parameter \(\alpha \) is not of interest.

In Sect. 2.2, we pass the inverse length-scale parameter \(\tau \) to the forward map and treat it as an additional unknown in the inverse problem. We therefore require a joint prior \(\mathbb {P}(u,\tau )\) on both the level set field and on \(\tau \). We will treat \(\tau \) as a hyper-parameter, so that \(\mathbb {P}(u,\tau )\) takes the form \(\mathbb {P}(u,\tau ) = \mathbb {P}(u|\tau )\mathbb {P}(\tau )\). Specifically, we will take the conditional distribution \(\mathbb {P}(u|\tau )\) to be given by \(\mu _0^\tau = N(0,\mathcal {C}_{\tau })\), and the hyperprior \(\mathbb {P}(\tau )\) to be any probability measure \(\pi _0\) on \(\mathbb {R}^+\), the set of positive reals; in practice it will always have a Lebesgue density on \(\mathbb {R}^+\). The joint prior \(\mu _0\) on \(X\times \mathbb {R}^+\) is therefore assumed to be given by

$$\begin{aligned} \mu _0(\mathrm {d}u,\mathrm {d}\tau ) = \mu _0^\tau (\mathrm {d}u)\pi _0(\mathrm {d}\tau ). \end{aligned}$$
(3)

Non-zero means could also be considered via a change of coordinates. Discussion of prior choice for the hierarchical parameters in latent Gaussian models may be found in Fuglstad et al. (2015).

2.2 Likelihood

In the previous subsection we defined a prior distribution \(\mu _0\) on \(X\times \mathbb {R}^+\). We now define a way of constructing a piecewise constant field from a sample \((u,\tau )\). In Iglesias et al. (2016), where the Bayesian level set method was introduced, the piecewise constant field was constructed purely as a function of u as follows. Let \(n \in \mathbb {N}\) and fix constants \(-\infty = c_0< c_1< \ldots < c_n = \infty \). Given \(u \in X\), define \(D_i(u)\subseteq D\) by

$$\begin{aligned} D_i(u) = \{x \in D\,|\, c_{i-1} \le u(x) < c_i\},\quad i=1,\ldots ,n \end{aligned}$$

so thatFootnote 2 \(\overline{D} = \bigcup _{i=1}^n \overline{D}_i(u)\) and \(D_i(u)\cap D_j(u) = \varnothing \) for \(i\ne j\), \(i,j \ge 1\). Then given \(\kappa _1,\ldots ,\kappa _n \in \mathbb {R}\), define the map \(F:X\rightarrow Z\) by

$$\begin{aligned} F(u) = \sum _{i=1}^n \kappa _i\mathbbm {1}_{D_i(u)}. \end{aligned}$$
(4)

Thus F maps the level set field to the geometric field, which is the field of interest, even though inference is performed on the level set field. We may take \(Z = L^p(D)\), the space of p-integrable functions on D, for any \(1\le p \le \infty \). F(u) then defines a piecewise constant function on D; the interfaces defined by the jumps are given by the level sets \(\{x\in D\,|\, u(x) = c_i\}\).

Remark 2

One of the constraints of this construction, discussed in Iglesias et al. (2016), is that in order for F(u) to pass from \(\kappa _i\) to \(\kappa _j\), it must pass through all of \(\kappa _{i+1},\ldots ,\kappa _{j-1}\) first. Thus this construction cannot represent, for example, a triple junction. This also means that that it must be known a priori that, for example, level i is typically found near levels \(i-1\) and \(i+1\), but unlikely to be found near levels \(i+3\) or \(i+4\). This is potentially a significant constraint; we discuss how this may be dealt with in the conclusions.

This construction is effective for a fixed value of \(\tau \), but in light of Theorem 1(iii), the amplitude of samples from \(N(0,\mathcal {C}_{\alpha ,\tau })\) varies with \(\tau \). More specifically, since \(d-2\alpha < 0\) by assumption, samples will decay towards zero as \(\tau \) increases. For this reason, employing fixed levels \(\{c_i\}_{i=0}^n\) and then changing the value of \(\tau \) during a sampling method may render the levels out of reach. We can compensate for this by allowing the levels to change with \(\tau \), so that they decay towards zero at the same rate as the samples.

From Theorem 1(iii) and Remark 1(b) we deduce that samples u from \(N(0,\mathcal {C}_{\alpha ,\tau })\) decay towards zero at a rate of approximately \(\tau ^{d/2-\alpha }\) with respect to \(\tau \). This suggests allowing for the following dependence of the levels on the length-scale parameter \(\tau \):

$$\begin{aligned} c_i(\tau ) = \tau ^{d/2-\alpha }c_i,\quad i=1,\ldots ,n. \end{aligned}$$
(5)

In order to update these levels, we must pass the parameter \(\tau \) to the level set map F. We therefore redefine the level set map \(F:X\times \mathbb {R}^+\rightarrow Z\) as follows. Let \(n \in \mathbb {N}\), fix initial levels \(-\infty = c_0< c_1< cdots < c_n = \infty \) and define \(c_i(\tau )\) by (5) for \(\tau > 0\). Given \(u \in X\) and \(\tau > 0\), define \(D_i(u,\tau )\subseteq D\) by

$$\begin{aligned} D_i(u,\tau )&= \{x \in D\;|\;c_{i-1}(\tau ) \le u(x) < c_i(\tau )\},\nonumber \\&\qquad i=1,\ldots , n, \end{aligned}$$
(6)

so that \(\overline{D} = \bigcup _{i=1}^n \overline{D}_i(u,\tau )\) and \(D_i(u,\tau )\cap D_j(u,\tau ) = \varnothing \) for \(i\ne j\), \(i,j \ge 1\). Now given \(\kappa _1,\ldots ,\kappa _n \in \mathbb {R}\), we define the map \(F:X\times \mathbb {R}^+\rightarrow Z\) by

$$\begin{aligned} F(u,\tau ) = \sum _{i=1}^n \kappa _i\mathbbm {1}_{D_i(u,\tau )}. \end{aligned}$$
(7)

We can now define the likelihood. Let \(Y = \mathbb {R}^J\) be the data space, and let \(S:Z\rightarrow Y\) be a forward operator. Define \(\mathcal {G}:X\times \mathbb {R}^+\rightarrow Y\) by \(\mathcal {G} = S\circ F\). Assume we have data \(y \in Y\) arising from observations of some \((u,\tau ) \in X\times \mathbb {R}^+\) under \(\mathcal {G}\), corrupted by Gaussian noise \(\eta \sim \mathbb {Q}_0 := N(0,{\varGamma })\) on Y:

$$\begin{aligned} y = \mathcal {G}(u,\tau ) + \eta . \end{aligned}$$
(8)

We now construct the likelihood \(\mathbb {P}(y|u,\tau )\). In the Bayesian formulation, we place a prior \(\mu _0\) of the form (3) on the pair \((u,\tau )\). Assuming \(\mathbb {Q}_0\) is independent of \(\mu _0\), the conditional distribution \(\mathbb {Q}_{u,\tau }\) of y given \((u,\tau )\) is given by

$$\begin{aligned} \frac{\mathrm {d}\mathbb {Q}_{u,\tau }}{\mathrm {d}\mathbb {Q}_0}(y) =\exp \bigg (-{\varPhi }(u,\tau ;y) + \frac{1}{2}|y|_{\varGamma }^2\bigg ), \end{aligned}$$
(9)

where the potential (or negative log-likelihood) \({\varPhi }:X\times \mathbb {R}^+\rightarrow \mathbb {R}\) is defined by

$$\begin{aligned} {\varPhi }(u,\tau ;y) = \frac{1}{2}|y - \mathcal {G}(u,\tau )|_{\varGamma }^2 \end{aligned}$$
(10)

and \(|\cdot |_{\varGamma }:= |{\varGamma }^{-1/2}\cdot |\).

Denote \(\mathrm {Im}(F) \subseteq Z\) the image of \(F:X\times \mathbb {R}^+\rightarrow Z\). In what follows we make the following assumptions on \(S:Z\rightarrow Y\).

Assumption 1

  1. (i)

    S is continuous on \(\mathrm {Im}(F)\).

  2. (ii)

    For any \(r > 0\) there exists \(C(r) > 0\) such that for any \(z \in \mathrm {Im}(F)\) with \(\Vert z\Vert _{L^\infty } \le r\), \(|S(z)| \le C(r)\).

In the next subsection we show that, under the above assumptions, the posterior distribution \(\mu ^y\) of \((u,\tau )\) given y exists, and study its properties.

2.3 Posterior

Bayes’ theorem provides a way to construct the posterior distribution \(\mathbb {P}(u,\tau |y)\) using the ingredients of the prior \(\mathbb {P}(u,\tau )\) and the likelihood \(\mathbb {P}(y|u,\tau )\) from the previous two subsections. Informally we have

$$\begin{aligned} \mathbb {P}(u,\tau |y)&\propto \mathbb {P}(y|u,\tau )\mathbb {P}(u,\tau )\\&\propto \exp \left( -{\varPhi }(u,\tau ;y)\right) \mu _0^\tau (u)\pi _0(\tau ), \end{aligned}$$

after absorbing \(y-\)dependent constants from the likelihood into the normalization constant. In order to make this formula rigorous some care must be taken, since \(\mu _0^\tau \) does not admit a Lebesgue density. The following is proved in Appendix.

Theorem 2

Let \(\mu _0\) be given by (3), y by (8) and \({\varPhi }\) be given by (10). Let Assumptions 1 hold. If \(\mu ^y(du,d\tau )\) is the regular conditional probability measure on \((u,\tau )|y\), then \(\mu ^y \ll \mu _0\) with Radon–Nikodym derivative

$$\begin{aligned} \frac{\mathrm {d}\mu ^y}{\mathrm {d}\mu _0}(u,\tau ) = \frac{1}{Z}\exp \big (-{\varPhi }(u,\tau ;y)\big ), \end{aligned}$$

where, for y almost surely,

$$\begin{aligned} Z:= \int _{X\times \mathbb {R}^+}\exp \big (-{\varPhi }(u,\tau ;y)\big )\,\mu _0(\mathrm {d}u,\mathrm {d}\tau ) > 0. \end{aligned}$$

Furthermore \(\mu ^y\) is locally Lipschitz with respect to y, in the Hellinger distance: for all \(y,y^{\prime }\) with \(\max \{|y|_{\varGamma },|y^{\prime }|_{\varGamma }\} < r\), there exists a \(C = C(r) > 0\) such that

$$\begin{aligned} d_{\mathrm {Hell}}(\mu ^y,\mu ^{y^{\prime }}) \le C|y-y^{\prime }|_{\varGamma }. \end{aligned}$$

This implies that, for all \(f \in L^2_{\mu _0}(X\times \mathbb {R}^+;E)\) for separable Banach space E,

$$\begin{aligned} \Vert \mathbb {E}^{\mu ^y}f(u,\tau ) - \mathbb {E}^{\mu ^{y^{\prime }}}f(u,\tau )\Vert _E \le C|y-y^{\prime }|. \end{aligned}$$

To the best of our knowledge this form of Bayesian posterior distribution, in which the prior hyper-parameter appears in the likelihood because it is natural to scale a thresholding function with that parameter, for algorithmic reasons, is novel. A different form of thresholding is studied in the paper Bolin and Lindgren (2015) where boundaries defining regions in which certain events occur with a specified (typically close to 1) probability is studied.

2.4 Relation to probit models

The Bayesian level set method has a close relation with an ordered probit model in the case that the state space X is finite dimensional. Suppose that \(X = \mathbb {R}^N\), then neglecting the length-scale parameter, the data \(y_{\mathrm {level}}\) in the level set method are assumed to arise via

$$\begin{aligned} y_{\mathrm {level}} = \mathcal {G}(F(u)) + \eta ,\quad \eta \sim N(0,{\varGamma }), \end{aligned}$$

where F denotes the original thresholding function as defined by (4). In an ordered probit model, the data \(y_{\mathrm {prob}}\) are assumed to arise viaFootnote 3

$$\begin{aligned} y_{\mathrm {prob}}&= \mathcal {G}(z),\\ z_n&= F(u_n + \varepsilon _n),\;\;\;\varepsilon _n \sim N(0,1),\quad n=1,\ldots ,N. \end{aligned}$$

Note that in the case of probit, the noise is applied before the thresholding F so that the geometric field takes values in the discrete set \(\{\kappa _1,\ldots ,\kappa _n\}\). In contrast in the case of the level set model the noise is applied after thresholding. If \(\mathcal {G}\) is linear then the probit model results in categorical data, while in the level set case the data can take any real value. Depending on the forward model either probit or level set may be more appropriate: the former in cases where the data are genuinely discrete and interpolation between phases does not have a meaning, such as categorical data, and the latter when they are continuous, such as when corrupted by measurement noise. The two models could also be combined, which may be interesting in some applications. In the small noise limit the models are seen to be equivalent.

Placing a prior upon u leads to a well-defined posterior distribution in both cases. Dimension-robust sampling of both distributions can be performed using a prior-reversible MCMC method, such as the preconditioned Crank–Nicolson (pCN) method Cotter et al. (2013). The spatial version of probit, that is when X is a function space rather than \(\mathbb {R}^N\), is of interest to study further.

Once we introduce the hierarchical length-scale dependence, significant problems arise in terms of sampling the probit posterior in high dimensions, due to the issues associated with measure singularity discussed above. With the level set method it is possible to circumvent through the choice of prior and rescaling discussed in this section; a well-defined Metropolis-within-Gibbs sampling algorithm on function space is outlined in the next section.

3 MCMC algorithm for posterior sampling

Having constructed the posterior distribution on \((u,\tau )|y\), we are now faced with the task of sampling this probability distribution. We will use the Metropolis-within-Gibbs formalism, as described in for example Robert and Casella (2013), Sect. 10.3. This algorithm constructs the Markov chain \((u^{(k)},\tau ^{(k)})\) with the structure

  • \(u^{(k+1)} \sim \mathbb K^{\tau ^{(k)},y}(u^{(k)},\cdot )\),

  • \(\tau ^{(k+1)} \sim \mathbb L^{u^{(k+1)},y}(\tau ^{(k)},\cdot )\),

where \(\mathbb K^{\tau ,y}\) is a Metropolis–Hastings Markov kernel reversible with respect to \(u|(\tau ,y)\) and \(\mathbb L^{u,y}\) is a Metropolis–Hastings Markov kernel reversible with respect to \(\tau |(u,y).\) The Metropolis–Hastings method is outlined in chapter 7 of Robert and Casella (2013). See Geirsson et al. (2015) for related blocking methodologies for Gibbs samplers in the context of latent Gaussian models.

In defining the conditional distributions, and the Metropolis methods to sample from them, a key design principle is to ensure that all measures and algorithms are well defined in the infinite-dimensional setting, so that the resulting algorithms are robust to mesh refinement Cotter et al. (2013). This thinking has been behind the form of the prior and posterior distributions developed in the previous section, as we now demonstrate.

In Sect. 3.1 we define the kernel \(\mathbb K^{\tau ,y}\) and in Sect. 3.2 we define the kernel \(\mathbb L^{u,y}.\) Then in the final Sect. 3.3 we put all these building blocks together to specify the complete algorithm used.

3.1 Proposal and acceptance probability for \(u|(\tau ,y)\)

Samples from the distribution of \(u|(\tau ,y)\) can be produced using a pCN Metropolis Hastings method Cotter et al. (2013), with proposal and acceptance probability as follows:

  1. 1.

    Given u, propose

    $$\begin{aligned} v = (1-\beta ^2)^{1/2}u + \beta \xi ,\;\;\;\xi \sim N(0,\mathcal {C}_\tau ). \end{aligned}$$
  2. 2.

    Accept with probability

    $$\begin{aligned} \alpha (u,v) = \mathrm{min}\big \{1,\exp \big ({\varPhi }(u,\tau ;y)-{\varPhi }(v,\tau ;y)\big )\big \} \end{aligned}$$

    or else stay at u.

3.2 Proposal and acceptance probability for \(\tau |(y,u)\)

Producing samples of \(\tau |(u,y)\) is more involved, since we must first make sense of this conditional distribution. To do this, define the three measures \(\eta _0\), \(\nu _0\), and \(\nu \) on \(X\times \mathbb {R}^+\times Y\) by

$$\begin{aligned} \eta _0(\mathrm {d}u, \mathrm {d}\tau ,\mathrm {d}y)&= \mu _0^0(\mathrm {d}u)\pi _0(\mathrm {d}\tau )\mathbb {Q}_0(\mathrm {d}y),\\ \nu _0(\mathrm {d}u, \mathrm {d}\tau ,\mathrm {d}y)&= \mu _0^\tau (\mathrm {d}u)\pi _0(\mathrm {d}\tau )\mathbb {Q}_0(\mathrm {d}y),\\ \nu (\mathrm {d}u,\mathrm {d}\tau ,\mathrm {d}y)&= \mu _0^\tau (\mathrm {d}u)\pi _0(\mathrm {d}\tau )\mathbb {Q}_{u,\tau }(\mathrm {d}y). \end{aligned}$$

Here \(\mathbb {Q}_0 = N(0,{\varGamma })\) is the distribution of the noise, and \(\mathbb {Q}_{u,\tau }\) is as defined in (9). Then we have the chain of absolute continuities \(\nu \ll \nu _0 \ll \eta _0\), with

$$\begin{aligned} \frac{\mathrm {d}\nu _0}{\mathrm {d}\eta _0}(u,\tau ,y)&= \frac{\mathrm {d}\mu _0^\tau }{\mathrm {d}\mu _0^0}(u) =: L(u,\tau ),\\ \frac{\mathrm {d}\nu }{\mathrm {d}\nu _0}(u,\tau ,y)&= \frac{\mathrm {d}\mathbb {Q}_{u,\tau }}{\mathrm {d}\mathbb {Q}_0}(y) = \exp \left( -{\varPhi }(u,\tau ;y)+\frac{1}{2}|y|_{\varGamma }^2\right) , \end{aligned}$$

and so by the chain rule we have \(\nu \ll \eta _0\) and

$$\begin{aligned} \frac{\mathrm {d}\nu }{\mathrm {d}\eta _0}(u,\tau ,y) = \frac{\mathrm {d}\mathbb {Q}_{u,\tau }}{\mathrm {d}\mathbb {Q}_0}(y)\cdot \frac{\mathrm {d}\mu _0^\tau }{\mathrm {d}\mu _0^0}(u) =: \varphi (u,\tau ,y). \end{aligned}$$

We use the conditioning lemma, Theorem 3.1 in Dashti and Stuart (2016), to prove the existence of the desired conditional distribution.

Theorem 3

Assume that \({\varPhi }:X\times \mathbb {R}^+\times Y\rightarrow \mathbb {R}\) is \(\eta _0\) measurable and \(\eta _0\)-a.s. finite. Assume also that, for (uy) \(\mu _0^0\times \mathbb {Q}_0\)-a.s.,

$$\begin{aligned} Z_\pi := \int _{\mathbb {R}^+} \exp \big (-{\varPhi }(u,\tau ;y)\big )L(u,\tau )\,\pi _0(\mathrm {d}\tau ) > 0. \end{aligned}$$

Then the regular conditional distribution of \(\tau |(u,y)\) exists under \(\nu \), and is denoted by \(\pi ^{u,y}\). Furthermore, \(\pi ^{u,y} \ll \pi _0\) and, for (uy) \(\nu \)-a.s,

$$\begin{aligned} \frac{\mathrm {d}\pi ^{u,y}}{\mathrm {d}\pi _0}(\tau ) = \frac{1}{Z_\pi } \exp \big (-{\varPhi }(u,\tau ;y)\big )L(u,\tau ). \end{aligned}$$

Proof

The conditional random variable \(\tau |(u,y)\) exists under \(\eta _0\), and its distribution is just \(\pi _0\) since \(\eta _0\) is a product measure. Theorem 3.1 in Dashti and Stuart (2016) then tells us that the conditional random variable \(\tau |(u,y)\) exists under \(\nu \). We denote its distribution \(\pi ^{u,y}\). Define

$$\begin{aligned} c(u,y)&= \int _{\mathbb {R}^+}\varphi (u,\tau ,y)\pi _0(\mathrm {d}\tau )\\&= \exp \left( \frac{1}{2}|y|_{\varGamma }^2\right) \int _{\mathbb {R}^+}\exp \big (-{\varPhi }(u,\tau ;y)\big ) L(u,\tau )\pi _0(\mathrm {d}\tau ). \end{aligned}$$

Now since \(\exp \big (\frac{1}{2}|y|_{\varGamma }^2\big ) \in (0,\infty )\) \(\mu _0^0\times \mathbb {Q}_0\)-a.s., we deduce that \(c(u,y) > 0\) \(\mu _0^0\times \mathbb {Q}_0\)-a.s. by the \(\mu _0^0\times \mathbb {Q}_0\)-a.s. positivity of \(Z_\pi \). By the absolute continuity \(\nu \ll \eta _0\), we deduce that \(c(u,y) > 0\) \(\nu \)-a.s. Therefore, again by Theorem 3.1 in Dashti and Stuart (2016), we have \(\pi ^{u,y}\ll \pi _0\) and, for (uy) \(\nu \)-a.s.,

$$\begin{aligned} \frac{\mathrm {d}\pi ^{u,y}}{\mathrm {d}\pi _0}(\tau )&= \frac{1}{c(u,y)}\varphi (u,\tau ,y)\\&= \frac{1}{Z_\pi } \exp \big (-{\varPhi }(u,\tau ;y)\big )L(u,\tau ). \end{aligned}$$

\(\square \)

Remark 3

Above we have used \(\mu _0^0\) as a reference measure, and the function \(L(u,\tau )\) enters our expression for the posterior. But any \(\mu _0^\lambda \) will suffice since the entire family of measures \(\{\mu _0^\tau \}_{\tau \ge 0}\) are equivalent to one another. A straightforward calculation with the chain rule gives

$$\begin{aligned} \frac{\mathrm {d}\pi ^{u,y}}{\mathrm {d}\pi _0}(\tau )&= \frac{1}{Z_{\pi ,\lambda }}\frac{\mathrm {d}\mu _0^\tau }{\mathrm {d}\mu _0^\lambda }(u)\exp \big (-{\varPhi }(u,\tau ;y)\big )\\&=\frac{1}{Z_{\pi ,\lambda }}L_{\lambda }(u,\tau )\exp \big (-{\varPhi }(u,\tau ;y)\big ). \end{aligned}$$

We now wish to sample from \(\pi ^{u,y}\) using a Metropolis–Hastings algorithm. We assume from now on that \(\pi _0\) admits a Lebesgue density, so that \(\pi ^{u,y}\) also admits a Lebesgue density. Abusing notation and using \(\pi ^{u,y},\pi _0\) to denote Lebesgue densities as well as the corresponding measures we have

$$\begin{aligned} \pi ^{u,y}(\tau ) \propto \exp \big (-{\varPhi }(u,\tau ;y)\big )L(u,\tau )\pi _0(\tau ). \end{aligned}$$

Take a proposal kernel \(Q(\tau ,\mathrm {d}\gamma ) = q(\tau ,\gamma )\,\mathrm {d}\gamma \). Define the two measures \(\rho , \rho ^T\) on \((\mathbb {R}\times \mathbb {R},\mathcal {B}(\mathbb {R})\otimes \mathcal {B}(\mathbb {R}))\) by

$$\begin{aligned} \rho (\mathrm {d}\tau , \mathrm {d}\gamma )&= \pi ^{u,y}(\mathrm {d}\tau )Q(\tau , \mathrm {d}\gamma )\\&\propto \exp \big (-{\varPhi }(u,\tau ;y)\big )L(u,\tau )\pi _0(\tau )q(\tau ,\gamma )\,\mathrm {d}\tau \mathrm {d}\gamma ,\\ \rho ^T(\mathrm {d}\tau ,\mathrm {d}\gamma )&= \mu (\mathrm {d}\gamma ,\mathrm {d}\tau ). \end{aligned}$$

Then under appropriate conditions on \(\pi _0\) and q, these two measures are equivalent. Define \(r(\tau ,\gamma )\) to be the Radon–Nikodym derivative

$$\begin{aligned} r(\tau ,\gamma )&:=\frac{\mathrm {d}\rho ^T}{\mathrm {d}\rho }(\tau ,\gamma )\\&= \exp \big ({\varPhi }(u,\tau ;y)-{\varPhi }(u,\gamma ;y)\big )\times \frac{\mathrm {d}\mu _0^\gamma }{\mathrm {d}\mu _0^\tau }(u)\\&\quad \times \frac{\pi _0(\gamma )q(\gamma ,\tau )}{\pi _0(\tau )q(\tau ,\gamma )}. \end{aligned}$$

The general form of the Metropolis–Hastings algorithm, as for example given in Tierney (1998), says that we produce samples from \(\pi ^{u,y}\) by iterating the follow two steps:

  1. 1.

    Given \(\tau \), propose \(\gamma \sim Q(\tau ,\mathrm {d}\gamma )\).

  2. 2.

    Accept with probability \(\displaystyle \alpha (\tau ,\gamma ) = n\big \{1, r(\tau ,\gamma )\big \}\), or else stay at \(\tau \).

In order to implement this algorithm, we need an expression for the Radon–Nikodym derivative \(\frac{\mathrm {d}\mu _0^\gamma }{\mathrm {d}\mu _0^\tau }(u)\). Denote by \(\{\lambda _j(\tau )\}_{j\ge 1}\) the eigenvalues of the covariance \(\mathcal {C}_\tau \), and \(\{\varphi _j\}_{j\ge 1}\) their corresponding eigenvectors. Note that because of the structure of the family \(\{\mathcal {C}_\tau \}_{\tau \ge 0}\), the eigenvectors are independent of \(\tau \). Using Proposition 3, we see that

$$\begin{aligned} \frac{\mathrm {d}\mu _0^\gamma }{\mathrm {d}\mu _0^\tau }(u)&= \prod _{j=1}^\infty \frac{\lambda _j(\tau )^{1/2}}{\lambda _j(\gamma )^{1/2}}\nonumber \\&\quad \times \exp \Bigg (\frac{1}{2}\sum _{j=1}^\infty \bigg (\frac{1}{\lambda _j(\tau )} - \frac{1}{\lambda _j(\gamma )}\bigg )\langle u,\varphi _j\rangle ^2\Bigg )\nonumber \\&= \exp \Bigg (\frac{1}{2}\Bigg [\sum _{j=1}^\infty \left( \frac{1}{\lambda _j(\tau )} \right. \left. - \frac{1}{\lambda _j(\gamma )}\right) \langle u,\varphi _j\rangle ^2 \nonumber \\&\quad + \log \left( \frac{\lambda _j(\tau )}{\lambda _j(\gamma )}\right) \Bigg ]\Bigg ). \end{aligned}$$
(11)

From Theorem 1 we know that \(\mu _0^\tau \) and \(\mu _0^\gamma \) are equivalent, and so it must be the case that the expressions for the derivative above are almost-surely finite. However, this is not immediately clear from inspection of the expression; thus we provide some intuition about why it is so in the following theorem. The proof is given in Appendix.

Theorem 4

Assume that \(u \sim N(0,\mathcal {C}_0)\). Then for each \(\tau > 0\),

  1. (i)

    \(\displaystyle \sum _{j=1}^\infty \left( \frac{1}{\lambda _j(\tau )} - \frac{1}{\lambda _j(0)}\right) \langle u,\varphi _j\rangle ^2\) is almost-surely finite if and only if \(d = 1\); and

  2. (ii)

    \(\displaystyle \sum _{j=1}^\infty \left[ \left( \frac{1}{\lambda _j(\tau )} - \frac{1}{\lambda _j(0)}\right) \langle u,\varphi _j\rangle ^2 + \log \left( \frac{\lambda _j(\tau )}{\lambda _j(0)}\right) \right] \) is almost-surely finite if \(d \le 3\).

A consequence of part (i) of this result is that in dimensions 2 and 3, both the product and the sum in (11) diverge, despite the whole expression being finite. This means that care is required when numerically implementing the Gibbs update of \(\tau .\)

3.3 The algorithm

Putting the theory above together, we can write down a Metropolis-within-Gibbs algorithm for sampling the posterior distribution. Recall that we assumed the proposal kernel Q admitted a Lebesgue density q: \(Q(\tau ,\mathrm {d}\gamma ) = q(\tau ,\gamma )\mathrm {d}\gamma \).

Let \(\{\lambda _j(\tau ),\varphi _j\}_{j \ge 1}\) denote the eigenbasis associated with \(\mathcal {C}_\tau \). Define

$$\begin{aligned} w(\tau ,\gamma ) =&\exp \Bigg (\frac{1}{2}\sum _{j=1}^\infty \bigg [\left( \frac{1}{\lambda _j(\tau )} - \frac{1}{\lambda _j(\gamma )}\right) \langle u,\varphi _j\rangle ^2\\&+ \log \left( \frac{\lambda _j(\tau )}{\lambda _j(\gamma )}\right) \bigg ]\Bigg ) \end{aligned}$$

and set

$$\begin{aligned} \alpha ^\tau (u,v)&= \mathrm{min}\Big \{1,\exp \big ({\varPhi }(u,\tau ;y) - {\varPhi }(v,\tau ;y)\big )\Big \},\\ \alpha ^u(\tau ,\gamma )&= \mathrm{min}\bigg \{1,\exp \big ({\varPhi }(u,\tau ;y) - {\varPhi }(u,\gamma ;y)\big )\\&\quad \cdot w(\tau ,\gamma )\cdot \frac{\pi _0(\tau )q(\tau ,\gamma )}{\pi _0(\gamma )q(\gamma ,\tau )}\bigg \}. \end{aligned}$$

Fix jump parameter \(\beta \in (0,1]\), and generate \(\{u^{(k)},\tau ^{(k)}\}_{k\ge 0}\) as follows:

figure a

Then \(\{u^{(k)},\tau ^{(k)}\}_{k\ge 0}\) is a Markov chain which is invariant with respect to \(\mu ^y(du,d\tau )\).

4 Numerical results

We perform a variety of numerical experiments to illustrate the performance of the hierarchical algorithm described in Sect. 3. We focus on three different forward models. The first is pointwise observations composed with the identity—the simplicity of this model allows us to probe the behavior of the algorithm at low computational cost, and such models are also of interest in applications such as image reconstruction—see, for example, Alvarez and Morel (1994), Sapiro (2006) and the references therein. The other two, groundwater flow and EIT, are physical models which have previously been studied extensively, including study of non-hierarchical Bayesian level set methods Iglesias et al. (2016), Dunlop and Stuart (2015). A review of studies on inverse problems associated with EIT is given in Borcea (2002).

The code used for simulations is available on GitHub at https://github.com/mattdunlop/bayes-hier/releases/v1.0.

4.1 Discretization of the problem

There are two spaces that we must discretize in order to implement the algorithm. The first is the state space, where the samples will be generated, and the second is the function space associated with the evaluation of the forward model. We briefly outline how this is done.

Our discretization for the state space relies on the Karhunen–Loéve expansion of the prior. Suppose we wish to produce samples from a Gaussian measure \(N(0,\mathcal {C})\), where \(\mathcal {C}\) has associated eigenbasis \(\{\lambda _j,\varphi _j\}_{j\in \mathbb {N}}\). Then a sample u from this distribution may be represented as

$$\begin{aligned} u(x) = \sum _{j=1}^\infty \sqrt{\lambda _j}\xi _j\varphi _j(x),\;\;\;\xi _j\sim N(0,1)\text { i.i.d.} \end{aligned}$$

We discretize the space by truncating and approximating this basis, so that elements of the space are represented as

$$\begin{aligned} u^N(x) = \sum _{j=1}^N u_j^N\varphi _j^N(x). \end{aligned}$$

The inference is then performed on the random variables \(\{u_j^N\}_{j=1}^N\). Additionally, in the cases we consider, the eigenvectors associated with all covariances are given by the Fourier basis and so we may use the Fast Fourier Transform for efficient implementation.

The second discretization occurs in the solution of the differential equations. In the EIT example a finite element discretization is used, in which the functions are approximated by expansion in a finite basis. The coefficients of the expansion of the solution to the PDE in this basis are then solved for numerically. The basis is chosen such that each basis element is locally supported—this ensures that matrices arising in the implementation of the method are sparse.

The groundwater flow model uses a finite difference discretization, in which derivatives are approximated by difference quotients. For example, given a uniform grid \(\{x_i,y_j\}_{i,j=1}^N\) with spacing \(x_{i+1}-x_i = \delta \), we may approximate

$$\begin{aligned} \frac{\partial h}{\partial x}(x_i,y_j) \approx \frac{h(x_i + \delta ,y_j) - h(x_i - \delta ,y_j)}{2\delta }. \end{aligned}$$

This leads to an approximate solution to the PDE defined on the grid \(\{x_i,y_j\}_{i,j=1}^N\).

Finite element, finite difference and even spectral methods outlined above can all be used for any PDE examples; what we use for illustrative purposes in this paper (EIT with finite element and groundwater flow with finite difference) are just examples of numerous possible forward models and discretization combinations.

4.2 Identity map

The first inverse problem is based on reconstruction of a piecewise constant field from noisy pointwise observations.

4.2.1 The forward model

Let \(D = [0,1]^2\) and define a grid of observation points \(\{q_j\}_{j=1}^J\subseteq D\). Let \(Z = L^p(D)\) for some \(1\le p < \infty \) and let \(Y = \mathbb {R}^J\). The forward operator \(S:Z\rightarrow Y\) is defined by

$$\begin{aligned} S(\kappa ) = (\kappa (q_1),\ldots ,\kappa (q_J)). \end{aligned}$$

We are then interested in finding \(\kappa \), given the prior information that it is piecewise constant, and taking a number of known prescribed values. Let \(\mathcal {G} = S\circ F:X\times \mathbb {R}^+\rightarrow Y\). We reconstruct \((u,\tau )\) and hence \(\kappa =F(u,\tau )\). The map S is not continuous, and so Assumptions 1 do not hold. However, Proposition 2 in Appendix shows that the map \(\mathcal {G}\) is uniformly bounded, and almost-surely continuous under the priors considered. From this the conclusions of Proposition 1 in Appendix follow, and it is possible to deduce the conclusions of Theorem 2.

4.2.2 Simulations and results

We study the effect of different length scales, for both hierarchical and non-hierarchical methods, demonstrating the advantages of the former over the latter. To this end we define \(\tau _i^\dagger = 5i\), \(i=1,\ldots ,10\), and generate 10 different true level set fields \(u_i^\dagger \sim \mu _0^{\tau _i^\dagger }\) on a mesh of \(2^{10}\times 2^{10}\) points. This leads to 10 sets of data \(y_i\), given by

$$\begin{aligned} y_i = \mathcal {G}(u_i^\dagger ,\tau _i^\dagger ) + \eta _i,\;\;\;\eta _i \sim N(0,{\varGamma })\text { i.i.d.}, \end{aligned}$$

where we take the noise covariance \({\varGamma }= 0.2^2\cdot I\) to be white. The level set map F is defined such that there are 3 phases, taking the constant values 1, 3, and 5. The mean relative error on the generated datasets ranges from 6 to 9 %.

One of the motivations for developing a hierarchical method is that little knowledge may be known a priori about the length scale associated with the unknown geometric field. We therefore sample from each hierarchical posterior distribution associated with each \(y_i\) using a variety of initial values for the length-scale parameter. This allows us to check that, computationally, we can recover a good approximation to the true length scale even if our initial guess is poor. Specifically, for each set of data we run 10 hierarchical MCMC simulations started at the different values of \(\tau = \tau _k^\dagger \), giving a total of 100 hierarchical MCMC chains. For all chains we place a relatively flat prior of \(N(20,10^2)\) on \(\tau \). On the prior for the level set function u, we take Neumann boundary conditions and fix the smoothness parameter \(\alpha = 5\). The thresholding levels in the level set map are chosen such that there is an order one amount of prior mass in all levels—specifically we take \(c_1 = -0.1\) and \(c_2 = 0.1\).

We also wish to compare how the hierarchical method compares with the non-hierarchical method. We therefore look at the 10 different posterior distributions that arise from each set of data \(y_i\) when using each of 10 fixed prior inverse length scales \(\tau _k^\dagger \), which gives another 100 MCMC chains.

We perform all sampling on a mesh of \(2^7\times 2^7\) points to avoid an inverse crime, discretizing via the discrete Fourier transform (DFT) and retaining all \(2^{14}\) modes. The observation grid \(\{q_j\}_{j=1}^{100}\) is taken to be a uniformly spaced grid of 100 points. We use a Gaussian random walk proposal distribution for the length-scale parameter. We make this choice as it is the canonical starting point for MCMC, and it works in this case. It is possible, however, that something more sophisticated may be beneficial. We produce \(5\times 10^6\) samples for each chain, and discard the first \(10^6\) samples as burn-in when calculating quantities of interest.

Fig. 2
figure 2

(Identity model) The sample mean of \(\tau \) along each hierarchical MCMC chain, against the initial value of \(\tau \). The different curves arise from using different data \(y_i\)

Table 1 (Identity model) The value of \(\tau \) used to create the data \(y_i\), and the mean value of \(\tau \) across the MCMC chains and the different initial values of \(\tau \)

In Fig. 2 we look at the recovery of the true value of \(\tau \) with the hierarchical method. For large enough \(\tau _0\), the mean of \(\tau \) after the burn-in period is roughly constant with respect to varying the initialization point, for each posterior. This makes sense from a theoretical point of view since these means arise from the same posterior distribution, for a fixed truth, but it is also reassuring from a computational point of view since the output is close to independent of the initial guess for the length scale. There does, however, appear to be an issue with initializing the value of \(\tau \) at too low a value, with the value \(\tau \) tending to get stuck far from the truth when initialized at \(=5\). This effect has been detected in several other experiments and models—initializing the value of \(\tau \) much lower than the true inverse length can cause the parameter to become stuck in a local minimum. Such an effect has not been observed, however, when the parameter is initialized significantly larger than the true value. Table 1 shows that recovery of the true value of \(\tau \) is very good for \(\tau ^\dagger \le 35\), though becomes slightly worse for larger values of \(\tau ^\dagger \). The means here are calculated without the \(\tau _0 = 5\) sample means since they are clearly outliers for most of the posteriors. One possible explanation for the lack of recovery in the cases \(\tau ^\dagger = 40\), 45, and 50 is to do with the structure of the observation map S. The observation grid has a length scale associated with it, related to distances between observation points, and so issues could arise when trying to detect the length scale of the geometric field that is significantly shorter than this. Additionally, the length scales \(1/\tau \) are closer for larger \(\tau \) and so it may be more difficult to distinguish between particular values.

Fig. 3
figure 3

(Identity model) The trace of \(\tau \) along the MCMC chain, when initialized at the 10 different initial values. True inverse length scale is \(\tau = 15\)

Fig. 4
figure 4

Simulations for the identity model. a The true geometric field used to generate the data y, with true inverse length scale \(\tau = 15\). b (Top) representative samples of \(F(u,\tau )\) under the hierarchical posterior. (Middle) approximations of \(F(\mathbb {E}(u),\mathbb {E}(\tau ))\). (Bottom) approximations of \(\mathbb {E}(F(u,\tau ))\). From left-to-right, \(\tau \) is initialized at \(\tau = 5, 15, 25, 35, 45\). c As in b, using the non-hierarchical method. From left-to-right, \(\tau \) is fixed at \(\tau = 5, 15, 25, 35, 45\)

Fig. 5
figure 5

(Identity model) approximations of \(\text {Var}(F(u,\tau ))\) using the hierarchical (top) and fixed (bottom) priors, initialized or fixed at \(\tau = 5, 15, 25, 35, 45\), from left-to-right. True inverse length scale is \(\tau = 15\)

Fig. 6
figure 6

(Identity model) representative samples \(\tau ^4\cdot u\) (top) and sample means \(\mathbb {E}(\tau ^4\cdot u)\) (bottom) of the level set function. The rescaling \(\tau ^4\) means that the above quantities have the same approximate amplitude. True inverse length scale is \(\tau = 15\). (Left) Using the non-hierarchical method; from left-to-right \(\tau \) is fixed at \(\tau = 5, 15, 25, 35, 45\). (Right) Corresponding quantities for the hierarchical method

For brevity we now focus on the case where \(\tau ^\dagger = 15\). The traces of the values of \(\tau \) along the hierarchical chains corresponding to this truth is shown in Fig. 3. After approximately \(10^6\) samples, all chains have become centered around the true length scale. This convergence appears to be roughly linear for each chain.

Fig. 7
figure 7

(Identity model) (diagonal) empirical densities of \(\tau \) and the first five KL modes of u. (off-diagonal) empirical joint densities. True inverse length scale is \(\tau = 15\)

Figure 4 shows the push forwards of the sample means from the different chains under the level set map, that is, approximations of \(F(\mathbb {E}(u),\mathbb {E}(\tau ))\). This figure also shows approximations of \(\mathbb {E}(F(u,\tau ))\) and typical samples of \(F(u,\tau )\) coming from the different chains. We see that these conditional means for the hierarchical method appear to agree with one other. This is reassuring for the reason mentioned above—they are all estimates of the mean of the same distribution. The figures for the non-hierarchical posteriors admit greater variation, especially near the boundary for higher values of \(\tau \). Moreover, not all inclusions are detected when the length-scale parameter is taken to be \(\tau = 5\). Note that the mean from the hierarchical posterior agrees closely with that from the non-hierarchical posterior using the fixed true length scale \(\tau = 15\). Additionally, even though the means are reasonable approximations to the truth in most cases, the typical samples are much worse when using the non-hierarchical method with an incorrect length-scale parameter.

We can also consider the sample variance of the pushforward of the samples by the level set map, i.e., approximations of the quantity \(\text {Var}(F(u,\tau ))\). In Fig. 5 we show this quantity for both the hierarchical and non-hierarchical priors. Note that for the non-hierarchical priors, the variance increases both at the boundary and away from the observation points for larger values of \(\tau \). Variance is also higher along the interfaces and within the central phase, since points in these locations are more likely to switch between all three phases. The hierarchical approximations all appear to agree. While the hierarchical means are very similar to the non-hierarchical means using the true length scale, as seen in Fig. 4, the hierarchical variances are smaller away from the observation points.

Additionally, we look at the level set function u itself in Fig. 6. In these plots we rescale the level set function by \(\tau ^{\alpha -d/2} = \tau ^4\) so that they are all of approximately the same amplitude. The means for both the hierarchical and non-hierarchical methods are again quite similar to one another, though the difference between the typical samples is much more stark.

Finally, in Fig. 7, we look at the joint densities of the inverse length-scale parameter \(\tau \) and first five Karhunen–Loève (KL) modes of the level set function u.Footnote 4 Non-trivial correlations are evident between \(\tau \) and each of these modes, with the support of the densities appearing non-convex. This is likely related to the non-linear scaling between the length scale and the amplitude of the level set function under the prior. Conversely the KL modes, while still correlated with one another, have simpler joint densities. Note, also, that the posterior on the length scale is centered close to the true value of the inverse length-scale parameter \(\tau .\)

Remark 4

In this section we studied the ability to recover the true length-scale parameter \(\tau ^\dagger \), given a finite number of direct noisy observations of the geometric field. The question arises of how the quality of this recovery depends upon the spatial resolution of the data. As would be expected, learning this parameter becomes more difficult when this resolution is poor due to the lack of information in the data. However, it is interesting to note that, even in the limit of an infinite number of distinct observation points, it is unlikely that we would be able to identify \(\tau ^\dagger \) perfectly. This is suggested by a result of Zhang (2004) which states that, in the context of generalized linear mixed models, the marginal variance and length-scale parameters of a Matérn field cannot be consistently estimated in this limit where as in our case the domain is fixed. This is in contrast to the case of additional data points increasing the domain, where consistent estimation is possible Marshall and Mardia (1984).

4.3 Identification of geologic facies in groundwater flow

The identification of geologic facies in subsurface flow applications is a common example of a large-scale inverse problem that involves the recovery of unknown interfaces. In the case of groundwater flow, for example, the inverse problem concerns the recovery of the interface between regions with different hydraulic conductivity given measurements of hydraulic head. Geometric inverse problems of this type have recently received a lot of attention by the research community Xie et al. (2011), Ping and Zhang (2014), Lorentzen et al. (2012), Lorentzen et al. (2012). Indeed, it has been recognized that the geometry determined by the aforementioned interfaces constitutes one of the main sources of uncertainty that must be quantified and reduced by means of Bayesian inversion.

In the context of groundwater flow, the identification of interfaces between regions associated with different types of geological properties can be posed as the recovery of a piecewise constant conductivity field parameterized with a level set function. A fully Bayesian level set framework for the solution of the aforementioned type of inverse problems has been recently developed in Iglesias et al. (2016). The MCMC method applied in Iglesias et al. (2016) performs well when the prior of the level set function properly encodes the intrinsic length scales of the unknown interfaces. Clearly, in practical applications such length scales are most likely unknown and their incorrect specification may result in inaccurate and uncertain estimates of the unknown interfaces. The purpose of this section is to show that the proposed hierarchical Bayesian framework enables us to determine an optimal length scale in the prior of the level set function which, in turn, captures more accurately the intrinsic length scale of the unknown interface.

4.3.1 The forward model

We are interested in the identification of a piecewise constant hydraulic conductivity, denoted by \(\kappa \), of a two-dimensional confined aquifer whose physical domain is \(D=[0,6]\times [0,6]\). We assume single-phase steady-state Darcy flow. The piezometric head, denoted by h(x) (\(x\in D\)), which describes the flow within the aquifer can be modeled by the solution of Bear (1972)

$$\begin{aligned} -\nabla \cdot \kappa \nabla h&=f&\qquad \text {in}~~D, \end{aligned}$$
(12)

where f represents sources/sinks and where boundary conditions need to be specified. For the present work we consider the setup from the Benchmark used in Carrera and Neuman (1986), Hanke (1997), Iglesias and Dawson (2007), Iglesias et al. (2013), Iglesias (2016), and Iglesias et al. (2016). In concrete, we assume that f is a recharge term of the form

$$\begin{aligned} f(x_{1},x_{2})=\left\{ \begin{array}{lll} 0 &{}\quad \text {if}&{} 0< x_{2}\le 4,\\ 137&{}\quad \text {if}&{} 4< x_{2}< 5,\\ 274&{}\quad \text {if}&{} 5\le x_{2} < 6.\end{array}\right. \end{aligned}$$
(13)

and we consider the following boundary conditions

$$\begin{aligned} \begin{aligned} h(x_1,0)=100, \qquad \frac{\partial h}{\partial x_1}(6,x_2)=0,\\ -\kappa \frac{\partial h}{\partial x_1}(0,x_2)=500,\qquad \frac{\partial h}{\partial x_2}(x_1,6)=0. \end{aligned} \end{aligned}$$
(14)

We consider the inverse problem of recovering \(\kappa \) from observations \(\{\ell _j(h)\}_{j=1}^{64}\) of h given by (12)–(14). We assume we have smoothed point observations given by

$$\begin{aligned} \ell _j(h) = \int _D \frac{1}{2\pi \varepsilon ^2}e^{-\frac{1}{2\varepsilon ^2}(x-q_j)^2}h(x)\,\mathrm {d}x, \end{aligned}$$

where \(\varepsilon > 0\) and \(\{q_j\}_{j=1}^{64}\subseteq D\) is a grid of 64 observation points equally distributed on D. Let \(Z = L^p(D)\) for some \(1\le p < \infty \) and \(Y = \mathbb {R}^{64}\). Given \(\kappa \in Z\), let h be given by (12)–(14). Then the forward map \(S:Z\rightarrow Y\) is given by

$$\begin{aligned} \kappa \mapsto (\ell _1(h),\ldots ,\ell _{64}(h)). \end{aligned}$$

We assume that each \(\kappa _i\) in the definition of the level set map F is strictly positive. The image of F is contained in the set of bounded fields on D bounded below by \(n_i\kappa _i > 0\). In Iglesias et al. (2016) the map S is shown to be continuous and uniformly bounded on such fields, with respect to \(\Vert \cdot \Vert _{L^p(D)}\) for some p, and so Assumptions 1 hold. As a consequence Theorem 2 applies directly.

4.3.2 Simulations and results

In the previous example we illustrate, with a simple model, the capabilities of the proposed framework to recover a specified true length scale and a true level set function that defines a true discontinuous field from which synthetic data are generated. However, we must reiterate that, in practice, we wish to recover the true discontinuous field; the level set function is merely an artifact that we use for the parameterization of such a field. In practical applications the aim of the proposed hierarchical Bayesian level set framework is to infer a length scale alongside with a level set function which, by means of expression (7), produces a discontinuous field that captures the desired piecewise constant field as accurately as possible and, in particular, the intrinsic length-scale separation of the interfaces determined by the discontinuities of the true geometric field. Therefore, in order to test our methodology in the applied setting of groundwater flow, rather than a true level set function, in this subsection we consider the true hydraulic conductivity \(\kappa ^{\dagger }\) whose logarithm is displayed in Fig. 9a. This \(\kappa ^\dagger \) is defined such that it takes the constant values \(e^{1.5}\), \(e^{4}\) and \(e^{6.5}\). This is channelized conductivity typical of fluvial environments and often used as Benchmarks for subsurface flow inversion (Ping and Zhang 2014; Lorentzen et al. 2012; Xie et al. 2011; Iglesias et al. 2016). Note that the values that the conductivity can take on the three different regions differ by at least one order of magnitude, due to the logarithmic transformation. While there is indeed an intrinsic length scale in the channelized structure, this true conductivity field does not come from a specified level set prior.

Fig. 8
figure 8

(Groundwater flow model) trace plots of \(\tau \) obtained from six hierarchical MCMC chains

Fig. 9
figure 9

Simulations for the groundwater flow model. a (Left) logarithm of the true hydraulic conductivity field used to generate the data y. (Right) true pressure field and the grid of observation points. b (Top) logarithm of representative samples of \(F(u,\tau )\) under the hierarchical posterior. (Middle) logarithm of the approximations of \(F(\mathbb {E}(u),\mathbb {E}(\tau ))\). (Bottom) logarithm of the approximations of \(\mathbb {E}(F(u,\tau ))\). From left-to-right, \(\tau \) is initialized at \(\tau = 10, 30, 50, 70, 90\). c As in b, using the non-hierarchical method. From left-to-right, \(\tau \) is fixed at \(\tau = 10, 30, 50, 70, 90\)

Fig. 10
figure 10

(Groundwater flow model) approximations of \(\text {Var}\big (F(u,\tau )\big )\) using the hierarchical (top) and the non-hierarchical (bottom) MCMC

Fig. 11
figure 11

(Groundwater flow model) representative samples and sample means of the level set function. The rescaling \(\tau ^4\) means that the above quantities have the same approximate amplitude. True inverse length scale is \(\tau = 15\). a (Top) representative samples of the rescaled level set function \(\tau ^4\cdot u\) and (bottom) approximations of \(\mathbb {E}(\tau ^4\cdot u)\) using the hierarchical method. From left-to-right, \(\tau \) is initialized at \(\tau = 10, 30, 50,70, 90\). b As in a, using the non-hierarchical method. From left-to-right, \(\tau \) is fixed at \(\tau = 10, 30, 50, 70, 90\)

Fig. 12
figure 12

(Groundwater flow model) (diagonal) empirical densities of \(\tau \) and the first five KL modes of u. (off-diagonal) empirical joint densities

Synthetic data are generated by means of

$$\begin{aligned} y= (\ell _1(h^{\dagger }),\ldots ,\ell _{64}(h^{\dagger }))+ \eta ,\quad \eta \sim N(0,{\varGamma })\text { i.i.d.}, \end{aligned}$$

where \(h^{\dagger }\) is the solution to (12)-(14) for \(\kappa =\kappa ^{\dagger }\). Equations (12)–(14) have been solved with cell-centered finite differences Arbogast et al. (1997). In order to avoid inverse crimes, synthetic data are generated on a grid finer (\(160\times 160\) cells) than the one used for the inversion (\(80\times 80\) cells). The discretization is performed via the DFT, and we retain all modes. In addition, \({\varGamma }\) is a diagonal matrix given by \({\varGamma }_{i,i}= 0.0175\ell _i(h^{\dagger })\). In other words, we add noise that corresponds to 1.75 % of the size of the noise-free observations. On the prior for the level set function u, we take Neumann boundary conditions and fix the smoothness parameter \(\alpha = 5\).

We consider a Gaussian prior \(N(35,10^2)\) for \(\tau \), and use a Gaussian random walk proposal distribution for this parameter. We then apply the hierarchical MCMC method from subsection 3.3 initialized with the following six different choices of \(\tau =1,10,30,50,70,90\) and a sample of the prior (with that given \(\tau \)) of the level set function u. We thus produce six MCMC chains of length \(4\times 10^6\) and discard the first \(10^6\) as burn-in for the computation of quantities of interest. The trace plots of \(\tau \) are displayed in Fig. 8 from which we clearly observe that all chains, regardless of their initial point, seem to stabilize and produce samples around \(\tau =18\). In the top row of Fig. 9b we display the logarithm of some representatives samples of \(F(u,\tau )\) under the hierarchical posterior. The middle row of Fig. 9b shows the logarithm of \(F(\mathbb {E}(u),\mathbb {E}(\tau ))\), i.e., the pushforward of the posterior means obtained using the hierarchical method. The bottom row of Fig. 9b displays the logarithm of the approximations of \(\mathbb {E}(F(u,\tau ))\). That is, the expected value of the pushforward samples under the posterior. The aforementioned results corresponds to five MCMC chains with \(\tau \) initialized \(\tau =10,30,50,70,90\) (the results for \(\tau =1\) have been omitted). Similarly, Fig. 10 (top) shows the approximations of the variance of the pushforward samples of the posterior, i.e., \(\text {Var}\big (F(u,\tau )\big )\). Clearly, both \(\mathbb {E}(F(u,\tau ))\) and \(F(\mathbb {E}(u),\mathbb {E}(\tau ))\) result in fields that provide a reasonable approximation of the true geometric field. Note that, as expected, the largest uncertainty in the distribution of the pushforward samples is around the interface between the regions with different conductivity. In Fig. 11a we show some representative samples of u (top) and approximations to \(\mathbb {E}(u)\) (bottom). In these plots, as before, we rescale the level set function by \(\tau ^{\alpha -d/2} = \tau ^4\) so that they are all of approximately the same amplitude. In Fig. 12 we display the empirical densities of \(\tau \) and the first five KL modes of u. A key observation is that, although the true hydraulic conductivity is not generated by thresholding a Gaussian random field, and hence there is no “true” length scale, the posterior nonetheless settles on a narrow range of values of \(\tau \) which are consistent with the data.

From the aforementioned results we can also clearly see that the hierarchical MCMC algorithm produces similar outcomes regardless of the initialization of the inverse of the length scale \(\tau \), reflecting ergodicity of the Markov chain. The results from \(\tau =1\) are not shown but they are very similar to the ones from other chains. As with the results from the previous subsection, the similarity in outcomes between all six chains is not surprising as these are aimed at sampling from the same posterior distribution; but the fact that this posterior distribution on \(\tau \) concentrates near to a single value is of particular interest because it shows that the true geometric field has an intrinsic length scale, even though it was not constructed via the map \(F(u,\tau ).\) Furthermore, this similarity of outcomes between chains showcases the main advantage of the proposed framework with respect to the non-hierarchical one. Indeed, as stated earlier, the proposed method has the ability to recover a distribution for the intrinsic length scale which gives rise to reasonably accurate estimates (i.e., \(F(\mathbb {E}(u),\mathbb {E}(\tau ))\), and \(\mathbb {E}(F(u,\tau ))\)) of the true geometric field. We now present the numerical results from applying a non-hierarchical MCMC algorithm in which the inverse of length scale \(\tau \) is fixed. We consider again six MCMC chains as before with the (now fixed) values of \(\tau =1,10,30,50,70,90\) that we used to initialized the hierarchical chains used before. Analogous results to the ones presented for the hierarchical method can be found in the bottom panels of Fig. 9 as well as the bottom of Figs. 10 and 11. Clearly, the lack of properly prescribing the intrinsic length scale in the non-hierarchical method results in inaccurate estimates of the true geometric field. We clearly observe that for \(\tau \ge 30\) the estimates of the truth given by \(F(\mathbb {E}(u),\mathbb {E}(\tau ))\) and \(\mathbb {E}(F(u,\tau ))\) are substantially inaccurate and the uncertainty measured by \(\text {Var}\big (F(u,\tau )\big )\) is large. The non-hierarchical MCMC for \(\tau =1\) did not converge; the results are not shown. The non-hierarchical MCMC only provides reasonable estimates for \(\tau =10\) and \(\tau =30\). However, we can visually appreciate that these results are still suboptimal when compared to the results from the hierarchical framework.

4.4 Electrical impedance tomography

Finally we consider the electrical impedance tomography (EIT) problem. This problem has previously been approached with a non-hierarchical Bayesian level set method Dunlop and Stuart (2015). In this subsection we show that the hierarchical approach outperforms the non-hierarchical approach in the case where the true conductivity is a binary field, given the same number of forward model evaluations.

Fig. 13
figure 13

(EIT model) the trace of \(\tau \) along the MCMC chain, when initialized at the 5 different values \(\tau = 10, 30, 50, 70, 90\)

4.4.1 The forward model

EIT is an imaging technique which attempts to infer the internal conductivity of a body from boundary voltage measurements. Typical applications include medical imaging, as well as subsurface imaging where it is known as electrical resistivity tomography (ERT). We utilize the complete electrode model (CEM), proposed in Somersalo et al. (1992). This is a physically accurate model which has been shown to agree with experimental data up to measurement precision. The strong form of the PDE governing the model is given by

$$\begin{aligned} {\left\{ \begin{array}{ll} \displaystyle -\nabla \cdot (\kappa (x)\nabla v(x)) = 0 &{}\quad x \in D\\ \displaystyle \int _{e_l} \kappa \frac{\partial v}{\partial n}\,\mathrm {d}S = I_l &{}\quad l=1,\ldots ,L\\ \displaystyle \kappa (x)\frac{\partial v}{\partial n}(x) = 0 &{}\quad x \in \partial D\setminus \bigcup _{l=1}^L e_l\\ \displaystyle v(x) + z_l\kappa (x)\frac{\partial v}{\partial n}(x) = V_l &{}\quad x \in e_l, l=1,\ldots ,L. \end{array}\right. } \end{aligned}$$

Here \(D\subseteq \mathbb {R}^2\) is the domain and \(\{e_l\}_{l=1}^L \subseteq \partial D\) are electrodes on the boundary upon which currents \(\{I_l\}_{l=1}^L\) are injected and voltages \(\{V_l\}_{l=1}^L\) are read. The numbers \(\{z_l\}_{l=1}^L\) represent the contact impedances of the electrodes. The field \(\kappa \) represents the conductivity of the body and v represents the potential within the bodyFootnote 5. It should be noted that the solution of this PDE comprises both a potential \(v \in H^1(D)\) and a vector \(\{V_l\}_{l=1}^L\) of boundary voltage measurements.

The inverse problem we consider is the recovery of \(\kappa \) from a sequence of boundary voltage measurements. A number of (linearly independent) current stimulation patterns \(\{I_l\}_{l=1}^L\) may be performed to provide more information; we assume that we perform the maximum \(M = L-1\) measurements. Let \(Z = L^p(D)\) for some \(1\le p < \infty \) and \(Y = \mathbb {R}^{J}\) where \(J = LM\). We can concatenate the boundary voltage measurements arising from different stimulation patterns to yield a map \(S:Z\rightarrow Y\),

$$\begin{aligned} \kappa \mapsto (V^{(1)},V^{(2)},\ldots ,V^{(M)}), \end{aligned}$$

where \(V^{(m)} = \{V_l^{(m)}\}_{l=1}^L \in \mathbb {R}^L\), \(m=1,\ldots ,M\).

For the experiments we work on a circular domain \(D = \{x \in \mathbb {R}^2\;|\;|x| < 1\}\). 16 electrodes are spaced equally around the boundary providing 50 % coverage. All contact impedances are taken to be \(z_l = 0.01\). Adjacent electrodes are stimulated with a current of 0.1, so that the matrix of stimulation patterns \(I = \{I^{(j)}\}_{j=1}^{15} \in \mathbb {R}^{16\times 15}\) is given by

$$\begin{aligned} I = 0.1\times \left( \begin{array}{llll} +1 &{}\quad 0 &{}\quad \cdots &{}\quad 0\\ -1 &{}\quad +1 &{}\quad \cdots &{}\quad 0\\ 0 &{}\quad -1 &{}\quad \ddots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad +1\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -1 \end{array}\right) . \end{aligned}$$

We define our forward map \(\mathcal {G}:X\times \mathbb {R}^+\rightarrow \mathbb {R}^J\) by \(\mathcal {G} = S\circ F\). As in the groundwater flow example, assume that each \(\kappa _i\) in the definition of the level set map is strictly positive. We do not have a continuity result for the map S on \(L^p\) for any \(1\le p < \infty \). However, the almost-sure continuity of the map \(\mathcal {G}\) can be seen via a modification of the proof of Proposition 3.5 in Dunlop and Stuart (2015) to include the parameter \(\tau \); this modification is almost identical to the proof of Proposition 1 given in the appendix. The uniform boundedness of \(\mathcal {G}\) follows from a result in Dunlop and Stuart (2015) similarly. Hence as was the case with the identity map example, the conclusions of Proposition 1 follow, and we can deduce the conclusions of Theorem 2.

Fig. 14
figure 14

Simulations for the EIT model. a (Left) true conductivity field used to generate the data y. (Right) the entries \(y_i\) of the data vector y, plotted against i. b (Top) Representative samples of \(F(u,\tau )\) under the hierarchical posterior. (Middle) approximations of \(F(\mathbb {E}(u),\mathbb {E}(\tau ))\). (Bottom) approximations of \(\mathbb {E}(F(u,\tau ))\). From left-to-right, \(\tau \) is initialized at \(\tau = 10, 30, 50, 70, 90\). c As in b, using the non-hierarchical method. From left-to-right, \(\tau \) is fixed at \(\tau = 10, 30, 50, 70, 90\)

Fig. 15
figure 15

(EIT model) approximations of \(\text {Var}(F(u,\tau ))\) using the hierarchical (top) and fixed (bottom) priors, with \(\tau \) initialized or fixed at \(\tau = 10, 30, 50, 70, 90\), from left-to-right

Fig. 16
figure 16

(EIT model) representative samples and sample means of the level set function. The rescaling \(\tau ^4\) means that the above quantities have the same approximate amplitude. True inverse length scale is \(\tau = 15\). a (Top) representative samples of the rescaled level set function \(\tau ^4\cdot u\) and (bottom) approximations of \(\mathbb {E}(\tau ^4\cdot u)\) using the hierarchical method. From left-to-right, \(\tau \) is initialized at \(\tau = 10, 30, 50, 70, 90\). b As in a, using the non-hierarchical method. From left-to-right, \(\tau \) is fixed at \(\tau = 10, 30, 50, 70, 90\)

Fig. 17
figure 17

(EIT model) (diagonal) empirical densities of \(\tau \) and the first five KL modes of u. (off-diagonal) empirical joint densities

4.4.2 Simulations and results

We fix a true conductivity \(\kappa ^\dagger \), shown in Fig. 14. As with the groundwater flow experiments, this is constructed explicitly and does not have a true value of \(\tau \) associated with it. We generate data y as

$$\begin{aligned} y = S(\kappa ^\dagger ) + \eta ,\;\;\;\eta \sim N(0,{\varGamma }), \end{aligned}$$

where we take the noise covariance \({\varGamma }= 0.0002^2\cdot I\) to be white. The mean relative error on the generated data is approximately 12 %. The data are generated using a mesh of 43264 elements and simulations are performed using a mesh of 10816 elements, in order to avoid an inverse crime. Forward solves are performed using the EIDORS software Adler and Lionheart (2006). All level set field samples are defined on the square \([-1,1]^2\) and restricted to the domain D. This has the advantage of allowing for efficient sampling via the Fast Fourier Transform, though it has the drawback of introducing possibly non-trivial boundary effects on the domain; no such effects are observed in our problem, however. The discretization on the square is performed via the DFT on a grid of \(2^7\times 2^7\) points, and we retain all modes.

The level set map F is defined such that there are 2 phases, taking the constant values 1 and 10. We take the prior level set field mean to be zero, so that in this case F (and hence \({\varPhi }\)) becomes independent of \(\tau \). Thus a forward model evaluation is not required for the Gibbs update of \(\tau \), and each sample of \((u,\tau )\) using the hierarchical method costs virtually the same as one of u using the non-hierarchical method.

Similarly to the previous experiments, we initialize the hierarchical sampling from \(\tau = 10,30,50,70,90\) to check for robustness of the method. We use a sharper prior on \(\tau \) than was used previously. We again use a Gaussian random walk proposal distribution for \(\tau \). We fix the smoothness parameter \(\alpha = 5\) in the prior for u, and again use Neumann boundary conditions. We again wish to compare how the hierarchical method compares with the non-hierarchical method. We therefore also look at the 5 different posterior distributions that arise when using each of 5 fixed prior inverse length scales \(\tau = 10,30,50,70,90\), which gives another 5 MCMC chains. For both the methods we produce \(4\times 10^6\) samples for each chain, and discard the first \(2\times 10^6\) samples as burn-in when calculating quantities of interest.

The traces of the values of \(\tau \) along the hierarchical chains are shown in Fig. 13. With the exception of the chain initialized at \(\tau = 10\), the chains converge to the sample approximate value of \(\tau \). Unlike in previous experiments, the traces have a relatively flat period before the approximate linear convergence to the common length scale. Initializing \(\tau = 90\) requires an additional \(10^6\) samples to converge, over the other converging chains.

Figure 14 shows the push forwards of the sample means from different chains under the level set map, along with approximations of \(\mathbb {E}(F(u,\tau ))\) and typical samples of \(F(u,\tau )\) coming from the different posteriors. In both the hierarchical and non-hierarchical methods, the chains initialized/fixed at \(\tau = 10\) fail to recover the true conductivity, similarly to what was observed with the identity map experiments when initializing at \(\tau = 5\). The other chains for the hierarchical method produce very similar results to one another, while the effect of fixing the length scale to be too short is apparent in the figures for the non-hierarchical method.

In Fig. 15 we see approximations to Var(\(F(u,\tau )\)) under the different posteriors. In both cases, variance is highest around the boundaries of the two inclusions. The difference between the hierarchical and non-hierarchical methods is more apparent here, with higher variance between the two inclusions when the length scale is fixed to be too short.

Again, we look at the level set function u itself in Fig. 16. In these plots, as before, we rescale the level set function by \(\tau ^{\alpha -d/2} = \tau ^4\) so that they are all of approximately the same amplitude. As in the previous experiments, there is noticeable contrast between the means for the hierarchical and non-hierarchical methods, and yet more contrast between the typical samples.

Finally, in Fig. 17, we show the posterior densities on the inverse length scale and the first five KL modes, as well as correlations between them. As with the groundwater flow example, although there is no “true” inverse length scale, the data are sufficiently informative to define a small range of values for this parameter under the posterior.

5 Conclusions

The level set method is an attractive approach to inverse problems for the detection of interfaces. Furthermore, the Bayesian approach is particularly desirable when there is a need to quantify uncertainty. In this paper we have shown that Bayesian level set inversion is considerably enhanced by a hierarchical approach in which the length scale of the underlying level set function is inferred from the data. We have demonstrated this by means of three examples of interest arising in, respectively, the information, physical and medical sciences; however, many potential applications remain to be explored and this provides an interesting avenue for future work.

We also developed the theoretical underpinnings for our hierarchical method. Our work is based on a Metropolis-within-Gibbs approach which alternates between updating the level set function and the length scale. The Metropolis method we use for the level set field update does not use derivatives of the log-likelihood, and could be improved by doing so, using the infinite-dimensional variants on MALA and HMC (which use first derivative information, see the citations in Cotter et al. (2013)) or the manifold MALA and HMC methods, which use higher order derivatives Girolami and Calderhead (2011). Another interesting direction for future work is the design of methods with more informed proposals which exploit correlations in the level set function and its length scale. And finally it would be interesting to consider pseudo-marginal methods to sample the hierarchical parameter alone, as in Filippone and Girolami (2014).

Assuming independence under the prior, it would require little further work to treat the thresholding levels \(\{c_i\}\) and the values of the thresholded function \(\{\kappa _i\}\) as part of the inference as well; we omitted this here for the sake of clarity. Such a model may be more realistic, and numerical studies of such models may prove interesting. Another extension of interest may be to place a hyperprior upon the regularity parameter also, which may be useful for improving rates of convergence van der Vaart and van Zanten (2009). This is a more challenging task, again related to singularity of measures. The paper Agapiou et al. (2014) discusses ways in which this may be done; however, it is still an open question in terms of theory and optimal algorithms. Additionally, it may be of interest to overcome the restriction of the ordering of phases \(\{\kappa _i\}\) by means of a vector level set method Tai and Chan (2004).

Finally we mention that the use of a single length scale within an isotropic prior is a simple example of more sophisticated hierarchical approaches which attempt to learn non-stationary and non-isotropic Calvetti and Somersalo (2007), Calvetti and Somersalo (2008) features of the level set function from the data. This provides an interesting opportunity for future work and for ideas from machine learning to play a role in the solution of inverse problems for interfaces.