1 Introduction

Prediction of flow and transport in subsurface reservoirs is typically fraught with diverse types of uncertainties, including imperfect knowledge of the spatial distribution of system parameters, types of boundary conditions and their values, as well as forcing terms (e.g. Lin et al. 2010; Tartakovsky et al. 2012; Tartakovsky 2013 and references therein). All these uncertainties should be appropriately considered and their impact on the quality of model predictions needs to be quantified in a rigorous way. These requirements should also be compatible with the operational challenges associated with the analysis and management of complex settings such as those characterizing natural aquifer systems.

Bayesian inference is a convenient and flexible theoretical framework within which all these issues can be tackled. Bayesian approaches enable one to incorporate in a stochastic model inversion available data from diverse sources, relying on prior information. The latter is then updated through conditioning onto observations to yield posterior probability distributions of system parameters and responses. Recent examples involving applications of Bayesian characterizations of uncertain parameter fields associated with subsurface flow and transport settings can be found, among others, in Rubin et al. (2010), Murakami et al. (2010), Chen et al. (2012), and Over et al. (2013) and references therein.

The application of the Bayesian framework to (stochastic) inverse modeling of groundwater flow typically requires obtaining multiple forward solutions of the mathematical model governing the spatial/temporal evolution of the system physics. The Markov Chain Monte Carlo (MCMC) method is one of the most widely employed approaches in the context of porous media characterization. MCMC has been applied with several degrees of success in hydrogeology for stochastic model calibration and uncertainty quantification (e.g., Vrugt et al. 2003, 2008; Zanini and Kitanidis 2009; Keating et al. 2010; Schoups and Vrugt 2010; Huard et al. 2010; Zheng and Han 2016). Shi et al. (2012) employed MCMC for vadose zone characterization and compared the ensuing results against those obtained through a nonlinear regression method. These authors found that MCMC (a) produces results of higher fidelity and (b) is more advantageous from a computational standpoint than nonlinear regression for problems associated with a relatively small dimensionality of the parameter space.

Routine application of MCMC to stochastic inverse groundwater flow modeling under realistic conditions is hampered by practical challenges due to the usually high dimensionality of the parameter space. Parameterization of the spatially heterogeneous distribution of model attributes, such as system transmissivity, via the truncated Karhunen–Loève expansion (KLE) (Loeve 1977) can be considered as a viable strategy to alleviate this difficulty. In essence, the Karhuen–Loève representation of a random spatial field is based on the spectral expansion of the process covariance function. This approach has been broadly used (Li and Cirpka 2006; Efendiev et al. 2006; Marzouk and Najm 2009; Ray et al. 2012; Laloy et al. 2013; Mara et al. 2015) mainly because it enables one to reduce the dimensionality of the problem while preserving to a given extent the key characteristics of the considered stochastic model (Marzouk and Najm 2009). The KLE has been recently used by Das et al. (2010) in conjunction with the MCMC technique to characterize the saturated hydraulic conductivity of a mildly heterogeneous agricultural field. These authors rely on a truncated form of KLE by retaining solely a reduced number of terms (or modes) in the expansion.

The number of terms that enables the truncated KLE to be effective for a computationally affordable and accurate system representation depends on the functional format of the covariance function (e.g., exponential, Gaussian, spherical, or other) as well as on the degree of spatial persistence, or correlation, of the field. It can be seen that the norm of the eigenvalues of the covariance matrix tends to decay rapidly for heterogeneous fields characterized by large correlation scales (relative to a characteristic length scale of the flow domain). In these cases, it is seen that retaining less than 20 terms in the KLE typically allows capturing more than 90% of the energy of the target spatial random field (Das et al. 2010). Otherwise, the number of terms to be retained in the KLE to achieve an appropriate representation of a random parameter field tends to increase when the correlation scale of the covariance function decreases. This can become a limiting factor constraining the effectiveness of the technique when one is confronted with short-range (with respect to the domain size) correlated heterogeneous fields.

In this work, we focus on these types of strongly heterogeneous fields, for which Bayesian inference becomes highly challenging and computationally demanding due to the large number of terms required to be retained in the KLE. The main objective of this work is to develop an operational strategy which renders the MCMC method computationally affordable to be employed for the stochastic characterization of short-range random parameter fields. Our strategy is data-driven and is based on destructuring the stochastic inverse modeling procedure of fully saturated groundwater flow into the following two steps:

  1. 1.

    Starting from a highly-parameterized system, a set of sparse KLEs are formed by progressively reducing the dimensionality of the parameter space. For each KLE, the Maximum a Posteriori (MAP) estimate of the eigenmodes in the expansion is oobtained through inverse modeling of flow (against available observations of the system state, i.e., hydraulic heads or fluxes, and, optionally, of system parameters, i.e., hydraulic conductivity/transmissivity). Once this MAP estimate is obtained, a new sparse KLE is constructed by removing the least influential components of the expansion via an analysis of the spatial variance of the resulting estimated field.

  2. 2.

    A model selection criterion is employed to select the optimal sparse KLE, as driven by the available data. The posterior statistical distribution of the corresponding eigenmodes is then obtained, relying on the DREAM(ZS) MCMC sampler developed by Laloy and Vrugt (2012).

The work is organized as follows: Sect. 2 introduces the flow problem and Sect. 3 the Karhunen–Loève decomposition. In Sect. 4, we detail the way the Bayesian inference is performed for a stochastic field of the kind we consider in our computational example. Section 5 summarizes the main elements of the information criterion we employ for model selection. Section 6 illustrates our strategy to achieve dimensionality reduction of the parameter space. Section 7 is devoted to the presentation of an application of our technique to the stochastic inversion of flow through a strongly heterogeneous random porous medium. The key findings are then summarized in the conclusions.

2 The flow model

We consider two-dimensional steady-state fully saturated groundwater flow taking place within a spatially bounded domain, D, governed by

$$\left\{ \begin{array}{ll} \nabla \cdot \left( {T\left( \varvec{x} \right)\nabla h\left( \varvec{x} \right)} \right) = 0,&\quad \varvec{x} \in D \hfill \\ h\left( \varvec{x} \right) = h_{0} , &\quad \varvec{x} \in \partial D_{1} \hfill \\ \left( { - T\left( \varvec{x} \right)\nabla h\left( \varvec{x} \right)} \right).\varvec{\eta}_{{\partial D_{2} }} = g_{0} &\quad \varvec{x} \in \partial D_{2} \end{array} \right.$$
(1)

Here, \(\varvec{x} = \left( {{x},{y}} \right)\) is the vector of spatial coordinates, \({h}\left( \varvec{x} \right)\) [L] and \(T\left( {\varvec{x}} \right)\) [L2T−1] respectively are hydraulic head and transmissivity fields; Dirichlet and Neumann boundary conditions corresponding to given pressure head, \(h_{0}\), or normal flux, \({\text{g}}_{ 0}\), are respectively defined along the (disjoint) boundary segments \(\partial {D}_{1}\) and \(\partial {D}_{2}\), forming the domain boundary ∂D; \({\user2{\eta }}_{{\partial D_{2} }}\) is the outward unit vector normal to \(\partial {D}_{2}\).

Given the spatial distribution of \(T\left( {\varvec{x}} \right)\), the numerical solution of the forward problem (1) is performed through the mixed-hybrid finite element method (Younes et al. 2010) upon discretizing D with uniform square elements.

Observations of \({h}\left( \varvec{x} \right)\) and \(T\left( {\varvec{x}} \right)\) are assumed to be jointly available at a set of M points \({\varvec{x}}_{i} = \left( {x_{i} ,y_{i} } \right)\) (i = 1, 2,…, M) within D. We collect these data into the observation vector m. For the purpose of our demonstration we assume that the functional format of the covariance of \(Y\left( {\varvec{x}} \right) = \text{log}\left( {T\left( {\varvec{x}} \right)} \right)\) is deterministically known together with its parameters. We consider log-transmissivity Y as a Gaussian field that can be represented by its Karhunen–Loève expansion (Loeve 1977).

3 Karhunen–Loève expansion

Let \(Y\left( {\varvec{x},\omega } \right) = \text{log}\left( {T\left( {\varvec{x},\omega } \right)} \right)\) be a Gaussian random process, where \(\varvec{x} \in {D}\) and \(\omega \in \Omega\) (\(\Omega\) being a suitable probability space). One can characterize Y through its mean, \(\mu_{Y}\), and two-point covariance function, \({\text{C}}_{\text{Y}}\)(x, x′), between locations x and x′. Covariance \({\text{C}}_{\text{Y}}\) is bounded, symmetric, and positive definite (assuming that \(Y \in L^{2} \left( D \right),\,\forall {\varvec{x}} \in D\)). The Karhunen–Loève expansion (KLE) of the random field \(Y\left( {{\varvec{x}},\omega } \right)\) is defined as

$${\text{Y}}\left( {\varvec{x},\omega } \right) \equiv \mu_{Y} + \sum\limits_{i = 1}^{ + \infty } {\sqrt {\lambda_{i} } \xi_{i} \left( \omega \right)\varphi_{i} \left( \varvec{x} \right)}$$
(2)

Here, \(\lambda_{i}\) and \(\varphi_{i} \left( {\varvec{x}} \right)\) respectively are eigenvalues and eigenfunctions of \({\text{C}}_{\text{Y}}\)(x, x′), \(\left\{ {\xi_{i} } \right\}_{i = 1}^{\infty }\) being a set of statistically independent standard normal random variables. According to Mercer’s theorem (Mercer, 1909) \({\text{C}}_{\text{Y}}\)(x, x′) can be decomposed as

$${\text{C}}_{\text{Y}} \left( {\varvec{x},\varvec{x^{\prime}}} \right) = \sum\limits_{i = 1}^{\infty } {{\lambda}_{i} \varphi_{i} \left( \varvec{x} \right)\,} \varphi_{i} \left( {\varvec{x^{\prime}}} \right)$$
(3)

where \(\lambda_{i}\) and \(\varphi_{i} \left( {\varvec{x}} \right)\) are obtained by solving the following Fredholm equation

$$\int\limits_{D} {{\text{C}}_{Y} \left( {\varvec{x},\varvec{x^{\prime}}} \right)\varphi_{i} \left( {\varvec{x^{\prime}}} \right)\text{d}\varvec{x^{\prime}}} = \lambda_{i} \varphi_{i} \left( \varvec{x} \right).$$
(4)

The eigenfunctions \(\left\{ {\varphi_{i} \left( \varvec{x} \right)} \right\}_{i = 1}^{\infty }\) are orthonormal and form a complete basis in \(L^{2} \left( D \right)\), i.e.,

$$\int\limits_{D} {\varphi_{i} \left( \varvec{x} \right)\varphi_{j} \left( \varvec{x} \right)\text{d}\varvec{x}} = {\delta}_{{ij}}$$
(5)

\({\delta}_{{ij}}\) being the Kronecker delta.

The separability assumption is often used to characterize the covariance function model of Y in the context of stochastic analyses of flow and transport in randomly heterogeneous porous and/or fractured formations. This assumption has enabled obtaining analytical solutions of key moments of hydraulic head and fluxes and contaminant transport and facilitates basic studies of uncertainty propagation in such random porous and fractured media (see. e.g. Dagan 1989; Zhang 2002, and references therein). Adoption of this simplified format has also the practical advantage of being associated with relatively straightforward estimates of the model parameters through the type and quantity of data which is typically available (see e.g. Gneiting et al. 2007, Genton 2007). In the following, we assume that the covariance function of \({Y}\left( {\varvec{x},\omega } \right)\) has the exponential form

$${\text{C}}_{\text{Y}} \left( {\varvec{x},\varvec{x^{\prime}}} \right) = \sigma^{2} \exp \left( { - \frac{{\left| {x - x^{\prime}} \right|}}{\eta } - \frac{{\left| {y - y^{\prime}} \right|}}{\eta }} \right)$$
(6)

where \(\sigma_{{}}^{2}\) and \(\eta\) respectively are the variance and correlation length of Y. The eigenvalues \(\lambda_{\,i}\) and corresponding eigenfunctions appearing in (2)–(5) can be readily computed (Zhang and Lu 2004) by solving a system of two coupled algebraic equations. In the most general case, the eigenvalue problem (4) is solved numerically (e.g., Phoon et al. 2002). Note that other models could be employed for the representation of \({\text{C}}_{\text{Y}}\), including, e.g., the Modified Exponential and the Spartan covariance (e.g. Spanos et al. 2007; Tsantili and Hristopulos 2016; Su and Lucor 2006), which might require a smaller number of KL terms than the exponential covariance (Spanos et al. 2007).

As shown in Zhang and Lu (2004), values \(\lambda_{\,i}\) monotonically decrease at the rate of \(1/i^{2}\). One can then approximate \({Y}\left( {\varvec{x},\omega } \right)\) by considering a finite number of terms in (2), i.e.,

$$Y\left( {\varvec{x},\omega } \right) \approx \mu_{Y} + \sum\limits_{i = 1}^{K} {\sqrt {\lambda_{i} } \xi_{i} \left( \omega \right)\varphi_{i} \left( \varvec{x} \right)}$$
(7)

with \(\varvec{\xi}\sim \text{N}\left( {0,\mathbf{I}_{K} } \right)\), \(\mathbf{I}_{K}\) being the identity matrix of size K. We note that

$$\sum\limits_{i = 1}^{ + \infty } {\lambda_{i} = \bar{D}\,\sigma^{2} }$$
(8)

\(\bar{D}\) being a measure of the area of the domain. Hence, the number of terms to be retained in (7) can be selected in a way that the ratio

$$e\left( K \right) = \frac{{\sum\nolimits_{i = 1}^{K} {\lambda_{i} } }}{{\sum\nolimits_{i = 1}^{\infty } {\lambda_{i} } }}$$
(9)

is larger than a given threshold. In our computational examples we follow Das et al. (2010) and set \(e\left( K \right)\) > 0.90, which allows to capture more than 90% of the variance of Y.The number of terms to be retained in (7) depends on the correlation length of the covariance function of Y, small values of \(\eta\) usually corresponding to high values of K. As such, strongly heterogeneous stochastic fields, which are associated with high variance and/or small correlation lengths, pose a clear challenge for an effective representation grounded on the KLE.

The forward problem is tackled by solving (1) for several realizations of the Y spatial field. These are obtained by evaluating (7) through sampling of the random vector \(\left\{ {\xi_{i} } \right\}_{i = 1}^{K}\) from the standard multi-Gaussian distribution. An uncertainty analysis of the way the randomness of Y propagates to the output of the flow model can then be easily performed through numerical Monte Carlo simulations. In the context of a stochastic inverse problem, one is mainly interested in characterizing a collection of Y fields that are consistent with the observations grouped in vector \(\varvec{m}\). When the stochastic inverse problem is set in a Bayesian framework, the posterior (updated) probability density function (pdf) of the field \({Y}\left( {\varvec{x},\omega } \right)\) is typically inferred on the basis of available data and prior knowledge about the system.

4 Bayesian inference and Markov Chain Monte Carlo (MCMC) sampling

Characterizing the posterior pdf of \({Y}\left( {\varvec{x},\omega } \right)\) in the context of Bayesian inference is tantamount to assessing the joint posterior pdf of the entries of the random vector \(\varvec{\xi}= \left\{ {\xi_{i} } \right\}_{i = 1}^{K}\). The conditional posterior distribution of \({\varvec{\xi}}\) is defined as

$$\, p\left( {\varvec{\xi}\left| \varvec{m} \right.} \right) \propto p\left( {\varvec{m}\left|\varvec{\xi}\right.} \right)p\left(\varvec{\xi}\right)$$
(10)

Here, \(p\left( {{\varvec{m}}|\varvec{\xi}} \right)\) is the likelihood function and \(p\left( {\varvec{\xi}} \right)\) is the prior probability density function of \({\varvec{\xi}}\), which encapsulates any prior knowledge about the log-transmissivity field. As stated in Sect. 3, we consider \({Y}\left( {\varvec{x},\omega } \right)\) as a Gaussian process with the covariance function defined in (6). It then follows that \(p\left( {\varvec{\xi}} \right)\sim \text{N}\left( {0,{\mathbf{I}}_{K} } \right)\).

The conditional posterior distribution (10) can be characterized through diverse numerical methods. Markov Chain Monte Carlo (MCMC) samplers are particularly suited for this task. There are several MCMC algorithms proposed in the literature (e.g., Haario et al. 2001; Green and Mira 2001; ter Braak and Vrugt 2008; Vrugt et al. 2009a; Laloy and Vrugt 2012), all of which relying on the Metropolis-Hasting algorithm. In the latter, a new candidate value for parameter \(\varvec{\xi}^{i}\) is generated at the ith iteration from a proposal distribution \(q\left( {\varvec{\xi}^{i} \left| {\varvec{\xi}^{i - 1} } \right.} \right)\). Acceptance or rejection of a new candidate is based on the associated Hasting ratio, defined as

$$\alpha = \hbox{min} \left( {1,\frac{{{p}\left( {\varvec{\xi}^{i} \left| \varvec{m} \right.} \right){q}\left( {\varvec{\xi}^{i} \left| {\varvec{\xi}^{i - 1} } \right.} \right)}}{{{p}\left( {\varvec{\xi}^{i - 1} \left| \varvec{m} \right.} \right){q}\left( {\varvec{\xi}^{i - 1} \left| {\varvec{\xi}^{i} } \right.} \right)}}} \right) \,$$
(11)

Convergence of the chain to the target distribution, i.e.,\(\, p\left( {\varvec{\xi}|\varvec{m}} \right)\), is typically achieved after a burn-in period. Considerable research efforts on improving the efficiency of MCMC samplers have been focused on reducing the burn-in period (see, e.g. Haario et al. 2001; Green and Mira 2001; Vrugt et al. 2009a among others). The choice of the proposal distribution \(q\left( {.\left| . \right.} \right)\) and the updating strategy are key to obtain the speed up of the algorithm convergence. A common strategy which is also pursued to accelerate convergence of the MCMC sampler relies on characterizing the modes of the posterior pdf \(\, p\left( {\varvec{\xi}|\varvec{m}} \right)\) (Vrugt and Bouten 2002). Assuming a unimodal pdf, the mode corresponds to the Maximum A Posteriori (MAP) value, defined as

$$\varvec{\xi}^{{MAP}} = \arg \;\mathop {\hbox{max} }\limits_{\xi } \left( {p\left( {\varvec{\xi}|\varvec{m}} \right)} \right)$$
(12)

The MAP characterization enables the MCMC sampler to be initialized approximately around the most likely values associated with the posterior distribution of the model parameter set (Vrugt and Bouten 2002).

Here, we employ the DREAM(ZS) software to generate samples from the conditional posterior distribution of \(\varvec{\xi}\) (Laloy and Vrugt 2012). This adaptive algorithm runs multiple chains in parallel to explore the random parameter space. Vrugt et al. (2009b) compared the DREAM algorithm with the generalized likelihood uncertainty estimation (GLUE) method. As a key feature, DREAM(ZS) generates candidates by sampling from an archive of past states collected in a sample Z. Thus, only a few parallel chains are required for posterior sampling and a marked reduction of the burn-in period is achieved. The efficiency of the algorithm has been successfully tested on several highly dimensional, complex and nonlinear problems. These studies pointed out that the computational effort can be demanding in cases where the process model be associated with long simulation times. In these instances one can consider reducing computational costs either by resorting to a surrogate model of the process considered (Kennedy and O’Hagan 2001; Higdon et al. 2008; Cui et al. 2011; Laloy et al. 2013) or by developing a strategy to reduce the dimensionality of the stochastic inverse problem. Here we focus on the latter strategy and explore its effectiveness by way of a suite of computational examples.

5 Model selection criterion

The strong heregeoneity of the domain we consider leads to a KLE characterized by a high number of terms. Inferring the posterior joint pdf (10) through MCMC for these types of high-dimensional problems is practically unaffordable. It is then desirable to further reduce the dimensionality of the inverse problem before running the MCMC sampler. We propose doing so via the use of a model selection criterion. As an example, here we rely on the Kashyap information criterion, KIC (Kashyap 1982), other alternatives (e.g., AIC (Akaike 1974), AICc (Hurvich and Tsai 1989) or BIC (Schwarz 1978) being fully compatible with our procedure.

The expression for KIC is derived from the Bayesian Model Evidence (BME) defined as

$$\, p\left( {\varvec{m}|M_{k} } \right) = \int\limits_{{KL_{k} }} {p\left( {\varvec{m}|M_{k} ,\varvec{\xi}} \right)} p\left( {\varvec{\xi}|M_{k} } \right)\text{d}\varvec{\xi}$$
(13)

where \(\{ M_{k} ,\;k = 1, \ldots ,N_{k} \}\) is a set of competing alternative models and \(M_{k}\) depends on \(KL_{k}\) quantities collected in vector \(\varvec{\xi}\). BME (13) is a metric quantifying how likely model \(M_{k}\) is, given the data \({\varvec{m}}\). The competitive models we consider in our framework are all the possible KLEs.

The analytical evaluation of the integral in (13) is not straightforward, especially for high-dimensional parameter spaces. An approximate form of (13) can be obtained by employing the Laplace approximation. The latter assumes that the posterior distribution of the parameters in \(\varvec{\xi}\) is Gaussian and highly peaked around its local maximum a posteriori (MAP) estimate \(\varvec{\xi}^{MAP}\). Expressing \(\, p\left( {{\varvec{m}}\left| {M_{k} } \right.} \right)\) through a Taylor series expansion centered at the MAP, retaining terms up to second-order and taking the exponential of the resulting expansion yields (see Schöniger et al. 2014)

$$\, p\left( {{\varvec{m}}\left| {M_{k} } \right.} \right) = \, p\left( {\varvec{\xi}^{MAP} \left| {M_{k} } \right.} \right)p\left( {{\varvec{m}}\left| {M_{k} ,\varvec{\xi}^{MAP} } \right.} \right)\left( {2\pi } \right)^{K/2} \left| {\mathbf{H}} \right|^{ - 1/2}$$
(14)

where \({\mathbf{H}}\) is the Hessian matrix evaluated at the MAP, usually approximated by the Fisher information matrix \({\mathbf{F}}\). One then defines KIC as

$$KIC_{k} = - 2\ln \left( {p\left( {{\varvec{m}}|\varvec{\xi}^{MAP} ,M_{k} } \right)} \right) - 2\ln \left( {p\left( {\varvec{\xi}|M_{k} } \right)} \right) - K \ln ( {2\pi }) + \ln \left( {\left| {\mathbf{F}} \right|} \right)$$
(15)

Note that \(p\left( {\left.\varvec{\xi}\right|M_{k} } \right)\) corresponds to the prior assigned to the KL terms denoted in (10) by \(p\left(\varvec{\xi}\right)\) and \(p\left( {{\varvec{m}}|\varvec{\xi},M_{k} } \right)\) is the likelihood denoted \(p\left( {{\varvec{m}}|\varvec{\xi}} \right)\) in (10).

6 Strategy for dimensionality reduction of the inverse problem

As stated in Sect. 4, the approach we employ to reduce the dimensionality of the inverse problem relies on representing the Y field via a sparse truncated KL parameterization. The strongly heterogeneous random fields we consider are characterized by a small correlation scale, relative to a characteristic length scale of the flow domain. Values of Y in these fields tend to alternate rapidly in space in a rough rather than a smooth manner and treating them through KLE still requires considering a notably high-dimensional parameter space to capture the major details of the underlying field. This element constitutes a critical challenge and tends to hamper the effectiveness of characterizing the Y field through Bayesian inference approaches based on MCMC samplers. To alleviate this difficulty, we propose a strategy to further reduce the dimensionality of the parameterization of the problem. We construct models with different degrees of complexity through sparse KLE and evaluate their performance in the presence of available observations. We associate the degree of complexity of a model with the number of parameters which are retained in (7). Our model selection strategy is driven by available information content and is based on the use of model selection criteria of the kind illustrated in Sect. 5 which we employ to guide the identification of the eigenmodes (i.e., the number of parameters) of the sparse KLE which are most influential to the interpretation of the observed data.

We start by recasting the truncated KLE (7) as

$${Y}\left( {\varvec{x},\omega } \right) \approx \mu_{Y} + \sum\limits_{i = 1}^{K} {\theta_{i} \left( \omega \right)\varphi_{i} \left( \varvec{x} \right)}$$
(16)

where, \(\theta_{i} = \sqrt {\lambda_{i} } \xi_{i}\) and the parameter prior is now defined as \(\theta_{i} \sim \text{N}\left( {0,\lambda_{i} } \right)\). Since the set of eigenfunctions \(\left\{ {\varphi_{i} \left( \varvec{x} \right)} \right\}_{i = 1}^{K}\) are orthogonal within the spatial domain \({{D}}\), (16) is a variance decomposition of \({Y}\left( {\varvec{x},\omega } \right)\), i.e.,

$$E_{D} \left[ {\left( {Y\left( {\varvec{x},\omega } \right) - \mu_{Y} } \right)^{2} } \right] = \frac{1}{\bar{D}}\int\limits_{D} {\left( {Y\left( {\varvec{x},\omega } \right) - \mu_{Y} } \right)^{2} \text{d} \varvec{x}} = \sum\limits_{i = 1}^{K} {\theta_{i}^{2} \left( \omega \right)}$$
(17)

Note that the spatial variance depends on ω, i.e., on the random realization (or draw) considered. Suppose that the MAP estimate \(\varvec{\theta}_{{}}^{MAP}\) is considered. Then, (17) indicates that \(\left( {\theta_{i}^{MAP} } \right)^{2}\) is a measure of the contribution of the ith eigenmode to the spatial variance of the stochastic field. The key idea underlying the approach is that eigenmodes with negligible contribution to (17) can be discarded from the expansion (16) so that dimensionality reduction of the inverse problem can be achieved. We do so according to the procedure detailed in the following where we assume, for the sake of simplicity, that the posterior pdf (10) is unimodal.

  1. 1.

    Start by retaining the first \({K}\) eigenmodes of the covariance function that capture most of the energy of the stochastic process. As an example, in our demonstration we select

    $$\sum\limits_{i = 1}^{K} {\lambda_{i} /\bar{D}\sigma^{2} } \ge 0.90$$
    (18)
  2. 2.

    Find the maximum a posteriori estimate, \(\varvec{\theta}_{{}}^{MAP} = \mathop {\arg \hbox{max} }\limits_{\varvec{\theta} } \left( {{p}\left( {\varvec{\theta} \left| \varvec{m} \right.} \right)} \right)\); here, we do so by relying on the Levenberg–Marquardt (LM; Levenberg 1944; Marquardt 1963) algorithm.

  3. 3.

    Compute the value of a given model selection criterion. As a reference metric, we consider the KIC (Kashyap 1982) criterion (15) reformulated here as,

    $${KIC}_{\text{K}} = - 2\ln \left( {{p}\left( {\varvec{m}|\varvec{\theta}^{MAP} ,{K}} \right)} \right) - 2\ln \left( {{p}\left( {\varvec{\theta}|K} \right)} \right) - {K}\ln \left( { 2\pi } \right) + \ln \left( {\left| {\mathbf{F}} \right|} \right)$$
    (19)

    Here, K indicates the number of terms retained in the KL expansion, \({p}\left( {\varvec{m}\left| {\varvec{\theta}^{MAP} ,{K}} \right.} \right)\) is the likelihood function evaluated at the MAP estimate (Schöniger et al. 2014); \({p}\left( {\left.\varvec{\theta}\right|{{K}}} \right)\) is the prior pdf of the current K KL-terms (recall that \({p}\left( {\theta_{i} |K} \right)\sim {\text{N}}\left( {0,\lambda_{i} } \right)\)); \(\left| {\mathbf{F}} \right|\) is the determinant of the so-called Fisher information matrix evaluated at the MAP.

  4. 4.

    Compute the contribution of the ith eigenmode to the spatial variance of the stochastic field, as quantified by the partial variance \(\left( {\theta_{i}^{MAP} } \right)^{2}\) for i = 1,…,K.

  5. 5.

    Sort the eigenmodes \(\left( {\lambda_{\text{i}} ,\varphi_{\text{i}} \left( \varvec{x} \right)} \right)\) according to their partial variance [from largest to smallest \(\left( {\theta_{i}^{MAP} } \right)^{2}\); see (17)].

  6. 6.

    Keep the \(K^{new}\) most significant eigenmodes, such that

    $$\sum\limits_{i = 1}^{{K^{new} }} {(\theta_{i}^{MAP} )^{2} \left/\sum\limits_{i = 1}^{K} {(\theta_{i}^{MAP} )} \right.} \ge 0.90$$
    (20)
  7. 7.

    If \(K^{new} = 1\), then go to step 8 of the procedure; otherwise, set \(K = K^{new}\), construct a new sparse KLE and go to step 2.

  8. 8.

    Finally, set \({K}^{\text{opt}} = \mathop {\arg \hbox{min} }\limits_{K} \left( {{KIC}_{\text{K}} } \right)\) and use DREAM(ZS) to sample the sparse KLE coefficients according to the target pdf \({p}\left( {\varvec{\theta} \left| \varvec{m} \right.} \right)\).

Hence, step 8 yields the optimal sparse KLE, analyzed on the basis of the chosen information criterion (19). The Bayesian inference of the values of the reduced subset of parameters \(\left\{ {\theta_{i}^{{}} } \right\}_{i = 1}^{{K^{opt} }}\) is then performed with the MCMC DREAM(ZS) sampler.

Note that while we assume here that the target pdf (10) is unimodal, the procedure can be extended to the case of multimodal distributions by searching in step 2 for all optimum values obtained using multiple starting points in the LM algorithm.

7 Results and discussion

7.1 Setting of the inverse problem

We analyze and exemplify the performance of our approach upon relying on a set of computational studies performed on synthetic systems. We consider a two-dimensional square domain of side L = 10 m discretized with a mesh formed by 10,000 uniform square elements. The steady-state flow problem described by (1) is solved under permeameter-like boundary conditions corresponding to uniform (in the average) groundwater flow driven by a given head drop. As a test bed for our approach, and following the discussion of Sect. 3, we consider the exponential covariance function (6) with a given correlation length η/L = 0.1 and a variance \(\sigma_{{}}^{2} = 1\). An unconditional realization of the heterogeneous Y field which we consider as reference is generated using the KLE with 400 terms. Figure 1 depicts the cumulative sum of the normalized eigenvalues (9) for the setting considered. These results suggest that a number of terms K ≈ 150 is required for the KLE to capture about 90% of the system variance.

Fig. 1
figure 1

Cumulative sum of the normalized eigenvalues [see (9)] for the exponential covariance with η/L = 0.1 and variance \(\sigma_{{}}^{2} = 1\)

The steady-state forward flow problem is then solved for the generated reference Y field. Values of Y and hydraulic head are jointly sampled at 25 diverse locations randomly selected in the system and constitute the entries of the vector m of observation data. We assume that both head and Y measurements are noisy. Measurement errors are considered to be uncorrelated in space and are modeled as zero-mean Gaussian random variables, characterized by known standard deviations, denoted as \(\sigma_{h}\) and \(\sigma_{Y}\), respectively for head and Y data. Figure 2 depicts the reference Y field and the 25 locations at which observations of both Y and hydraulic head are collected in our example.

Fig. 2
figure 2

Reference spatial field of the log-transmissivity field, \(Y\left( \varvec{x} \right)\). Crosses indicate locations where head and Y values are jointly sampled

Following Bayes’ theorem, the posterior pdf of the KLE modes is given by

$$p\left( {\varvec{\theta}\left| {\user2{m},K,\sigma_{h}^{{}} ,\sigma_{Y}^{{}} ,\mathbf{C}} \right.} \right) \propto \exp \left( { - \frac{{SS_{1} \left(\varvec{\theta}\right)}}{{2\sigma_{h}^{2} }} - \frac{{SS_{2} \left(\varvec{\theta}\right)}}{{2\sigma_{Y}^{2} }}} \right)\times\exp \left( {-\frac{1}{2}\varvec{\theta}^{T} \mathbf{C}^{ - 1}\varvec{\theta}} \right)$$
(21)

where T is transpose and C is the covariance matrix defined by

$$\mathbf{C}\,\, = \left[ {\begin{array}{*{20}c} {\lambda_{1} } & 0 & \cdots \\ 0 & \ddots & \vdots \\ 0 & \cdots & {\lambda_{K} } \\ \end{array} } \right]$$
(22)

Here, \(SS_{1} \left(\varvec{\theta}\right)\) and \(SS_{2} \left(\varvec{\theta}\right)\) respectively are the sum of squared differences between observed and modeled (relying on K modes of the KLE) head and Y values. Measurement error standard deviation of pressure heads is set to \(\sigma_{h} = 0.05\) m, which corresponds to 5% of the largest head variation \(\left( {h_{\hbox{max} } - h_{\hbox{min} } } \right)\) in the domain. Two scenarios corresponding to different values of standard deviation of measurement errors of Y are investigated, i.e., \(\sigma_{Y} = 0.1\) and 0.5, respectively corresponding to 2 and 10% of the largest Y variation \(\left( {Y_{\hbox{max} } - Y_{\hbox{min} } } \right)\) across the domain.

Consistent with the assumptions in the approach underlying (18), the information matrix F embedded in KIC (19) is rendered by (Schöniger et al. 2014)

$${\mathbf{F}} = {\mathbf{J}}^{T} {\varvec{\Sigma}}^{ - 1} \,\,{\mathbf{J}} + {\mathbf{C}}^{ - 1}$$
(23)

where J is the Jacobian matrix evaluated at MAP and \({\varvec{\Sigma}}\) the covariance matrix defined as

$${\varvec{\Sigma}}\,\, = \left[ {\begin{array}{*{20}c} {\sigma_{Y}^{2} {\mathbf{I}}_{{N_{obs} /2}} } & 0 \\ 0 & {\sigma_{h}^{2} {\mathbf{I}}_{{N_{obs} /2}} } \\ \end{array} } \right] ;$$
(24)

\({\text{N}}_{\text{obs}}\) being the number of data collected in the vector m and \({\mathbf{I}}_{{N_{obs} /2}}\) the identity matrix of size N obs /2.

We remark that Bayesian inversion with MCMC using the KLE of the Y field associated with K = 150, which allows capturing approximately 90% of the variance associated with the postulated exponential covariance function (6), was unaffordable due to the large number of parameters. The following section is devoted to the illustration of our application of the dimensionality reduction strategy described in Sect. 5.

7.2 KLE with dimensionality reduction

We apply the model reduction strategy described in Sect. 6 starting from the KLE associated with K = 150. The components of the MAP vector \(\varvec{\theta}_{{}}^{MAP}\) are estimated through the LM algorithm and the corresponding value of KIC (19) is computed following steps 1–4 of the algorithm described in Sect. 6. The algorithm is continued until only one term remains in the sparse KLE. This screening phase required about 370 model calls and is computationally cheap as compared to the cost required by MCMC samplers (around 50,000 model calls).

Figure 3 depicts the dependence of KIC on the number of modes (1 ≤ K ≤ 150) retained in the sparse KLE and resulting from the application of the reduction procedure described in Sect. 6. This figure indicates that KIC identifies a minimum corresponding to the use of solely 19, or 12 components of the sparse KLE, respectively for \(\sigma_{Y} = 0.1\) and 0.5. In other words, the information content embedded in the available noisy measurements allows identifying a sparse KLE representation of the Y field based on a reduced number of components, i.e., K = 19, or 12 in the cases analyzed. This result is consistent with the general idea that a reduced number of parameters is required to interpret data associated with large measurement errors. We note that we obtain results of similar quality by relying also on diverse quantities, such as AIC (Akaike 1974) or BIC (Schwarz 1978) criteria (not shown). When sorted in order of importance, the modes retained at the optimum correspond to the components identified by the sets of indices {i = 2, 17, 21, 49, 7, 38, 69, 8, 28, 79, 41, 33, 36, 20, 40, 80, 78, 13, 10} or {i = 2, 8, 36, 49, 17, 30, 21, 79, 38, 122, 129, 6}, respectively for \(\sigma_{Y} = 0.1\) and 0.5. We recall here that modes are selected and ranked according to their relevance [see (17) and step 5 in the reduction algorithm].

Fig. 3
figure 3

Selection of the optimal number of modes, K opt, based on the KIC model selection criterion (16) for the values of standard deviation of data measurement errors: (left) \(\sigma_{h} = 0.05\), \(\sigma_{Y}\) = 0.1 and (right) \(\sigma_{h} = 0.05\), \(\sigma_{Y}\) = 0.5

Finally, the resulting Y field parameterizations are employed to appraise the posterior pdf (21) through DREAM(ZS). Figure 4 depicts the inferred posterior marginal pdfs of the first three KL modes identified by the set of indices listed above and resulting from stochastic model calibration via MCMC for the two scenarios examined. These results reveal that the mode values are appropriately estimated. Their associated posterior pdfs are unimodal, with an approximately symmetric shape, and encompass a narrow range of values for both values of \(\sigma_{Y}\) considered. Results of similar quality are obtained for the remaining modes retained in these sets (not shown).

Fig. 4
figure 4

Inferred posterior probability distribution of selected KL eigenmodes after statistical calibration with MCMC for the values of standard deviation of data measurement errors: (left column) \(\sigma_{h} = 0.05\), \(\sigma_{Y}\) = 0.1 and (right column) \(\sigma_{h} = 0.05\), \(\sigma_{Y}\) = 0.5

Figure 5 depicts the results of the MCMC-based inversion evaluated at the measurement locations for h and Y and for both values of \(\sigma_{Y}\) tested. The 95% uncertainty bounds (corresponding to the 97.5 and 2.5 percentiles of the distributions) representing parametric uncertainty (narrow bounds in the figure) are depicted in Fig. 5 together with the total predictive uncertainty (wide bounds in the figure), the latter taking into account parametric uncertainty as well as measurement errors. The results of Fig. 5 suggest that virtually all observations are comprised within the 95% total uncertainty range for both values of \(\sigma_{Y}\). As expected, the total uncertainty characterizing Y estimates tends to increase with \(\sigma_{Y}\). The parametric uncertainty is slightly larger for \(\sigma_{Y} = 0.1\) than for \(\sigma_{Y} = 0.5\), respectively involving 19 and 12 modes at the optimum.

Fig. 5
figure 5

MCMC predictive uncertainty of the statistically calibrated reduced models. First row data are corrupted through Gaussian errors with standard deviation \(\sigma_{h} = 0.05\) (for heads) and \(\sigma_{Y}\) = 0.1 (for log-transmissivity). Second row data are corrupted with Gaussian errors with \(\sigma_{h} = 0.05\) (for heads) and \(\sigma_{Y}\) = 0.5 (for log-transmissivity)

Figure 6a, b depict the MAP estimate of the spatial field Y, respectively for \(\sigma_{Y} = 0.1\), and 0.5. Figure 6c, d depict the spatial distribution of the width of the 95% total uncertainty ranges of h, respectively for \(\sigma_{Y} = 0.1\), and 0.5. The corresponding graphical depiction for the width of the 95% uncertainty ranges of Y is shown in Fig. 6e, f. Direct comparison of Figs. 6a, b and 2 suggests that the identified (optimum) sparse KLEs yield a good MAP approximation of the reference log-transmissivity field, with a good quality representation of the spatial pattern of poorly and highly conducive regions, for both cases. It is nevertheless noted that, even as the MAP estimate can be deemed satisfactory, the predictive total uncertainty (Fig. 6c–f) associated with the stochastic field tends still to be large at locations far from measurements. This feature is especially evident for \(\sigma_{Y} = 0.5.\)

Fig. 6
figure 6

Results of the sparse KLE inversion with DREAM(ZS) MCMC. Data are characterized by (left column) \(\sigma_{Y}\) = 0.1 or (right column) \(\sigma_{Y}\) = 0.5. First row (ab) MAP estimate of the Y field. The last two rows include the width of the 95% total predictive uncertainty range for (c, d) pressure head and (e, f) log-transmissivity

7.3 Predictive performance

Figure 5 suggests that the calibrated models provide a satisfactory representation of the observations in a probabilistic sense. We now analyze their predictive performance at diverse locations in the domain. The reference values at unsampled locations can be compared against the corresponding MCMC predictive distributions of h(x) and Y(x). The estimated Cumulative Distribution Functions (CDFs) obtained for h and Y are respectively depicted in Figs. 7 and 8 together with the corresponding reference value for \(\sigma_{Y} = 0.1,0.5.\) Only a set of selected locations in the domain are displayed, as representative of the range of results obtained in our simulations. It can be noted that at some locations the reference value is comprised within the range of values associated with non-negligible probability for the two CDFs depicted. Otherwise, there are locations at which this behavior can be observed for only one of the two posterior CDFs, which is most frequently linked to the largest variance of the measurement errors. Nonetheless, there are some locations (far from measurements) where the reference values are not captured by either of the CDFs obtained from our inversion. Hence, the parameterization strategy based on the identification of a reduced dimensionality KLE may lead to collections of solutions which do not encompass the reference solution at some unsampled locations (far from measurements). To improve the quality of the estimation, one can, for instance, increase the number of measurements and/or the threshold for the selection of eigenmodes in the MAP to yield an augmented number of KL eigenmodes, thus contributing to improve the quality of the inverse solutions (as compared to the reference solution).

Fig. 7
figure 7

Comparison between cumulative distribution functions of pressure heads at selected unsampled locations [red 19 modes reduced sparse KLE (\(\sigma_{Y}\) = 0.1); green 12 modes reduced sparse KLE (\(\sigma_{Y}\) = 0.5)]. Blue dashed lines indicate reference values. Coordinate pairs in parenthesis correspond to the locations selected in the domain

Fig. 8
figure 8

Comparison between cumulative distribution functions of log-transmissivity at selected unsampled locations [red 19 modes reduced sparse KLE (\(\sigma_{Y}\) = 0.1); green 12 modes reduced sparse KLE (\(\sigma_{Y}\) = 0.5)]. Blue dashed lines indicate reference values. Coordinate pairs in parenthesis correspond to the locations selected in the domain

8 Conclusions

We develop an operational strategy to obtain computationally affordable and Bayesian estimates of satisfactory quality of heterogeneous transmissivity fields in the presence of sampled data available at a set of locations in an aquifer. We do so by relying on a scheme based on modeling the (natural) logarithm of transmissivity as a stochastic Gaussian process which is parameterized through a truncated KLE. We consider strongly heterogeneous transmissivity fields, such as those characterized by short-range (with respect to the domain size) correlation, for which Bayesian inference becomes highly challenging and computationally demanding due to the large number of terms which are required to be retained in the KLE.

Our strategy starts from a highly-parameterized field and yields a set of sparse KLEs with reduced dimensionality, the MAP estimate of the eigenmodes in each sparse KLE being obtained through inverse modeling of flow against noisy data. Selection of the optimal number of modes to be retained in the expansion is driven by a model selection criterium, which is informed by available observations. The posterior statistical distribution of the corresponding eigenmodes is then obtained upon relying on the DREAM(ZS) MCMC sampler developed by Laloy and Vrugt (2012).

The approach is illustrated by relying on a suite of computational examples where noisy transmissivity and head values are sampled from a given transmissivity field. The new methodology yields a satisfactory inversion of the stochastic field with a good representation of the observations in a probabilistic sense. At some unsampled locations (far from measurements), the collection of estimated solutions may not encompass the reference values. The quality of the estimation could be improved for instance by increasing the number of measurements and/or the threshold for the selection of KL eigenmodes in the MAP.