Identification of non-Gaussian parameters in heterogeneous aquifers by a modified probability conditioning method through hydraulic-head assimilation

Lan, Tian; Shi, Xiaoqing; Chen, Yan; Li, Liangping; Wu, Jichun; Duan, Limin; Liu, Tingxi

doi:10.1007/s10040-020-02243-6

Identification of non-Gaussian parameters in heterogeneous aquifers by a modified probability conditioning method through hydraulic-head assimilation

Identification des paramètres non gaussiens des aquifères hétérogènes par une méthode modifiée de probabilités conditionnelles par assimilation des charges hydrauliques

Identificación de parámetros no gaussianos en acuíferos heterogéneos mediante un método modificado de probabilidad condicionada a través de la aproximación de la carga hidráulica

基于改进概率条件法同化水头数据推估非均质含水层的非高斯参数

Identificação de parâmetros não gaussianos em aquíferos heterogêneos por um método de condicionamento de probabilidade modificado através da assimilação de pressão hidráulica

Paper
Published: 13 October 2020

Volume 29, pages 819–839, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Hydrogeology Journal Aims and scope Submit manuscript

Identification of non-Gaussian parameters in heterogeneous aquifers by a modified probability conditioning method through hydraulic-head assimilation

Download PDF

Tian Lan¹,
Xiaoqing Shi ORCID: orcid.org/0000-0002-5074-8856¹,
Yan Chen²,
Liangping Li³,
Jichun Wu¹,
Limin Duan⁴ &
…
Tingxi Liu⁴

440 Accesses
2 Citations
Explore all metrics

Abstract

Parameter estimation with uncertainty quantification is essential in groundwater modeling to ensure model quality; however, parameter estimation, especially for non-Gaussian distributed parameters in highly heterogeneous aquifers, is still a great challenge. The ensemble smoother with multiple data assimilation (ES-MDA) is one of the most popular and effective ensemble-based data assimilation algorithms. However, it only works for multi-Gaussian fields, since two-point statistics are used to estimate the co-relation between parameters and state variables. The probability conditioning method (PCM) has the capability to integrate nonlinear flow data into facies simulation, but it has an assumption of homogeneity within each facies. Full characterization of facies and estimates of hydraulic conductivity within each facies are equally important. This work firstly modifies the original PCM, introducing a new probability assignment method, to consider within-facies heterogeneities, and then it is further combined with the ES-MDA to estimate non-Gaussian distributed hydraulic parameters in a groundwater model. The proposed method is evaluated using a two-facies case and a three-facies case in groundwater modeling. Both cases demonstrate that the modified PCM is effective for facies delineation, especially to identify high heterogeneities in each facies, as well as non-Gaussian characteristics with good connectivity within certain facies. The results also show that the performances of data reproduction and model prediction are of high accuracy and low uncertainty, which is attributed to the accurate characterization of the non-Gaussian parameters in the heterogeneous aquifers used.

Résumé

L’évaluation de paramètres avec quantification d’incertitude est essentielle en modélisation des eaux souterraines pour s’assurer de la qualité du modèle. Cependant, l’estimation des paramètres, en particulier pour les paramètres distribués non gaussiens dans les aquifères très hétérogènes, reste un défi majeur. L’ensemble plus fluide avec assimilation de données multiples (ES-MDA est l’un des algorithmes d’assimilation de données basés sur des ensembles les plus populaires et les plus efficaces. Cependant, cela ne fonctionne que pour les champs multi-gaussiens, car les statistiques à deux points sont utilisées pour estimer la co-relation entre les paramètres et les variables d’état. La méthode des probabilités conditionnelles (MPC) a la capacité d’intégrer des données d’écoulement non linéaires dans la simulation de faciès, mais avec une hypothèse d’homogénéité à l’intérieur de chaque faciès. La caractérisation complète des faciès et les estimations de la conductivité hydraulique à l’intérieur de chaque faciès sont également importantes. Ce travail modifie d’abord la MPC originale, en introduisant une nouvelle méthode d’attribution de probabilité, pour prendre en compte les hétérogénéités intra-faciès, puis il est ensuite combiné avec l’ES-MDA pour estimer les paramètres hydrauliques distribués non gaussiens dans un modèle d’eau souterraine. La méthode proposée est évaluée en utilisant un cas à deux faciès et un cas à trois faciès dans la modélisation des eaux souterraines. Les deux cas démontrent que la MPC modifiée est efficace pour la délimitation des faciès, en particulier pour identifier des hétérogénéités élevées dans chaque faciès, ainsi que des caractéristiques non gaussiennes avec une bonne connectivité dans certains faciès. Les résultats montrent également que les performances de reproduction des données et de prédiction des modèles sont d’une grande précision et d’une faible incertitude, ce qui est attribué à la caractérisation précise des paramètres non gaussiens dans les aquifères hétérogènes utilisés.

Resumen

La estimación de parámetros con cuantificación de la incertidumbre es esencial en la modelización de aguas subterráneas para garantizar la calidad del modelo. Sin embargo, la estimación de parámetros, especialmente para los parámetros distribuidos no gaussianos en acuíferos muy heterogéneos, sigue siendo un gran desafío. El ensamble con una asimilación de datos múltiples (ES-MDA) es uno de los algoritmos basados en conjuntos más comunes y eficaces. Sin embargo, sólo funciona para campos multi-Gaussianos, ya que se utilizan estadísticas de dos puntos para estimar la correlación entre los parámetros y las variables de estado. El método de evaluación de la probabilidad (PCM) tiene la capacidad de integrar datos de flujo no lineal en la simulación de facies, pero tiene un supuesto de homogeneidad dentro de cada facie. La caracterización completa de las facies y las estimaciones de la conductividad hidráulica dentro de cada una de ellas son igualmente importantes. Este trabajo modifica en primer lugar el PCM original, introduciendo un nuevo método de asignación de probabilidades, para considerar las heterogeneidades dentro de las facies, y luego se combina con el ES-MDA para estimar los parámetros hidráulicos distribuidos no gaussianos en un modelo de aguas subterráneas. El método propuesto se evalúa utilizando un caso de dos facies y un caso de tres facies en la modelización de aguas subterráneas. Ambos casos demuestran que el PCM modificado es eficaz para la definición de facies, especialmente para identificar altas heterogeneidades en cada una de ellas, así como características no gaussianas con buena conectividad dentro de ciertas facies. Los resultados también muestran que los resultados de la reproducción de datos y la predicción del modelo son de gran exactitud y baja incertidumbre, lo que se atribuye a la caracterización precisa de los parámetros no gaussianos en los acuíferos heterogéneos utilizados.

摘要

不确定性量化参数估计对于地下水模拟至关重要,可以保障模型的质量。但是,参数估计,尤其是高度非均质含水层中非高斯分布参数的估计,仍然是一个巨大的挑战。多源数据同化的集合平滑器(ES-MDA)是当前最流行和最有效的集合数据同化算法之一。但是,该算法仅适用于多元高斯场,它采用两点统计估计参数和状态变量之间的相互关系。概率条件法(PCM)可以将非线性的地下水流数据集成到岩相模拟中,但它假设每个相内均质。完整刻画岩相分布和估计每个相中的水力传导系数同样重要。本研究首先修正了原始的PCM,引入了一种新的概率分配方法,以考虑岩相内的非均质性,然后将其与ES-MDA进一步结合以估计地下水流模型中的非高斯分布水力参数。在地下水模拟中,采用两相和三相算例对所提出的新方法进行了评估。两个算例结果均表明,经修正的PCM可有效地描述岩相分布,特别是识别每个相中的高度非均质性,以及在某些岩相中具有良好连通性的非高斯特征。研究结果还表明,本方法重现观测数据和模型预测的性能具有较高的准确性和较低的不确定性,这归因于准确刻画了非均质含水层的非高斯参数。

Resumo

A estimativa de parâmetros com quantificação de incerteza é essencial na modelagem de águas subterrâneas para garantir a qualidade do modelo. No entanto, a estimativa de parâmetros, especialmente para parâmetros com distribuição não-gaussiana em aquíferos altamente heterogêneos, ainda é um grande desafio. O método ensemble smoother with multiple data assimilation (ES-MDA) é um dos mais populares e eficazes algoritmos de assimilação de dados baseados em conjunto. No entanto, ele só funciona para campos multi-Gaussianos, uma vez que estatísticas de dois pontos são usadas para estimar a correlação entre parâmetros e variáveis de estado. O método de condicionamento de probabilidade (MCP) tem a capacidade de integrar dados de fluxo não linear na simulação de fácies, mas pressupõe homogeneidade dentro de cada fácies. A caracterização completa da fácies e as estimativas da condutividade hidráulica dentro de cada fácies são igualmente importantes. Primeiramente, este trabalho modifica o MCP original, introduzindo um novo método de atribuição de probabilidade, para considerar heterogeneidades dentro da fácies, e então é ainda combinado com o ES-MDA para estimar parâmetros hidráulicos de distribuição não-gaussiana em um modelo de água subterrânea. O método proposto é avaliado usando um caso de duas fácies e um de três fácies na modelagem de águas subterrâneas. Ambos os casos demonstram que o MCP modificado é eficaz para delineamento de fácies, especialmente para identificar altas heterogeneidades em cada fácies, bem como características não gaussianas com boa conectividade dentro de certas fácies. Os resultados também mostram que os desempenhos de reprodução dos dados e do modelo de previsão são de alta exatidão e baixa incerteza, o que é atribuído à caracterização precisa dos parâmetros não gaussianos nos aquíferos heterogêneos utilizados.

Experimental sandbox tracer tests to characterize a two-facies aquifer via an ensemble smoother

Article Open access 18 July 2023

Evaluating parameter inversion efficiency in Heterogeneous Groundwater models using Karhunen-Loève expansion: a comparative study of genetic algorithm, ensemble smoother, and MCMC

Article 10 June 2024

Identification of hydraulic conductivity and groundwater contamination sources with an Unscented Kalman Smoother

Article 27 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Inverse problems are very important in groundwater modeling since the quality of the groundwater model largely depends on the quality of the model parameters (Gómez-Hernández et al. 2003; Karahan and Ayvaz 2008; Franssen et al. 2009; Zhou et al. 2014). Many studies have focused on parameter estimation in the last few decades (e.g. Carrera et al. 2005; Dagan 1985; Doherty 2004; Gómez-Hernández et al. 2003; Franssen et al. 2009; Neuman 1973; Oliver et al. 1997; Zhou et al. 2014). Due to the intrinsic heterogeneities of natural porous media and the scarcity of observation data, accurate characterization of the spatial distribution of hydraulic properties and corresponding uncertainty is always a key issue in groundwater management and protection (De et al. 1999; Carrera et al. 2005; Zhou et al. 2014).

Inverse methods are often used by conditioning on observation data (e.g. flow data, concentration data, and hydrogeophysical data) to characterize the spatial variation of parameters (e.g. hydraulic conductivity). Data assimilation methods have been popular in recent decades as they can assimilate different sources of information to estimate parameters and predict states (Oliver and Chen 2008, 2011; Chen and Oliver 2011; Chen et al. 2009; Chen and Zhang 2006; Nan and Wu 2011; Li et al. 2012a; Zhou et al. 2014; Xue and Zhang 2014; Man et al. 2016; Lan et al. 2018; Evensen 2018). Well-known data assimilation methods include the Kalman filter (KF, Kalman 1960), ensemble Kalman filter (EnKF, Evensen 2009), ensemble smoother (Van Leeuwen and Evensen 1996), and iterative ensemble smoothers (e.g. ensemble smoother with multiple data assimilation, ES-MDA, Emerick and Reynolds 2013a). They have gained popularity due to their simplicity and flexibility in implementation. For subsurface flow problems, the ES-MDA method, proposed by Emerick and Reynolds (2013a), can typically obtain better data reproduction and better estimation of parameters, compared to the EnKF (Emerick 2016; Emerick and Reynolds 2012, 2013a, b). However, the data assimilation methods mentioned previously cannot get optimal solutions when they are applied to groundwater inversion problems where estimated parameters usually have a non-Gaussian distribution. Accounting for non-Gaussian distributions of hydraulic conductivity is very significant since flow and transport predictions are dramatically different between Gaussian and non-Gaussian conductivity fields (Gómez-Hernández and Wen 1998; Zinn and Harvey 2003; Feyen and Caers 2005; Lee et al. 2007).

There have been many published papers on alleviating this non-Gaussian challenge. On the one hand, much research focused on parameterization methods that can represent the non-Gaussian parameters using latent Gaussian variables (Dorn and Villegas 2008; Sarma et al. 2008; Chen et al. 2009; Chang et al. 2010; Li et al. 2012b; Zhou et al. 2012a; Xu and Gómez-Hernández 2016; Xu and Gómez-Hernández 2017; Li et al. 2018). Liu and Oliver (2005) applied the truncated pluri-Gaussian model to match geologic facies using dynamic flow data via the EnKF. Jafarpour and McLaughlin (2008) applied the discrete cosine transformation to decrease the non-Gaussianity of parameters, and then used a traditional method to accomplish the model inversion. Chang et al. (2010) applied level-set parameterization for the non-Gaussian parameter estimation problem. Zhou et al. (2011) combined normal score transformation and the EnKF together to solve the non-Gaussian inversion problem. Moreover, as machine learning has become more and more popular, some researchers have begun to use machine learning algorithms as parameterization methods (Canchumuni et al. 2017; Mo et al. 2020). However, most of these studies are illustrated based on two-facies cases, and their applicability in multiple faces, such as three-facies, is worth further research and demonstration. Sometimes, change from two-facies to three-facies could lead to new challenges for some parameterization methods, as the existence of the third facies greatly increases the complicity of facies delineation (Chen et al. 2015, 2016).

On the other hand, many researchers focused on geostatistical methods, for which the core purpose is to condition facies simulation to the non-linear flow data (Strebelle 2002; Caers and Hoffman 2006; Cao et al. 2018; Hansen et al. 2018; Laloy et al. 2018; Khaninezhad et al. 2019). Jafarpour and Khodabakhshi (2011) proposed a probability conditioning method (PCM) based on the tau-model proposed by Journel (2002), and combined it with the EnKF. Zhou et al. (2012b) developed a pattern-search-based method following the idea of direct sampling (Mariethoz et al. 2010). Li et al. (2013) proposed an ensemble PATtern (EnPAT) search method that simultaneously updated both hydraulic conductivity and hydraulic head. Recently, Ma and Jafarpour (2018a) proposed a new pilot-points method for conditioning discrete MPS facies simulation on dynamic flow data and coupled it with ES to test several numerical experiments.

Among these methods mentioned already, the PCM gained popularity as it can condition both facies and hydraulic properties on flow data via the probability map (Jafarpour and Khodabakhshi 2011). Lots of research has been focused on understanding its mathematical principles and improving its performance (Khodabakhshi and Jafarpour 2014; Ma and Jafarpour 2019). Khodabakhshi and Jafarpour (2013) proposed an adaptive sampling strategy based on the PCM when multiple training images are used to acknowledge the uncertainty. Ma and Jafarpour (2018b) improved the PCM by constructing a probability map based on first- and second-order moments and introducing pixel-based tau values. To the best of the authors’ knowledge, these further studies on the PCM are based on the assumption that conductivities within each facies are homogeneous. However, conductivity heterogeneities within facies play an important role in the groundwater flow and transport model (Zhang et al. 2013).

To illustrate the effect of heterogeneous conductivities within facies on the flow and transport, two cases were constructed for comparison (case homo and case heter; more details of the flow and transport model can be found in the ‘Appendix’). In these two cases, everything is the same except that the conductivities in each facies in case homo are set as a constant (equal to the mean value in case heter, i.e. 3 ln(m/day) for the channel and –2 ln(m/day) for nonchannel, see Fig. 1b,c). Figure 1d shows that the breakthrough curves are rather different in case homo and case heter for both observation points. In addition, Fig. 1e,f illustrates that the flow fields are strongly different even though the two cases are identical except for the conductivity heterogeneities within the facies. These observations indicate that accurate characterization of the conductivity heterogeneities within each facies is also significant in the non-Gaussian inversion problems.

This work first proposes a modified PCM to characterize geological facies as well as hydraulic conductivities within each facies by combining PCM with the ES-MDA. Note that the ES-MDA was chosen instead of the EnKF as the data assimilation method, due to the better performance from ES-MDA for subsurface inverse problems (Emerick 2016; Emerick and Reynolds 2012, 2013a, b). Meanwhile, to the best of the authors’ knowledge, this work is the first time that the PCM is used to simultaneously estimate facies and conductivity fields in groundwater models, especially for models with three facies.

The rest of the paper is organized as follows. The relevant methods and the proposed method are introduced. Afterwards, a two-facies synthetic case and a three-facies synthetic case are constructed to illustrate the performance of the proposed method. Then, a sensitivity analysis to evaluate the impact of two important parameters of the proposed algorithm is carried out, followed by summarization.

System model and methodologies

Groundwater flow model

The flow model is assumed to be transient, and its governing equation is as follows (Bear 1972),

$$ \nabla \cdotp \left(K\nabla H\right)+W={\mu}_s\frac{\partial H}{\partial t} $$

(1)

where ∇ is the divergence operator; ∇ is the gradient operator; K is the hydraulic conductivity [L T⁻¹]; H is the hydraulic head [L]; W is the volumetric injection (pumping) flow rate per unit volume of the aquifer [L T⁻¹]; μ_s is the specific storage of the aquifer [L⁻¹]; and t is the time [T].

Ensemble smoother with multiple data assimilation (ES-MDA)

The ES-MDA is one of the most popular data assimilation methods with better performance and higher efficiency compared to the EnKF. In the ES-MDA, all the parameters of interest p are augmented with state variable h into a joint state vector x = [ph]^T, and an ensemble of N_e realizations of parameters is generated. The principle of the ES-MDA is very similar to that of the EnKF. There are only two differences between these two methods: one is that the ES-MDA uses global updates with all available data while the classical EnKF carries out updates sequentially using data from different times, and the other difference is that the ES-MDA uses multiple data assimilations with inflation coefficients, while the EnKF performs only one assimilation with each set of data. The main procedures of the ES-MDA are listed here:

Step 1.
Decide the number of data assimilations (N_a) and choose the coefficient (α_i, i = 1, …, N_a) for each data assimilation step satisfying the constraint in Eq. (2). Considering the computational cost and its performance, the number of assimilation times (iteration times) in the ES-MDA algorithm is chosen to be 4 in this work (Emerick and Reynolds 2013a), and the coefficient of each data assimilation in the ES-MDA is chosen to be α₁ = 9.333, α₂ = 7.0, α₃ = 4.0 and α₄ = 2.0 (Emerick and Reynolds 2013a).

$$ \sum \limits_{i=1}^{N_{\mathrm{a}}}\frac{1}{\alpha_i}=1 $$

(2)

Step 2.
For each realization, run the forward model G(.) from time zero

$$ {d}_i=G\left({\mathrm{x}}_i^{\mathrm{f}}\right),i=1,2,\dots, {N}_{\mathrm{e}} $$

(3)

In the aforementioned equation, i is the ensemble member index, and superscript f denotes forecast.

Step 3.
Update the ensemble of realizations using Eq. (4).

$$ {\mathbf{x}}_i^{\mathrm{a}}={\mathbf{x}}_i^{\mathrm{f}}+{\mathbf{C}}_{\mathrm{YD}}{\left({\mathbf{C}}_{\mathrm{D}\mathrm{D}}+{\alpha}_l{\mathbf{C}}_{\mathrm{D}}\right)}^{-1}\left({\mathbf{d}}_{{\mathrm{obs}}_i}-{\mathbf{d}}_i\right),i=1,2,\dots, {N}_{\mathrm{e}} $$

(4)

In the preceding equation, C_YD is the cross-covariance matrix between the forecast state and the predicted data, C_DD is the covariance matrix of the predicted data, l is the iteration index of the ES-MDA, l = 1, 2, …, N_a, C_D is the covariance matrix of the measurements error, d_obs is the perturbed observations with noise covariance α_lC_D, d is the predicted data, and superscript a denotes analysis.

After step 3, the updated ensemble X^a is obtained. Then, go back to step 2; the updated ensemble obtained from this step is implemented for the next data assimilation. Repeat steps 2–3 until the total number of data assimilations N_a is reached.

Probability conditioning method (PCM)

Original PCM

The probability conditioning method (PCM) was proposed to constrain single normal equation SIMulation (SNESIM)-based (Strebelle 2002) facies simulations on flow data (Jafarpour and Khodabakhshi 2011). The implementation of the PCM consists of two main steps. In the first step, flow data are used to update the lnK field through ES-MDA, and then the updated lnK field is used to infer a facies probability map. In the second step, the probability map is used as soft data (through the tau-model, Journel 2002) in the SNESIM algorithm to generate new (updated) realizations of facies indicators.

In order to infer a facies probability map from the nonlinear flow data, the EnKF data assimilation method was used in the original PCM to update conductivities (denoted as lnK) based on the flow data. Then the updated hydraulic properties were used to infer the facies probability map. For a model with two facies types, channel and nonchannel, with homogeneous lnK values, i.e. lnK₁ and lnK₀ respectively, one can use the following equation to calculate the facies probability map based on the updated lnK,

$$ {\displaystyle \begin{array}{c}P\left(f(x)=\mathrm{channel}|\mathrm{flow}\ \mathrm{data}\right)\\ {}=\Big\{\begin{array}{c}\kern-9.7em {p}_{\mathrm{max}},\overline{lnK}(x)>\mathit{\ln}{K}_1\\ {}\kern-9.7em {p}_{\mathrm{min}},\overline{lnK}(x)<\mathit{\ln}{K}_0\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\overline{lnK}(x)-\mathit{\ln}{K}_0}{\mathit{\ln}{K}_1-\mathit{\ln}{K}_0},\kern0.5em \mathrm{otherwise}\end{array}\end{array}} $$

(5)

where $ \overline{\ln K}(x) $ denotes the ensemble mean of lnK at grid cell x; lnK₁ and lnK₀ denote the homogeneous lnK values in the channel and non-channel respectively; and [p_min,p_max] are the boundaries of probability values. In this work, [p_min,p_max] are set as [0.01, 0.99].

For the two-facies case, one first uses Eq. (5) to calculate the probability of the channel in each grid based on the lnK ensemble which is conditioned to flow data, then the probability of the nonchannel is equal to 1 – P(f(x) = channel). However, for the three-facies case, it is necessary to calculate the probability of each facies type separately. The equations for the three-facies case are shown in the following, which are similar to that used in the two-facies case, calculating probabilities based on updated lnK,

$$ {\displaystyle \begin{array}{c}P\left(f(x)=\mathrm{high}\;\ln K\;\mathrm{facies}|\mathrm{flow}\ \mathrm{data}\right)\\ {}=\Big\{\begin{array}{c}\kern-9.7em {p}_{\mathrm{max}},\overline{\ln K}(x)>\ln {K}_2\\ {}\kern-9.7em {p}_{\mathrm{min}},\overline{\ln K}(x)<\ln {K}_1\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\overline{\ln K}(x)-\ln {K}_1}{\ln {K}_2-\ln {K}_1},\kern0.5em \mathrm{otherwise}\end{array}\end{array}} $$

(6)

$$ {\displaystyle \begin{array}{c}P\left(f(x)=\mathrm{mid}\;\ln K\;\mathrm{facies}|\mathrm{flow}\ \mathrm{data}\right)\\ {}=\Big\{\begin{array}{c}\kern-7.7em {p}_{\mathrm{min}},\overline{\ln K}(x)>\ln {K}_2\; or\;\overline{\ln K}(x)<\ln {K}_0\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\overline{\ln K}(x)-\ln {K}_0}{\ln {K}_1-\ln {K}_0},\ln {K}_0<\overline{\ln K}(x)<\ln {K}_1\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\ln {K}_2-\overline{\ln K}(x)}{\ln {K}_2-\ln {K}_1},\ln {K}_1<\overline{\ln K}(x)<\ln {K}_2\end{array}\end{array}} $$

(7)

$$ {\displaystyle \begin{array}{c}P\left(f(x)=\mathrm{low}\;\ln K\;\mathrm{facies}|\mathrm{flow}\ \mathrm{data}\right)\\ {}=\Big\{\begin{array}{c}\kern-9.7em {p}_{\mathrm{min}},\overline{\ln K}(x)>\ln {K}_1\\ {}\kern-9.7em {p}_{\mathrm{max}},\overline{\ln K}(x)<\ln {K}_0\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\ln {K}_1-\overline{\ln K}(x)}{\ln {K}_1-\ln {K}_0},\kern0.5em \mathrm{otherwise}\end{array}\end{array}} $$

(8)

where $ \overline{\ln K}(x) $ denotes the ensemble mean of lnK at grid cell x; lnK₀, lnK₁ and lnK₂ denote homogeneous lnK values in the three facies (from low to high), respectively; and [p_min,p_max] are the boundaries of probability values. In this work, [p_min,p_max] are set as [0.01, 0.99].

Modified PCM

It is evident that there is an assumption of homogeneity in the preceding equations. To apply the scenario to non-Gaussian and heterogeneous cases, the preceding equations are modified as follows. It should be noted that non-Gaussian and heterogeneous lnK fields in this work are constructed by combining the SNESIM algorithm (used to generate facies distribution) and GCOSIM3D algorithm (used to generate heterogeneity within each facies type).

For the two-facies case,

$$ {\displaystyle \begin{array}{c}P\left(f(x)=\mathrm{channel}|\mathrm{flow}\ \mathrm{data}\right)\\ {}=\Big\{\begin{array}{c}\kern-9.7em {p}_{\mathrm{max}},\overline{\ln K}(x)>\overline{\ln {K}_1}\\ {}\kern-9.7em {p}_{\mathrm{min}},\overline{\ln K}(x)<\overline{\ln {K}_0}\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\overline{\ln K}(x)-\overline{\ln {K}_0}}{\overline{\ln {K}_1}-\overline{\ln {K}_0}},\kern0.5em \mathrm{otherwise}\end{array}\end{array}} $$

(9)

where $ \overline{\ln K}(x) $ denotes the ensemble mean of lnK at grid cell x; $ \overline{\ln {K}_1} $ and $ \overline{\ln {K}_0} $ are the means of lnK in channel and nonchannel type facies; and [p_min,p_max] are the boundaries of probability values. In this work, [p_min,p_max] are set as [0.01, 0.99].

For the three-facies case,

$$ {\displaystyle \begin{array}{c}P\left(f(x)=\mathrm{high}\;\ln K\;\mathrm{facies}|\mathrm{flow}\ \mathrm{data}\right)\\ {}=\Big\{\begin{array}{c}\kern-9.7em {p}_{\mathrm{max}},\overline{\ln K}(x)>\overline{\ln {K}_2}\\ {}\kern-9.7em {p}_{\mathrm{min}},\overline{\ln K}(x)<\overline{\ln {K}_1}\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\overline{\ln K}(x)-\overline{\ln {K}_1}}{\overline{\ln {K}_2}-\overline{\ln {K}_1}},\kern0.5em \mathrm{otherwise}\end{array}\end{array}} $$

(10)

$$ {\displaystyle \begin{array}{c}P\left(f(x)=\mathrm{mid}\;\ln K\;\mathrm{facies}|\mathrm{flow}\ \mathrm{data}\right)\\ {}=\Big\{\begin{array}{c}\kern-7.7em {p}_{\mathrm{min}},\overline{\ln K}(x)>\overline{\ln {K}_2}\ or\ \overline{\ln K}(x)<\overline{\ln {K}_0}\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\overline{\ln K}(x)-\overline{\ln {K}_0}}{\overline{\ln {K}_1}-\overline{\ln {K}_0}},\overline{\ln {K}_0}<\overline{\ln K}(x)<\overline{\ln {K}_1}\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\overline{\ln {K}_2}-\overline{\ln K}(x)}{\overline{\ln {K}_2}-\overline{\ln {K}_1}},\overline{\ln {K}_1}<\overline{\ln K}(x)<\overline{\ln {K}_2}\end{array}\end{array}} $$

(11)

$$ {\displaystyle \begin{array}{c}P\left(f(x)=\mathrm{low}\;\ln K\;\mathrm{facies}|\mathrm{flow}\ \mathrm{data}\right)\\ {}=\Big\{\begin{array}{c}\kern-9.7em {p}_{\mathrm{min}},\overline{\ln K}(x)>\overline{\ln {K}_1}\\ {}\kern-9.7em {p}_{\mathrm{max}},\overline{\ln K}(x)<\overline{\ln {K}_0}\\ {}{p}_{\mathrm{min}}+\left({p}_{\mathrm{max}}-{p}_{\mathrm{min}}\right)\frac{\overline{\ln {K}_1}-\overline{\ln K}(x)}{\overline{\ln {K}_1}-\overline{\ln {K}_0}},\kern0.5em \mathrm{otherwise}\end{array}\end{array}} $$

(12)

where $ \overline{\ln K}(x) $ denotes the ensemble mean of lnK at grid cell x; $ \overline{\ln {K}_0} $, $ \overline{\ln {K}_1} $ and $ \overline{\ln {K}_2} $ denote the means of lnK values in the three facies (from low to high), respectively; and [p_min,p_max] are the boundaries of probability values. In this work, [p_min,p_max] are set as [0.01, 0.99].

Note that these equations could be extended to more facies types, providing the differences between each $ \overline{\ln {K}_i} $ are significant.

Parameter estimation scheme

In this paper, to apply the PCM in non-Gaussian and heterogeneous parameter estimation cases, the original PCM is first modified to remove its homogeneity limitation (see details in section ‘Probability conditioning method (PCM)’), and then the modified PCM is combined with the ES-MDA instead of the EnKF due to the superior performance of ES-MDA for subsurface parameter estimation problem. In order to show the proposed scheme thoroughly, Fig. 2 shows the framework of the original PCM in Jafarpour and Khodabakhshi (2011) together with the framework resulting from this study. As shown in Fig. 2, in order to estimate non-Gaussian parameters in heterogeneous aquifers, the proposed scheme has an additional step of updating lnK_i for facies type i compared to the original PCM.

Figure 2 shows that the proposed parameter estimation framework includes seven steps overall:

Step 1.
Generate initial realizations of the facies indicator using the SNESIM algorithm, initial ensembles of heterogeneous lnK of different facies types using GCOSIM3D algorithm (Gómez-Hernández and Journel 1993), and an initial probability map according to the number of facies (if there are n types of facies in the study domain, then the initial probability in each grid is set to 1/n).
Step 2.
Generate the non-Gaussian lnK ensemble by mapping lnK ensembles in a different facies to facies ensemble. For example, for the jth facies realization, if the facies type indicator in grid i is 0, then the lnK value in this grid is set to be the corresponding value in grid i of the jth lnK₀ realization.
Step 3.
Run the forward model with each non-Gaussian lnK realization.
Step 4.
Update lnK_i ensembles and the non-Gaussian lnK ensemble based on the ES-MDA equations introduced in section ‘Ensemble smoother with multiple data assimilations (ES-MDA)’.
Step 5.
Calculate the probability map based on the updated lnK according to equations stated in section ‘Modified PCM’.
Step 6.
Generate new facies realizations using the updated probability map in SNESIM.
Step 7.
Generate the updated non-Gaussian lnK ensemble by mapping the updated lnK_i ensembles in step 4 to the updated facies ensemble in step 6.

Since the ES-MDA is an iterative data assimilation method, step 3 to step 7 is repeated N_a times.

Case 1: two-facies case

Case setup

In this case, the flow is assumed to be transient in a two-dimensional (2D) confined aquifer with a starting head of 0 m. As shown in Fig. 3, the dimension of the aquifer is 600 m × 600 m and the grid size is 10 m in both horizontal x and y directions. In this case, the upward and downward boundaries are assumed to be impermeable, and the head of the left boundary is fixed to be 0 m. The flux at the right boundary is shown in Fig. 3c. More details can be found in Table 1.

Table 1 Flow model parameters in synthetic cases

Full size table

It is assumed that there are two facies types in the study domain, the channelized facies field and lnK field (Fig. 3), and are constructed in the following three steps:

Step 1.
Generate the facies field using the SNESIM algorithm with the training image (Fig. 4) in Strebelle (2002).
Step 2.
Generate two Gaussian random fields, lnK₀ and lnK₁, of the same size as the study domain using GCOSIM3D (Gómez-Hernández and Journel 1993) with parameters shown in Table 2 for sand (channel) and shale (nonchannel). In GCOSIM3D, the log-conductivity fields are characterized by their mean, standard deviation and directional correlation lengths in the two spatial dimensions (λ_x and λ_y).
Step 3.
Assemble the non-Gaussian lnK field by populating regions with one facies type (from step 1) with log-conductivity values from the corresponding Gaussian random field from step 2, i.e. the lnK value of a grid cell is based on the facies indicator value, if the facies indicator is equal to 1 then the lnK of this grid cell is set to the corresponding lnK₁ value at this grid cell, and vice versa.

It should be noted here that both the reference fields and the initial realizations are generated using the procedures already mentioned.

Table 2 Geostatistical parameters to characterize the spatial distribution of lnK within each two facies in case 1

Full size table

In order to estimate the facies map and heterogeneous lnK map in this synthetic case, nine observation wells are randomly chosen (Fig. 3c) to get observation data for data assimilation. The measurement errors of the head are assumed to follow the standard normal distribution with mean of zero and standard deviation of 0.01 m. The numerical code MODFLOW−2000 (Harbaugh et al. 2000) is used to solve the flow model in this case.

Case 1: results and analysis

Estimation results

Figures 5 and 6 show the evolution of three individual realizations, and the ensemble mean and ensemble standard deviation with the four iterations (data assimilations) of the ES-MDA. The ensemble mean of the initial realizations do not show any channelized feature, but the spatial structures start to appear and become evident during the data assimilation. For instance, at the first assimilation step, the upper channel is well identified; and at the final step, the ensemble mean has good connectivity and clear channel boundaries, recovering most channel locations in the reference model. In addition, Fig. 6 shows that the lnK heterogeneities in two facies are also well characterized in this case. The standard deviation has decreased dramatically from conditioning to head data, with the highest uncertainly remaining only near the estimated channel boundary at the final iteration.

Similarly, Fig. 7 shows the evolution of the probability map of the channel facies with the iterations of the ES-MDA. The high probability region at the last iteration clearly identifies the channel location in the reference model, demonstrating the effectiveness of this proposed method. The gradually refined probability map with the iterations of the ES-DMA constrains the facies models simulated from the SNESIM for the next data assimilation, hence increasing the estimate accuracy with each iteration.

In order to evaluate the estimation results further and quantitatively, two quantitative indicators are analyzed: root mean square error (RMSE) and the fraction of the correct facies indicator.

Since the true distribution of the estimated parameters in the synthetic case is known, it is possible to calculate the deviation of the estimation from the truth (reference field). The RMSE is a commonly used indicator in parameter estimation, measuring the accuracy of estimation results. In this work, RMSE_i at grid i is computed as follows

$$ {\mathrm{RMSE}}_i=\sqrt{\frac{1}{N_{\mathrm{r}}}\sum \limits_{j=1}^{N_{\mathrm{r}}}{\left({Y}_{j,i}-{Y}_{\mathrm{r}\mathrm{ef},i}\right)}^2} $$

(13)

where Y_ref,i and Y_j,i are the reference value and jth realization value at grid i respectively, and N_r denotes the total number of realizations used in ES-MDA.

Figure 8 shows the map of RMSE from the initial ensemble and from the final ensemble. The initial RMSE values in the study domain are relatively large, indicating the low accuracy of the heterogeneity characterization. However, after the data assimilation, the RMSE values decrease dramatically, illustrating the ability of this proposed method to capture spatial heterogeneity.

In this paper, the ‘fraction of correct facies’ is defined as the number of grid cells for which the facies indicator is correctly estimated divided by the total number of grid cells in the study domain. The average fraction of all realizations, E_f, is used to quantitatively evaluate the quality of the reconstructed facies model.

Figure 9 shows the evolution of the E_f during data assimilation, and one finds that the E_f increases as the assimilation step advances. After only two steps, facies indicators in around 80% of grid cells are correctly estimated. For the final ensemble, facies indicators are estimated correctly in around 85% of grid cells, which shows the efficiency and effectiveness of this proposed method.

Data reproduction and prediction performance

The foregoing analysis is focused on demonstrating and illustrating the performance of parameter estimation. To further evaluate the estimation results, the performance of data reproduction and model prediction is illustrated here.

To quantitatively assess the performance of data reproduction and model prediction, the mean absolute error (MAE) is used in this work. It is calculated as follows

$$ {\mathrm{MAE}}_j=\frac{1}{n_{\mathrm{obs}}}\sum \limits_{i=1}^{n_{\mathrm{obs}}}\mid {h}_{ij}^s-{h}_i\mid $$

(14)

where n_obs is the total number of head data used for data assimilation, h_i is the ith observation head data, and $ {h}_{ij}^s $ is the corresponding simulated head data of jth realization.

The scatterplots of the observed data and the ensemble mean of the simulated head data are shown in Fig. 10. Linear fit results and MAE are included to evaluate the overall model calibration performance. A perfect result would show the simulated head data on the 45° line. Figure 10 shows that the cloud from the final ensemble (blue) is much closer to the 45° line compared to the cloud from the initial ensemble. Based on this, one can argue that this proposed method has a rather good performance in terms of data reproduction. The MAE values of the initial and the final ensemble, shown in Table 3, also suggest that the final ensemble obtained a good match to data with small uncertainty, since the min, max, mean, and standard deviation of the MAE are all largely reduced compared to those from the initial ensemble.

Table 3 The MAE values of initial and final ensembles for data reproduction. SD standard deviation

Full size table

In order to evaluate the prediction ability, the updated lnK is used to forecast head data in the next 500 days. All model parameters remain the same. The scatterplots of true values and the ensemble mean of the simulated data are shown in Fig. 11. Linear fit results and MAE are also included to evaluate the overall model prediction performance. Figure 11 shows that the average simulated prediction data of the final ensemble are very close to the 45° line, and they are dramatically better than those of the initial ensemble, showing the good prediction ability of this proposed method. In addition, Table 4 shows MAE values of the initial and final ensembles. It shows that the final ensemble has much lower values in terms of min, max, mean, and standard deviation relative to the initial ensemble, which further suggests that the prediction data of the final ensemble have high accuracy and low uncertainty.

Table 4 The MAE values of initial and final ensembles for data prediction. SD standard deviation

Full size table

Case 2: three facies case

Case setup

To further investigate the applicability of the proposed scheme to the estimation of facies and heterogeneous lnK in the multiple facies case, this proposed method was applied to an example with three facies.

In this case, the flow is assumed to be transient in a 2D confined aquifer with a starting head of 0 m. As shown in Fig. 12, the dimension of the aquifer is 600 m × 600 m and the grid size is 10 m in both horizontal x and y directions. In this case, the head of all boundaries is fixed to be 0 m. There are three pumping wells in the study domain (blue crosses in Fig. 12c), and the flux at each pumping well is set as 60 m³/day. The locations of three pumping wells and another 10 observation wells are shown in Fig. 12c. More details can be found in Table 1.

It is assumed that there are three facies types in the study domain, and the facies field and lnK field (Fig. 12) are constructed in the same way as introduced in case 1. A low-permeability facies is added to the training image used in case 1 to generate a new training image with three facies, namely high conductivity (sandstone channels), medium conductivity (shale background), and low conductivity (lens-shaped clay). The new training image is shown in Figure 13. More details of parameter settings can be found in Table 5.

Table 5 Geostatistical parameters to characterize the spatial distribution of lnK within each facies in case 2

Full size table

In order to estimate the facies map and heterogeneous lnK map in this synthetic case, 13 observation wells (3 pumping wells and another 10 observation wells) are used (Fig. 12c) to get observation data for data assimilation. The measurement errors of the head are assumed to follow the standard normal distribution with mean of zero and standard deviation of 0.01 m. The numerical code MODFLOW−2000 (Harbaugh et al. 2000) is used to solve the flow model in this case.

Case 2: results and analysis

Estimation results

Figures 14 and 15 show the evolution of three individual realizations, and the ensemble mean and ensemble standard deviation with the four iterations (data assimilations) of the ES-MDA. The ensemble mean of the initial realizations do not show any evident non-Gaussian and heterogeneous features despite that single initial realizations have non-Gaussian and heterogeneous features, but spatial structures start to appear and become evident during the data assimilation. For instance, at the first assimilation step, the bottom channel is identified; at the second step, the upper channel and lens-shaped low-lnK distribution become evident. Finally, at the fourth step, the ensemble mean has good connectivity within certain facies and clear boundaries between different facies types, recovering most non-Gaussianity and heterogeneities in the reference fields. The standard deviation has significantly reduced from conditioning to head data, indicating the decrease of uncertainty. Figure 15 shows that the lnK heterogeneities in three facies types are also well characterized in this case. So, one can argue that this proposed method has a strong ability to recover non-Gaussian characteristics and estimate parameter heterogeneities via conditioning to head data.

In addition, Fig. 16 shows that probability maps gradually recover the spatial distribution of the three facies in the study domain based on the updated lnK ensemble, showing the effectiveness of this proposed method. In this way, it is possible to convert flow data into soft data on which SNESIM conditions facies realizations, hence increase the accuracy of the geo-statistical simulation. Furthermore, only after two assimilation steps, the probability map characterizes most spatial features of the three facies, illustrating the efficiency of this proposed method.

In order to quantitatively evaluate the estimation results further, two quantitative indicators are again analyzed: root mean square error (RMSE) and the fraction of correct facies indicators.

Figure 17 shows the map of RMSE from the initial ensemble and from the final ensemble. The initial RMSE values in the study domain are relatively large, indicating low accuracy of the heterogeneity characterization. However, the RMSE values of the final ensemble (estimation result) decrease dramatically compared to the initial ones, illustrating the ability of this proposed method to capture spatial heterogeneity.

Figure 18 shows the evolution of the E_f during the data assimilations, and it is evident that the E_f increases as the assimilation step advances. After only one assimilation step, facies indicators in around 75% of grid cells are correctly estimated. For the final ensemble, facies indicators are estimated correctly in around 81% of grid cells, which shows the efficiency and effectiveness of this proposed method in characterizing non-Gaussian features in the multifacies case.

Data reproduction and prediction performance

The above analysis is focused on demonstrating and illustrating the performance of parameter estimation. To further evaluate the estimation results, the performance of data reproduction and model prediction is illustrated here.

The scatterplots of the observed data and the ensemble mean of the simulated head data are shown in Fig. 19. Linear fit results and MAE are included to evaluate the overall data reproduction performance in this case as well. Figure 19 shows that the cloud from the final ensemble (blue) is much closer to the 45° line compared to the cloud from the initial ensemble, illustrating the accuracy of this proposed method in terms of data reproduction. The MAE values of the initial and the final ensemble shown in Table 6 show that the final ensemble obtained a good match to data with small uncertainty, since the min, max, mean, and standard deviation of MAE are all significantly reduced compared to the initial ensemble.

Table 6 The MAE values of initial and final ensembles for data reproduction in case 2. SD standard deviation

Full size table

In order to evaluate the prediction ability, the updated lnK was used to forecast head data in the next 500 days. All model parameters remain the same. The scatterplots of true values and the ensemble mean of the simulated data are shown in Fig. 20. Linear fit results and MAE are also included to evaluate the overall model prediction performance. In Fig. 20, it is easy to see that the average simulated prediction data of the final ensemble are very close to the 45° line, and they are dramatically better than those of the initial ensemble, showing the good prediction ability of this proposed method. In addition, Table 7 shows MAE values of the initial and final ensemble. It shows that the final ensemble has much lower values in terms of min, max, mean, and standard deviation relative to the initial ensemble, illustrating the prediction ability of the proposed method as prediction data of the final ensemble show high accuracy and low uncertainty.

Table 7 The MAE values of initial and final ensembles for data prediction in case 2. SD standard deviation

Full size table

Discussion

Effect of ensemble size

The results of case 2 shown in the previous section are based on an ensemble size of 300. To evaluate the impact of the ensemble size on parameter estimation, an analysis with ensemble sizes of 150, 300, 500, 800, and 1,000 (Table 8) is performed here. Note that scalar RMSE and ensemble spread are used in this section to evaluate the performance of different parameter settings for simplicity. These two indicators are defined as follows

$$ {\mathrm{RMSE}}_{\mathrm{scalar}}=\sqrt{\frac{1}{N_{\mathrm{m}}}\sum \limits_{i=1}^{N_{\mathrm{m}}}{\left(\overline{Y_{\mathrm{e},i}}-{Y}_{\mathrm{r},i}\right)}^2} $$

(15)

$$ \mathrm{Ensemble}\kern0.17em \mathrm{Spread}=\sqrt{\frac{1}{N_{\mathrm{m}}}\sum \limits_{i=1}^{N_{\mathrm{m}}}\operatorname{var}\left({Y}_{\mathrm{e},i}\right)} $$

(16)

where $ \overline{Y_{\mathrm{e},i}} $ and Y_r,i are the ensemble mean value and the reference value at location i respectively; var(Y_e,i) is the ensemble variance at location i; and N_m is the total number of nodes in the study domain.

Table 8 Data-assimilation-related parameters used in different cases

Full size table

The evolution of the RMSE_scalar and the ensemble spread with data assimilations for the cases with different ensemble size are shown in Fig. 21. For the RMSE_scalar, all cases with ensemble size of 300 and above show similar performance. When the ensemble size is only 150, the RMSE_scalar increases slightly after the second data assimilation. In terms of ensemble spread, all cases show somewhat similar behavior. The small differences in ensemble spread among the cases might not be statistically significant, since ES-MDA is an ensemble-based method and its results typically vary to a certain degree when with different ensembles of the same size. Based on both the RMSE_scalar and the ensemble spread, it seems that an ensemble size of 300 is an appropriate choice and further increasing the ensemble size does not provide much improvement to results. Note that in a typical use of the ensemble-based data assimilation methods for parameter estimation of subsurface models, increasing ensemble size would result in large improvement to ensemble spread (maintaining ensemble variability). The improvement in this case is not obvious because in the PCM workflow, facies realizations are always regenerated using the updated probability map, and this resimulation of facies realizations introduces new variability to the ensemble for the next data assimilation (acting almost like covariance inflation in a sense).

Effect of assimilation steps

The number of assimilation steps (N_a) is another important parameter in this proposed method. The results of case 2 (section ‘Case 2: three facies case’) are obtained using four data assimilations. To evaluate the impact of the number of assimilation steps on the results of parameter estimation, an analysis with two, four, and eight assimilation steps (Table 8) is performed here.

Figure 22 shows the RMSE_scalar and the ensemble spread for the cases with different numbers of data assimilations. It shows that the quality of parameter estimation is acceptable when there are only two data assimilations, but it is slightly less accurate compared to case 2 where N_a is 4. However, the performance of parameter estimation does not improve significantly as N_a further increases. Again, in this example, N_a equal to 4 appears to be a good choice, balancing the performance and computation cost.

Conclusions

In order to fully characterize both the facies boundary and heterogeneity of hydraulic conductivity within each facies, a modified PCM is proposed, in which a data assimilation method that is more suitable for subsurface parameters estimation (ES-MDA) is used with the probability conditioning method (PCM) and the estimation is extended to include heterogeneity within facies. It is of interest to note that, to the best of the authors’ knowledge, this work is the first time that the PCM is used to estimate both facies and conductivity fields in groundwater modeling, especially for models with three facies types. To illustrate and demonstrate the effectiveness and efficiency of this proposed method, a two-facies case and a three-facies case were constructed, both with heterogeneities within each facies type, and both quantitative and qualitative measures were used to evaluate the results.

For both test cases, the proposed method was able to identify nearly correct facies boundary locations in a few data assimilations. The calibrated models were able to reproduce head data that were used for conditioning during data assimilation, and the predictability of the calibrated models are also highly improved compared to models that were conditioned to head data.

A sensitivity analysis is also carried out to evaluate the impact of two important parameters of the proposed algorithm, ensemble size (N_e) and assimilation steps (N_a), using the case with three facies types. The analysis showed that for this particular case an ensemble size of 300 and ES-MDA with four data assimilations are good choices, balancing the performance and computational cost. As noted in the discussion, increasing ensemble size did not show as a significant impact, as typically it would for a standard application of the ensemble-based method to parameter estimation in subsurface models, because the facies regeneration step in PCM injects additional variability after each data assimilation.

An important issue in conditioning facies simulation to flow data indirectly is the usage of facies probability maps for soft conditioning in the SNESIM algorithm. However, having distinctive hydraulic conductivity for each of the facies types is a critical condition for the effectiveness of PCM. Therefore, it is of interest to note that PCM may not be suitable for nonclassical non-Gaussian problems where there are no evident multi-peaks in the probability density function of the conductivity field, meaning that the distance between peaks of distributions of the hydraulic conductivity for each facies is not significant compared to the standard deviation of the distributions. The impact of the distinctiveness of the distribution of the hydraulic properties among different facies on PCM is worth further research and discussion.

References

Bear J (1972) Dynamics of fluids in porous materials. Dover, New York
Google Scholar
Caers J, Hoffman T (2006) The probability perturbation method: a new look at Bayesian inverse modeling. Math Geol 38:81–100
Article Google Scholar
Canchumuni SA, Emerick AA, Pacheco MA (2017) Integration of ensemble data assimilation and deep learning for history matching facies models. OTC Brasil, Offshore Technology Conference, Rio de Janeiro, 29–31 October 2019
Cao Z, Li L, Chen K (2018) Bridging iterative ensemble smoother and multiple-point geostatistics for better flow and transport modeling. J Hydrol 565:411–421
Article Google Scholar
Carrera J, Alcolea A, Medina A, Hidalgo J, Slooten LJ (2005) Inverse problem in hydrogeology. Hydrogeol J 13(1):206–222
Article Google Scholar
Chang H, Zhang D, Lu Z (2010) History matching of facies distribution with the EnKF and level set parameterization. J Comput Phys 229:8011–8030. https://doi.org/10.1016/j.jcp.2010.07.005
Article Google Scholar
Chen C, Gao G, Honorio J, Gelderblom P, Jaakkola T (2015) Integration of principal-component-analysis and streamline information for the history matching of channelized reserviors. J Pet Technol 1(4):138–141
Google Scholar
Chen C, Gao G, Gelderblom P, Jimenez E (2016) Integration of cumulative-distribution-function mapping with principal-component analysis for the history matching of channelized reservoirs. SPE Reserv Eval Eng 19(02):278–293
Article Google Scholar
Chen Y, Oliver DS (2011) Ensemble randomized maximum likelihood method as an iterative ensemble smoother. Math Geosci 44:1–26. https://doi.org/10.1007/s11004-011-9376-z
Article Google Scholar
Chen Y, Zhang D (2006) Data assimilation for transient flow in geologic formations via ensemble Kalman filter. Adv Water Resour 29:1107–1122. https://doi.org/10.1016/j.advwatres.2005.09.007
Article Google Scholar
Chen Y, Oliver DS, Zhang D (2009) Data assimilation for nonlinear problems by ensemble Kalman filter with reparameterization. J Pet Sci Eng 66:1–14. https://doi.org/10.1016/j.petrol.2008.12.002
Article Google Scholar
Dagan G (1985) Stochastic modeling of groundwater flow by unconditional and conditional probabilities: the inverse problem. Water Resour Res 21(1):65–72
Article Google Scholar
De Marsily WG, Delhomme J-P, Delay F, Buoro A (1999) 40 years of inverse problems in hydrogeology. C R Acad Sci Serie Ii Fascicule A 329(2):73-87
Doherty J (2004) PEST: model-independent parameter estimation. User’s manual, 5th edn. Watermark, Brisbane, Australia
Google Scholar
Dorn O, Villegas R (2008) History matching of petroleum reservoirs using a level set technique. Inverse Problems. https://doi.org/10.1088/0266-5611/24/3/035015
Emerick AA (2016) Analysis of the performance of ensemble-based assimilation of production and seismic data. J Pet Sci Eng 139:219–239. https://doi.org/10.1016/j.petrol.2016.01.029
Article Google Scholar
Emerick AA, Reynolds AC (2012) History matching time-lapse seismic data using the ensemble Kalman filter with multiple data assimilations. Comput Geosci 16:639–659
Article Google Scholar
Emerick AA, Reynolds AC (2013a) Ensemble smoother with multiple data assimilation. Comput Geosci-Uk 55:3–15. https://doi.org/10.1016/j.cageo.2012.03.011
Article Google Scholar
Emerick AA, Reynolds AC (2013b) Investigation of the sampling performance of ensemble-based methods with a simple reservoir model. Comput Geosci 17:325
Article Google Scholar
Evensen G (2009) Data assimilation: the ensemble Kalman filter. Springer, Berlin
Book Google Scholar
Evensen G (2018) Analysis of iterative ensemble smoothers for solving inverse problems. Comput Geosci. https://doi.org/10.1007/s10596-018-9731-y
Feyen L, Caers J (2005) Multiple-point geostatistics: a powerful tool to improve groundwater flow and transport predictions in multi-modal formations. Geostatistics for Environmental Applications. Springer, Heidelberg, Germany, pp 197–208
Google Scholar
Franssen HJH, Alcolea A, Riva M, Bakr M, Wiel NVD, Stauffer F, Guadagnini A (2009) A comparison of seven methods for the inverse modelling of groundwater flow: application to the characterisation of well catchments. Adv Water Resour 32(6):851–872
Article Google Scholar
Gómez-Hernández JJ, Journel AG (1993) Joint sequential simulation of MultiGaussian fields. Geostatistics Troia’92. Springer, Dordrecht, The Netherlands, pp 85–94. https://doi.org/10.1007/978-94-011-1739-5_8
Book Google Scholar
Gómez-Hernández JJ, Hendricks Franssen HJ, Sahuquillo A (2003) Stochastic conditional inverse modeling of subsurface mass transport: a brief review and the self-calibrating method. Stoch Env Res Risk A 17(5):319–328
Article Google Scholar
Gómez-Hernández JJ, Wen XH (1998) To be or not to be multi-Gaussian? A reflection on stochastic hydrogeology. Adv Water Resour 21(1):47–61
Article Google Scholar
Hansen TM, Mosegaard K, Cordua KS (2018) Multiple point statistical simulation using uncertain (soft) conditional data. Comput Geosci 114:1–10
Article Google Scholar
Harbaugh AW, Banta ER, Hill MC, McDonald MG (2000) MODFLOW-2000, the U.S. Geological survey modular ground-water model: user guide to modularization concepts and the ground-water flow process. US Geol Surv Open-File Rep 00-92:121
Google Scholar
Jafarpour B, Khodabakhshi M (2011) A probability conditioning method (PCM) for nonlinear flow data integration into multipoint statistical facies simulation. Math Geosci 43:133–164. https://doi.org/10.1007/s11004-011-9316-y
Article Google Scholar
Jafarpour B, McLaughlin DB (2008) History matching with an ensemble Kalman filter and discrete cosine parameterization. Comput Geosci 12:227–244
Article Google Scholar
Journel AG (2002) Combining knowledge from diverse sources: an alternative to traditional data independence hypotheses. Math Geol 34:573–596
Article Google Scholar
Kalman RE (1960) A new approach to linear filtering and prediction problems. Trans ASME J Basic Eng 82(D):35–45
Article Google Scholar
Karahan H, Ayvaz MT (2008) Simultaneous parameter identification of a heterogeneous aquifer system using artificial neural networks. Hydrogeol J 16(5):817–827
Article Google Scholar
Khaninezhad R, Golmohammadi A, Jafarpour B (2019) A pattern-matching method for flow model calibration under training image constraint. Comput Geosci. https://doi.org/10.1016/J.ADVWATRES.2016.04.007
Khodabakhshi M, Jafarpour B (2013) A Bayesian mixture-modeling approach for flow-conditioned multiple-point statistical facies simulation from uncertain training images. Water Resour Res 49(1):328–342
Article Google Scholar
Khodabakhshi M, Jafarpour B (2014) Adaptive conditioning of multiple-point statistical facies simulation to flow data with probability maps. Math Geosci 46(5):573–595
Article Google Scholar
Laloy E, Hérault R, Jacques D, Linde N (2018) Training-image based geostatistical inversion using a spatial generative adversarial neural network. Water Resour Res 54(1):381–406
Article Google Scholar
Lan T, Shi X, Jiang B, Sun Y, Wu J (2018) Joint inversion of physical and geochemical parameters in groundwater models by sequential ensemble-based optimal design. Stoch Env Res Risk A 32:1919–1937. https://doi.org/10.1007/s00477-018-1521-5
Article Google Scholar
Lee SY, Carle SF, Fogg GE (2007) Geologic heterogeneity and a comparison of two geostatistical models: sequential Gaussian and transition probability-based geostatistical simulation. Adv Water Resour 30(9):1914–1932
Article Google Scholar
Li L, Zhou H, Gómez-Hernández JJ, Hendricks Franssen HJ (2012a) Jointly mapping hydraulic conductivity and porosity by assimilating concentration data via ensemble Kalman filter. J Hydrol 428–429:152–169. https://doi.org/10.1016/j.jhydrol.2012.01.037
Article Google Scholar
Li L, Zhou H, Hendricks Franssen HJ, Gómez-Hernández JJ (2012b) Groundwater flow inverse modeling in non-MultiGaussian media: performance assessment of the normal-score ensemble Kalman filter. Hydrol Earth Syst Sci 16:573–590. https://doi.org/10.5194/hess-16-573-2012
Article Google Scholar
Li L, Srinivasan S, Zhou H, Gómez-Hernández JJ (2013) Simultaneous estimation of geologic and reservoir state variables within an ensemble-based multiple-point statistic framework. Math Geosci 46(5):597–623
Article Google Scholar
Li L, Stetler L, Cao Z, Davis A (2018) An iterative normal-score ensemble smoother for dealing with non-Gaussianity in data assimilation. J Hydrol 567:759–766
Article Google Scholar
Liu N, Oliver DS (2005) Ensemble Kalman filter for automatic history matching of geologic facies. J Pet Sci Eng 47:147–161. https://doi.org/10.1016/j.petrol.2005.03.006
Article Google Scholar
Ma W, Jafarpour B (2018a) Pilot points method for conditioning multiple-point statistical facies simulation on flow data. Adv Water Resour 115:219–233
Article Google Scholar
Ma W, Jafarpour B (2018b) An improved probability conditioning method for constraining multiple-point statistical facies simulation on nonlinear flow data. Society of Petroleum Engineers. https://doi.org/10.2118/190077-ms
Ma W, Jafarpour B (2019) Production data integration into complex geologic facies models: exploiting the behavior of multiple-point statistical simulation for effective data conditioning. SPE Reservoir Simulation Conference, Galveston, TX, April 2019
Man J, Zhang J, Li W, Zeng L, Wu L (2016) Sequential ensemble-based optimal design for parameter estimation. Water Resour Res 52:7577–7592. https://doi.org/10.1002/2016wr018736
Article Google Scholar
Mariethoz G, Renard P, Straubhaar J (2010) The direct sampling method to perform multiple-point geostatistical simulations. Water Resour Res 46(11):1–14
Google Scholar
Mo S, Zabaras N, Shi X, Wu J (2020) Integration of adversarial autoencoders with residual dense convolutional networks for estimation of non-Gaussian hydraulic conductivities. Water Resour Res 56(2):e2019WR026082
Article Google Scholar
Nan T, Wu J (2011) Groundwater parameter estimation using the ensemble Kalman filter with localization. Hydrogeol J 19(3):547–561
Article Google Scholar
Neuman SP (1973) Calibration of distributed parameter groundwater flow models viewed as a multiple objective decision process under uncertainty. Water Resour Res 9(4):1006–1021
Article Google Scholar
Oliver DS, Chen Y (2008) Improved initial sampling for the ensemble Kalman filter. Comput Geosci 13:13–27. https://doi.org/10.1007/s10596-008-9101-2
Article Google Scholar
Oliver DS, Chen Y (2011) Recent progress on reservoir history matching: a review. Comput Geosci 15(1):185–221
Article Google Scholar
Oliver DS, Cunha LB, Reynolds AC (1997) Markov chain Monte Carlo methods for conditioning a permeability field to pressure data. Math Geol 29(1):61–91
Article Google Scholar
Sarma P, Durlofsky LJ, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40(1):3–32
Article Google Scholar
Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34:1–21
Article Google Scholar
Van Leeuwen PJ, Evensen G (1996) Data assimilation and inverse methods in terms of a probabilistic formulation. Mon Weather Rev 124:2898–2913
Article Google Scholar
Xu T, Gómez-Hernández JJ (2016) Characterization of non-Gaussian conductivities and porosities with hydraulic heads, solute concentrations, and water temperatures. Water Resour Res. https://doi.org/10.1002/2016wr019011
Xu T, Gómez-Hernández JJ (2017) Simultaneous identification of a contaminant source and hydraulic conductivity via the restart normal-score ensemble Kalman filter. Adv Water Resour. https://doi.org/10.1016/j.advwatres.2017.12.011
Xue L, Zhang D (2014) A multimodel data assimilation framework via the ensemble Kalman filter. Water Resour Res 50:4197–4219. https://doi.org/10.1002/2013wr014525
Article Google Scholar
Zhang Y, Green CT, Fogg GE (2013) The impact of medium architecture of alluvial settings on non-Fickian transport. Adv Water Resour 54:78–99
Article Google Scholar
Zheng C (2006) MT3DMS v5.2 supplemental user’s guide: technical report to the U.S. Department of Geological Sciences, University of Alabama. Army Engineer Research and Development Center, Vicksburg, MS, 24 pp
Google Scholar
Zhou HY, Gomez-Hernandez JJ, Franssen HJH, Li LP (2011) An approach to handling non-Gaussianity of parameters and state variables in ensemble Kalman filtering. Adv Water Resour 34:844–864. https://doi.org/10.1016/j.advwatres.2011.04.014
Article Google Scholar
Zhou HY, Li LP, Gómez-Hernández JJ (2012a) Characterizing curvilinear features using the localized normal-score ensemble Kalman filter. Abstr Appl Anal. https://doi.org/10.1007/s11004-020-09882-1
Zhou HY, Gómez-Hernández JJ, Li LP (2012b) A pattern-search-based inverse method. Water Resour Res 48(3):1–17
Article Google Scholar
Zhou HY, Gómez-Hernández JJ, Li LP (2014) Inverse methods in hydrogeology: evolution and recent trends. Adv Water Resour 63:22–37
Article Google Scholar
Zinn B, Harvey CF (2003) When good statistical models of aquifer heterogeneity go bad: a comparison of flow, dispersion, and mass transfer in connected and multivariate Gaussian hydraulic conductivity fields. Water Resour Res. https://doi.org/10.1029/2001wr001146

Download references

Acknowledgements

The authors thank the two anonymous reviewers and the associate editor Dr. Yong Zhang for their constructive comments, which significantly improved the quality of this paper.

Funding

This work was financially supported by the National Key Research and Development Program of China (No. 2018YFC0406402) and the National Natural Science Foundation of China (No. 41672229 and No. 41730856).

Author information

Authors and Affiliations

Key Laboratory of Surficial Geochemistry, Ministry of Education and School of Earth Sciences and Engineering, Nanjing University, Nanjing, 210023, China
Tian Lan, Xiaoqing Shi & Jichun Wu
Geoscience Research Centre, Total E&P UK, Westhill, UK
Yan Chen
Department of Geology and Geological Engineering, South Dakota School of Mines and Technology, Rapid City, 57701, USA
Liangping Li
Water Conservancy and Civil Engineering College, Inner Mongolia Agricultural University, Hohhot, 010018, Inner Mongolia, China
Limin Duan & Tingxi Liu

Authors

Tian Lan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqing Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Liangping Li
View author publications
You can also search for this author in PubMed Google Scholar
Jichun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Limin Duan
View author publications
You can also search for this author in PubMed Google Scholar
Tingxi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoqing Shi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In case homo and case heter, the flow is assumed to be transient in a 2D confined aquifer with a starting head of 0 m. As shown in 1, the dimension of the aquifer is 600 m × 600 m and the grid size is 10 m both in horizontal x and y directions. The height of the model is 10 m. The upward and downward boundaries are assumed to be impermeable, the head of the left boundary is fixed to be 0 m, and the flux at the right boundary is set as −300 m³/day. Additionally, the porosity and the specific storage are set to 0.3 and 0.0003 m⁻¹ respectively

Meanwhile, there is a line source at the left boundary with a constant concentration of 100 mg/L. It is of interested to note that only advection and dispersion are considered in these two comparing cases. The longitudinal dispersivity and horizontal transverse dispersivity are set to be 10 and 1 m, respectively. The governing equation for aqueous species’ transportation is defined as (Zheng 2006):

$$ \frac{\partial {C}_n}{\partial t}=\nabla \cdotp \left(D\cdotp \nabla {C}_n\right)-\nabla \cdotp \left(v{C}_n\right)+\frac{q_s}{\theta }{C}_n^s $$

(17)

where C_n is the aqueous concentration of the nth component [M L⁻³]; t is the time [T]; D is the diffusion coefficient [L² T⁻¹]; v = (−K ∇ H)/θ[L² T⁻¹]; q_s is the volumetric flow rate per unit volume of the aquifer [T⁻¹]; θ is the effective porosity; and C_n^s is the concentration of the source or sink flux of the nth component [M L⁻³]. The numerical code MT3DMS (Zheng 2006) is used to solve the solute model. The total simulation time is 500 days with 100 time steps.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lan, T., Shi, X., Chen, Y. et al. Identification of non-Gaussian parameters in heterogeneous aquifers by a modified probability conditioning method through hydraulic-head assimilation. Hydrogeol J 29, 819–839 (2021). https://doi.org/10.1007/s10040-020-02243-6

Download citation

Received: 17 February 2020
Accepted: 13 September 2020
Published: 13 October 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10040-020-02243-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Identification of non-Gaussian parameters in heterogeneous aquifers by a modified probability conditioning method through hydraulic-head assimilation

Abstract

Résumé

Resumen

摘要

Resumo

Similar content being viewed by others

Experimental sandbox tracer tests to characterize a two-facies aquifer via an ensemble smoother

Evaluating parameter inversion efficiency in Heterogeneous Groundwater models using Karhunen-Loève expansion: a comparative study of genetic algorithm, ensemble smoother, and MCMC

Identification of hydraulic conductivity and groundwater contamination sources with an Unscented Kalman Smoother

Introduction

System model and methodologies

Groundwater flow model

Ensemble smoother with multiple data assimilation (ES-MDA)

Probability conditioning method (PCM)

Original PCM

Modified PCM

Parameter estimation scheme

Case 1: two-facies case

Case setup

Case 1: results and analysis

Estimation results

Data reproduction and prediction performance

Case 2: three facies case

Case setup

Case 2: results and analysis

Estimation results

Data reproduction and prediction performance

Discussion

Effect of ensemble size

Effect of assimilation steps

Conclusions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation