1 Introduction

Groundwater contamination is a serious environmental issue that has been occurring in various areas worldwide for many years. Contamination is caused by various contaminants such as anthropogenic chemicals and radioactive and microbiological substances. Contamination by soluble and mobile contaminants tends to spread inconspicuously into extensive areas because its extension is not noticed unless the water quality is systematically monitored. To suppress the extension of contamination and form a proper remediation plan via pumping and chemical processing, correct estimations of the contaminant plume evolution and its uncertainty given the water quality data at wells are essential. To achieve this, the construction of an accurate hydrogeological model to simulate the groundwater flow and contaminant transport is indispensable.

A contaminant source (i.e., the source location or release history) and hydraulic properties are two predominant uncertain factors in the estimation of the contaminant plume distribution. Uncertainty in the contaminant source arises in accidental contamination events, because the source is not identified or recorded except laboratory experiments. In real-world events, release history records are particularly rare, as in past cases of underground contamination at nuclear facilities (OECD 2014); although source locations have been detected by preliminary surveys or historical site assessments in drain lines, sumps, pipes, and tanks, the temporal release histories have not been traced in all cases. Among the relevant hydraulic properties, the permeability expressed by the hydraulic conductivity or transmissivity is the most essential parameter for simulating groundwater flow and contaminant transport. Despite the significance and heterogeneity of the spatial distribution, the amount and location of the measured permeability data at wells are usually limited by practical constraints such as time and cost. Therefore, there are large uncertainties in permeability distributions estimated by simply interpolating and extrapolating the measured permeability or indirectly using hydraulic head or other data. The correctness and uncertainty of the estimated permeability distribution necessarily control the estimation accuracy of the contaminant plume evolution.

The joint identification of the contaminant release history and hydraulic properties has been studied using several approaches. Examples of hydraulic property identification approaches include non-linear maximum likelihood estimation (Wagner 1992) and trained artificial neutral network (Singh and Datta 2004) for homogeneous fields and restart normal-score ensemble Kalman filter (EnKF) (Sanchez-Leon et al. 2016; Chen et al. 2018; Xu and Gomez-Hernandez 2018) and ensemble smoother (ES) with multiple data assimilation (Xu et al. 2021) for heterogeneous fields. The high accuracy of EnKF methods has been confirmed in a sandbox experiment (Chen et al. 2021). However, there are two problems with the above studies:

  1. 1.

    The release patterns of contaminants are assumed to be known. Typically, the contaminant source parameters, such as the source location, initial release time, release duration, and mass-loading rate, are determined by assuming constant release (e.g., Xu et al. 2021). However, the release pattern in actual contamination events is usually unknown and uncertain. Therefore, a random function should be applied to the release pattern (Snodgrass and Kitanidis 1997).

  2. 2.

    Although EnKF methods have the advantage of enabling production of non-Gaussian distributions without considering the spatial correlation of the hydraulic conductivity, they require an impractically large amount of spatiotemporal measurement data of the head and concentration for usual contaminant cases.

To overcome the first issue, the quasi-linear geostatistical approach (GA; e.g., Snodgrass and Kitanidis 1997; Gyzl et al. 2004; Shlomi and Michalak 2007) is applicable by introducing the prior information of release history with geostatistical trend and covariance. The applicability of GA has been verified at real-world sites contaminated by water-soluble contaminants: 1,4-dioxane (Woodbury et al. 1998; Michalak and Kitanidis 2002), tetrachloroethene and trichloroethne (Michalak and Kitanidis 2003), and hexachlorocyclohexane (Gyzl et al. 2014). However, most of these studies only incorporated the uncertainty of the contaminant source using a deterministic hydraulic property model.

GA can also solve the second issue if extended to hydraulic tomography studies (e.g., Li et al. 2007, 2008; Cardiff et al. 2009; Cardiff and Barrash 2011; Pouladi et al. 2021), and its applicability using head data has been verified by field tests (e.g., Illuman et al. 2009; Wang et al. 2017; Zha et al. 2018; Luo et al. 2022). However, GA tends to generate a spatially smoother best estimate than the true distribution, which is its main drawback. This smoothing effect is caused by modeling the hydraulic conductivity as a multivariate Gaussian, which is usually inadequate for the estimation of heterogeneous fields such as aquifers in fluvial deposits, where several strata with highly different permeabilities coexist (Mo et al. 2020). However, the assumption of a Gaussian field is applicable to cases of groundwater contamination that occur in a single aquifer. The smoothing effect has been improved via joint inversion of the head and temperature data (Jiang and Woodbury 2006) and the head and tracer data (e.g., Harvey and Gorelick 1995; Cirpka and Kitanidis 2000; Xu and Kitanidis 2014), as well as in combination with a convolution neural network (Vu and Jardani 2022).

Although the tracer test data can indeed improve the performance and accuracy of GA, the implementation of many tests over a contaminated area necessarily disturbs the contamination situation, renders situation assessments difficult, and possibly further extends the contamination. In contrast to such impractical testing, measuring the transient concentration data in groundwater at wells over an area is possible and preserves the situation. Therefore, through the joint use of head and transient concentration data, the estimation accuracies of both the contaminant plume distribution and the hydraulic conductivity are expected to be effectively improved. To achieve this, an estimation of the unknown initial plume distribution is indispensable.

Given the above background, this study aims to accurately estimate the contaminant plume evolution by considering uncertainties in both the temporal release history and the heterogeneous transmissivity fields. Accordingly, the GA method is further developed for a joint clarification of the contaminant plume and transmissivity distributions using both the head and contaminant concentration data. The joint clarification is achieved by combining previous estimation methods for each component: a contaminant plume with an unknown release history is estimated using the method of Shlomi and Michalak (2007) and the hydraulic transmissivity is estimated using the method of Kitanidis and Lee (2014). This paper begins with a review of the previous estimation methods and then, proposes a combined method. This method consists of the following three steps: separate initial estimations of the transmissivity and the initial plume distributions using the head and concentration data, respectively; an iterative update of their distributions via joint use of the data; and an estimation of the contaminant plume evolution and its uncertainties based on their conditional realizations. The proposed method is verified by two numerical experiments assuming groundwater contamination in a two-dimensional aquifer and the results are discussed finally.

2 Methods

2.1 Iterative Estimation of Contaminant Plume and Hydraulic Transmissivity

Previous geostatistical approaches for contaminant plume estimation with unknown release histories (e.g., Shlomi and Michalak 2007) cannot consider the uncertainty of the hydraulic transmissivity. To address this problem, this study developed a GA method to estimate the contaminant plume evolution z(x,t) (x: space and t: time) and its uncertainty by combining previous estimation methods for the contaminant plume and transmissivity, reviewed in Sects. 2.2 and 2.3, for the uncertainties of the release history s(t) and log-transmissivity r(x), respectively. Because both z and r are necessary for each estimation, an iterative approach using both the head and concentration data is proposed as shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the iterative estimation of the contaminant plume and the hydraulic transmissivity distributions, z(x,t) and r(x), respectively

The first step is the initial estimation of r and the initial contaminant plume z0 = z(t0) (t0: initial measurement time), using the head φ and initial concentration data z0*, separately. The next step is to update r based on the estimated z0 using both φ and the transient concentration data z*(t). The posterior pdfs of r and z0 are iteratively calculated until the posterior pdf of r reaches its maximum. In this step, the mutual uncertainties of r and z0 are not considered (i.e., the uncertainty of r is not considered in the z0 estimation, and vice versa). Finally, the best estimate of z(t) is obtained using the best estimates of r and the corresponding z0. To consider both the uncertainties of r and z0, the estimation method for the uncertainty of z(t) based on Nr × Nz0 conditional realizations of r and z0 (N: number of realizations) is developed as described in Sect. 2.4.

2.2 Geostatistical Inversion for Initial Contaminant Plume Estimation

This section reviews preceding studies of the quasi-linear GA for the estimation of the contaminant plume distribution from a known source with an unknown release history (e.g., Kitanidis 1995; Snodgrass and Kitanidis 1997; Shlomi and Michalak 2007). Under a steady state flow, \({\varvec{z}}_{0}^{\user2{*}} \in {\mathbb{R}}^{{n_{z} \times 1}}\) (nz-dimensional real space) is related linearly to the release history \({\varvec{s}} \in {\mathbb{R}}^{{m_{t} \times 1}}\) at each time \(t_{j} \left( {j = 1, \ldots ,m_{t} } \right)\) such that

$$ \begin{array}{*{20}c} {{\varvec{z}}_{0}^{\user2{*}} = {\varvec{H}}_{s}^{*} s + {\varvec{v}}_{z} ,} \\ \end{array} $$
(1)

where \({\varvec{H}}_{s}^{*} \in {\mathbb{R}}^{{n_{z} \times m_{t} }}\) and \({\varvec{v}}_{z} \in {\mathbb{R}}^{{n_{z} \times 1}}\) stand for the Jacobian matrix and the model mismatch error at the measurement points, respectively. \({\varvec{H}}_{s}^{*}\) expresses the sensitivity of the concentrations at each measurement point and time and can be calculated in advance by a flow and transport simulation for the release of a unit concentration pulse. Therefore, the unknown s can be obtained by solving Eq. (1) inversely.

The geostatistical inversion incorporates the temporal correlation of s and assumes that s and vz are random vectors following the multivariate Gaussian distributions \({\varvec{s}}\sim N\user2{ }\left( {{\varvec{X}}_{s} {\varvec{\beta}}_{s} ,{\varvec{Q}}_{s} \left( {\theta_{s} } \right)} \right)\) and \({\varvec{v}}_{{\varvec{z}}} \sim N\left( {{\bf 0},{\varvec{R}}_{z} } \right)\), where \({\varvec{X}}_{s} \in {\mathbb{R}}^{{m_{t} \times p_{s} }}\) is a known matrix of basis functions; \({\varvec{\beta}}_{s} \in {\mathbb{R}}^{{p_{s} \times 1}}\) are ps unknown drift coefficients; \({\varvec{Q}}_{s} \left( {\theta_{s} } \right) \in {\mathbb{R}}^{{m_{t} \times m_{t} }}\) is the generalized covariance matrix of s; \(\theta_{s}\) is the structural parameter of Qs; and Rz is the error covariance matrix of \({\varvec{z}}_{0}^{\user2{*}}\). This study assumes an uncorrelated error of \({\varvec{R}}_{z} = \sigma_{{R_{z} }}^{2} {\varvec{I}}\), where \(\sigma_{{R_{z} }}^{2}\) is the variance of the error and \({\varvec{I}} \in {\mathbb{R}}^{{n_{z} \times n_{z} }}\) is the identity matrix. The unknown s can be estimated from \({\varvec{z}}_{0}^{\user2{*}}\) by maximizing the posterior pdf \(p^{{{\prime \prime }}} \left( {{\varvec{s}},\user2{ \beta }_{s} } \right)\) obtained via Bayes’ rule as

$$ \begin{array}{*{20}c} { - {\text {ln}}p^{\prime\prime}\left( {{\varvec{s}},\user2{ \beta }_{s} } \right) = \frac{1}{2}\left( {{\varvec{z}}_{0}^{\user2{*}} - {\varvec{H}}_{s}^{*} {\varvec{s}}} \right)^{T} {\varvec{R}}_{z}^{ - 1} \left( {{\varvec{z}}_{0}^{\user2{*}} - {\varvec{H}}_{s}^{*} {\varvec{s}}} \right) + \frac{1}{2}\left( {{\varvec{s}} - {\varvec{X}}_{s} {\varvec{\beta}}_{s} } \right)^{T} {\varvec{Q}}_{s}^{ - 1} \left( {{\varvec{s}} - {\varvec{X}}_{s} {\varvec{\beta}}_{s} } \right).} \\ \end{array} $$
(2)

The structural parameters \({\varvec{\theta}} = \left( {\theta_{s} , \sigma_{{R_{z} }} } \right)^{T}\) can be iteratively estimated using a restricted maximum likelihood approach that minimizes the objective function \(L\left( {\varvec{\theta}} \right)\) (Kitanidis 1995)

$$ \begin{array}{*{20}c} {L\left( {\varvec{\theta}} \right) = \frac{1}{2}\ln \left| {{\varvec{\varSigma}}_{s} } \right| + \frac{1}{2}\ln \left| {{\varvec{X}}_{s}^{T} {\varvec{H}}_{s}^{*T}{\varvec{\varSigma}}_{s}^{ - 1} {\varvec{H}}_{s}^{*} {\varvec{X}}_{s} } \right| + \frac{1}{2}{\varvec{z}}_{0}^{*T}{\varvec{\varXi}}_{s}^{ - 1} {\varvec{z}}_{0}^{*} ,} \\ \end{array} $$
(3)
$$ \begin{array}{*{20}c} {{\varvec{\varSigma}}_{s} = {\varvec{H}}_{s}^{*} {\varvec{Q}}_{s} {\varvec{H}}_{s}^{*T} + {\varvec{R}}_{z} ,} \\ \end{array} $$
(4)
$$ \begin{array}{*{20}c} {{\varvec{\varXi}}_{s} ={\varvec{\varSigma}}_{s}^{ - 1} -{\varvec{\varSigma}}_{s}^{ - 1} {\varvec{H}}_{s}^{*} {\varvec{X}}_{s} \left( {{\varvec{X}}_{s}^{T} {\varvec{H}}_{s}^{*T}{\varvec{\varSigma}}_{s}^{ - 1} {\varvec{H}}_{s}^{*} {\varvec{X}}_{s} } \right)^{ - 1} {\varvec{X}}_{s}^{T} {\varvec{H}}_{s}^{*T}{\varvec{\varSigma}}_{s}^{ - 1} .} \\ \end{array} $$
(5)

Then, the best estimate \(\hat{\user2{s}}\) and its posterior covariance \({\varvec{V}}_{{\hat{\user2{s}}}}\) are derived by solving the following equation system

$$ \begin{array}{*{20}c} {\left( {\begin{array}{*{20}c} {{\varvec{\varSigma}}_{s} } & {{\varvec{H}}_{s}^{*} {\varvec{X}}_{s} } \\ {\left( {{\varvec{H}}_{s}^{*} {\varvec{X}}_{s} } \right)^{T} } & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {{\varvec{\varLambda}}_{s}^{T} } \\ {{\varvec{M}}_{s} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {{\varvec{H}}_{s}^{*} {\varvec{Q}}_{s} } \\ {{\varvec{X}}_{s}^{T} } \\ \end{array} } \right),} \\ \end{array} $$
(6)
$$ \begin{array}{*{20}c} {\hat{\user2{s}} ={\varvec{\varLambda}}_{s} {\varvec{z}}_{0}^{*} ,} \\ \end{array} $$
(7)
$$ \begin{array}{*{20}c} {{\varvec{V}}_{{\hat{s}}} = {\varvec{Q}}_{s} - {\varvec{Q}}_{s} {\varvec{H}}_{s}^{*T}{\varvec{\varLambda}}_{s}^{T} - {\varvec{X}}_{s} {\varvec{M}}_{s} ,} \\ \end{array} $$
(8)

where \({\varvec{\varLambda}}_{s} \in {\mathbb{R}}^{{m_{t} \times n_{z} }}\) and \({\varvec{M}}_{s} \in {\mathbb{R}}^{{p_{s} \times m_{t} }}\) are the weight matrix and the Lagrange multiplier, respectively. To enforce concentration non-negativity, a power transformation (Box and Cox 1964) is applied, such that

$$ \begin{array}{*{20}c} {\tilde{\user2{s}} = \alpha \left( {{\varvec{s}}^{1/\alpha } - 1} \right),} \\ \end{array} $$
(9)

where α is a positive number. Because Eq. (1) is not linear in the transformed space, \(\tilde{\user2{s}}\) and θ are solved iteratively using the quasi-linear approach (Snodgrass and Kitanidis 1997) in which α is chosen to be as small as possible while ensuring that \(\tilde{\user2{s}} > \alpha\). After obtaining the best estimate and its covariance, the solutions are back-transformed into the original space by

$$ \begin{array}{*{20}c} {{\varvec {s}} = \left( {\frac{{\tilde{\user2{s}} + \alpha }}{\alpha }} \right)^{\alpha } .} \\ \end{array} $$
(10)

Once \(\hat{\user2{s}}\) and \({\varvec{V}}_{{\hat{s}}}\) are determined, the best estimate of \(\widehat{{{\varvec{z}}_{0} }} \in {\mathbb{R}}^{m \times 1}\) and its posterior covariance \({\varvec{V}}_{{\hat{z}}}\) can be solved as

$$ \begin{array}{*{20}c} {\widehat{{{\varvec{z}}_{0} }} = {\varvec{H}}_{s} \hat{\user2{s}},} \\ \end{array} $$
(11)
$$ \begin{array}{*{20}c} {{\varvec{V}}_{{\hat{z}}} = {\varvec{H}}_{s} {\varvec{V}}_{{\hat{s}}} {\varvec{H}}_{s}^{T} ,} \\ \end{array} $$
(12)

where m is the number of estimation points and \({\varvec{H}}_{s} \in {\mathbb{R}}^{{m \times m_{t} }}\)is the Jacobian matrix at all estimation points.

2.3 Principle Component Geostatistical Approach for Hydraulic Transmissivity Estimation

This study adopts the principal component geostatistical approach (PCGA: Kitanidis and Lee 2014; Lee and Kitanidis 2014), as reviewed below, to estimate the hydraulic transmissivity distribution. The observation \({\varvec{y}} \in {\mathbb{R}}^{n \times 1}\) can be expressed by the forward model h with \({\varvec{r}} \in {\mathbb{R}}^{m \times 1}\) and the observation error \({\varvec{v}} \in {\mathbb{R}}^{n \times 1}\) as

$$ \begin{array}{*{20}c} {{\varvec {y}} = h\left( {\varvec{r}} \right) + {\varvec {v}}.} \\ \end{array} $$
(13)

For the present case, y corresponds to only head data or to head and concentration data. r and v are assumed to follow the multivariate Gaussian distributions \({\varvec{r}}\sim N\user2{ }\left( {\user2{X\beta },{\varvec{Q}}\left( {\theta_{r} } \right)} \right)\) and \({\varvec{v}}\sim N\user2{ }\left( {0,{\varvec{R}}} \right)\), where \({\varvec{X}} \in {\mathbb{R}}^{m \times p}\) is a known matrix of basis functions; \({\varvec{\beta}} \in {\mathbb{R}}^{p \times 1}\) represents p unknown drift coefficients; \({\varvec{Q}}\left( {\theta_{r} } \right) \in {\mathbb{R}}^{m \times m}\) is a generalized covariance matrix of r; θr is the structural parameter of Q; and R is the error covariance matrix of y. As in the above release history, the best estimate \(\hat{\user2{r}}\) is obtained by maximizing \(p^{{{\prime \prime }}} \left( {{\varvec{r}},\user2{ \beta }} \right)\). Because Eq. (13) is not linear, the quasi-linear approach (Kitanidis 1995) is applied to approximate the true \(\hat{\user2{r}}\) with the latest estimate \(\overline{\user2{r}}\), such that

$$ \begin{array}{*{20}c} {h\left( {\hat{\user2{r}}} \right) = h\left( {\overline{\user2{r}}} \right) + {\varvec {H}}\left( {\hat{\user2{r}} - \overline{\user2{r}}} \right), {\varvec {H}} = \left. {\frac{\partial h}{{\partial {\varvec{r}}}}} \right|_{{{\varvec{r}} = \overline{\user2{r}}}} .} \\ \end{array} $$
(14)

The following equation system is solved to update \(\overline{\user2{r}}\) until it converges

$$ \left( {\begin{array}{*{20}c}{\varvec{\varSigma}}& {{\varvec{HX}}} \\ {\left( {{\varvec{HX}}} \right)^{T} } & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\overline{\user2{\xi }}} \\ {\overline{\user2{\beta }}} \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {{\varvec{y}} - h\left( {\overline{\user2{r}}} \right) + {\varvec {H}}\overline{\user2{r}}} \\ 0 \\ \end{array} } \right), $$
(15)
$$ \begin{array}{*{20}c} {{\varvec{\varSigma}} = {\varvec {HQ}}{\varvec{H}}^{T} + {\varvec {R}}}, \\ \end{array} $$
(16)
$$ \begin{array}{*{20}c} {\overline{\user2{r}} = {\varvec {X}}\overline{\user2{\beta }} + {\varvec {Q}}{\varvec{H}}^{T} \overline{\user2{\xi }}.} \\ \end{array} $$
(17)

Once the optimal solution \(\hat{\user2{r}}\) is obtained, its posterior covariance \({\varvec {V}}_{{\hat{r}}}\) can be calculated in the same way as in Eqs. (6) and (8).

PCGA was proposed to obtain \(\hat{\user2{r}}\) efficiently with small computation cost by improving the conventional GA through two approaches. The first approach is the use of Taylor expansion for the indirect expression of \({\varvec{H}}\)

$$ \begin{array}{*{20}c} {{\varvec {Ha}} \approx \frac{1}{{\delta_{r} }}\left[ {h\left( {{\varvec{a}} + \delta_{r} {\varvec{a}}} \right) - h\left( {\varvec{a}} \right)} \right],} \\ \end{array} $$
(18)

where a is the target vector, such as \(\overline{\user2{r}}\), and \(\delta_{r}\) is a finite difference interval that can be optimized as (Lee et al. 2016)

$$ \begin{array}{*{20}c} {\widehat{{\delta_{r} }} = \frac{{\sqrt {\varepsilon_{r} } }}{{{\varvec{a}}_{2}^{2} }}\max \left( {\left| {{\varvec{r}}^{T} {\varvec{a}}} \right|,\left| {\varvec{r}} \right|^{T} \left| {\varvec{a}} \right|} \right){\text{sign}}\left( {{\varvec{r}}^{T} {\varvec{a}}} \right),} \\ \end{array} $$
(19)

where εr is the relative machine precision depending on the precision of the forward model; \(\left| {\varvec{a}} \right| = \left( {\left| {a_{1} } \right|, \ldots , \left| {a_{m} } \right|} \right)^{T}\); and sign() indicates the sign of a value. The second approach is a low-rank approximation of Q as

$$ \begin{array}{*{20}c} {{\varvec {Q}} \approx {\varvec{Z}}_{Q} {\varvec{Z}}_{Q}^{T} = \mathop \sum \limits_{i = 1}^{K} {\varvec{\zeta}}_{i} {\varvec{\zeta}}_{i}^{T} , {\varvec{\zeta}}_{i} = \sqrt {\lambda_{i} } {\varvec{V}}_{i} ,} \\ \end{array} $$
(20)

where λi and \({\varvec{V}}_{i} \in {\mathbb{R}}^{m \times 1}\) are the ith eigenvalue and eigenvector of Q in the descending order. The order K can be defined such that the relative error of the low-rank approximation, \(\lambda_{K + 1} /\lambda_{1}\), is sufficiently small. All of the above calculations are implemented by normalizing the drift and covariance of the prior model, as explained in the Appendix, following Kitanidis and Lee (2014).

2.4 Conditional Realizations of the Transmissivity and Initial Plume Distributions

The uncertainty of z(x,t) is assessed considering the uncertainties of both the initial contaminant plume and the transmissivity distributions by generating their conditional realizations. The conditional realization can be drawn from the posterior pdf using either the Cholesky decomposition of the posterior covariance (Harvey and Gorelick 1995; Nowak 2009; Troldborg et al. 2012) or the parametric bootstrapping sampling method (Kitanidis 1995; Kitanidis and Lee 2014). Because of its simplicity and smallness of calculation, the Cholesky approach is adopted here where the ith conditional realization of the transmissivity distribution \(\widehat{{{\varvec{r}}_{c} }}_{i}\) is

$$ \begin{array}{*{20}c} {\widehat{{{\varvec{r}}_{c} }}_{i} = \widehat{{{\varvec{r}}_{u} }}_{i} + \left( {{\varvec{HQ}}} \right)^{T} {\varvec{\varSigma}}^{ - 1} \left( {{\varvec{y}} - h\left( {\widehat{{{\varvec{r}}_{u} }}_{i} } \right) + {\varvec{v}}_{i} } \right),} \\ \end{array} $$
(21)

where \(\widehat{{{\varvec{r}}_{u} }}_{i} \sim N\left( {\hat{\user2{r}}, V_{{\hat{r}}} } \right)\) is the ith unconditional realization of the transmissivity randomly sampled from the posterior pdf and \({\varvec{v}}_{i} \sim N\user2{ }\left( {0,{\varvec{R}}} \right)\) is the ith random measurement error of the head and concentration. In the same way, the ith realization of the initial contaminant plume distribution \({\widehat{{\varvec{z}}_0}}_{c_i}\) can be written as

$$ \widehat{{{\varvec{z}}_{0}}}_{c_i} = {\varvec{H}}_{s} \widehat{{{\varvec{s}}_{c} }}_{i} , \widehat{{{\varvec{s}}_{c} }}_{i} = \widehat{{{\varvec{s}}_{u} }}_{i} + \left( {{\varvec{H}}_{s}^{*} {\varvec{Q}}_{s} } \right)^{T} {\varvec{\varSigma}}_{s}^{ - 1} \left( {{\varvec{z}}_{0}^{{*}} - {\varvec{H}}_{s}^{*} \widehat{{{\varvec{s}}_{u} }}_{i} + {\varvec{v}}_{z_i}} \right), $$
(22)

where \(\widehat{{{\varvec{s}}_{u} }}_{i}\) is the ith unconditional realization of the release history and \({\varvec{v}}_{z_i} \sim N\user2{ }\left( {0,{\varvec{R}}_{z} } \right)\) is the ith random measurement error of the initial concentration. \(\widehat{{{\varvec{s}}_{u} }}_{i}\) can be inversely calculated from the ith realization of the release history in the transformed space, \(\widehat{{\tilde{\user2{s}}_{u} }}_{i} \sim N\left( {\widehat{{\tilde{\user2{s}}}}, V_{{\widehat{{\tilde{s}}}}} } \right)\).

3 Numerical Experiment

3.1 Physical Model

The above proposed geostatistical approach was tested via numerical experiments of two-dimensional steady state groundwater flow and contaminant transport in the steady state. Let a transmissivity field T in a confined aquifer be spatially variable but locally isotropic. The governing equation for groundwater flow in a saturated porous media is expressed as

$$ {\varvec{\nabla}} \cdot {\varvec{u}} = Q_{f} \delta \left( {{\varvec{x}} - {\varvec{x}}_{f} } \right),\quad {\varvec{u}} = - T{\varvec{\nabla}} \varphi , $$
(23)

where u is the groundwater flow velocity; T is the transmissivity; φ is the hydraulic head; δ(x) is the Dirac delta function; and Qf is the pumping rate at a well location xf. Under this state, the contaminant transport is expressed by the advection–dispersion equation as

$$ \nabla \cdot \left( {{\varvec{D}}\nabla c - {\varvec{V}}c} \right) = R_{f} \frac{\partial c}{{\partial t}} + \lambda_{f} c, $$
(24)

where c is the dimensionless concentration; \({\varvec{V}} = {\varvec{u}}/\varepsilon\) is the actual groundwater velocity; D is the dispersion tensor; ε is the porosity; Rf is the retardation factor; and λf is the radioactive or first-order biochemical decay constant. Each component of D is formulated as

$$ \begin{array}{*{20}c} {D_{ij} = \alpha_{T} \left| {\varvec{V}} \right|\delta_{ij} + \left( {\alpha_{L} - \alpha_{T} } \right)\frac{{V_{i} V_{j} }}{{\left| {\varvec{V}} \right|}} + D_{m} \tau \delta_{ij} \left( {i,j = 1,2} \right),} \\ \end{array} $$
(25)

where αL and αT are the longitudinal and transverse dispersivities, respectively; Dm is the molecular diffusion coefficient; and τ is the tortuosity.

3.2 Settings of Two Cases

On the basis of reviews of groundwater contamination events caused by water-soluble pollutants, the contamination extent generally ranges from scales of 100 m to 1 km (e.g., for over 2000 sites in California, the median plume length was 270 m for 1,4-dioxane, 115 m for 1,1,1-trichloroethne, 95 m for trichloroethene, and 123 m for 1,1-dichloroethene; Adamson et al. 2014). At such a scale, contaminated water can be pumped from wells set at several locations. To simplify the present experiment, only one pumping well was set at (x, y) = (25, 0) m in a model domain of 100 m along the x-axis (the flow direction) × 50 m along the y-axis (Fig. 2).

Fig. 2
figure 2

True value distributions of (a) the log-transmissivity and (b) the initial contaminant plume for cases 1 (left) and 2 (right)

Two cases of transmissivity fields with different degrees of heterogeneity were prepared by referring to the experimental model of Lee and Kitanidis (2014): case 1 had a smooth spatial change in the transmissivity and case 2 had a highly heterogeneous field with local changes in the transmissivity. The mean log-transmissivities (m2/d) of 2.4 were the same for both cases; this is the product of the assigned aquifer thickness, 10 m, with a typical hydraulic conductivity for porous sand, 10−5 m/s (Zanini and Kitanidis 2009). The difference between the two cases is expressed by the spatial correlation range of the field, case 1 is long and case 2 is short, as shown by the covariance function in Table 1. The covariance functions defined were a generalized cubic covariance with a linear drift following Zanini and Kitanidis (2009), which is continuously differentiable and smooth (Kitanidis and Lee 2014), for case 1 and an isotropic exponential covariance with a constant drift for case 2. Constant-head boundaries were set at x = 0 (inflow) and 100 m (outflow) with a head difference of 0.2 m, and impermeable boundary conditions were set at both y edges (y = ± 25 m) (Fig. 3). The contaminant concentration at x = 0 m and the dispersive flux at x = 100 m were both zero. The longitudinal and transverse dispersivities were defined as 5.0 and 0.5, respectively, considering that \(\alpha_{L} \sim 0.1L_{p}\) (Lallemand-Barres and Peaudecerf 1978; Pickens and Grisak 1981; Spitz and Moreno 1996) or αL = 0.83 [log10(Lp)]2.414, where Lp is the plume length from the source [m] (Xu and Eckstein 1995) and αT is approximately 0.1αL (Gelhar et al. 1992; Wiedemeier et al. 1999). Rf and λf were not considered, and τ was set to 1.

Table 1 Parameter settings for the two cases with smooth and heterogeneous transmissivity fields (case 1 and 2, respectively)
Fig. 3
figure 3

Setting of the boundary conditions and well locations in the model domain. Open circles and asterisks indicate the monitoring wells of the hydraulic head (all 35 wells) and the contamination (18 wells), respectively. The cross mark indicates the location of the pumping well and contaminant source. The meaning of the circles and the cross mark are the same in Figs. 5, 6, 10, 11, and 13

The contaminant plume distribution originated from a known source at (x, y) = (25, 0) m. The release of the contaminant starts at t = −300 days (case 1) and −150 days (case 2) before the initial measurement time (t = 0) and ends at t = 0. The source intensity was assumed to increase linearly from 0 to 1. Hydraulic heads were measured at 35 monitoring wells under steady state for one pumping well at the source location (Fig. 4), and contaminant concentrations were measured monthly over 1 year (t = 0 to 1 year) at 18 monitoring wells located uniformly on the downstream side. In the calculation, the mean travel time was used instead of the transient concentration data, as suggested for tracer data (Harvey and Gorelick 1995; Ezzedine and Rubin 1996; Cirpka and Kitanidis 2000; Lee and Kitanidis 2014)

$$ \begin{array}{*{20}c} {\overline{{t_{{{\varvec{x}}_{i} }} }} = \frac{{\mathop \sum \nolimits_{{t = t_{0} }}^{{t_{end} }} tz\left( {{\varvec{x}}_{i} , t} \right)\Delta t}}{{\mathop \sum \nolimits_{{t = t_{0} }}^{{t_{end} }} z\left( {{\varvec{x}}_{i} , t} \right)\Delta t}},} \\ \end{array} $$
(26)

where \(\overline{{t_{{{\varvec{x}}_{i} }} }}\) is the mean travel time at position xi; \(t = (t_{0} , \ldots ,t_{end} )\) is the measurement time; and Δt is the measurement interval.

Fig. 4
figure 4

Hydraulic head distribution under steady state when pumping at one well for cases 1 (left) and 2 (right). The location of the pumping well is shown in Fig. 3

Using the 35 head and 18 travel time data, the log-transmissivities at the 5,000 (100 × 50) cells at intervals of 1 m along the x- and y-axes were estimated. Assuming that the transmissivities at the 35 wells were known, the unknown structural parameters θr were estimated to be 2.0 × 10−5 for case 1 and 12.5 for case 2. In the contaminant plume estimation, the optimal values of both σR and θs were determined simultaneously. The standard deviation of the measurement errors of the head and concentration (or the mean travel time as mentioned above) were set to 0.05 m (approximately 5% of the maximum head change as a result of pumping) and 10%, respectively, following Lee and Kitanidis (2014). A Gaussian random error with zero mean and a corresponding standard deviation was added to all of the measurement data. The unknown release histories were recovered at ten-day intervals over the 1,350 days prior to the start of measurement, which is sufficiently long to express the Jacobian matrix for the contaminant plume evolution from the source to the model boundary.

3.3 Calculation Execution Conditions

The forward simulation of the groundwater flow and transport was executed using 3D-SEEP (Kimura and Muraoka 1986), based on the three-dimensional Garlerkin finite element method. Singular value decompositions for the low-rank approximations were computed in parallel using the ScaLAPACK package (Blackford et al. 1997). The linear systems of Eqs. (6) and (15) were solved using the generalized minimal residual method with a criterion for the relative residual error of ≤ 1 × 10−8. A PC with an Intel Core i9-11900 K (3.50 GHz) CPU and 64-GB memory was used for the numerical experiments.

For both cases, the initial transmissivity field was set to be uniform with a log mean of −9.0 m2/s. Following Lee and Kitanidis (2014), the optimum number of the low-rank approximation of Q was set to K = 96, in which the relative error of the approximation \(\lambda_{K + 1} /\lambda_{1}\) was 3.1 × 10−4% for case 1 and 1.2% for case 2. Only for the joint inversion of case 2, which is a strongly nonlinear problem, was K changed to 350 with \(\lambda_{K + 1} /\lambda_{1} = 0.{18}\%\). To ensure the monotonic convergence of the nonlinear transmissivity estimation problem [Eq. (13)], the optimal solution was identified using a line search (Zanini and Kitanidis 2009)

$$ \begin{array}{*{20}c} {\hat{\user2{r}} = {\varvec{r}}_{i} \delta_{ls} + {\varvec{r}}_{i + 1} \left( {1 - \delta_{ls} } \right),} \\ \end{array} $$
(27)

where ri is the previous estimate and ri+1 is the updated estimate found using the Gauss–Newton procedure [Eq. (17)] and δls is a scalar. The range of δls was set to −0.1 ≤ δls ≤ 1.1 following Zanini and Kitanidis (2009). Finally, the calculations of the transmissivity estimation converged entirely within 18 iterations for all cases with εr = 5 × 10−6.

To obtain the final solution, the estimated transmissivity distributions were updated two and three times for cases 1 and 2, respectively. The uncertainty of the contaminant plume distributions was evaluated using the results of 10,000 (Nr = 100 × Nz0 = 100) realizations. Because of rounding errors, the eigenvalues of the posterior covariance included small negative values (approximately −10−7); all the negative eigenvalues were therefore changed to 1 × 10−10.

4 Results

4.1 Hydraulic Transmissivity

The best estimates and estimation variances of the log-transmissivity distributions are shown in Figs. 5 and 6, respectively. Even for the results using only the head data, sufficient accuracy of the best estimates can be confirmed by the near agreement between the simulated and measured heads having small root mean square errors (RMSE) of 0.047 (case 1) and 0.054 (case 2) (Fig. 7a). However, because of the measurement error, the results are spatially much smoother than the true fields in both cases. In particular, large underestimates occurred at the relatively high transmissivity portions; these are continuously distributed from the upstream to the downstream (case 1) and are heterogeneously distributed on the downstream side (case 2). This smoothing effect was remarkable in case 2 with the highly heterogeneous field; furthermore, the estimation uncertainty significantly increased with distance from the pumping well.

Fig. 5
figure 5

True (top) and best estimates of the log-transmissivity distributions for cases 1 and 2 using only the head data (middle) and using both the head and concentration data (bottom)

Fig. 6
figure 6

Estimation variances of the log-transmissivity distributions for cases 1 and 2 using only the head data (top) and using both the head and concentration data (bottom)

Fig. 7
figure 7

Simulated versus measured values for (a) the head and (b) the mean travel time using only the head data (closed circle) and using both the head and concentration data (open circles)

Conversely, through the joint use of the head and concentration data, large improvements in the estimation accuracy were confirmed for both cases, for example, the relatively high transmissivity portions were well reproduced. Both the measurement data of the head and the mean travel time were adequately reproduced within their 95% confidence intervals (Fig. 7b). Even though the estimation accuracy was low in the portions outside the contaminant plume transport, such as in the vicinity of the domain boundary, the joint data use obviously decreased the estimation uncertainties along the contaminant plume evolutions for both cases compared with the results using only the head data (Fig. 6).

Table 2 Summary of statistics of the best estimates of the log-transmissivity distributions (m2/d) for cases 1 and 2

The estimation accuracies of the log-transmissivity distributions were assessed in the measurement area, x = [12, 87] (m), y = [− 15, 15] (m), using the descriptive statistics: the coefficient of determination (R2) between the estimated and true values (Fig. 8), mean (μ0) and variance (ν0) of the log-transmissivity field; and mean (μ2 = mean \(\left[ {\left( {{\varvec{r}}_{{{\text{true}}}} - \hat{\user2{r}}} \right)^{2} } \right]\)) and variance (ν2 = var \(\left[ {\left( {{\varvec{r}}_{{{\text{true}}}} - \hat{\user2{r}}} \right)^{2} } \right]\)) of the square differences (Table 2). These statistical parameters demonstrate the large improvement of the estimation accuracy via the joint use of the head and concentration data for both cases (e.g., μ2 was decreased by 76% and 50% for cases 1 and 2, respectively).

Fig. 8
figure 8

Estimated versus true log-transmissivities in the measurement area for (a) case 1 and (b) case 2 using only the head data (closed circles) and both the head and concentration data (open circles)

4.2 Contaminant Plume Distribution

Using the optimal values of θs and \(\sigma_{{R_{z} }}\) (Table 3), the release histories for the best estimates of the log-transmissivity distributions were estimated for the two cases, as shown in Fig. 9. When using both the head and concentration data, the recovered release peaks were slightly closer to the true peaks for both cases compared with the results when using only the head data. This is due to the improvement in the estimated transmissivity distribution, in particular the decrease in the underestimation induced when using only the head data. The magnitude of the estimated variance of the release history depends on both the concentration measurement error and the heterogeneity of the transmissivity distribution. As reported by Butera and Tanda (2003), the model mismatch error of the hydraulic field can increase the estimation variance for two main reasons: errors in the flow direction and the magnitude of the dispersion. The large confidence intervals in Fig. 9 are caused by the former factor in case 1 using only the head data; and by the latter factor in case 2 because of the large heterogeneous transmissivity.

Table 3 Optimal values of the structure parameters for the best estimates of the release history in the two cases
Fig. 9
figure 9

Estimated release histories using only the head data (dashed line) and using both the head and concentration data (solid line) for cases 1 (left) and 2 (right). The red line indicates the true release history

The best estimates of the transmissivity distributions were applied to the prediction of the best estimate of the contaminant plume evolution, as drawn in Figs. 10 and 11 for the results of the plume distribution at t = 0, 1, and 2 years for cases 1 and 2. When using only the head data, the contaminant transport velocity was underestimated in both cases. This underestimation resulted in the maximum concentrations at t = 1 and 2 years being 1.4 (case 1) and 1.7 (case 2) times as large as the true concentrations. In addition, the center of plume distribution differs from the true position. At t = 2 year, the true maximum concentrations are located at 85 m (case 1) and outside the model domain (case 2). However, the estimated positions when using only the head data are 69 m (case 1) and 66 m (case 2). Conversely, given the joint use of the head and concentration data, the plume evolution is well reproduced in both cases. The advantage of using both forms of data was proved by the significant increases in the mean of the linear correlation coefficients at one month interval from t = 0 to 2 years between the true and best estimated concentrations at the 18 measurement points (Fig. 12): the time-averaged correlation coefficient increased from 0.72 to 0.97 for case 1 and from 0.67 to 0.97 for case 2.

Fig. 10
figure 10

True (left) and best estimates of the contaminant plume distributions at t = 0 (the initial measurement time), 1, and 2 years later for case 1, obtained using only the head data (middle) and using both the head and concentration data (right)

Fig. 11
figure 11

True (left) and best estimates of contaminant plume distributions at t = 0 (the initial measurement time), 1, and 2 years later for case 2, obtained using only the head data (middle) and using both the head and concentration data (right)

Fig. 12
figure 12

Time evolution of the correlation coefficients between the true and best estimated concentration at measurement points for (a) case 1 and (b) case 2 obtained using only the head data (open circles) and using both the head and concentration data (closed circles)

The uncertainties in the contaminant plume transport were quantified for the results through the joint data use (Fig. 13). Obviously, the resultant uncertainties were sufficiently small compared with the best estimate values in Figs. 9 and 10. At t = 0, the uncertainties were relatively high as a result of the estimated variance of the release histories (Fig. 9); this was more conspicuous in case 2. However, these fluctuations decreased with time.

Fig. 13
figure 13

Standard deviations of the contaminant plume distributions at t = 0, 1, and 2 years later for cases 1 (left) and 2 (right) using both the head and concentration data

5 Discussion

The effectiveness and high accuracy of the joint clarification of the initial contaminant plume and transmissivity distributions using both the head and concentration data were demonstrated for the prediction of the contaminant plume evolution. The proposed GA method is applicable for any water-soluble contaminant with or without retardation and/or radioactive or biochemical decay. However, the flow and transport simulation was simply implemented under steady state flow conditions. This assumption cannot be satisfied for cases that need to consider unsteady flow caused by typically periodic pumping. The next step is to incorporate the unsteady flow in both unconfined and confined aquifers.

As mentioned above, the two main novel points of this study differing from previous studies were the non-use of tracer data and the consideration of the uncertainty of the initial contaminant plume distribution. While a smaller model domain (40 m × 20 m) and a smaller head measurement error (5.0 × 10−4 m) than that used in this study was targeted, the effectiveness of the joint use of the head and tracer data with a defined anisotropic exponential covariance of the transmissivity was, in previous studies, demonstrated to improve the estimation accuracy of log-transmissivity inside the tracer paths (Cirpka and Kitanidis 2000; Lee and Kitanidis 2014). For a similar exponential covariance model (case 2), the joint use of the head and tracer data was demonstrated to improve the estimation accuracy based on the RMSE (\(\sqrt {\mu_{2} }\)) of the log-transmissivity in the entire domain by 10% compared with that obtained using only the head data (Cirpka and Kitanidis 2000). Although this study did not use tracer data, the RMSE in the measurement area was improved by 29% via the joint use of the head and concentration data. This high accuracy contributes to accurately reproducing the contaminant plume evolution even if the release history is unknown. Another noteworthy advantage of the present method is its capability to evaluate the uncertainty of the contaminant plume evolution, considering the uncertainties of both the initial contaminant plume and the transmissivity distributions. This feature resulted in the sufficiently small estimation uncertainty of the contaminant plume evolution even for the highly heterogeneous transmissivity field (case 2) compared with the best estimate values.

As an extension of the proposed method to practical applications, the following three points need to be considered. The first point is the need to estimate the three-dimensional permeability (i.e., the hydraulic conductivity). Although this is possibly straightforward given the proposed method, a problem is the large cost of acquiring a sufficient amount of multi-depth data of the head and contaminant concentration for the three-dimensional estimation. Therefore, a suitable data amount and the location of the measurement data should be specified depending on the hydrogeological features of the target area. The second point is the reduction of the smoothing effect associated essentially with the spatial estimation of geostatistical methods. As shown in this study, even though the estimated transmissivity field is spatially smoother than the true field, the contaminant plume evolution can be reproduced well because the contaminant plume spreads over time primarily as a result of mechanical dispersion. However, reducing the smoothing effect is indispensable when a hotspot-shaped concentration anomaly much higher than the surroundings needs to be reproduced. This reduction may be possible by applying geostatistical simulations to conditional realizations, typically via sequential Gaussian simulations (e.g., Deutsch and Journel 1998), turning band simulations (e.g., de Sá et al. 2021a, b), and the incorporation of discontinuous geological structures such as lithological contacts and unconformities (Fienen et al. 2008, 2009; Koike et al. 2022) and fracture/fault distribution (Zha et al. 2017). The third point is to incorporate the constraints on release history, except for the non-negativity, for example, the upper limit of concentration as a result of the contaminant solubility. This can be achieved via Gibbs sampling (Michalak 2008), a representative Markov chain Monte Carlo method whose effectiveness for contaminant plume estimation was demonstrated by Takai et al. (2022) using a set of field data.

6 Conclusions

This study developed a geostatistical method to achieve accurate estimation of contaminant plume evolutions via a joint clarification of the contaminant plume and hydraulic transmissivity distributions. One of the novelties of this method is the use of the contaminant concentration data in the consideration of the uncertainty of a contaminant plume distribution originating from an unknown release history. To verify the effectiveness and accuracy of the proposed method, two transmissivity fields with different spatial patterns were prepared: a high-contrast smooth field (case 1) and a highly heterogeneous field (case 2). The main obtained results are summarized as follows.

  1. (1)

    Even though the contaminant plume distribution was unknown, a higher estimation accuracy of the hydraulic transmissivity distribution was achieved through the joint use of the head and concentration data than when using only the head data. For case 2, the estimation accuracy was improved similarly to the previous study (Cirpka and Kitanidis 2000) using tracer data.

  2. (2)

    Using both the head and concentration data, the release peaks were recovered more accurately than when using only the head data. This superiority was due to the improvement of the estimated transmissivity distribution by decreasing the smoothing effect.

  3. (3)

    Large difference between the results through the non-use and use of the concentration data with the head data was highlighted in the predicted contaminant plume evolution. Using only the head data, the velocity of the estimated plume transport was almost half that of the true value. Conversely, using both the head and concentration data, the plume evolutions were sufficiently predicted for both cases with high time-averaged correlation coefficients of 0.97, respectively, between the true and predicted concentrations. Furthermore, the uncertainties of the predicted plume distributions based on the conditional realizations of the initial plume and transmissivity distributions were sufficiently smaller than the magnitudes of the best estimates.

Consequently, the effectiveness and accuracy of the proposed approach were demonstrated even if the initial contaminant plume distribution is uncertain. Any forward modeling methods of the groundwater flow and the contaminant transport can be incorporated into the method. Accurate predictions of the contaminant plume transport are helpful to effectively plan remediation, in particular, when choosing the number and location of pumping-up wells. Because the applicability was demonstrated only via numerical experiments in this study, our next step will be a practical application to actual contaminated fields using multi-depth head and concentration data, targeting three-dimensional space and considering an additional uncertainty such as the uncertainty in the boundary conditions of the flow and transport model.