Introduction

Protecting coastal aquifers requires not only a good understanding of their dynamics, but also a detailed knowledge of the variability of their parameters. Seawater intrusion (SWI) is especially sensitive to the sea-aquifer connection, usually associated with the presence of preferential flow paths. Management of coastal aquifers and design of protection and correction actions requires identification of such paths. These goals demand that modeling takes full advantage of collected data, which can only be achieved in an inverse modeling framework (e.g., Poeter and Hill 1997).

Coastal aquifers would appear to be ideally suited to inversion, in the sense that highly informative and relatively easy to collect data are usually available. Aquifer response to sea level fluctuations (caused by tides and wind or barometric fluctuations) provides a range of aquifer-scale hydraulic data that cannot be matched by inland aquifers. Pollutants usually affect a small portion of inland aquifers, whereas salinity transport may occur along the whole coastline, bringing in information about large-scale properties. Moreover, salinity should be relatively easy to monitor by means of geophysical methods, so that extensive data can be collected at a moderate cost.

The concurrence of need and availability of informative data should lead to a perfunctory application of inverse modeling techniques to coastal aquifers. Paradoxically, the literature reports on fully fledged inversion are extremely scarce. It can be contended that this scarcity reflects conceptual and computational difficulties.

Conceptual difficulties start from the fact that SWI is an essentially three-dimensional (3D) problem and is very sensitive to the heterogeneity in hydraulic conductivity and to the presence of preferential flow paths (e.g., paleochannels, Mulligan et al. 2007). It is also highly sensitive to aquifer bathymetry (Abarca et al. 2007a). Moreover, head measurements are affected by density (Post et al. 2007). Salinity concentration measurements in open wells may not reflect resident aquifer concentrations but flux averaged concentrations. These difficulties are shared by all transport problems, but are particularly severe in SWI, where vertical fluxes are likely to occur within the borehole. Computational difficulties include the need for solving two coupled non-linear equations. Doing so in a 3D domain, while solving the inverse problem, requires a huge computational effort.

These difficulties often lead to questioning the wisdom of inversion. The opposite can also be contended. Modeling difficulties highlight the need for inversion. Ironically, but not surprisingly, literature on the inverse problem for SWI problems is scant. A number of reviews are available for conventional groundwater model inversion (Yeh 1986; Carrera 1987; McLaughlin and Townley 1996; Poeter and Hill 1997; de Marsily et al. 1999; Carrera et al. 2005), but none of them devotes any attention to SWI. The objective of this paper is to fill such a gap by analyzing the conceptual and computational aspects of the inverse problem that are specific to SWI modeling.

Basic inversion concepts

The basic issues of the groundwater inverse problem are fairly well established. A summary of them is included here for the sake of completeness and to define the terms that will be used later.

Problem statement: parameterization

An inverse problem can be stated as a process of finding the set of parameters that leads to an optimal fit between computed and measured values of aquifer state variables. These include both direct state variables such as head or concentration, and derived state variables such as electrical conductivity or flow rates. The term “parameter” is more difficult to define. In the context of inversion, parameters are a set of unknown scalars that allow for the definition, without ambiguity, of all aquifer properties (hydraulic conductivity, storativity, recharge, boundary heads and fluxes, porosity, dispersivity, aquifer geometry) at all points in space and, when applicable, time.

The process of expressing all aquifer properties in terms of parameters is termed parameterization. Many parameterization schemes can be used. The most popular ones are zonation, where parameters are associated with properties within a portion (zone) of the aquifer, or pilot points, where properties are obtained by interpolation between parameter values associated to those points (see McLaughlin and Townley 1996, or Alcolea et al. 2006, for discussions on this issue). Strictly speaking, parameterization is not required for the pure geostatistically based formulations of the inverse problem (e.g., Kitanidis and Vomvoris 1983; Rubin and Dagan 1987; Hernández et al. 2006). However, these formulations would be unaffordably expensive for SWI and will not be discussed here.

Experience dictates that parameterization may be the most difficult conceptual step of inverse modelling. On one hand, it is desirable to keep the number of parameters as small as possible to reduce convergence difficulties and CPU time. On the other hand, it is clear that many parameters may be required for a proper identification of spatial variability patterns. As numerical methods and computer speed advance, there is a clear trend towards densely parameterized models (Alcolea et al. 2006; Hunt et al. 2007).

Objective function

Model calibration is usually performed manually by trial and error. However, the process is tedious and often incomplete (see, e.g., Carrera and Neuman 1986a; Poeter and Hill 1997). Automatic solution overcomes these difficulties. Automatic calibration is normally formulated as the minimization of an objective function. An alternative to this approach is the use of direct methods which consist of substituting state variables, assumed to be known everywhere, into the governing equations and solve these for aquifer properties (e.g. Nelson 1960; Giudici et al. 2000). However, this approach does not appear feasible for coupled non-linear problems and will not be discussed here.

While a number of objective functions are feasible, the vast majority of authors use variations of

$$ F = \mathop \Sigma \limits_i {\lambda _i}{F_i} $$
(1)

where subindex i identifies the type of data (e.g., i = h for head, i = c for concentration, i = p for parameters, etc.), the λ i is the relative weight factor and F i measures the fit between measurements and computations of type i data (including model parameters, that is, i = Y for log-K (hydraulic conductivity), i = r for recharge, etc). A weighted sum of squared errors is usually adopted for F i . For a generic type of data u (state variable or model parameter):

$$ {F_u} = {\left( {{\mathbf{u}}\left( {\mathbf{p}} \right) - {\mathbf{u}}*} \right)^t}{\mathbf{V}}_u^{{\text{ - 1}}}\left( {{\mathbf{u}}\left( {\mathbf{p}} \right) - {\mathbf{u}}*} \right) $$
(2)

where u* is the vector of measurements, u(p) is the vector of computed values of u with parameters p at the same location and times as measurements and \( {\mathbf{V}}_u^{{\text{ - 1}}} \) is the covariance matrix of u residuals, that is (u(p) – u*), which includes both measurement and model errors. This covariance matrix is never known with accuracy. Therefore, following Neuman and Yakowitz (1979), it is common to write it as C u  = τ u V u , where C u is an improved estimate of the covariance matrix and τ u is an unknown scalar. Note that, when u represents a given type of parameter (e.g. log-K), then u(p) is itself the vector of parameters of such type and u* is the vector of their prior estimates.

The rationale behind the objective Eq. (1) is diverse. It was originally proposed by Neuman (1973) with two terms (F h  + λ p F p ), in a multiobjective optimization context, to obtain a good fit of heads, while ensuring plausible parameters (i.e., computed parameters p are close to their prior estimates, p*). Stated like this, F p plays the role of a regularization term that stabilizes the solution (see the following section Uniqueness, stability, identifiability). However, this term appears naturally in statistically based objective functions such as the Bayesian function (Neuman and Yakowitz 1979) or maximum likelihood estimation (Carrera and Neuman 1986a) (see also Emselem and de Marsily 1971). These approaches lead to sum of squared errors objective functions, such as Eq. (2), when residuals are multinormal. They also provide optimal means to estimate weight factors λ i (e.g., Kitanidis and Vomvoris 1983; Medina and Carrera 2003). Therefore, the objective function Eq. (1) is indeed optimal when residuals are multinormal. Moreover, minimization is easy when the dependence between observation and parameters is linear. Both requisites (normality and linearity) may be obtained by appropriate transformation of the variables. For example, hydraulic conductivity is known to be log-normally distributed (Davis 1969). Therefore, the objective function for hydraulic conductivity K should be written in terms of Y = log K. As it turns out, this transformation may also help in improving the quadratic component of F (Dagan 1985; Carrera and Neuman 1986b). A careful analysis of concentration errors prompted Knopman and Voss (1989) to also log-transform concentration.

The nature of terms F i , λ i and V i should be understood in a somewhat lax manner (the effect of varying λ i is shown in Fig. 1). Several terms may be used for data of the same type, but which the modeller may wish to treat separately. For example, Rötting et al. (2006) or Alcolea et al. (2007, 2009) separate terms representing natural head, typically independent at different wells, and head responses to pumping tests or river or sea level fluctuations, which are often autocorrelated in time, thus leading to a non-diagonal V h (e.g., Carrera and Neuman 1986c). By the same token, a careful analysis of model errors is needed to properly define the error structure, which may be achieved either formally (Refsgaard et al. 2006) or subjectively (Sanz and Voss 2006). In short, there is a lot of room in the objective function for modellers to introduce their conceptual views and subjective judgement.

Fig. 1
figure 1

Tradeoffs between F h (model fit) and F p (parameter plausibility) when varying the relative weight parameter, shown along with the transmissivity fields obtained by Alcolea et al (2006)

Minimization algorithm

Minimizing F (Eq. 1) requires an iterative process, unless F is exactly quadratic, which is rarely the case. Numerous minimization methods are available. Discrete optimization methods, which solely rely on the computation of F are the simplest to implement. Many of them are designed to find the global minimum. Examples include simulated annealing, genetic algorithms (e.g., Rao et al. 2003; Tsai et al. 2003), or the shuffled complex evolution method (Duan et al. 1992). They have been used to solve optimization problems in coastal aquifers (e.g., Benhachmi et al. 2003; Katsifarakis and Petala 2006; Yeh and Bray 2006; He et al. 2007). However, the cost of discrete optimization methods grows exponentially with the number of parameters. Moreover, discrete non-uniqueness is much less of an issue than often purported. Therefore, the focus will be set here on continuous methods. Cooley (1985) showed that the most efficient of these are Gauss-Newton methods (Marquardt method being the favourite). They are used routinely and will be the only ones discussed here. The algorithm proceeds as follows (see Fig. 2):

  1. Step 1.

    Initialization. Set k = 0 and define initial parameters p 0. Solve the direct problem to compute h(p 0) and other derived state variables. Compute F = F(p 0).

  2. Step 2.

    Compute the state variables, u k, Jacobian, J k = u k/ p k, first order approximation to Hessian, \( {{\mathbf{H}}^{\text{k}}} = {{\mathbf{J}}^{\text{t}}}{\mathbf{V}}_u^{{\text{ - 1}}}{\mathbf{J}} + {\lambda _p}{\mathbf{V}}_p^{{\text{ - 1}}} \) and gradient, g k = ∂F k/ p k.

  3. Step 3.

    Compute updating direction d k from H k d k = -2g k.

  4. Step 4.

    Update parameters, \( {{\mathbf{p}}^{{\text{k + 1}}}} = {{\mathbf{p}}^{\text{k}}} + {{\mathbf{d}}^{\text{k}}} \)

  5. Step 5.

    Solve the direct problem for p k + 1 and compute F k + 1(p k + 1).

  6. Step 6.

    If converged (small \( \left\| {{{\mathbf{g}}^{\text{k}}}} \right\| \), small \( \left\| {{{\mathbf{d}}^{\text{k}}}} \right\| \), small \( \left| {{F^{\text{k}}} - {F^{{\text{k + 1}}}}} \right| \), etc.), stop. If not, if \( {F^{{\text{k + 1}}}} < {F^{\text{k}}} \), set \( k = k + 1 \) and go to step 2, otherwise if \( {F^{{\text{k + 1}}}} > {F^{\text{k}}} \), either add a positive matrix to H k (and return to step 3), or perform line search to find α that minimizes F k + 1 (p k + α d k).

Fig. 2
figure 2

Schematic description of a typical optimization procedure for inversion

There are numerous variations for the basic algorithm (see, e.g., Cooley 1985; Doherty 2002; Medina and Carrera 2003), but they will not be examined here.

Sensitivity, uncertainty and worth of data

In a broad sense, sensitivity refers to the dependence of model output on model input. As such, it can be evaluated globally to quantify the overall dependence of model outputs on input parameters (see, e.g., Saltelli et al. 2005). However, sensitivity is computed locally in the context of inverse modelling. That is, the sensitivity of a state variable u m with respect to parameter p j simply expresses the rate of change of u m per unit change in p j at the current value of all parameters. That is:

$$ {\left( {{J_u}} \right)_{mj}} = \frac{{\partial {u_m}}}{{\partial {p_j}}} $$
(3)

This definition is not very useful for qualitative analysis, because (J u ) mj depends on the relative magnitude of u m and p j . For example, the sensitivity of a concentration expressed in mg/l is 1,000 times larger than the corresponding sensitivity to the same concentration expressed in g/l. It is clear that sensitivities need to be scaled. The most natural way to scale sensitivities in an inversion context is to decompose \( {\mathbf{V}}_u^{ - 1} \) as \( {\mathbf{W}}_u^t{{\mathbf{W}}_u} \) and \( {\mathbf{V}}_p^{ - 1} \) as \( {\mathbf{W}}_p^t{{\mathbf{W}}_p} \), so that the scaled sensitivity matrix would become:

$$ {{\mathbf{S}}_u} = {{\mathbf{W}}_u}{{\mathbf{J}}_u}\quad {\text{or}}\quad {\mathbf{S}}{{\mathbf{S}}_u} = {{\mathbf{W}}_u}{{\mathbf{J}}_u}{{\mathbf{W}}_p} $$
(4)

In the case of diagonal V u and V p , the components of SS u are:

$$ {\text{s}}{{\text{s}}_{mj}} = \frac{{{\sigma _{pj}}}}{{{\sigma _{um}}}}\frac{{\partial {u_m}}}{{\partial {p_j}}} $$
(5)

where σ pj is the standard deviation of the jth parameter and σ um is the standard deviation of the mth residual of type u measurements (\( \sigma _{pj}^2 \) and \( \sigma _{um}^2 \) are diagonal terms of V p and V u , respectively). Given the uncertainty on V u and V p (recall the need to find scaling parameter τ or λ), this may still not be sufficient to properly assess the worth of different types of data. This is why Knopman and Voss (1989) substitute σ um by a subjective magnitude and σ pj by p j (this latter choice is equivalent to assume p j log-normally distributed with σ ln pj  = 1).

Analyzing sensitivities allows one to understand how parameters affect results and to gain insight into model behaviour. Sensitivity is also used to evaluate uncertainty. This can be done either qualitatively or quantitatively. If ss mj is large, small variations in p j should lead to large variations in u m . If the state variable u m has been measured, then the value of p j is heavily constrained by the measurement. This is quantified by the covariance matrix of estimated parameters or by Fisher’s information matrix. The latter expresses the information that data contain about parameters. It can be approximated by:

$$ {{\mathbf{I}}_F} = \mathop \Sigma \limits_i {\lambda _i}{\mathbf{J}}_i^t{\mathbf{V}}_i^{ - 1}{{\mathbf{J}}_i} = \mathop \Sigma \limits_i {\lambda _i}{\mathbf{S}}_i^t{{\mathbf{S}}_i} $$
(6)

\( {\mathbf{I}}_F^{{\text{ - 1}}} \) gives a lower bound of the a posteriori covariance matrix, Σ p . Σ p is expected to be much smaller than the a priori covariance matrix V p because it includes all the information contained in the observations. Several comments should be made about covariance and Fisher’s matrices. First, the covariance matrix of model parameters quantifies the uncertainty of estimated parameters (as measured by their variances and correlation coefficients). It is often stated that high correlations are undesirable. Actually (see Fig. 3), it is the opposite. Uncertainty on a parameter is quantified by its variance (or standard deviation, Fig. 3). A high correlation with another parameter means that the two parameters are dependent on each other. The correct reading of a high correlation is that one knows something about the two parameters (e.g., their ratio, if log-transformed parameters are used in Fig. 3), although not about each one separately. Since nothing is known when the parameters are uncorrelated, one is much better off with a high than with a low correlation.

Fig. 3
figure 3

Schematic description of the stability problem. Narrow and long valleys of the objective function in the parameter space lead to a virtually impossible convergence. The user notices that different initial sets of model parameters (squares) will lead to different parameter estimates (circles), with similar values of the objective function. This effect can be characterized with the covariance matrix, which displays large differences between eigenvalues. The uncertainty ellipse (depicted for the central parameter estimate) is oriented along the eigenvectors and its axes are proportional to the square root of the eigenvalues. In the case of two parameters this implies large uncertainties (still too small) for the two parameters and a strong correlation

The second remark to be made is that one needs a careful assessment of relative weights (λ i ) to assess properly both uncertainty and information (see statistical approaches by Kitanidis and Vomvoris 1983 or Carrera and Neuman 1986a). In practice, at least in the authors’ experience, modellers tend to be optimistic about measurement and model errors (i.e., tend to assign low V i ). Only after preliminary inversion runs does one become fully aware of model limitations and assigns realistic V i matrices (this is automatically done by the above statistical approaches). Avoiding this step will lead to improper weighting of different types of data.

A third remark is that the covariance thus computed is too optimistic (Fig. 3). It must be viewed as a lower bound of uncertainty (it is exactly the lower bound, if model output is a linear function of model parameters). An evaluation of the degree of optimism was carried out by Carrera and Glorioso (1991), but they showed that it is very problem dependent. Nonlinear confidence intervals can also be computed (e.g., Vecchia and Cooley 1987; Hill 1998), but they are out of the scope of this section.

A final remark should be made regarding information. As quantified by Fisher’s matrix, information is additive—recall Eq. (6). In fact, the information contained by data can be quantified by different metrics of I F (e.g., the determinant, the sum of diagonal terms, etc., see Carrera and Neuman 1986c). A particularly popular metric about information on model parameters is the cumulative scaled sensitivity (CSS; Knopman and Voss 1989), which is obtained from the diagonal terms of the information matrix (usually divided by the number of measurements) and square rooted,

$$ {\text{cs}}{{\text{s}}_j} = \sqrt {\frac{1}{N}\sum\limits_{m = 1}^N {ss_{mj}^2} } $$
(7)

where N is the total number of observations. In finely parameterized models, css j can be mapped to show which parameters can be estimated with a given observation network and which cannot.

Cumulative scaled sensitivities can be used to assess the information content of all measurements about each parameter. However, to evaluate which measurements provide most information about all parameters, the contribution of each measurement to the information matrix should be used. This is obtained by simply adding the diagonal terms of such contribution (L.J. Slooten, IDAEA-CSIC, unpublished data, 2009). That is,

$$ {I_m} = \sum\limits_j^{{N_p}} {{\text{ss}}_{mj}^2} $$
(8)

where N p is the number of parameters and I m should be read as the information contained in the mth measurement about all parameters. I m should be integrated in time in transient problems. I m can be computed for every node and plotted to identify the areas where measurements are most informative.

Uniqueness, stability, identifiability

The inverse problem is often said to be ill posed because its solution may be non-unique or unstable. Non-identifiability occurs when different parameter sets lead to the same solution of the direct problem. Non-uniqueness occurs when different parameter sets satisfy the minimum condition of the objective Eq. (1). Instability occurs when small changes in the observations lead to large changes in the estimated parameters. Carrera and Neuman (1986b) discuss extensively these concepts and show that they are closely related. They argue that the most frequent problem is instability. However its effect (Fig. 3) is identical to the ones of non-identifiability or non-uniqueness: the solution depends on the initial parameters. The point to stress here is that the presence of this kind of problem can be detected and fixed.

Detection can be achieved by analyzing the covariance matrix of estimated parameters—or the information matrix, I F (Eq. 6). When the problem is restricted to two parameters, instability (or poor identifiability) is associated with a very high correlation, which is why high correlations are viewed as negative. If more than two parameters are involved, poor identifiability is linked to high eigenvalues of the covariance matrix (low eigenvalues of the information matrix). The corresponding eigenvector defines the combination of parameters that cannot be identified (see Fig. 3). Details of the procedure are described by Carrera and Neuman (1986c) and Medina and Carrera (1996).

The impact of these problems can be reduced by several means. The traditional option is regularization, which consists of including F p terms in the objective function. These terms tend to smooth the solution and keep it close to the prior estimates. The risk is over-smoothing, which may cause a loss of resolution capacity (recall Fig. 1, a too large λ p led to a solution without channels). Actually, it is sufficient to increase the weight of prior estimates only for the parameters associated to large eigenvalues. A second option is to reduce the number of parameters to be estimated. This can be done using subjective judgment, possibly aided by a sensitivity analysis (e.g., fix the values of the most uncertain parameters). Formal techniques have also been developed such as single value decomposition (Chang and Yeh 1976; Hill and Østerby 2003), hybrid parameterization (Tonkin and Doherty 2005) or model reduction (Vermeulen et al. 2006). A third option is to increase the number and types of data or to optimize the observation scheme, by designing it to minimize parameter uncertainty and/or to increase the ability of data to discriminate among alternative models (Knopman and Voss 1989; Usunoff et al. 1992), as discussed earlier.

Conceptual aspects

Model simplifications

The methodology outlined in the previous section has never been reported for a full 3D SWI problem in a strict sense (but see Dausman et al. 2009). Probably the closest to a full calibration is the case reported by Bauer et al. (2006a), who used PEST to solve the inverse problem in a 2D vertical cross section at the Okavango Delta, Botswana (where water density is controlled by salinity, but this is not really a SWI problem!). Excessive computer time prevented them from estimating more than four parameters, not to mention going on to calibrate the full 3D problem. Iribar et al. (1997) used head, chloride concentration and flow rate data to estimate 40 transmissivity values. Abarca et al. (2006) and Vázquez-Suñé et al. (2006) used some 100 transmissivity values, plus storativity values, boundary fluxes, porosity, dispersivity and time evolution data of river recharge at the Llobregat Delta (Spain). However, they had to neglect density effects, which they justified because of the small aquifer thickness and elevation gradients. Thus, they could only settle on a two layer model. Bray et al. (2007) adopted an intermediate solution. They assumed hydraulic conductivity to be known from abundant point data interpolated by kriging and they calibrated dispersivity against concentration data. Leaving aside the question of whether point measurements of hydraulic conductivity are appropriate (Barlebo et al. 2004, among others, argue the opposite), it is worth noticing that only two parameters were estimated.

Automatic calibration is often disregarded because of its excessive CPU time cost (Bauer et al. 2006a; Werner and Gallagher 2006). Sometimes, manual calibration is made in conjunction with a formal sensitivity analysis (Person et al. 1998; Yakirevich et al. 1998). For example, Momii et al. (2005) used a sharp interface model to calibrate manually head, head fluctuations caused by tides and concentration data on a 2D plane model.

It is worth mentioning the work of Barazzuoli et al. (2008), who calibrated a 3D model using steady-state head to find hydraulic conductivity in each of the four layers of the model and used transient head data to find transient fluxes. Karahanoglu and Doyuran (2003) also calibrated a 2D vertical section in sequential phases (first steady state, then transient).

These efforts are clearly suboptimal. Sequential calibration does not take full advantage of the worth of information contained in the data. Sequential calibration efforts are to be commended as practical, but a lot of information is lost in the process. For example, if hydraulic conductivity is derived from steady-state head data, the information contained in transient head or concentration data is lost. Moreover, each of the sequential problems is more likely to be uncertain. Therefore, this type of approach must be viewed as a struggle by modellers to cope with the computational and conceptual difficulties discussed in the following.

Worth of data

The Fisher’s information matrix (Eq. 6) shows that the worth of an observation in an inverse problem context is determined by two main factors: the sensitivity of the (simulated) observations to all the different parameters, and the variance of the associated measurement and model errors. Measurements of different observation types tend to inform about different parameters, and to have different sources of error. This has led several authors to investigate what measurement types contain most information, and what measurement locations are optimal.

Flow related measurements (e.g., head) do not contain information about transport parameters in constant density models but they do in variable density ones. Shoemaker (2004) studied the capacity of observations of different types to constrain model parameters by computing scaled sensitivities (Eq. 5) and parameter correlations when using different data sets and different parameters. He found that using only head observations is not enough to identify flow and transport parameters. By combining head with salinity and flow rate observations, the parameters became much better constrained.

Sanz and Voss (2006) applied an analysis of the a-posteriori parameter covariance matrix (recall the previous section Uniqueness, stability, identifiability), and the correlation matrix to the Henry problem (Henry 1964). The solution depends on two dimensionless numbers, each one a function of the classical flow and transport parameters (permeability, diffusion coefficient, freshwater inflow rate, etc.). This dependence can be found from an eigenanalysis of the covariance matrix (see Medina and Carrera 1996, for the procedure) or from a qualitative analysis of the problem. Sanz and Voss (2006) found that head measurements are most informative deep inland, while concentration measurements are most informative around the toe of the seawater wedge. Their work also illustrates the importance of using an appropriate error structure for state variables and relative weighting of different types of data.

As mentioned at the beginning of this section, the worth of data is increased not only by seeking informative measurements, but also by minimizing the variance of measurement and model errors. Regarding the latter, careful scrutiny of data and large residuals may help in identifying outliers, a frequent case of trouble during automatic inversion, or deficiencies in the conceptual model. Error filtering and time averaging is specially recommended when long time data records are available. This eliminates high frequency errors and favors Gaussianity.

Use of head data

Using head data for calibration of density-dependent flow models is much more delicate than for constant density models (Post et al. 2007). For one thing, head is not a state variable in density dependent flow. SWI models are solved in terms of either pressure or equivalent freshwater head. Yet, head data are often gathered by measuring water elevation in a well. This is only informative if density along the piezometer water column is known (Fig. 4). To address this difficulty, one may either measure directly pressure at depth (e.g., Alcolea et al. 2009), which may imply a slight loss of accuracy, or monitor both water elevations and average salinity, which is costly.

Fig. 4
figure 4

Schematic description of potential problems with head measurements in the seawater intrusion problem displayed in (a). Measurements (b) depend on whether the piezometer is full of (A) saltwater, (B) freshwater, or (C) open

The situation is much more complex if the borehole is open. On the one hand, measured head is an average along the vertical weighted by the hydraulic conductivity. While this problem may affect all types of aquifers, it is relatively easy to deal with in constant density flow models (see, e.g., Martínez-Landa and Carrera 2006). On the other hand, a vertical flux should be expected as a result of the vertical pressure gradient created by the influence of the sea. This effect can be explicitly included in the inversion process. Two alternatives are available. First, the borehole can be explicitly modeled by using a string of one-dimensional elements connected to aquifer nodes. The conductances of these connections depend on the hydraulic conductivity of the node. Density-dependent flow and transport is then solved in the expanded grid, which includes both aquifer and borehole nodes. This option may be expensive because the short-circuit effect of the borehole causes large head and concentration gradients and, if tides are simulated, fast fluctuations. Therefore, this option is only recommended for highly detailed small-scale models. The second alternative consists of assuming that aquifer head and concentration will not be significantly affected by the borehole. Therefore, the model is solved without explicitly simulating the short-circuit effect. This effect needs to be taken into account only for computing head (or pressure) to be compared to measurements.

An additional source of uncertainty may be caused by sea level fluctuations. As discussed later in section On the use of tidal data, high frequency fluctuations (e.g., tides) will be dampened close to the coast in free aquifers and should not be a problem. However, in confined aquifers, the tidal signal may affect measurements deep inland and would cause an additional source of noise, if not monitored properly. Addressing this issue requires averaging head over a long period, which is costly, but may be useful.

In summary, head errors may be large. As described earlier, addressing them in detail may be costly. If the measurement process is not modeled explicitly, errors should be acknowledged in the head covariance matrix, \( {\mathbf{V}}_h^{ - 1} \) (recall the previous section Objective function). It must be added, that these errors, especially the ones caused by salinity within the borehole, are likely to be highly correlated, which requires a non-diagonal \( {\mathbf{V}}_h^{ - 1} \). A simple way to account for auto-correlated noise is described in detail by Neuman and Carrera (1985).

Use of concentration data

The use of concentration data is not as simple as it might look. The most immediate difficulty is caused by saltwater circulation within the well (Fig. 4b, case C). Circulation causes measured salinity profiles to be much sharper than the actual width of the mixing zone (Tellam et al. 1986). Using pore water samples, as Tellam et al. (1986) did, can only be justified for a research project. Alternatives such as profiles deduced from induction in closed PVC wells (Lebbe 1999) should be explored further. It is clear, however, that (1) vertical salinity profiles should be used with care, and (2) the issue needs to be studied in much more detail (see, e.g., Shalev et al. 2009).

Concentration at pumping wells also needs close scrutiny. Ideally, mixing at the well can be represented in models, so that measured concentrations are comparable with computed concentrations. In practice, however, model simplifications may make this comparison non-trivial, e.g., when using a sharp interface model (as discussed in Mantoglou 2003).

A third source of concern is the difference between resident and flow concentration. Here, again, the issue is related to the type of model adopted. In general, measured concentration will be close to flowing concentration over the open portion of a pumping well screen. If this portion is long, the difference with resident concentration can be quite large. In periods of intrusion, flowing concentration will be larger than resident concentration. The opposite should occur during periods of retreat. As transport models are usually solved in terms of resident concentration, a post-processing is required. Only models based on non-local transport formulations represent the difference between resident and flow concentration explicitly (see discussion by Willmann et al. 2008). To the authors' knowledge, there has not been any attempt to use these kind of models for SWI problems. These problems can be addressed by explicitly modelling the measurement borehole (as described previously); however, the solution is numerically difficult and computationally costly.

Regarding the worth of concentration data, Fig. 5 shows that the concentration field is heavily dependent on hydraulic conductivity (sharp drop on areas of low transmissivity, saltwater wedge lying below high permeability zones, etc). The problem is more severe in aquifers affected by SWI, which salinize primarily along channels well connected to the sea (e.g., Iribar et al. 1997).

Fig. 5
figure 5

Qualitative assesment of the impact of heterogeneity on steady state saltwater intrusion. The red lines represent seawater mixing fractions ranging from 0.1 (top) to 0.9 (bottom). Notice that the mixing zone flattens and narrows down below high permeability zones (Abarca et al. 2007b)

Nevertheless, some studies conclude that concentrations are not very informative about hydraulic parameters (e.g., Bray et al. 2007), whereas others conclude that the inclusion of concentration data significantly improves parameter estimation (Shoemaker 2004). A partial explanation may be that steady-state unpumped conditions such as the ones shown in Fig. 5, may not be comparable to SWI conditions observed during pumping. When pumping drives SWI, hydraulic gradients may override buoyancy forces, so that transport parameters become less important to explain concentration. Still, buoyancy forces may dominate on portions of the aquifer (see, e.g., Pool and Carrera 2009). In short, while it is clear that concentration data should be used for calibration whenever possible, it appears clear that the issue deserves further analysis.

Geophysical methods

In view of the difficulties associated with concentration data, it is not surprising that electrical conductivity (EC) measurements, typically derived from geophysics, have been extensively used. In fact, the whole suite of electro-magnetic methods have been used in model calibration attempts: electrical resistance tomography (ERT; Bauer et al. 2006b, Comte and Banton 2007), short and long offset transient-electromagnetic-measurements (SHOTEM and LOTEM) (Kafri et al. 2007), or time-domain electromagnetic methods (TDEM) (Yechieli et al. 2001). By providing extensive coverage, electrical conductivity measurements should allow a rather complete, albeit often blurry, picture of the interface shape. As already discussed, the interface shape and its time evolution should be sensitive to heterogeneity (Fig. 5) and, especially, preferential flow paths connecting the aquifer to the coast (Mulligan et al. 2007).

Electrical geophysics is not free of problems. Resistivity maps cannot be compared directly to water salinity, but require a calibration of their own (Comte and Banton 2007). This does not preclude qualitative use, but hinders direct use for inversion. Moreover, connate saltwater at low permeability areas may hide deeper resistivity measurements. Ironically, this would hinder qualitative use of resistivity maps, but could be overcome by joint inversion of SWI and geoelectric model parameters. In summary, EC mapping is an extremely attractive option, but should be made in connection with flow and transport inversion.

On the use of tidal data

Sea level fluctuations such as astronomical or wind driven tides, represent a large-scale stress on the system. As such, they yield information about hydraulic parameters. As pointed out earlier, taking advantage of these data should improve parameter identifiability and inverse problem stability. More importantly, Knudby and Carrera (2006) showed transport connectivity, which controls how fast SWI will contaminate an aquifer, correlates best with hydraulic diffusivity (T/S, T being transmissivity and S the storage coefficient). In fact, Carr and Vanderkamp (1969) showed that the head response in homogeneous aquifers depends solely on the characteristic length:

$$ L = \sqrt {\frac{{TP}}{{\pi S}}} $$
(9)

where P is the period of fluctuation. Equation (9) is not applicable to heterogeneous aquifers, but the sole dependence on diffusivity remains true. That is, the response to tides is not sufficient to identify T (or K) and S (or specific storage S s), but needs to be complemented by other data such as concentration or hydraulic tests (Alcolea et al. 2007, 2009). Another advantage of tidal response is that it is cheap to measure and to simulate because equivalent freshwater head response is virtually insensitive to density variations (Ataie-Ashtiani et al. 2001; L.J. Slooten, IDAEA-CSIC, unpublished data, 2009). Therefore, computations required for this type of data can be made with a constant density flow model.

Tidal response can provide large-scale information. Characteristic length, L can be quite large for confined aquifers. For example, with a tidal period of half a day, L will equal 1,260 m for a confined aquifer (S = 10–4) of 1,000 m2/day transmissivity. Obviously, this distance is much shorter for unconfined aquifers. Equation (9) is also valid when several fluctuations are superimposed. Typical tides are dominated by a half day period, but longer components are also present. In fact, wind or barometric pressure fluctuations may contain modes with periods of several days. This implies that L (Eq. 9) can vary quite widely, so that aquifer fluctuations driven by sea level fluctuations may penetrate significantly inland even in unconfined aquifers.

A sensitivity analysis for tidal response data, aimed at identifying optimal observation locations (Fig. 6) using the methodology described in the previous section Sensitivity, uncertainty and worth of data was performed (L.J. Slooten, IDAEA-CSIC, unpublished data, 2009). They found that if the aquifer is treated as homogeneous, maximum information is obtained at a distance L from the coast. However, if heterogeneity is acknowledged, maximum information is contained by heads measured at a distance around L/2 from the coast. Yet, assuming a dense observation network, the parameters that can be best estimated are those right at the coast. This finding supports the earlier assertion about the identification of connectivity. Given that connectivity to the sea is important for coastal aquifer management, it is clear that the full advantage of aquifer response to sea level fluctuations should be taken whenever possible.

Fig. 6
figure 6

Information in tidal response data on hydraulic conductivity parameters. a Overall information on model parameters (Eq. 6) per location. The optimal measurement location is closer to the coast in a finely parameterized model than in a homogeneous model. In a finely parameterized model (b), the parameters that can be best estimated (i.e., high composite scaled sensitivity) are those close to the coast. Further inland, composite scaled sensitivity decreases

Initial conditions: aquifer bathymetry

Specifying initial conditions is required for simulating any transient problem. When the history of pumping is well known, the best option usually consists of simulating such history while assuming that the aquifer is at an initial steady state. In failing to do so, the model will generate spurious results until it accommodates the instabilities introduced by the specified, non-equilibrium, initial condition (e.g., Werner and Gallagher 2006; Doherty 2008).

It is generally believed that the initial steady state must be the result of a sufficiently long simulation. As it turns out, the nonlinear density dependent flow and transport equations can also be solved under steady-state conditions provided that a sufficiently close initial guess is available. Since such an initial guess is not easy to come up with, most codes do not provide the steady state option. However, in an inverse modelling context, a good initial guess for steady state may be the solution of the steady state resulting from the previous inverse problem iteration.

A problem with starting from a steady state is that it may be unrealistic: the time needed to reach the steady state can be longer than the timescales on which changes in external forcing occur (Feseker 2007). The problem may occur in both directions (i.e., initial salinities larger than suggested by a steady state simulation, and vice versa). On the one hand, connate saltwater is likely to be found in Holocene aquifers poorly connected to the sea (Gámez et al. 2009). It is also likely to be present in low permeability areas (Custodio et al. 1971; Bridger and Allen 2006). On the other hand, sea level had been rising during the Holocene. Therefore, low permeability zones may not have yet been reached by salt water, although they would under a steady state condition with current sea levels. In this regard, one should bear in mind that the last glacial maximum occurred “only” some 15,000 years ago. Therefore, it is very likely that the initial salinities do not reflect current sea level in poorly connected areas. This problem can be identified by performing two long-term simulations: one with initially salinized conditions, and one with initial freshwater conditions. If they lead to the same solution, then the problem can be ignored and initial steady-state conditions can be adopted.

Difficulties with initial conditions lead to the development of an alternative approach discussed by Doherty (2008). In this work, the initial conditions are controlled by estimation parameters: “spreading parameters” that describe the width of the mixing zone around the interface, and “elevation parameters” that define the initial height of the interface above the aquifer bottom.

The issue of initial conditions also makes apparent the need for a careful assessment of aquifer elevations and connection to the sea. As illustrated in Fig. 7, initial conditions may be highly sensitive to the elevation of the discharge point (Gámez et al. 2009). Moreover, valleys of the aquifer bottom should coincide with regions of maximum inland penetration of seawater, even under steady-state conditions (Abarca et al. 2007a). Things can be worsened if these valleys coincide with high-permeability regions, which should be expected if they correspond to paleochannels deposited during periods of low sea level. In such cases, deep portions will represent preferential flow initial salinity makes them perfect candidates for fast SWI. The problem is especially severe in karstic regions, where flow along high-permeability channels may be turbulent, so that Darcy’s law is not valid.

Fig. 7
figure 7

Sensitivity of initial concentrations to the elevation of natural discharge outlet (z 0 ). a The confined aquifer is initially salinized if the freshwater head at the outlet (h f0  = 40z 0 ) is lower than the inland head (h u ). b Otherwise, the confined aquifer will contain freshwater and inland wells will take much longer to be polluted by SWI. Improper accounting of this elevation or, in general, initial conditions will lead to an unreliable inversion

The previous discussion points to the importance of characterizing aquifer elevation and connection to the sea. The most immediate option is to extend the parametrization of Doherty (2008) to the aquifer bottom and sea-aquifer connection. Parameters controlling aquifer bottom and sea connection can then be estimated during calibration. The fact that no efforts along this direction have been published in the scientific literature may reflect that either (1) the resulting inversion is too complex (in 3D models, the grid would have to be updated during calibration), (2) the problem is only truly relevant for unusually high variations in aquifer elevation, or (3) modelers are overcome by other difficulties. In any case, it is clear that the issue requires further analysis.

Computational aspects

SWI problems inversion is computationally costly. High cost reflects mainly the need for the sensitivity matrix (3) of Gauss-Newton methods. In the following, a summary of the methods to compute J u and some possible improvements of the computational performance are discussed.

Computation of sensitivities

Three methods can be used to compute sensitivities: the adjoint state method (Jacquard and Jain 1965; Townley and Wilson 1985), the influence coefficient method (Becker and Yeh 1972) and the sensitivity equation method (Distefano and Rath 1975). The adjoint state method is not well suited for SWI problems because this method is most appropriate for linear problems. It can be used for non-linear problems, but it is no longer convenient, especially for transient ones. The influence coefficient method, also known as incremental ratio or parameter perturbation method, approximates the sensitivity matrix using a finite difference scheme (i.e., ratio of change in computed state variables per unit change in each component of the parameter set). This approach requires the evaluation of the direct model at least N P +1 times (one time with the original parameter set and N P times corresponding to each parameter perturbation). Therefore, the resulting cost is high (see Shoemaker (2004) for an example of the increase in the calibration time). Moreover, an adequate choice of the magnitude of each parameter perturbation is required to obtain a good approximation of the sensitivity matrix. Inaccuracies in the sensitivity matrix may affect the computation of the gradient of the objective function, covariance matrices and the determination of the correlation between parameters (Hill and Østerby 2003). Precision in the computation of the sensitivity matrix can be enhanced using a higher-order finite difference scheme at the expense of an increase in CPU time. In spite of these disadvantages, the influence matrix method is the most widely used method in seawater intrusion applications because of its simplicity and the availability of generic calibration tools such as UCODE (Poeter et al. 2005) or PEST (Doherty 2002). These facilitate solving the inverse problem with conventional simulation codes. Also Van Meir and Lebbe (2005) used the parameter perturbation method to calibrate an axi-symmetric density dependent flow model.

The sensitivity equation method computes the sensitivity matrix by differentiating the direct problem equations, which leads to

$$ \begin{array}{*{20}{c}} {\left( {\begin{array}{*{20}{c}} {\frac{{\partial {{\mathbf{f}}_F}}}{{\partial {\mathbf{h}}}}} \hfill & {\frac{{\partial {{\mathbf{f}}_F}}}{{\partial {\mathbf{c}}}}} \hfill \\ {\frac{{\partial {{\mathbf{f}}_T}}}{{\partial {\mathbf{h}}}}} \hfill & {\frac{{\partial {{\mathbf{f}}_T}}}{{\partial {\mathbf{c}}}}} \hfill \\ \end{array} } \right)} \hfill & {\left( {\begin{array}{*{20}{c}} {\frac{{\partial {\mathbf{h}}}}{{\partial {\mathbf{p}}}}} \hfill \\ {\frac{{\partial {\mathbf{c}}}}{{\partial {\mathbf{p}}}}} \hfill \\ \end{array} } \right) = } \hfill & {\left( {\begin{array}{*{20}{c}} {\frac{{\partial {{\mathbf{f}}_F}}}{{\partial {\mathbf{p}}}}} \hfill \\ {\frac{{\partial {{\mathbf{f}}_T}}}{{\partial {\mathbf{p}}}}} \hfill \\ \end{array} } \right)} \hfill \\ \end{array} $$
(10)

where f F and f T are the (discretized) flow and solute transport equations, respectively. Solving this set of linear systems yields the sensitivities. Evaluating the coefficient matrix and right hand side in Eq. (10) requires tedious programming and verification, which has deterred modelers from implementing it. An alternative to this problem is to use autodifferentiation tools (Rall 1981; Griewank 2000) to generate the necessary code automatically. Rath et al. (2006) used the code SHEMAT (Clauser 2003) to do so while calibrating coupled flow and heat transport. However, autodifferentiation requires the original code to follow some coding conventions (e.g., adapt the code to Fortran 77 standard, avoid implicit loops) which can make the process as arduous as the actual implementation of the derivatives. Furthermore, if not correctly implemented, it can worsen the performance of the original code.

Still, the exact computation of the sensitivity matrix yields benefits in the calibration performance. The computational advantages of the sensitivity equation method can be seen analyzing the cost of calibrating a given model. The costs of a single iteration of the inverse problem for the sensitivity equation and influence coefficient methods are

$$ \begin{array}{*{20}{c}} {{C_{{\text{IC}}}} = \left( {a{N_p} + 1} \right) \times {C_{{\text{DP}}}}} \\ {{C_{{\text{SE}}}} = \left( {{C_{{\text{DP}}}} + {C_{{\text{SM}}}}} \right) + {N_p} \times {C_{{\text{LSE}}}}} \\ \end{array} $$
(11)

where a is an integer depending on the finite difference scheme used to approximate the derivatives (1 for backward and forward differences and 2 for central differences), C IC is the cost of the influence coefficient method, C SE the cost of the sensitivity equation method, C DP the cost of solving the direct problem, C SM the cost of computing the sensitivity matrix and C LSE is the cost of solving a linear system of equations of the form of Eq. (10). Equation (11) shows that the calibration cost grows proportionally with the number of estimated parameters with a slope equal to aC DP for the influence coefficient method. The sensitivity equation method has an initial overhead because of the computation of the derivatives but the growth rate of the cost is only C LSE (<< aC DP).

A comparison of the performance of the influence coefficient and sensitivity equation methods is shown in Fig. 8. Results correspond to the calibration of a Henry problem but with a random Gaussian transmissivity field. Calibration was done using the pilot point method for an increasing number of parameters. As can be seen, the cost of successful iterations was dramatically reduced with the direct derivation method. However, the difference between the overall calibration cost was not as different as suggested by Eq. (11). In the implementation adopted here, the influence coefficient method detects failed iterations, which do not require computation of the sensitivities, after the first simulation of that iteration, thus avoiding the need for extra computations. The sensitivity equation method, instead, computes sensitivities in all iterations.

Fig. 8
figure 8

CPU time consumed for calibrating a seawater intrusion mode (the Henry problem with a heterogeneous transmissivity field) using the influence coefficient and sensitivity equation methods for an increasing number of parameters. a The time used for a single iteration on a fine mesh. b The time of the whole iterative process. The two examples are for different numbers of nodes and timesteps but show a similar trend

Areas of improvement

The results of the previous example point out that there is a lot of room for improvement related to computational performance. Inverse modeling codes may profit from the repeatedly simulated problems with similar parameters during the calibration process. Stored information on the state variables from previous calibration iterations can be used as initial guess for the resolution of the non-linear direct problem (Galarza et al. 1999), which can reduce its cost.

Code parallelization can improve the performance of the inversion process. Parallelization can be done at different levels. Adequate division and numbering of the model mesh result in a direct problem sparse matrix suitable for parallel linear solvers (Canot et al. 2006). This process is straightforward in finite difference and regular finite element meshes, which may not be appropriate for the geometry of real aquifers. Efficiency relies in the storage scheme and the linear solver. Parallelization can be generalized to all the computations in the problem to improve the efficiency. It has been successfully applied to CO2 sequestration problems (Lu and Lichtner 2007), although the technical resources may not be commonly affordable. Regarding the inverse problem, parameter perturbation methods can benefit largely from parallelization. If the (N p+1) direct problem computations needed for each inverse problem iteration are distributed among N p+1 processors, the actual time required for computing the sensitivity matrix is comparable to that of a direct simulation. This functionality is included in the UCODE and PEST suites. In the same manner, genetic algorithms can benefit from parallel processing, as shown by Bray and Yeh (2008).

Conclusions

The discussion presented here points out that the full inversion formalism has not yet been applied to seawater intrusion (but see Dausman et al. 2009). Automatic calibration efforts reported so far are based on numerous simplifications: 2D modeling, ignoring density dependence, neglecting mixing, splitting the problem (separate inversion of different data sets), disregarding variations in aquifer elevation, or combinations of these. These simplifications reflect both conceptual and computational difficulties.

From a computational point of view, the inversion of two non-linear coupled equations on a 3D domain is challenging. Computer cost can be significantly reduced by analytical evaluation of sensitivities or by taking advantage of the fact that similar problems have to be solved, varying only model parameters. However, these kinds of improvements require tedious and costly programming. Instead, recent trends appear to point in the direction of generic inversion codes such as PEST or UCODE, whose performance can be greatly enhanced by parallelization.

It can be contended, however, that the main difficulties reflect conceptual shortcomings. Moreover, SWI inversion is complex because SWI models depend on many factors that can be neglected in conventional freshwater aquifers. The use and meaning of measured heads and concentrations is sensitive to borehole construction (length of open interval) and history (whether full of freshwater or saltwater). These problems can be addressed by explicitly modeling the measurement process, which is feasible, but represents an added source of complexity.

Seawater intrusion is sensitive to aquifer bathymetry and initial conditions. The latter can be obtained numerically if a steady state is chosen as initial state. However, the solution may be difficult because it requires a good initial guess. Fortunately, such a guess can be obtained from previous iterations in the context of automatic inversion. Unfortunately, initial conditions may not be at steady state because actual salinization prior to pumping may not reflect current sea level. In such cases, an option is to parameterize initial salinities, which are then estimated during model calibration. In fact, the same can be done regarding aquifer bathymetry (especially the elevation of the discharge point in confined aquifers). Obviously, these options represent a marked increase in model complexity. Further analysis is needed to find out whether and when they are sensible.

These difficulties are partially overcome by the availability of informative extra data sets, notably electromagnetic geophysics and tidal response. These data are highly informative, relatively easy to obtain, and they provide extensive areal coverage. Taking advantage of them increases computational cost and conceptual complexity of inversion, but is likely to be worth the effort.

In all, the time is ripe. The number of publications on the conceptual aspects of SWI has grown exponentially in recent years. Therefore, most of the difficulties addressed here should be overcome soon. As a result, a surge in SWI inversion should be expected.