1 Introduction

Predicting rock and fluid properties in the layer of soil and weathered rock at Earth’s surface is important across a broad range of cross-disciplinary problems, from understanding the processes that drive subsurface weathering (e.g., St. Clair et al. 2015; Riebe et al. 2017, 2021; Hayes et al. 2019) to quantifying effects of subsurface water storage on forest productivity and drought response (e.g., Hahm et al. 2019; Dawson et al. 2020; McCormick et al. 2021). Predictions of porosity and fluid (air and water) saturation are especially important to understanding the factors that regulate subsurface water flow and storage and thus to quantifying hydrological processes (Rempe and Dietrich 2018; Harman and Cosans 2019). Hence, predictions of petrophysical properties such as porosity, mineral volume, and fluid saturation are central to advancing the science of the critical zone, the layer that extends from treetop to bedrock and is so named because of its importance to life at Earth’s surface (Anderson et al. 2005; Brantley et al. 2007).

Direct measurements of petrophysical properties can be obtained from outcrops, well logs, and core samples, but these measurements may be rare in the critical zone. To overcome this limitation, the spatial distributions of petrophysical properties are often estimated from surface-based geophysical images, such as electrical resistivity tomography and seismic refraction tomography (Holbrook et al. 2014; Parsekian et al. 2015). Seismic velocity and electrical resistivity in the subsurface depend on petrophysical properties; hence, they can be combined with plausible or measured ranges in the properties of weathered rock and soil to estimate the thicknesses of soil, saprolite, and weathered rock and to quantify how both porosity and fluid content vary in each layer (Flinchum et al. 2018a, b; Callahan et al. 2020). For example, low velocities are observed in weathered rocks with low water content, whereas high velocities are measured in unweathered rocks with high water content. High resistivity may be indicative of rocks with low porosity and water content, whereas low resistivity may be indicative of highly porous, water-rich soil.

Rock physics models (Mavko et al. 2020) generally provide the physical relations between geophysical data (e.g., elastic and electrical properties) and petrophysical variables (e.g., rock and fluid properties). Examples of rock physics models include empirical relations, granular media models based on Hertz–Mindlin grain contact theory, and inclusion models. These models are used to estimate the elastic response (P- and S-wave velocities) and electrical response (resistivity) of a saturated porous rock with known porosity, mineralogy, and fluid saturations. Rock physics models have largely been developed for hydrocarbon-saturated porous rocks in oil and gas reservoirs (Mavko et al. 2020) and have been heuristically extended to groundwater modeling and near-surface geophysics applications (Knight et al. 1998; Knight and Endres 2005; Moysey et al. 2005; Singha and Moysey 2006; Nenna et al. 2011, 2013; Holbrook et al. 2014; Flinchum et al. 2018a; Gu et al. 2020).

At the field scale, the prediction of the spatial distribution of rock and fluid properties from the measured elastic and electrical data in the near subsurface requires the solution of a geophysical inverse problem (Tarantola 2005; Grana et al. 2021). Such predictions are then used in near-surface geophysics studies to make quantitative interpretations of water storage and weathering in the critical zone. For example, geophysical measurements in mountain watersheds, conducted at the hillslope scale, have been used to predict porosity to quantify subsurface water-holding capacity (e.g., Robinson et al. 2008; Holbrook et al. 2014; Flinchum et al. 2018a; White et al. 2019) and weathering in the critical zone (Hayes et al. 2019; Callahan et al. 2020). From a mathematical point of view, calculating the solution of geophysical inverse problems is challenging due to the uncertainty in the measurements and the non-uniqueness of the solution. Estimating petrophysical properties is also complicated by the fact that they are spatially correlated, usually with long correlation in the lateral direction and short correlation in the vertical direction (Isaaks and Srivastava 1989; Kitanidis 1997; Chiles and Delfiner 2009).

Deterministic methods, such as gradient-based algorithms, and probabilistic approaches, such as Bayesian inversion, are commonly adopted in geophysical inverse problems (Tarantola 2005; Aster et al. 2018; Menke 2018). Joint inversion of geophysical data from multiple sources has been presented in exploration geophysics (Doyen 2007; Grana et al. 2021) as well as near-surface geophysics (Gallardo and Meju 2003; Meju et al 2003; Linde and Doetsch 2016), mining (Astic et al. 2020), and CO2 sequestration (Grana et al. 2020; Tveit et al. 2020). Statistical methods for flow unit classification and zonation problems have been used in Doetsch et al. (2010), Hachmöller and Paasche (2013), Hermans and Irving (2017), and Parsekian et al. (2021); however, the extension of these methods to continuous variables for the prediction of petrophysical properties is still an ongoing research topic. Bayesian inversion for petro-elastic characterization is commonly used in exploration geophysics for the estimation of elastic properties (Tarantola and Valette 1982; Ulrych et al. 2001; Scales and Tenorio 2001; Buland and Omre 2003; Tarantola 2005) and petrophysical properties (Eidsvik et al. 2004; Bachrach 2006; Larsen et al. 2006; Grana and Della Rossa 2010; Grana 2016; Grana et al. 2017). Reviews of Bayesian inversion methods for reservoir characterization can be found in Doyen (2007), Azevedo and Soares (2017), and Grana et al. (2021). Bayesian inversion methods have also been applied to electromagnetic inversion (Minsley 2011; Ray and Key 2012; Buland and Kolbjørnsen 2012; Blatter et al. 2019), ground-penetrating radar tomography (Gloaguen et al. 2007; Dubreuil-Boisclair et al. 2011; Brunetti and Linde 2018; Hunziker et al. 2019), and refraction seismic data (Huang et al. 2021). Geostatistical sampling and stochastic optimization approaches have been applied to multiple geophysical datasets in different earth and environmental science applications (Allard et al. 2021; Athens and Caers 2021; Goodwin et al. 2021; Loe et al. 2021; Miltenberger et al. 2021; Redoloza and Li 2021).

The focus of this work is the rock physics inversion (i.e., a geophysical inverse problem where the physics is approximated using a rock physics model) for the estimation of the spatial distribution of the petrophysical properties in unconsolidated, weathered, and unweathered rocks, partially saturated with water and air, in the critical zone. In particular, this work introduces a geostatistical inversion method to predict petrophysical properties, specifically porosity and water saturation, from P-wave velocity and electrical resistivity. The advantage of the proposed inversion is that it generates geologically realistic realizations of petrophysical properties using geostatistical algorithms that account for spatial correlation models, computes the geophysical response by applying nonlinear rock physics relations, and updates the initial realizations using stochastic updating conditioned on the measured geophysical data in a Bayesian framework. The result of the inversion is a set of model realizations of petrophysical properties of the critical zone representing the posterior probability distribution of the properties of interest. The proposed implementation is applied in the joint velocity and resistivity domain and assumes that preliminary seismic and electrical inversion results are available. The objectives herein are as follows: (i) illustrate the proposed mathematical framework, (ii) demonstrate the approach on plausible synthetic subsurface models with “known” characteristics of the near-surface critical zone, and (iii) apply the methodology to real field data with unknown subsurface characteristics and measurements affected by noise, limited resolution, and preprocessing uncertainty.

2 Method

A rock physics model is a physical relation between the petrophysical property of interest (\({\varvec{m}}\)), for example porosity and water saturation, and the available geophysical data (\({\varvec{d}}\)), for example P-wave velocity and resistivity. The rock physics model can be extended to include other model variables such as mineral volumetric fractions, and multiple geophysical data, such as S-wave velocity and density. The rock physics model might take different formulations in different lithologies depending on the mineral composition and structure of the porous rocks (Mavko et al. 2020).

Generally, the input of the rock physics model is a set of volumetric fractions (i.e., defined between 0 and 1) and the output is a set of geophysical data. Hence, mathematically, the rock physics model can be written as a function \({\varvec{f}}:\left[ {0,1} \right]^{{n_{m} }} \to {\mathbb{R}}^{{n_{d} }}\) that approximates the physical relation between \({\varvec{m}}\) and \({\varvec{d}}\) as

$$ {\varvec{d}} = {\varvec{f}}\left( {\varvec{m}} \right) + {\varvec{\varepsilon}}, $$
(1)

with measurement error \({\varvec{\varepsilon}}\), where \(n_{m}\) is the number of model variables (e.g., \(n_{m} = 2\), for porosity and water saturation) and \(n_{d}\) is the number of geophysical variables (e.g., \(n_{d} = 2\), for P-wave velocity and electrical resistivity). The prediction of the model variables \({\varvec{m}}\) from the available data \({\varvec{d}}\) is an inverse problem. Several methods have been proposed for the solution of inverse problems associated with Eq. (1), including deterministic and stochastic methods (Tarantola 2005).

In this work, a Bayesian approach is adopted. The solution of the inverse problem is the probability distribution \(P({\varvec{m}}|{\varvec{d}})\) of the model variables conditioned on the available data, and it is calculated according to Bayes’ rule

$$ P\left( {{\varvec{m}}{|}{\varvec{d}}} \right) = \frac{{P({\varvec{d}}|{\varvec{m}})P\left( {\varvec{m}} \right)}}{{P\left( {\varvec{d}} \right)}}, $$
(2)

where \(P\left( {\varvec{m}} \right)\) is the prior distribution of the model variables, \(P({\varvec{d}}|{\varvec{m}})\) is the likelihood function, and \(P\left( {\varvec{d}} \right)\) is a normalizing term such that the posterior distribution \(P({\varvec{m}}|{\varvec{d}})\) is a valid probability density function (PDF) with integral equal to 1. In some special cases, for example for linear functions \({\varvec{f}}\) and Gaussian prior distributions of \({\varvec{m}}\), the solution can be analytically computed (Tarantola 2005). However, rock physics models are generally nonlinear. One of the main challenges in petrophysical characterization is that petrophysical properties are spatially correlated and the prior distribution should include a spatial correlation model (Isaaks and Srivastava 1989; Kitanidis 1997; Chiles and Delfiner 2009).

A stochastic method based on stochastic sampling and optimization is adopted. In the proposed approach, an initial ensemble of geostatistical realizations of the model variables porosity and water saturation is first generated using the probability field simulation (PFS) method (Srivastava 1992). Then the ensemble of models is updated using the ensemble smoother multi-data assimilation (ES-MDA) algorithm (Emerick and Reynolds 2013) according to the available measurements of P-wave velocity and resistivity to minimize the mismatch between the rock physics model predictions and the available data. Ensemble-based methods are generally faster than traditional stochastic sampling algorithms such as Markov chain Monte Carlo (Evensen 2009; Posselt and Bishop 2012). Ayani et al. (2020) show the advantage of using ensemble-based methods for a geophysical inverse problem for carbon dioxide storage monitoring, by comparing the inversion results of the ensemble smoother to those of a traditional deterministic inversion, using the same initial geostatistical model and showing the higher accuracy and lower uncertainty of the predicted solution obtained with the ensemble smoother. In the following sections, the prior model used to generate model realizations of porosity and water saturation is first introduced; then the rock physics model used to compute the predicted data of P-wave velocity and resistivity is described, and the inversion algorithm to obtain the posterior realizations of the model variables is presented.

2.1 Geostatistical Simulations

In stochastic sampling and optimization methods for geophysical inverse problems, model realizations of the variables of interest are sequentially simulated and updated until their geophysical response (the rock physics model predictions, in this approach) match the available data. Prior realizations of porosity and saturation are generated using geostatistical algorithms (Grana et al. 2021) such as sequential Gaussian simulation (SGS), truncated Gaussian simulation (TGS), pluri-Gaussian simulation (PGS), or PFS. Geostatistical simulations are stochastic algorithms that generate realizations of the properties of interest by sampling multidimensional random fields with a spatial correlation model to mimic the expected spatial variability. In general, these methods assume that the model variables are distributed according to Gaussian prior distributions. Petrophysical volumetric fractions are bounded between 0 and 1, by definition, and are generally non-stationary and non-ergodic; therefore, to apply traditional geostatistical methods, a normal score transformation maybe be necessary. In this case, the simulation is performed in the domain of the quantiles of the cumulative density function of the initial distribution and the results are transformed back. Alternatively, Gaussian distributions with truncation can be adopted; however, this method is recommended only when the likelihood of values outside of the physical range is negligible.

In the proposed approach, prior realizations of porosity and water saturation are generated using the PFS method. The PFS method is chosen for its computational efficiency in generating conditional Gaussian realizations and its ability to account for spatially variable mean and variance values (Srivastava 1992); however, the methodology could be applied to several geostatistical algorithms including two-point and multi-point statistics methods (Caers 2011). To simulate a spatial realization \({\varvec{x}}\) of a random variable, a spatially correlated realization of a Gaussian random field \({\varvec{z}}\sim {\mathcal{N}}\left( {{\varvec{z}};0,{\mathbf{C}}} \right)\) is first generated assuming a spatial correlation matrix \({\mathbf{C}}\), for example, using the fast Fourier transform with moving average; then, at each location, the correlated realization \({\varvec{z}}\) is multiplied by the local standard deviation \({\varvec{\sigma}}_{x}\) and the local mean \({\varvec{\mu}}_{x}\) is added

$$ {\varvec{x}} = {\varvec{\mu}}_{x} + {\varvec{\sigma}}_{x} {\varvec{z}}. $$
(3)

The local mean \({\varvec{\mu}}_{x}\) and variance \({\varvec{\sigma}}_{x}\) can be obtained from a prior trend or computed from available direct measurements using a kriging approach. If direct measurements are available, for example from a borehole location, the geostatistical realizations can be conditioned on the available data by setting \(\sigma_{x} = 0\) at the measurement locations.

The prior distribution of the petrophysical properties, porosity and water saturation, is defined based on direct measurements (e.g., core samples), nearby outcrop data, and prior geological information. A prior trend of porosity and water saturation is adopted, where porosity is a monotonically non-increasing function of depth, and water saturation is a monotonically non-decreasing function of depth. The prior model also includes a spatial correlation function that describes how the spatial correlation of each model variable varies with respect to the distance. In the proposed approach, the same spatial correlation model is assumed for porosity and water saturation, and it is a two-dimensional anisotropic spherical correlation function

$$ \rho_{sph} \left( {h,\theta } \right){ } = \left\{ {\begin{array}{*{20}l} {1 - \frac{3h}{{2l\left( \theta \right)}} + \frac{{h^{3} }}{{2l\left( \theta \right)^{3} }}} \hfill & {h \le l\left( \theta \right)} \hfill \\ 0 \hfill & {h > l\left( \theta \right),} \hfill \\ \end{array} } \right. $$
(4)

where \(h\) is the distance, \(\theta\) is the angular coordinate (i.e., the counterclockwise angle from the Cartesian horizontal axis), and \(l\left( \theta \right)\) is the correlation length defined as

$$ l\left( \theta \right) = \frac{{l_{{{\text{max}}}} l_{{{\text{min}}}} }}{{\sqrt {l_{{{\text{max}}}}^{2} \sin^{2} \left( {\alpha - \theta } \right) + l_{{{\text{min}}}}^{2} \cos^{2} \left( {\alpha - \theta } \right)} }}. $$
(5)

Equation (5) describes the radius of an ellipse parameterized by the maximum and minimum correlation length, \(l_{{{\text{max}}}}\) and \(l_{{{\text{min}}}}\), and by the azimuth angle \(\alpha\), such that \(l\left( \theta \right) = l_{{{\text{max}}}}\) when \(\theta = \alpha\), and \(l\left( \theta \right) = l_{{{\text{min}}}}\) when \(\theta = \alpha + \pi\). Stochastic realizations are then generated using the PFS and transformed back in the original domain according to the inverse cumulative distribution function of the prior distribution. This transformation might affect the spatial correlation length of the realizations in the vertical direction; nevertheless, it produces an ensemble of realistic petrophysical realizations with large variability between the realizations.

2.2 Rock Physics Model

To predict P-wave velocity and resistivity in the critical zone, an elastic rock physics model and electrical rock physics relation are adopted. Several rock physics models to compute the elastic and electrical response of fluid-saturated porous rocks have been presented (Mavko et al. 2020). One mineral phase (for example, quartz) and two fluid components, namely water and air, are initially assumed. Therefore, the variables of interest are porosity \(\phi\), water saturation \(S_{{\text{w}}}\), and air saturation \(1 - S_{{\text{w}}}\). Several rock physics models have been developed to compute the elastic response of fluid-saturated porous rocks. An overview of rock physics theory is given in Mavko et al. (2020). The P-wave velocity \(V_{{\text{P}}}\) of a fluid-saturated porous rock is computed as

$$ V_{{\text{P}}} = \sqrt {\frac{{K_{{{\text{sat}}}} \left( {\phi ,S_{{\text{w}}} } \right){ } + \frac{4}{3}G_{{{\text{sat}}}} \left( {\phi } \right)}}{{\rho \left( {\phi ,S_{{\text{w}}} } \right)}}} , $$
(6)

where \(K_{{{\text{sat}}}} \left( {\phi ,S_{{\text{w}}} } \right){ }\) and \(G_{{{\text{sat}}}} \left( {\phi } \right){ }\) are the bulk and shear moduli of the saturated rock, and \(\rho \left( {\phi ,S_{{\text{w}}} } \right)\) is the density of the saturated rock. Density \(\rho \left( {\phi ,S_{{\text{w}}} } \right)\) can be computed as

$$ \rho \left( {\phi ,S_{{\text{w}}} } \right) = \left( {1 - \phi } \right)\rho_{{\text{m}}} + \phi \rho_{{\text{f}}} = \left( {1 - \phi } \right)\rho_{{\text{m}}} + \phi \left[ {\left( {1 - S_{{\text{w}}} } \right)\rho_{{\text{a}}} + S_{{\text{w}}} \rho_{{\text{w}}} } \right], $$
(7)

where \(\rho_{{\text{m}}}\) is the density of the mineral phase, and \(\rho_{{\text{f}}}\) is the density of the fluid mixture that depends on water saturation \(S_{{\text{w}}}\) and the densities of air and water, \(\rho_{{\text{a}}}\) and \(\rho_{{\text{w}}}\). The bulk and shear moduli of the saturated rock, \(K_{{{\text{sat}}}} \left( {\phi ,S_{{\text{w}}} } \right)\) and \(G_{{{\text{sat}}}} \left( {\phi } \right)\), are generally computed using Gassmann’s equations (Mavko et al. 2020) and are functions of the bulk and shear moduli of the mineral, porosity, the bulk moduli of air and water, and water saturation. Several formulations have been proposed to link the dry rock elastic moduli to porosity and mineral elastic moduli, including granular media models such as the soft and stiff sand models (Dvorkin and Nur 1996; Gal et al. 1998; Dvorkin et al. 2014), and inclusion models such as self-consistent approximation and differential effective medium models (Mavko et al. 2020). The details of the rock physics models are given in the Appendix.

The resistivity \(R\) of the fluid-saturated rock can be computed using modified Archie's law (Archie 1942) for sandstone

$$ R = a\frac{{R_{{\text{w}}} }}{{\phi^{m} S_{{\text{w}}}^{n} }}, $$
(8)

or the Simandoux equation (Simandoux 1963) for shaley sandstone

$$ R = \frac{a}{{\left( {\frac{{\phi^{m} }}{{R_{{\text{w}}} }} + \frac{{v_{{\text{c}}} }}{{R_{{\text{c}}} }}} \right)S_{{\text{w}}}^{n} }}, $$
(9)

where \(R_{{\text{w}}}\) is the resistivity of water, \(m\) is the cementation exponent, and \(n\) is the saturation exponent, \(a < 1\) is an empirical constant, \(v_{{\text{c}}}\) is the volume of clay, and \(R_{{\text{c}}}\) is the resistivity of clay (Mavko et al. 2020). The Archie and Simandoux equations are often empirically applied in other lithologies by fitting the model parameters according to laboratory measurements.

2.3 Inverse Method

The inversion workflow combines geostatistical algorithms for the generation of the prior realizations of porosity and water saturation, rock physics models for the prediction of the elastic and electrical response of the prior models, and stochastic inverse theory for the updating of the models. The inverse method is based on the ES-MDA (Emerick and Reynolds 2013), in which an ensemble of prior realizations is first generated and then updated using a Bayesian updating step based on the Kalman filter equations.

In the proposed approach, an ensemble of \(N_{{\text{e}}}\) porosity and water saturation models \({\varvec{m}}_{j}\) for \(j = 1, \ldots ,N_{{\text{e}}}\) is predicted to match the available P-wave velocity and resistivity \({\varvec{d}}\). The posterior mean (i.e., the mean of the updated realizations) is the most likely model of porosity and water saturation.

The ES-MDA algorithm is iterative and includes the following steps:

  1. 1.

    For the first iteration \(i = 1\), an ensemble of \(N_{{\text{e}}}\) realizations \(\left\{ {{\varvec{m}}_{j}^{i} } \right\}_{{j = 1, \ldots ,N_{e} }}\) of the model variables is generated.

  2. 2.

    The rock physics model is then applied to predict the data \(\left\{ {{\varvec{d}}_{j}^{i} } \right\}_{{j = 1, \ldots ,N_{e} }}\) with a perturbation to the available data \({\varvec{d}}_{p}^{i} = {\varvec{d}} + \alpha_{i} {{\varvec{\Sigma}}}_{{\varvec{e}}}^{1/2} {\varvec{z}}_{{\varvec{d}}}\), where \({\varvec{z}}_{{\varvec{d}}} \sim N\left( {0, {\mathbf{I}}_{n} } \right)\) is a vector sampled from a multivariate Gaussian distribution with \(0\) mean and covariance matrix equal to \( {\mathbf{I}}_{n}\) (the identity matrix of size \(n \times n\), with \(n\) being the number of data points), \({{\varvec{\Sigma}}}_{{\varvec{e}}}^{1/2}\) is the square root of the covariance matrix of the data errors, and \(0 < \alpha_{i} \le 1\) is the inflation factor at iteration \(i\).

  3. 3.

    The model ensemble is updated according to the Bayesian updating equation

    $$ {\varvec{m}}_{j}^{i + 1} = {\varvec{m}}_{j}^{i} + {{\varvec{\Sigma}}}_{{{\varvec{m}},{\varvec{d}}}}^{i} \left( {{{\varvec{\Sigma}}}_{{{\varvec{d}},{\varvec{d}}}}^{i} + \alpha_{i} {{\varvec{\Sigma}}}_{{\varvec{e}}} } \right)^{ - 1} \left( {{\varvec{d}}_{p}^{i} - {\varvec{d}}_{j}^{i} } \right), $$
    (10)

    for \(j = 1, \ldots ,N_{e}\), where \({{\varvec{\Sigma}}}_{{{\varvec{m}},{\varvec{d}}}}^{i}\) is the cross-covariance matrix of models \({\varvec{m}}^{i}\) and prior data predictions \({\varvec{d}}^{i}\), and \({{\varvec{\Sigma}}}_{{{\varvec{d}},{\varvec{d}}}}^{i}\) is the \(n \times n\) covariance matrix of the data predictions \({\varvec{d}}^{i}\).

  4. 4.

    The prediction-updating steps 2 and 3 are repeated for \(N_{a}\) iterations with the condition \(\sum\nolimits_{i = 1}^{{N_{a} }} {\frac{1}{{\alpha_{i} }} = 1}\).

Due to the non-linearity of the rock physics model, the covariance matrices cannot be analytically computed and are estimated using the ensemble of models and predictions. In the proposed applications, the data include measurements of P-wave velocity and resistivity obtained after preprocessing of geophysical data. The data errors depend on the noise and resolution of the measured data as well as the uncertainty associated with the preprocessing step such as inversion parameters and regularization terms. Generally, the data errors are assumed to be spatially independent, and \({{\varvec{\Sigma}}}_{{\varvec{e}}} \user2{ }\) is assumed to be diagonal. The diagonal elements of the matrix, that is, the variances of the measurements, represent the variability of the data and account for the different resolution, noise, and preprocessing uncertainty of the various geophysical sources, in this case seismic and electrical data.

3 Application

The proposed methodology is first validated on two synthetic models in the near subsurface and then applied to a real dataset. The first synthetic example represents a plausible critical zone structure of a two-dimensional section of a mountain hillslope with soils and highly weathered regolith near the surface and fractured rock at depth. The second example represents a synthetic two-dimensional section of the critical zone across a forested slope and swampy meadow, conceptually reproducing similar conditions as found in the P301 catchment at the Southern Sierra Critical Zone Observatory (Holbrook et al. 2014). In the first example, the same rock physics is adopted for the entire dataset, whereas in the second example the rock physics model formulation varies across the section based on the spatial distribution of the facies (i.e., saprolite versus weathered bedrock).

The first synthetic dataset of a section of mountain hillslope (Fig. 1) is modified after the model presented in Parsekian et al. (2021). The model represents a 40 m-thick and 144 m-long section and includes four main litho-fluid facies (i.e., geobodies with specific petrophysical properties), namely dry soft rock, dry stiff rock, wet soft rock, and wet stiff rock, where wet indicates predominant water saturation and dry indicates predominant air saturation; soft indicates unconsolidated rocks and stiff indicates consolidated rocks (Fig. 1). In this application, all rocks across the section are assumed to be sandstones with known mineral composition, and their stiffness depends only on the compaction, and hence it is a function of porosity. Synthetic models of porosity and water saturation are then generated by sampling from facies-dependent Gaussian distributions. A two-dimensional Gaussian smoothing function with correlation length of 3 m is used as a convolutional filter to mimic a realistic geological continuity (Fig. 1). The average porosity is 0.41 in dry soft rocks, and 0.37 in wet soft rocks, 0.22 in dry stiff rocks and 0.20 in wet stiff rocks. The average water saturation is 0.25 in dry soft rocks, and 0.69 in wet soft rocks, 0.33 in dry stiff rocks and 0.77 in wet stiff rocks. Porosity decreases as a function of depth, whereas water saturation increases, due to compaction and gravity, respectively. The variability of porosity and water saturation within the same litho-fluid facies is smaller than between different litho-fluid facies. The gradual transitions at the facies boundaries are due to the application of the spatial filter to mimic realistic data resolution. The geophysical response of the synthetic model is shown in Fig. 2. P-wave velocity is computed according to Dvorkin’s model (Dvorkin and Nur 1996; Appendix) and electrical resistivity is calculated according to Archie’s law (Archie 1942). Both properties include a random error corresponding to a signal-to-noise ratio of 10. The mineralogy composition is 0.50 quartz, 0.25 feldspar, and 0.25 clay, corresponding to a bulk modulus of 30 GPa and a shear modulus of 30 GPa. The mineral fractions are assumed to be homogeneous across the section. The critical porosity value (i.e., the porosity value above which the rock becomes a suspension) is assumed to be 0.6, and the coordination number (i.e., the average number of contacts per grain) is 4. In Archie’s law, the cementation and saturation exponents are equal to 2. In real applications, these parameters should be calibrated according to real data (Mavko et al. 2020). The rock physics relations are shown in Fig. 3.

Fig. 1
figure 1

Synthetic model of a mountain hillslope (modified from Parsekian et al. 2021): a litho-fluid facies model, b porosity, and c water saturation

Fig. 2
figure 2

Geophysical response of the synthetic model of a mountain hillslope: a P-wave velocity and b resistivity (in logarithmic domain)

Fig. 3
figure 3

Rock physics model and prior distribution of petrophysical properties: a P-wave velocity versus porosity (color-coded by water saturation); b log-resistivity versus water saturation (color-coded by porosity); c log-resistivity versus P-wave velocity (color-coded by porosity); d contour plot of the prior bivariate distribution of porosity and water saturation

The rock physics inversion method is applied to the dataset of P-wave velocity and resistivity. For the prior distribution of porosity and water saturation, a bivariate truncated Gaussian model is assumed (Fig. 3). For the prior spatial correlation model, a two-dimensional spherical correlation function is assumed with correlation range of 75 m in the horizontal direction and 25 m in the vertical direction and azimuth angle of 16 degrees, based on the available prior geological information. A set of 1,000 prior realizations of porosity and water saturation is generated using PFS. Figure 4 shows a subset of five randomly selected realizations. All the prior realizations honor the vertical trends, the prior means and variances, and approximately the spatial correlation function. The ES-MDA method is applied with four data assimilations, with inflation factors equal to 0.25 at each iteration. The posterior mean models of porosity and water saturation are computed as the average of the 1,000 updated simulations and are shown in Fig. 5. Overall, the posterior mean models show a good match with the actual models in Fig. 1. The correlation coefficient between predicted and actual porosity is 0.97, and the coefficient between predicted and actual water saturation is 0.93. The root-mean-square error for porosity is 0.017 and for water saturation is 0.021. The average coverage ratio of the 0.90 confidence interval (i.e., the fraction of actual samples that fall in the 0.90 confidence interval) is 0.84 for porosity and 0.88 for water saturation, showing a slight underestimation of the uncertainty possibly due to the normal score transformation of the prior distributions. Figure 5 also shows a litho-fluid facies classification for validation purposes. The classification is obtained using linear discriminant analysis with a training dataset extracted from a subsample consisting of 25% of the reference model. The predicted litho-fluid model matches the conceptual model in Fig. 1 with a success rate of 0.92. Consistent results are also obtained using standard cutoff classifications and unsupervised cluster analysis. A set of five realizations randomly selected from the ensemble of 1,000 updated realizations is shown in Fig. 6. The posterior uncertainty is analyzed by applying multidimensional scaling (MDS, Caers 2011), with the Euclidean distance, to the prior and posterior realizations and comparing the variability of the first two components (Fig. 7). The MDS plot shows that the posterior uncertainty is reduced compared to the prior uncertainty. The first two components of the prior models explain 70% of the total variance, and the first three components 91%; whereas first two components of the posterior models explain 30% of the total variance, and the first three components 43%, possibly due to the noise, resolution, and spatial structure of the geophysical data. The results of a traditional Bayesian rock physics inversion (Grana et al. 2021) without spatial correlation prior model or geostatistical sampling are shown in Fig. 8. The traditional Bayesian rock physics inversion is performed with and without prior vertical trend. The results without the prior trend do not capture the vertical variations of petrophysical properties, and the predictions regress towards the mean values of the prior distribution. The results with prior trend capture the overall spatial behaviors of the petrophysical properties but fail to accurately predict the petrophysical values in the intermediate litho-fluid facies with similar elastic and electrical properties. To investigate the effect of the data resolution on the inversion results, a spatial filter is applied to the geophysical data in Fig. 2 to obtain a realistic resolution of the geophysical measurements (Fig. 9). The inversion results of the rock physics inversion applied to the low-resolution dataset are shown in Fig. 9 and still capture the main structure of the critical zone, despite smoother transitions between the litho-fluid facies due to the smoothing effect in the measured data.

Fig. 4
figure 4

Five randomly selected prior geostatistical realizations of porosity (left) and water saturation (right) along the two-dimensional section of a synthetic mountain hillslope

Fig. 5
figure 5

Posterior mean models of a litho-fluid facies classification, b porosity, and c water saturation along the two-dimensional section of a synthetic mountain hillslope

Fig. 6
figure 6

Five randomly selected posterior geostatistical realizations of porosity (left) and water saturation (right) along the two-dimensional section of a synthetic mountain hillslope

Fig. 7
figure 7

Analysis of the posterior variance in the multidimensional scale coordinate domain for the synthetic mountain hillslope: black crosses represent the prior models and red crosses represent the posterior models

Fig. 8
figure 8

Posterior mean models of litho-fluid facies classification, porosity, and water saturation obtained by applying a traditional Bayesian inversion method without prior vertical trends (ac) and with prior vertical trends (de)

Fig. 9
figure 9

Geostatistical rock physics inversion results with low-resolution data: a P-wave velocity and b resistivity (in logarithmic domain); and posterior models of c litho-fluid facies classification, d porosity, and e water saturation along the two-dimensional section of a synthetic mountain hillslope

The proposed method is then validated on a conceptual model developed for a weathered hillslope in the P301 catchment in the Southern Sierra Critical Zone Observatory and inferred from seismic velocity and electrical resistivity. The conceptual model is modified after Holbrook et al. (2014). The model includes saprolite, moderately weathered bedrock, and relatively unweathered bedrock (Fig. 10). The two-dimensional section is 40 m thick and 151 m long. Synthetic models of porosity and water saturation are sampled from facies-dependent Gaussian distributions with parameters estimated from Holbrook et al. (2014). The hillslope section is divided into three facies: saprolite, where the rock is chemically weathered but retains the fabric of the underlying rock; moderately weathered bedrock, where the rock is fractured and chemical weathering occurs along fracture surfaces; and relatively unweathered bedrock with limited fracturing and chemical weathering. In saprolite, the average porosity is 0.37 and the average water saturation is 0.34; in moderately weathered bedrock, the average porosity is 0.19 and the average water saturation is 0.88; and in relatively unweathered bedrock, the average porosity is 0.14 and the average water saturation is 1. Porosity decreases as a function of depth, whereas water saturation increases. The variability of porosity and water saturation is higher within saprolite than in bedrock. The elastic response is computed using a facies-dependent rock physics model where models are chosen to best represent the medium for each facies. For example, Dvorkin’s model (Dvorkin and Nur 1996) was chosen for saprolite, which should behave like a granular medium, and Berryman’s inclusion model (Berryman 1995) was chosen for weathered bedrock, where most of the porosity is generated by fracturing. In saprolite, Dvorkin’s model with coordination number 4 and critical porosity 0.6 is adopted, and in bedrock, Berryman’s inclusion model with aspect ratio 0.2 (Appendix). Electrical resistivity is calculated according to Archie’s law, with cementation and saturation exponents equal to 2. The parameters were calibrated such that the rock physics models predict elastic and electrical properties consistent with Holbrook et al. (2014). The reference P-wave velocity and resistivity models are shown in Fig. 11, assuming a signal-to-noise ratio of 10.

Fig. 10
figure 10

Conceptual model of a weathering zone in the Southern Sierra Critical Zone Observatory (modified from Holbrook et al. 2014): a rock type (saprolite, moderately weathered bedrock, and relatively unweathered bedrock), b porosity, and c water saturation

Fig. 11
figure 11

Geophysical response of the conceptual model of a weathering zone: a P-wave velocity and b resistivity (in logarithmic domain)

The stochastic inversion is then applied to P-wave velocity and resistivity. The prior distribution of porosity and water saturation is a bivariate truncated Gaussian mixture model and three Gaussian components for saprolite, moderately weathered bedrock, and relatively unweathered bedrock. The prior spatial correlation model is a two-dimensional spherical correlation function with correlation range of 75 m in the horizontal direction and 10 m in the vertical direction and azimuth angle of −14 degrees, according to prior geological knowledge. A set of 1,000 prior realizations of porosity and water saturation is generated using the PFS. Five randomly selected realizations are shown in Fig. 12. The ES-MDA method is applied with four data assimilations with inflation factors equal to 0.25 at each iteration. Because the rock physics model formulation depends on the lithology, at each step of the inversion a classification is applied using linear discriminant analysis to classify the geostatistical realizations into saprolite, moderately weathered bedrock, and relatively unweathered bedrock, and the appropriate rock physics model is applied.

Fig. 12
figure 12

Five randomly selected prior geostatistical realizations of porosity (left) and water saturation (right) along the two-dimensional section of the conceptual model of a weathering zone

The posterior mean models of porosity and water saturation, computed from the 1,000 updated simulations, are shown in Fig. 13 and match relatively accurately the conceptual model in Fig. 10. The trend, the mean, and the variances of the petrophysical properties are accurately predicted; however, some misclassifications are present at the boundaries between rock types, in particular between moderately weathered bedrock and relatively unweathered bedrock, possibly due to the similar elastic properties of the two rock types. The litho-fluid facies classification obtained by applying linear discriminant analysis to the predicted mean models has a success rate of 0.81. The correlation coefficient between predicted and actual porosity is 0.94 and the coefficient between predicted and actual water saturation is 0.92. The root-mean-square error for porosity is 0.021 and for water saturation is 0.024. The average coverage ratio of the 0.90 confidence interval is 0.88 for porosity and 0.89 for water saturation. Compared to the first example where only one rock physics model is used, the variability of the realizations is slightly larger, leading to misclassifications of lithologies with similar geophysical responses. Figure 14 shows a set of five randomly selected updated realizations. The MDS plot of the first two components of the prior and posterior realizations shows a clear reduction in the posterior uncertainty (Fig. 15).

Fig. 13
figure 13

Posterior mean models of a rock type classification (saprolite, moderately weathered bedrock, and relatively unweathered bedrock), b porosity, and c water saturation along the two-dimensional section of the conceptual model of a weathering zone

Fig. 14
figure 14

Five randomly selected posterior geostatistical realizations of porosity (left) and water saturation (right) along the two-dimensional section of a conceptual model of a weathering zone

Fig. 15
figure 15

Analysis of the posterior variance in the multidimensional scale coordinate domain for the conceptual model of a weathering zone: black crosses represent the prior models and red crosses represent the posterior models

Finally, the proposed method is applied to a real dataset acquired on a mountain hillslope near Laramie, Wyoming, detailed in Kotikian et al. (2019). The spatial distribution of the water volume depends on the porosity of the rocks and the fluid saturations. The acquired geophysical dataset includes refraction seismic and electrical resistivity tomography data that have been preprocessed (inverted using the software Geogiga Seismic Pro for the seismic and R2 for the electrical) to map P-wave velocity and resistivity along a 60 m section of the hillslope (Fig. 16). The relation between the properties of interest (i.e., porosity and water saturation) and the geophysical data is the multivariate rock physics model based on Dvorkin’s model for the elastic component and Archie’s equation for the electrical component (Fig. 17). The mineral composition and the rock physics model parameters are defined according to nearby fields (Flinchum et al. 2018a). The rock physics parameters are assumed constant due to the small dimension of the dataset: the critical porosity is 0.6, the coordination number is 4, the cementation exponent is 2, and the saturation exponent is 2. However, spatial trends of the parameters could also be adopted and integrated in the forward rock physics operator. The prior distribution of porosity and water saturation is a bivariate truncated Gaussian model with prior correlation equal to −0.2, and the prior spatial correlation model is a two-dimensional spherical correlation function with correlation range of 50 m in the horizontal direction and 7.5 m in the vertical direction and azimuth angle of 25 degrees, according to prior geological knowledge (Kotikian et al. 2019) and nearby sites (Flinchum et al. 2018b). The error of the seismic velocities is assumed to be 10% of their variance, whereas the error of the electrical resistivity is assumed to be 15% of their variance, based on the sensitivity analysis of seismic and electrical inversion (Parsekian et al. 2021). The predicted models of porosity and water saturation along the two-dimensional section are shown in Fig. 18. The inversion results show relatively high porosity at the top of the section that gradually reduces in depth. The porosity values around 0.4 in the unconsolidated rocks near the surface and the values around 0.05 in the fractured bedrock at depth are consistent with the available geological information. The inversion results also show relatively low water saturation at the top of the section which gradually increases in depth. The saturation values around 0.2 in the relatively dry rock near the surface are consistent with the soil moisture measurements (Kotikian et al. 2019). The results in the upper part of the section are also consistent with the nuclear magnetic resonance observations of water volume in a nearby borehole (Parsekian et al. 2021). The posterior uncertainty in the real case is larger (approximately 20%) than in the previous synthetic cases due to the noise and resolution of the measured data (Fig. 19).

Fig. 16
figure 16

Geophysical attributes estimated along a 60 m section of a mountain hillslope near Laramie, Wyoming: P-wave velocity computed from first-arrival travel-time tomography and resistivity from electrical resistivity tomography

Fig. 17
figure 17

Rock physics model based on Dvorkin’s model for the elastic component and Archie’s equation for the electrical component: P-wave velocity versus porosity color-coded by water saturation and resistivity versus water saturation color-coded by porosity

Fig. 18
figure 18

Predicted models of porosity and water saturation along the mountain hillslope near Laramie, Wyoming

Fig. 19
figure 19

Analysis of the posterior variance in the multidimensional scale coordinate domain for the mountain hillslope near Laramie, Wyoming: black crosses represent the prior models and red crosses represent the posterior models

4 Discussion

The three examples demonstrate the applicability and validity of the proposed geostatistical inversion methodology. The first synthetic example validates the method assuming a unique rock physics model for the entire dataset as well as to investigate the effect of the data resolution. The second synthetic example demonstrates the methodology for a complex structure with different rock physics models in each rock type. The real case application illustrates the applicability of the method to real data with low resolution and affected by noise.

The proposed method enables the prediction of the spatial distribution of porosity and water saturation. The volumetric fractions of the minerals are assumed to be known and constant in each lithology. The extension to petrophysical problems with variable mineralogy is straightforward, because the formulation of the inverse problem is not limited by the number of model variables and can be applied to any finite number of volumetric fractions of solid and fluid phases. However, the inverse problem becomes underdetermined, and it would require additional geophysical data, such as S-wave velocity or density, to obtain accurate predictions of the rock and fluid volumetric fractions (Doyen 2007; Grana et al. 2021). If these properties are available, and the rock physics model can be adequately calibrated with direct measurements, one could predict porosity, fluid saturations, and the volume of the main mineral component, such as quartz, feldspar, or clay.

One of the main challenges in applying the proposed method to real data is the calibration of the parameters of the rock physics model. Indeed, the rock physics relations generally require laboratory experiments to measure rock and fluid parameters, such as the elastic moduli and density of mineral and fluid components (Mavko et al. 2020). Model parameters such as critical porosity, coordination number, or aspect ratio should be calibrated by fitting a statistically significant number of direct measurements from core samples or borehole data (Mavko et al. 2020). Similarly, the spatial correlation models must be calibrated using available data and nearby outcrop information. In practical applications, the horizontal and vertical correlation functions are often assumed a priori based on prior geological knowledge of the area, for example based on outcrop data and analogues (Isaaks and Srivastava 1989; Grana et al. 2021). In theory, it is possible to estimate the spatial correlation model from the geophysical data, but the limited resolution of the measured data might lead to overestimation of the spatial correlation parameters of the petrophysical properties, due to the smoothing effects associated with the data as well as regularization methods used in data processing. Spatial correlation parameters can also be assumed unknown and stochastically generated in the geostatistical simulations; in this case, a larger number of realizations should be generated to capture the larger variability of the model space.

In general, any prior distribution can be used; however, for non-Gaussian distributions, normal-score transformations must be applied to perform the inversion in the transformed domain (Grana et al. 2021). Examples of non-Gaussian distributions include Gaussian mixture models for multimodal distributions, generalized Gaussian models for skewed distributions, and beta or Kumaraswamy distributions for double-bounded concave and convex distributions. Stochastic inversion methods with complex prior models based on Markov chain Monte Carlo methods for spatially correlated variables were also proposed in Hansen et al. (2012) but are generally applied to discrete random variables.

The proposed method is efficient due to the limited computational cost of the forward operator (i.e., the rock physics model). The computational cost of the inversion is of the order of minutes for datasets with less than 104 data points. The inversion can be efficiently applied to datasets with a large number of data, for example of the order of 105. For larger datasets, dimensionality-reduction methods based on statistical approaches (e.g., principal component analysis or multidimensional scaling) or deep learning (e.g., convolutional autoencoder and variational autoencoder) could be applied to compress the dimension of the data and reduce the computational time (Liu and Grana 2020). A sensitivity analysis on the size and variability of the initial ensemble should generally be performed to ensure that the ensemble is large enough to avoid ensemble collapse (Emerick and Reynolds 2013). The choice of the covariance matrix of the error affects the posterior predictions. If the variances of the errors are too large, then the inversion tends to predict models that are similar to the prior realizations; if the variances of the errors are too small, then inversion might predict unphysical values of the model variables in order to match the data within the error variances. The data errors are generally assumed to be spatially uncorrelated, resulting in a diagonal covariance matrix of the error; however, in many practical applications, errors in geophysical data are spatially correlated due to the prior geophysical modeling steps, such as seismic and electrical inversion.

The main limitation of the proposed inversion is the implementation in the joint velocity and resistivity domain rather than in the domain of the measured seismic and electrical data. If a Bayesian approach to seismic and electrical inversions is applied, as in Huang et al. (2021) and Minsley (2011), then uncertainty in the posterior distribution of velocity and resistivity can be integrated in the proposed inversion using the Chapman Kolmogorov approach (Grana et al. 2021). Alternatively, the proposed approach can be extended to the seismic and electrical domain by combining the rock physics model with seismic and electrical wave propagation models. However, geophysical models based on wave propagation are more computationally demanding than the rock physics model, leading to increased computational time for stochastic inversion approaches. A possible solution is to apply dimensionality reduction to perform the inversion in lower-dimensional spaces (Liu and Grana 2020). Rock physics and spatial correlation parameters are assumed to be constant, but they could also be assumed spatially variable or unknown and be stochastically simulated in the geostatistical algorithms. The main challenge in rock physics inverse modeling in the critical zone is the limited number of direct measurements for the validation, in the absence of borehole data. A potential research direction is combining geophysical and hydrological models to predict the spatial distribution of the subsurface water that flows and is stored in mountain watersheds. Additionally, model validation would benefit from laboratory measurements of petrophysical, elastic, and electrical properties on core samples, to improve the accuracy of the rock physics model calibration and reduce the uncertainty in the posterior distribution.

5 Conclusions

An innovative stochastic methodology is presented for predicting petrophysical properties such as porosity and fluid saturation from geophysical data that are commonly estimated from seismic and electrical data. The proposed approach is based on the following steps: geostatistical sampling of prior models of petrophysical properties, application of a rock physics model to compute the geophysical response, and updating of the prior models according to the likelihood of the available data. The result is a set of updated realizations that honor the prior model and match the available data. The inverted petrophysical models represent more realistic images of the critical zone compared with traditional inversion methods where the predictions regress towards the mean. The proposed approach integrates stochastic sampling with rock physics and spatial correlation models and generates spatially correlated realizations of petrophysical properties conditioned on geophysical data. The spatial correlation model is included in the prior model of the petrophysical properties. In the examples presented herein, the proposed method is applied to evaluate the water-holding capacity of weathering zones in mountain watersheds. Such methods can be extended to other near-surface geophysics applications, if adequate geophysical data and rock physics models are available and can be directly integrated with hydrological models. The results of the rock physics inversion workflow can then be used to improve the understanding of subsurface weathering and how it relates to ecosystem and hydrological processes, thus contributing to advances in the critical zone.