Introduction

Over 200 volcanoes around the world are known to be actively exhibiting ground deformation (Ebmeier et al. 2018). This number will continue to rise from expansion of ground-based monitoring networks, increases in the amount and types of satellite imagery available, and improvements in data processing and analysis methods (Poland and Zebker 2022). Ground deformation over timescales not captured in previous data will also substantially contribute to increasing the number of known actively deforming volcanoes (Grapenthin et al. 2022).

Ground deformation in volcanic settings can arise from a variety of causes. Deformation can reflect pressurizing/depressurizing subsurface magma bodies, which could have geometries ranging from spheroid-like fluid-filled reservoirs to complex networks of dikes, sills, and crystal mush regions (Alshembari et al. 2023; Bato et al. 2021; Ebmeier et al. 2018; Grapenthin et al. 2022; Liao et al. 2021; Montgomery-Brown and Miklius 2021; Mullet and Segall 2022). Deformation can also reflect processes such as flank slip or other volcanic/tectonic faulting (Dumont et al. 2022; Poland et al. 2017) and hydrothermal activity (Fournier and Chardot 2012). Having accurate forward models of deformation from these various processes is important for resolving magma storage geometries and understanding stress states that influence flank stability (Gonzalez-Santana and Wauthier 2021), eruption triggering (Gregg et al. 2013), dike trajectories (Karlstrom et al. 2009; Sigmundsson et al. 2015), caldera collapse onset and expected eruptive volumes (Anderson et al. 2019; Sigmundsson et al. 2020), and overall eruptive cycles (Townsend 2022). To resolve these processes and detect signs of unrest, it is important to be able to relate deformation models to monitoring data (Fernández et al. 2017).

There are a wide variety of different forward models and inversion methods in use for studying volcano deformation. Intercomparisons between particular models have been made previously (Battaglia et al. 2013; Hickey and Gottsmann 2014; Novoa et al. 2019; Segall 2010; Taylor et al. 2021), but there has not yet been a comprehensive community wide model and inversion intercomparison initiative. Model intercomparison has been conducted for other aspects of volcano science including conduit flow (Sahagian 2005), lava flows (Dietterich et al. 2017), plumes (Costa et al. 2016), and pyroclastic density currents (Esposti Ongaro et al. 2020). Particularly successful initiatives for earthquake science have been carried out over the last two decades by the Southern California Earthquake Center (SCEC), including for both the Spontaneous Rupture Code Verification and simulations of Sequences of Earthquake and Aseismic Slip (SEAS) projects (Barall and Harris 2015; Erickson et al. 2020; Harris et al. 2018, 2011, 2009; Jiang et al. 2022; Mai et al. 2016). The SCEC exercises have developed into a multi-pronged community initiative that helps push the boundaries of earthquake modeling and have been influential for establishing standards of reproducibility.

We follow the SCEC blueprint to introduce an initial phase of volcano deformation community exercises for verification and validation. In general, verification refers to testing that a model is implemented correctly given the assumptions behind it, while validation refers to testing that a model represents reality and/or matches data (Gonnermann and Anderson 2021). Here, we verify forward models using intercomparison, which is termed benchmarking when exact analytical solutions are available for comparison. We then test inversions for noisy synthetic data; this can be considered a first step towards more comprehensive validation efforts which would involve real data and that are left for future exercises.

These exercises were initiated by a steering committee (Karlstrom, Montgomery-Brown, Crozier, Bato, Cayol) formed in fall 2021 in partnership with CONVERSE (Converging on Eruption Science with Equity), the MCS (Modelling Collaboratory for Subduction) (Gonnermann and Anderson 2021), and the IAVCEI (International Association of Volcanology and Chemistry of the Earth’s Interior) Geodesy Commission (no financial support). A virtual planning workshop was advertised to the community and held in October 2021 with 32 attendees. The steering committee used community suggestions to finalize exercise design and hired a student (Angarita) to develop a website (www.driversofvolcanodeformation.org, Fig. 1) which provides complete problem specifications and the ability to download or interactively plot and compare submissions. The website was published in February 2022, and was advertised through email listservs (Arizona State University Volcano Digest, IAVCEI Geodesy Commission, MCS) and presentations/meetings at multiple conferences: American Geophysical Union (Montgomery-Brown et al. 2022), European Geophysical Union, Cities on Volcanoes, and International Union of Geodesy and Geophysics (Cayol et al. 2023). Virtual office hours for participants were held in various time zones.

Fig. 1
figure 1

Screenshot showing interactive plotting on the project website (http://www.driversofvolcanodeformation.org)

The exercise website is intended to remain open for continuing submissions so that it can serve as a general resource for method selection, verification, and validation. Twenty-six researchers from multiple continents and career stages participated as of summer 2023, when results were compiled and participants were asked to provide feedback. This article outlines the exercises, presents key results, and discusses future initiatives.

Forward model (verification) exercises

We kept the scope of this first phase of exercises limited by focusing on static elastic displacement, which is a necessary step for community consistency before considering more complex and time-dependent modeling (e.g., thermo-poro-viscoelasticity of host rocks or treatment of multiphase magma dynamics) and inversions (e.g., time-series data processing methods or sensor network design). We also focused on inflating/deflating spheroidal magma reservoirs, which are the most common magma system geometries used in both inversions and general magma reservoir models. Approaches that seek realistic spatial variability in material properties with constraints from geophysical images are promising (Hickey et al. 2016). However, since there are limits to how uniquely complex source geometries and rheology can be resolved (Segall 2019), linear elastic spheroidal reservoir models will remain valuable and widely used even as more sophisticated methods continue to be developed.

For forward modeling of elastic spheroidal reservoirs, a wide variety of both analytical and numerical methods are in use. Analytical models are widely used due to their simplicity and low computational cost, which is beneficial for inversions (Lisowski 2006; Taylor et al. 2021). Most analytical models assume a homogeneous elastic half space (McTigue 1987; Mogi 1958; Yang et al. 1988), and are approximations that become less accurate for small depth/radius ratios and/or eccentricities. A recently derived series expansion model for a spherical reservoir can be arbitrarily accurate but at greater computational cost (Zhong et al. 2019). Several approximate corrections for topography with varying computational cost and accuracy can be used with analytical models (McTigue and Segall 1988; Williams and Wadge 2000). There are multiple different implementations in use for many of the analytical models and corrections, and implementation errors have previously been found in some published versions.

Numerical models can be more general than analytical models (Masterlark 2007), but also typically involve higher computational cost and have accuracy that depends upon user choices such as mesh and domain sizes. Volume discretized approaches such as finite element methods (FEMs) are robust and general, with the ability to account for various rheologies and material heterogeneity. Many different types of FEMs are commonly used in volcano geodesy, and all are sensitive to choices for domain size, boundary conditions, interpolation functions, and mesh generation (Novoa et al. 2019; Zienkiewicz 2005). Some widely used FEMs are commercial, although open source options are also available (Aagaard et al. 2013; Bodart et al. 2022; Garg et al. 2021; Lindsay et al. 2022; Longo et al. 2012; Rucker et al. 2022). Boundary integral or boundary element methods (BEMs) only discretize boundaries where stress or displacement conditions are applied, and several types are used (Cayol and Cornet 1997; Crouch and Starfield 1983). BEMs produce denser system matrices than FEMs, but typically have fewer degrees of freedom and thus are often more computationally efficient. BEMs also typically require fewer user choices than FEMs. However, BEMs are generally limited to homogeneous materials, and the commonly used constant dislocation BEM only converges within a few percent without additional treatment of dislocation edge singularities (Liu 2016). Other numerical approaches such as finite difference (Coco et al. 2014) or finite volume methods are at present less commonly used for volcano deformation, but are common in the similar problems associated with fault mechanics (Erickson et al. 2020).

We present a series of verification exercises. In each exercise, participants used models of their choice to submit predictions of displacements and stresses along two radial transects, a surface transect and a subsurface transect halfway between the top of the reservoir and the surface. The transects are 5 km long, since for most of these exercises deformation has decayed significantly by 5 km. We asked participants to provide metadata about mesh resolution, domain size, and domain boundary condition, but we did not provide guidelines on these choices or on convergence testing to gain a sense for the accuracy of models as they are used in practice. Submissions following the format specifications given on the community website could be uploaded in either “guest” mode for testing or “permanent” mode linked to a user account (these could still be updated at later times by the submitter).

All problems solve the equations of static linear elasticity without a body force (no gravity)

$$\frac{\partial {\sigma }_{ij}}{\partial {x}_{j}}=0$$
(1)

along with appropriate boundary and standard compatibility conditions. Cartesian coordinates (\(x,y,z\)) are indicated by \({x}_{j}\), and \({\sigma }_{ij}\) is the stress tensor using “Einstein” notation in which repeated indices are summed. In an isotropic (although possibly inhomogeneous) linear elastic solid following Hooke's law

$${\sigma }_{ij} = \lambda {e}_{kk}{\delta }_{ij}+2G{e}_{ij}$$
(2)

where \({e}_{ij}\) is the strain tensor (repeated indices are summed), \({\delta }_{ij}\) is the delta function, and \(\lambda\) and \(G\) are elastic constants (Lamé’s first parameter and shear modulus, respectively) that may be spatially variable in inhomogeneous scenarios. Hooke’s law is often written in terms of Poisson’s ratio \(\nu =\lambda /(2(\lambda +G))\) and Young’s modulus \(E=2G/(1+\nu )\) as \(E{e}_{ij}=(1+\nu ){\sigma }_{ij}-{\sigma }_{kk}{\delta }_{ij}\).

We present all forward models by showing the normalized root mean squared difference (NRMSD) of each submission component (displacement or stress) \({s}_{i}\) from reference component \({s}_{i}^{ref}\):

$${NRMSD}_{i} =\frac{\sqrt{{\sum }_{n=1}^{N}{\left({s}_{i}(n) - {s}_{i}^{ref}(n)\right)}^{2}}}{\sqrt{{\sum }_{n=1}^{N}{\left({s}_{i}^{ref}(n)\right)}^{2}}}$$
(3)

for \(N\) points along the radial transects. For exercises without surface topography, shear stress components of surface transects should be zero to satisfy the free surface condition. For these components we instead use:

$${NRMSD}_{i} = \frac{\sqrt{{\sum }_{n=0}^{N}{\left({s}_{i}(n)\right)}^{2}}}{P{N}^{1/2}}$$
(4)

where the prescribed pressure \(P\) is included to nondimensionalize the error expression. For exercises where an exact analytical solution is known, we use this as the reference so that the NRMSD metric directly measures error. For exercises where an exact analytical solution is not known, we use a mean of multiple submissions as the reference so that the NRMSD metric provides information about the variation between submissions. The NRMSD metric provides a compact way to examine model accuracy or variance, but it does not reveal the spatial distribution of model error/variance. We thus also plot surface displacements for all exercises, and more detail can be seen in the full sets of displacement and stress plots in supplemental figures S1-S12 or through the exercise website.

Forward model exercise 1: sphere in a homogeneous half space

Exercise 1 considers a spherical reservoir at three different depths in a homogeneous half space (Fig. 2); this can be considered a benchmark against the Zhong series expansion model that converges to the exact solution (Zhong et al. 2019). We ask participants to solve the quasistatic linear elastic governing equations on a semi-infinite Cartesian (\(x,y,z\)) domain with coordinate origin centered on a flat, stress-free interface above a spherical magma reservoir surface \(\Omega\) defined by \({x}^{2}+{y}^{2}+{\left(z+D\right)}^{2}={R}^{2}\) with reservoir radius \(R\) = 1 km and reservoir centroid depth below the free surface \(D\). With the domain defined, boundary conditions for Cauchy and Hooke equations are

$$\begin{array}{c}{\sigma }_{ij}{(\Omega )n}_{j}{n}_{i}=-P\\ {\sigma }_{zz}(x,y,z=0)=0\\ {\sigma }_{rz}(x,y,z=0)=0\end{array}$$
(5)

with \(n\) an inward pointing normal vector, and \(P\) the reservoir pressure change (relative to lithostatic pressure). It is understood that traction goes to zero far from the reservoir in the half space. Exercises 1A, 1B, and 1C use reservoir centroid depth \(D\) of 1.25 km, 2 km, and 4 km, respectively. All three scenarios use uniform \(\nu\) = 0.25, \(G\) = 10 GPa, and \(P\) = 10 MPa.

Fig. 2
figure 2

Exercise 1: sphere in a homogeneous half space. ac Problem geometry, with dashed output lines indicating the transects along which displacements and stresses are reported at 10 m intervals out to 5 km. di Surface displacement submissions; Table 1 provides details about each model and additional plots can be found in the supplemental material and exercise website

We use a Zhong series expansion model calculated to 64th order (Zhong et al. 2019), which testing shows converges to near machine precision for the reservoir depth/radius ratios considered in these exercises, as the reference for comparing other submissions against (Figs. 2, 3, 4, and 5). We note that these exercises identified an error in the published implementation of the Zhong model.

Fig. 3
figure 3

Convergence of surface displacements from approximate analytical Mogi and McTigue models relative to a Zhong series expansion model for different radius/depth (\(R/D\)) ratios. Several different order polynomials are shown for comparison. Error at a single location, rather than the aggregate normalized root mean squared difference (NRMSD) metric, would generally more closely follow the theoretical accuracy of (\(R/D\))3 for the Mogi model and (\(R/D\))6 for the McTigue model

Fig. 4
figure 4

Normalized root mean squared difference (NRMSD) for all exercise 1 (sphere in a homogeneous half space) submissions compared to a Zhong series expansion model. Values less than the x-axis bounds are plotted at the bounds. The modified NRMSD metric normalized by reservoir pressure is used for \({\sigma }_{zz}\) and \({\sigma }_{xz}\) at the free surface (ae)

Fig. 5
figure 5

Convergence of the normalized root mean square difference (NRMSD) of selected numerical methods compared to a Zhong series expansion model for different mesh/boundary treatments in exercise 1B (sphere in a homogeneous half space, centroid depth 2 km). Values < 10−6 are plotted at 10.−6. The modified NRMSD metric normalized by reservoir pressure is used for \({\sigma }_{zz}\) and \({\sigma }_{xz}\) at the free surface (ae). Mesh resolution is shown by the average distance per degree of freedom (m/DOF) along the reservoir boundary; this is equal to average element size divided by element order (1 for both BEMs, 4 for NGSOLVE, and 2 for COMSOL®). Marker sizes indicate domain radius and shapes indicate outer boundary conditions (BC): diamonds for half space BEMs with no outer boundary, squares for “fixed” BC (zero displacement), circles for “roller” BC (zero normal displacement plus zero tangential traction), and triangles for “infinite element” BC (using coordinate transformations to approximate an infinite domain)

Multiple other analytical approximations are available for spherical reservoirs in a homogeneous half space (Table 1). Figure 3 shows the convergence of the commonly used Mogi and McTigue models with increasing depth (i.e., depth/radius ratio), illustrating convergence of NRMSD at near-expected rates (\({(R/D)}^{3}\) for Mogi and \({(R/D)}^{6}\) for McTigue). Implementation errors were also fixed in the dMODELS and VSM McTigue models as a result of these exercises (Battaglia et al. 2013; Trasatti 2022), although there is still some difference between the McTigue submissions in these exercises. However, as expected, all of the McTigue models become more accurate with increasing reservoir depth (i.e., depth/radius ratio). For the 1.25 km deep reservoir, the Mogi and McTigue models show significant surface displacement error (0.47 NRMSD for Mogi and 0.23 NRMSD for McTigue, Fig. 4), for the 2 km deep reservoir, the Mogi model still shows significant error (0.11 NRMSD) but the McTigue models are reasonably accurate (0.01 NRMSD), and for the 4 km deep reservoir, both types of models are reasonably accurate (NRMSD <  ~ 0.01). Stresses are generally inaccurate for these models (NRMSD ~ 1). All of the analytical models have computation times << 1 s, except for the Zhong model which can take multiple seconds if many terms (> 50) in the series expansion are computed.

Table 1 Forward model submission information. “2D” indicates 2D axi-symmetric models. For outer domain edge boundary conditions, “fixed” is zero displacement, “roller” is zero normal displacement plus zero tangential traction, “inf. el.” (infinite element) uses coordinate transformations to approximate an infinite domain, and “free” is zero traction. Mesh and domain size values are approximate representative values; some participants used different discretization orders, used meshes with variable element sizes along the reservoir, and/or used slightly different mesh and domain sizes for different exercises. Only order of magnitude computation times are shown as participants reported computation times on different hardware

Several 3D BEMs were submitted (Table 1). Two dislocation BEMs using triangular elements (Nikkhoo and Walter 2015) were submitted with different numbers of reservoir mesh elements (320 and 4972). Both show similar surface displacement accuracy to the McTigue models (up to 0.21 NRMSD) for the 1.25 km deep reservoir, but their accuracy does not increase as quickly with depth. The dislocation BEMs provide generally more accurate stresses than McTigue models, but still exhibit up to 1 NRMSD for subsurface \({\sigma }_{zz}\). Convergence testing for a constant dislocation BEM demonstrates that it does not actually converge as the mesh is refined (Fig. 5) due to element edge singularity effects. A mixed BEM (MBEM) (Cayol and Cornet 1997) was submitted with up to 5120 reservoir mesh elements; this provides reasonably accurate stresses and displacements for all depths (< 0.05 NRMSD) and converges with reservoir mesh refinement (Fig. 5). The BEMs have reported computation times of tens of seconds to tens of minutes per simulation, appreciably longer than all the analytical models.

Multiple FEMs were submitted (Table 1); some with commercial software COMSOL® and Marc®, and others with open source codes or libraries: PyLith (Aagaard et al. 2013), GALES (Garg et al. 2021; Longo et al. 2012), MOOSE (Lindsay et al. 2022), NGSOLVE (Rucker et al. 2022), and DEFVOLC fictitious domain based on getFEM++ (Bodart et al. 2022, 2020). All the FEMs have < 0.1 NRMSD in displacements and stresses for all reservoir depths. Some outlier FEMs are 3D, so likely used coarser meshes for computational reasons. We show convergence testing from two models: NGSOLVE using 4th order elements and COMSOL® using 2nd order elements (Fig. 5). We do not address factors such as mesh size and element order in detail but note that for quadratic elements even a fairly coarse mesh size of 270 m can yield error < 0.01 NRMSD given a large enough domain size. However, domain sizes of 20 km (or 10 times the reservoir depth) are needed to obtain error < 0.01 NRMSD, whether “fixed” (zero displacement) or “roller” (zero normal displacement plus zero tangential traction) domain edge boundary conditions are used. In contrast, the “infinite element” coordinate transform approach enables high accuracy to be achieved with a domain size only slightly larger than the region of interest. Reported computation times per simulation vary from seconds-minutes for 2D models to tens of minutes for 3D models, suggesting that for 3D computations BEMs will often be faster than FEMs.

One submission used a Gaussian process emulator trained on around 500 FEM simulations (Anderson and Gu 2022; Anderson et al. 2019). The emulator has similar accuracy to the best FEMs for surface displacements for all scenarios. Computation times per emulator prediction are on the order of 0.001 s, which is comparable to most analytical models.

Forward model exercise 2A: sphere in a homogeneous half space with topography

Exercise 2A considers a spherical reservoir in a homogeneous half space overlain by surface topography consisting of a Gaussian “volcano” (Fig. 6). The setup is the same as exercise 1B, but with boundary conditions given by

$$\begin{array}{c}{\sigma }_{ij}{(\Omega )n}_{j}{n}_{i}=-P\\ {\sigma }_{zz}(x,y,z=h(\rho ))=0\\ {\sigma }_{rz}(x,y,z=h(\rho ))=0\end{array}$$
(6)

where \(\rho ={x}^{2}+{y}^{2}\) is a 2D (cylindrical) distance and

$$h(\rho )=Hexp\left(-\frac{{\rho }^{2}}{2{R}_{e}^{2}}\right)$$
(7)

defines a Gaussian volcanic edifice overlying the magma reservoir. Parameters are \(R\) = 1 km, \(D\) = 2 km (beneath the flat surface), \({R}_{e}\) = 1 km, \(H\) = 1.5 km, \(\nu\) = 0.25, \(G\) = 10 GPa, and \(P\) = 10 MPa.

Fig. 6
figure 6

Exercise 2A: sphere in a homogeneous half space with Gaussian topography. a Problem geometry, with dashed output lines indicating the transects along which displacements and stresses are reported at 10 m intervals out to 5 km. b, c Surface displacement submissions; Table 1 provides details about each model and additional plots can be found in the supplemental material and exercise website

In exercise 2A, exact solutions are not known so we use the average of all submitted FEMs as a reference against which to examine the variance in model results, after verifying that there are no visibly large FEM outliers (Figs. 6 and 7). Variance between all numerical models (FEM + MBEM) is < 0.03 NRMSD for displacements and 0.07 NRMSD for stress, and some of the outlier FEMs are 3D and thus likely used coarser meshes to reduce computational cost.

Fig. 7
figure 7

Normalized root mean squared difference (NRMSD) for all exercise 2A (sphere in a homogeneous half space with Gaussian topography) submissions relative to the average FEM submission. In this case, the NRMSD metric indicates variability between models rather than error or accuracy, and the FEM average may be biased (e.g., due to overrepresenting COMSOL®). For this exercise the NRMSD expression in Eq. 3 is used for all stresses since the topography makes \({\sigma }_{zz}\) and \({\sigma }_{xz}\) non-zero at the free surface. The label “varying depth” indicates a zeroth-order topographic correction (Williams and Wadge 1998), and “small slope” indicates a first-order topographic correction (Williams and Wadge 2000) (for which two different implementations were tested)

In this exercise, analytical models that neglect topography or only apply a varying depth (zeroth-order) correction (Williams and Wadge 1998) produce appreciably different surface displacements (by at least 0.2 NRMSD) from numerical models, highlighting the importance of accounting for topography in this scenario. Two existing implementations of the Williams and Wadge (2000) small slope (first-order) topographic corrections produce different surface displacements from each other, and also differ from numerical models. This suggests that there is an error in at least one of the implementations, but it also highlights the limits of the correction for large topography. The first-order corrections require computation times of seconds for a McTigue model, so could be faster than numerical models and thus useful for smaller topography.

Forward model exercise 2B: spheroid in a homogeneous half space

Exercise 2B considers an oblate (vertically shortened) sill-like spheroidal reservoir in a homogeneous half space (Fig. 8). The setup is the same as exercise 1, but with the reservoir defined by

Fig. 8
figure 8

Exercise 2B: oblate spheroid in a homogeneous half space. a Problem geometry, with dashed output lines indicating the transects along which stresses and displacements are reported at 10 m intervals out to 5 km. b, c Surface displacement submissions; Table 1 provides details about each model and additional plots can be found in the supplemental material and exercise website

$$\frac{{\rho }^{2}}{{R}_{\rho }^{2}}+\frac{{z}^{2}}{{R}_{z}^{2}}=1$$
(8)

Parameters are reservoir horizontal semi-diameter \({R}_{p}\) = 1 km, reservoir vertical semi-diameter \({R}_{z}\) = 0.1 km, \(D\) = 1.1 km, \(\nu\) = 0.25, \(G\) = 10 GPa, and \(P\) = 10 MPa.

In exercise 2B, exact solutions are not known so we use the average of all submitted FEMs as a reference against which to examine the variance in model results, after verifying that there are no visibly large FEM outliers (Figs. 8 and 9). Variance in all FEMs is less than 0.03 NRMSD, and some outlier FEMs are 3D and thus likely used coarser meshes for computational reasons. The dislocation BEMs and the MBEM are similar to FEMs (< 0.10 NRMSD), and the FEM-trained emulator is also similar to FEMs (< 0.07 NRMSD).

Fig. 9
figure 9

Normalized root mean squared difference (NRMSD) for all exercise 2B (oblate spheroid in a homogeneous half space) submissions relative to the average FEM submission. In this exercise, the NRMSD metric indicates variability between models rather than error or accuracy, except for \({\sigma }_{zz}\) and \({\sigma }_{xz}\) at the free surface where the modified NRMSD metric normalized by reservoir pressure is used ae). The FEM average may be biased (e.g., due to overrepresenting COMSOL®). Values less than the x-axis bounds are plotted at the bounds

We show several approximate analytical models for comparison. One finite ellipsoidal model (Cervelli 2013) is extended from a previous model (Yang et al. 1988) and handles prolate or oblate dipping ellipsoids with accuracy that increases with the ratio of the shallowest depth along the spheroid over the minimum semi-diameter. Another finite spheroidal model (Nikkhoo and Rivalta 2022) offers similar accuracy and can consider more general geometries, but requires more computation time (order 0.1 s compared to 0.01 s). Both models differ appreciably from FEMs (by 0.22 NRMSD) in this scenario. A point spheroid model (Nikkhoo et al. 2017) is less accurate, differing from FEMs by 0.46 NRMSD. Another approximate analytical model for an ellipsoidal reservoir has been derived (Amoruso and Crescentini 2013, 2011) but is not publicly available. We also show an analytical model for a penny-shaped crack (Fialko et al. 2001), which for this highly oblate reservoir provides a reasonable approximation, differing from FEMs by 0.33 NRMSD.

Forward model exercise 2C: sphere in a heterogeneous half space

Exercise 2C considers a spherical reservoir in a heterogeneous half space where elastic moduli vary with distance from the reservoir to approximate a thermal gradient (Fig. 10). The setup is the same as exercise 1B, but with spatially variable elastic coefficients. Poisson’s ratio and Young’s modulus are assumed to be temperature dependent (Bakker et al. 2016) and vary in a radial direction away from the reservoir, with \({r}^{2} = {x}^{2}+{y}^{2}+{\left(z+D\right)}^{2}\):

Fig. 10
figure 10

Exercise 2C: sphere in a heterogeneous half space. a, b Problem geometry and elastic moduli, with dashed output lines indicating the transects along which displacements and stresses are reported at 10 m intervals out to 5 km. c, d Surface displacement submissions; Table 1 provides details about each model and additional plots can be found in the supplemental material and exercise website

$$E(r) = {E}_{0}\left[1 - \frac{1}{2}\left(\exp\left(\frac{T(r)}{{T}_{R}}\right)-1\right)\right]$$
(9)
$$\nu (r) = \left(1 - \frac{E(r)}{{E}_{0}}\right)\left({\nu }_{R}-{\nu }_{0}\right)+{\nu }_{0}$$
(10)

Temperature distribution (in degrees Celsius) is given by the infinite space conduction solution outside of \(r=R\) as \(T(r) = ({T}_{R}-{T}_{0})\frac{R}{r}+{T}_{0}\). Parameters are \(R\) = 1 km, \(D\) = 2 km, far-field Poisson’s ratio \({\nu }_{0}\) = 0.25, near-reservoir Poisson’s ratio \({\nu }_{R}\) = 0.4, far-field temperature \({T}_{0}\) = 100 °C, near-reservoir temperature \({T}_{R}\) = 1000 °C, far-field Young’s modulus \({E}_{0}\) = 10 GPa, and \(P\) = 10 MPa. We note that elastic moduli generally also vary with depth but leave analysis of this for future exercises.

In exercise 2C, exact solutions are not known so we use the average of all submitted FEMs as a reference against which to examine the variance in model results, after verifying that there are no visibly large outliers in FEMs (Figs. 10 and 11). Of the models used in these exercises, only FEMs can directly address material heterogeneity. Variance in all FEMs is < 0.02 NRMSD, and some outlier FEMs are 3D so likely used coarser meshes for computational reasons.

Fig. 11
figure 11

Normalized root mean squared difference (NRMSD) for all exercise 2C (sphere in a heterogeneous half space) submissions relative to the average FEM submission. In this exercise, the NRMSD metric indicates variability between models rather than error or accuracy, except for \({\sigma }_{zz}\) and \({\sigma }_{xz}\) at the free surface where the modified NRMSD metric normalized by reservoir pressure is used ae. The FEM average may be biased (e.g., due to overrepresenting COMSOL®). Values less than the x-axis bounds are plotted at the bounds

We next examine how well homogeneous models can approximate the heterogeneous models. A Zhong series expansion model that uses the far-field moduli values (\(\nu\) = 0.25, \(E\) = 10 GPa) yields surface displacements that differ from the heterogeneous FEMs by 0.9 NRMSD, showing that the heterogeneity has a large impact in this scenario. We then determine what moduli values yield the best match (i.e., minimum NRMSD) to the heterogeneous FEM “FEM 2D COMSOL J” by using a Nelder-Mead simplex inversion with bounds 0.25 < \(\nu\)<0.4, 0 < \(E\)<10 GPa, and all other parameters fixed according to the exercise specifications. A model predicting surface displacements that differ from the heterogeneous FEM by only 0.1 NRMSD can be found with \(\nu\) = 0.4 (the value at the reservoir boundary) and \(E\) = 3.82 GPa (the value ~ 200 m outside the reservoir boundary), showing that in this scenario surface deformation is most sensitive to the elastic moduli very near the reservoir. This suggests that a homogeneous model with elastic moduli representing those near the reservoir could provide a good enough approximation to be useful for some applications, since uncertainty in model parameters such as elastic moduli often exceeds 10% (Masterlark et al. 2016).

Inversion (validation) exercises

Reservoir parameters inverted from surface deformation data can differ due to variability in the types of data used, data processing methods (e.g., downsampling strategies), inversion methods (e.g., gradient descent or Monte Carlo sampling), forward model choice, and inherent tradeoffs between parameters (Anderson and Segall 2013; Bagnardi and Hooper 2018; Bato et al. 2018; Parks et al. 2012). Ground-based measurements include Global Navigation Satellite System (GNSS), tiltmeter, and strainmeter data. These have limited spatial coverage but high temporal resolution and accuracy (e.g., GNSS can provide mm accuracy for static positions and record 10 s of samples per second). Remote sensing products include satellite interferometric synthetic aperture radar (InSAR), SAR amplitude correlation, airborne lidar, and structure-from-motion data. These typically only have temporal resolution of days but good spatial coverage (e.g., InSAR provides up to sub-cm accuracy in time-series analysis and m-scale pixels for some sensor platforms). Each type of data has various sources of noise acting over different temporal and spatial scales. Inversions take a range of approaches for estimating solutions and/or uncertainty, for example, using methods based on linearization or methods based on various types of parameter searches (e.g., Monte Carlo sampling) (Aster et al. 2018; Bagnardi and Hooper 2018; Menke 2018; Tarantola 2004).

Inversion exercise data

For this first phase of inversion exercises (exercise 3), we focus on static elastic displacements from spherical reservoirs. We provided synthetic ground displacement data in the form of ascending and descending line-of-sight (LOS) unwrapped InSAR interferograms (Fig. 12 and Table 2). We also provided 3-component (east, north, vertical) data from 400 (20 × 20) regularly spaced observation points. This can be considered analogous to a GNSS survey although it is unrealistically dense; for comparison, even the best monitored volcanoes such as Kīlauea and Piton de la Fournaise have on the order of tens of measurement locations at a given time from permanent and/or temporary stations.

Fig. 12
figure 12

Synthetic data for exercise 3 inversions. a, b Low noise InSAR data. ce Low noise GNSS data. f, g High noise InSAR data. hj High noise GNSS data

Table 2 Synthetic InSAR properties for both high noise and low noise datasets. The directions to satellites are given in terms of azimuth and incidence angles (from vertical) or line-of-sight (LOS) unit vectors

Participants could use either or both types of data. Spatially uncorrelated noise was added to the modeled GNSS displacements, and spatially correlated noise (Fukushima et al. 2005) was added to the InSAR data where correlation was a function of distance \(C(r) = V\exp(r/\lambda )\) for variance \(V\) and correlation length \(\lambda\). We provided two different datasets with different reservoir parameters and noise levels; a low noise set with better signal/noise ratio than most real data and a high noise set (Table 3). However, we emphasize that these exercises will not directly indicate how effectively reservoir parameters are estimated in real settings since we use simple forward models and noise sources, provide exact a priori information about these factors, and provide displacements rather than having participants infer these from raw data (for which processing methods can differ). We note that all displacement data and InSAR LOS direction information was given to participants at limited precision (i.e., rounded), which introduces an additional source of error.

Table 3 Synthetic data parameters for inversion exercises (exercise 3). Only the first two columns (noise parameters) were given to participants during the exercises

Participants were informed that the forward model used to generate data was the Zhong series expansion model for a pressurized spherical reservoir in a homogeneous elastic half space (Zhong et al. 2019), but other forward models could be used in inversions (e.g., faster approximate models). We provided the elastic moduli (\(\nu\) = 0.25, \(G\) = 10 GPa) and asked participants to estimate the reservoir position (east, north, depth), radius, volume change, and pressure change.

Inversion exercise parameter estimates

Exercise 3 results for the low and high noise datasets are shown in Figs. 13 and 14 and are grouped by the types of data and forward models used (Table 4). Parameter estimates were submitted using Mogi and McTigue models, a dislocation BEM, a MBEM, a Gaussian process emulator, and a Zhong series expansion model (Zhong et al. 2019) (i.e., the true forward model). Participants were not given instructions on what cost functions to use or on how to report their best estimates of parameters (e.g., minimum cost or maximum likelihood or maximum a posteriori) or confidence intervals, to exemplify the variability in these measures that are used in practice.

Fig. 13
figure 13

Exercise 3 low noise parameter estimates. Vertical black lines are true parameters that correspond to a depth/radius ratio of ~ 2.3. Black and colored bars indicate 64% and 95% confidence bounds, respectively. Some participants did not report bounds or only reported one set of bounds. For forward models except the Mogi model and Gaussian process emulator, volume change (d) is not inverted directly. Estimates of InSAR offsets are not shown

Fig. 14
figure 14

Exercise 3 high noise parameter estimates. Vertical black lines are true parameters that correspond to a depth/radius ratio of ~ 1.5. Black and colored bars indicate 64% and 95% confidence bounds, respectively. Some participants did not report bounds or only reported one set of bounds. For forward models except the Mogi model and Gaussian process emulator, volume change (d) is not inverted directly. Estimates of InSAR offsets are not shown

Table 4 Inversion submission information

We first consider the dependence of parameter estimates on the data types that participants chose to use. Three participants conducted otherwise identical inversions for just InSAR data, just GNSS data, and both data types. There are appreciable differences in parameter estimates depending upon the data type used, which emphasizes that care should be taken when comparing results for these exercises when different data types were used. Parameter estimates generally show the lowest accuracy and highest reported uncertainty for just InSAR data; this occurs because the InSAR data only resolve line-of-sight displacements and contain spatially correlated noise.

East and north locations are generally reported to be the best constrained parameters; estimates are roughly similar across all submissions but still have a range of ~ 300 m for both low and high noise datasets. We expect the overestimation of north coordinates in most submissions partly reflects a bias introduced by the added noise and/or data round-off error. Depth estimates have a range of ~ 400 m (or 20%) for low noise data and 800 m (or 35%) for high noise data. Volume change estimates have a range of 30% for low noise data and 50% for high noise data, pressure change estimates have a range of two orders of magnitude for both low and high noise data, and radius estimates have a range of one order of magnitude for both low and high noise data. With a Mogi model, pressure change and radius cannot be separately resolved, while for many other forward models pressure change and radius were directly inverted for and then volume changes were calculated given a formulation of reservoir elastic compressibility (often a full space approximation). However, for a spherical reservoir, strong tradeoffs are generally expected between pressure change and reservoir radius, particularly for higher depth/radius ratios (McTigue 1987; Parks et al. 2012; Segall 2010). One participant tested the effect of using MBEMs with three different upper bounds on radius and found similar results for all parameters except pressure change. This highlights the need to use additional constraints in order to robustly constrain pressure changes and reservoir radius (Anderson et al. 2019).

Similar east and north location estimates were obtained with each type of forward model, suggesting that these parameters are not sensitive to forward model choice. Depth, volume change, and pressure change show more sensitivity to forward model choice, as expected (Dieterich and Decker 1975), although the relative variation between different forward models depends on the type of data and inversion methods used. For the same inversion method and type of data, FEM emulator parameter estimates nearly exactly match series expansion parameter estimates. Both parameter estimates also almost exactly match the true parameters in low noise data when InSAR and GNSS data are combined. We do not make one-to-one comparisons between other forward models given that different inversion methods were used, but in most cases McTigue and Mogi models underestimate volume change. This suggests that when inversion methods are well calibrated, forward model choice can appreciably impact results for reservoirs with moderate depth/radius ratios such as in the low noise scenario (with ratio ~ 2.3). However, there is generally more variation between parameter estimates using the same forward model than between parameter estimates using different forward models, which suggests that much of the variance between estimates is dominated by inversion methods rather than by forward model choice or even data types.

Inversion exercise uncertainty estimates

Importantly, the difference between different parameter estimates and/or the true parameter values often exceeds reported uncertainties, and reported uncertainties estimates differ significantly between submissions which could be due to a variety of factors. Inversions require defining misfit between data (vector \({u}^{obs}\)) and model predictions (vector \({u}^{mod}\)); commonly using the reduced chi-square metric \({\chi }^{2}={\left({u}^{obs}- {u}^{mod}\right)}^{T}{C}^{-1}{\left({u}^{obs}- {u}^{mod}\right)}\) which is weighted by data covariance matrix \(C\). In these exercises, we specified \(C\), but in practice, \(C\) is often estimated. Misfit can then be converted to likelihood \(L\) following the general relation \(L\sim \exp(-{\chi }^{2})\), which can be multiplied by potentially non-uniform prior distributions or terms accounting for additional sources of uncertainty (Aster et al. 2018). Next, a sampling method is used to explore the parameter space; this can range from grid searches to iterative methods such as neighborhood algorithms or MCMC (Markov Chain Monte Carlo) methods that attempt to more densely sample regions where likelihood is higher and/or varying more abruptly (Sambridge 1999). Different instances of random noise can also be added to the data, and parameters that have a linear relation to data (e.g., reservoir pressure) can either be included in the nonlinear parameter searches or solved for separately given each parameter combination. Given a set of samples, the misfit or likelihood values can provide insight into the parameter space, and probability density functions (PDFs) can be calculated by integrating either over the likelihood function or over sample density (for likelihood-based sampling).

Figure 15 shows PDFs from a subset of the low noise inversions, as well as minimum reduced chi-square \({\chi }^{2}\) misfits from a uniform grid search with a McTigue model. The \({\chi }^{2}\) plots show relatively pronounced global minima for east location, north location, and depth, but much broader minima for radius, volume change, and pressure change. Most submitted PDFs from MCMC methods show comparatively narrow global maxima for all parameters, which is expected given the dense data we provided (which is typical for real InSAR data but not real GNSS networks), although some MCMC PDFs are much narrower than others. Many submitted PDFs from neighborhood algorithms used a smaller number of samples than the MCMC methods and show much broader PDFs, which could indicate that some of these PDFs are under-resolved. However, some of the neighborhood algorithm parameter estimates are still relatively accurate, and so in some cases (e.g., with computationally expensive forward models) there may be advantages to such methods that can use a smaller number of samples. Overall, there is appreciable variability between different PDF submissions, even when accounting for the use of different types of data and forward models. Some of these discrepancies might arise from using different downsampling methods, different (and potentially biased or overly restrictive) prior distributions/parameter bounds, and/or from using different (or differently configured) methods for searching the parameter space.

Fig. 15
figure 15

Probability density functions (PDFs) from a subset of the exercise 3 low noise submissions. For forward models except the Gaussian process emulator, volume change (d) is not inverted directly. Coarser parameter resolution provided with some submissions causes angular PDF appearances. Brown lines (right axes) show the minimum reduced chi-square \({\chi }^{2}\) misfit (normalized so that the minimum is 1) that can be obtained with a McTigue model based on grid searches over the parameter space

Discussion

The results of these exercises emphasize some important considerations for volcano deformation modeling. Even points that may seem obvious to some readers are worth discussing, as these exercises highlighted the diversity of modeling practices that are used across the community. We do not attempt to provide comprehensive guidelines, given the limited scope of this first phase of exercises and the wide variety of volcano deformation scenarios. Rather, we focus on general forward and inverse modeling insights, and on identifying some promising avenues for further development.

Considerations for verifying forward models

While it is generally acknowledged that verification is an important part of using numerical models, these exercises identified several discrepancies in both analytical and numerical forward models which suggests that the volcano geodesy community could benefit from more systematic model verification practices. Analytical forward models might not always be tested thoroughly on the assumption that they were tested by previous users and/or are too simple to need extensive testing. However, the implementation errors we identified in several spherical reservoir forward models and topographic corrections show that verification is still important, with these exercises providing one way to check that a particular model implementation has been verified. For numerical forward models, convergence testing should be common practice. However, our results emphasize the importance of considering both mesh resolution and far-field boundary treatment (e.g., domain size and boundary conditions); for many submissions accuracy was limited primarily by domain size, which often needs to be an order of magnitude or more larger than the region of interest (Figs. 4 and 5). These exercises can provide rough indications of appropriate parameter choices, but additional testing should be conducted for the conditions of each application.

Advantages of different forward models

Most analytical models have computation times that are orders of magnitude faster than all of the numerical models (Table 1). However, the accuracy of available analytical models can be limited in cases with shallow or non-spherical reservoirs, steep topography, or heterogenous rock properties, all of which are common in volcanic settings. For spherical reservoirs in homogeneous half spaces, when depth/radius ratios are greater than ~ 2, it is reasonable to use the approximate Mogi or McTigue models (Figs. 3, 4, 13, and 14) (Segall 2010; Taylor et al. 2021), but when depth/radius ratios are lower these models can start to yield appreciable error in estimates of volume change and depth. The Zhong model is arbitrarily accurate, but at the expense of much larger computation times. For spheroidal reservoirs with low depth/radii ratios, all available analytical models are approximations that exhibit appreciable error, as is demonstrated for the sill-like reservoir in exercise 2B (Figs. 8 and 9). When topographic features are present that have height/width ratios greater than ~ 0.3, approximate topography corrections are generally not accurate (Segall 2010), as is demonstrated with the Gaussian hill topography in exercise 2A (Figs. 6 and 7). Such edifices would generally lead to overestimates of source elevation and pressure change (Cayol and Cornet 1998). Crustal rock in volcanic settings can exhibit complicated heterogeneity, including vertical stratification (Currenti et al. 2007; Masterlark 2007), or radial variation around magma reservoirs due to the impact of temperature on both elastic moduli (Bakker et al. 2016) and viscoelasticity (Dragoni and Magnanensi 1989). We only consider radial heterogeneity in elastic moduli from a steady-state conduction temperature profile, but exercise 2C shows that even just this one source of heterogeneity can appreciably impact predicted deformation (Figs. 10 and 11). For this scenario, deformation can be approximated reasonably well by a model using homogeneous elastic moduli with near-reservoir moduli (i.e., high-temperature) values. This suggests that when homogeneous elastic moduli are used, if they are prescribed a priori then near-reservoir values may be most appropriate, and if they are estimated from geodetic inversions then the estimates will likely reflect near-reservoir values (Anderson and Poland 2016). More studies are needed to address how well different types of heterogeneity can be approximated with analytical models; for example, vertical stratification can amplify surface displacements and lead to source elevation and pressure being overestimated (Currenti et al. 2007; Masterlark 2007).

BEMs can be more accurate than analytical models for considering topography and non-spherical reservoirs (Figs. 6, 7, 8, and 9), but are less flexible than FEMs since most cannot include heterogeneity. Care should be taken when using constant dislocation BEMs, since element edge singularity effects limit their accuracy (Fig. 5). While BEMs will generally have faster computation times than FEMs, making quantitative comparisons from these exercises is difficult given the variable dimensionality and mesh/domain sizes of submissions (Table 1).

All the FEMs were reasonably accurate for all exercises, despite the presence of some discrepancies due to the use of different mesh and domain sizes. Reported computation times for FEMs ranged widely (Table 1); this is partly due to differences in the discretizations and computational resources used but may also reflect the efficiencies of different codes. The computational metrics that are most important (e.g., memory use, speed, parallelizability) will depend on the application, and may evolve over time as most of these models are being actively developed. Importantly, several open source codes performed well in comparison to commercial software, although participants indicated that some FEMs do require more time to learn and/or use than others.

While it is feasible to conduct probabilistic inversions with FEMs or BEMs that have computation times of seconds-minutes, doing so requires using more computational resources and/or fewer samples than could be used with faster forward models, which our exercises indicate can result in less accurate PDFs. However, several submissions represent promising approaches for producing fast and accurate forward model predictions in inversions. Fictitious domain approaches allow increased efficiency when conducting multiple simulations with different model parameters (Bodart et al. 2022, 2016), while emulators trade an initial training cost for very fast subsequent predictions. Exercises 1 and 2B demonstrated the utility of a Gaussian process emulator for spherical or spheroidal reservoirs in a homogeneous half space, but further work is needed to develop emulators that consider more complex magma system geometries, topography, and heterogeneous rheology. This will require larger training times and/or building emulators specific to particular volcanoes over particular time periods but could be valuable long term.

Discrepancies between inversions

Ultimately, we found that inversion methods contributed much more than forward model choice to differences between parameter estimates, even in the unrealistically ideal scenario of having dense, low noise datasets and a known true forward model. For spherical reservoirs, some tradeoffs between pressure change, radius, volume change, and depth are expected (Parks et al. 2012), although submissions also exhibited differences in east and north locations. Importantly, many reported confidence bounds did not overlap with other estimates or with the true parameter values. This shows that even for inversions with only a few free parameters, it is still vital to test inversion methods with synthetic data and ensure that both parameter estimations and uncertainty estimations are accurate. These exercises provide a starting point, but additional problem-specific testing is important since parameter space exploration can be more difficult for more complex forward models and sparser or noisier data. Finally, these exercises emphasize the importance of comprehensively reporting uncertainty and carefully interpreting reported uncertainties. This can be facilitated by clear specification of what uncertainty sources (e.g., model error, data variance) were considered, using approaches such as hyperparameters to account for model uncertainty, and including plots of misfits and/or PDFs.

For all inversion methods, there will be tradeoffs between accuracy, robustness, and computational cost. Neighborhood algorithms typically used far fewer samples than MCMC sampling or grid searches to produce roughly similar parameter estimates. However, neighborhood algorithms produced broader PDF peaks than MCMC sampling, suggesting that care needs to be taken that neighborhood algorithms are not under-resolving the parameter space. There is also discrepancy between PDF peaks in different MCMC inversions, indicating that similar care needs to be taken when calibrating these methods. In some cases, combining different inversion methods and/or forward models could be optimal, as in one submission that followed broad parameter searches using an approximate analytical forward model with more focused searches using a numerical model.

Future directions

These exercises only considered static elastic displacements from spheroidal reservoirs, and only tested inversions for synthetic and already processed data with a known spherical reservoir source model. Since discrepancies between submissions were found even for these simple scenarios, this project has demonstrated the importance of community verification and validation exercises. Feedback from participants also indicates that these exercises have been a useful learning tool. Using the SCEC earthquake verification and validation exercises as a template, we anticipate benefits from expanding these exercises to more complex volcano deformation problems. Important forward modeling scenarios include time-dependent poro-viscoelastic-plastic rock response, sheet intrusions, multiple deformation sources, hydrothermal fluid circulation, and surface loading. Important inversion scenarios include providing raw time-series data to test all stages of data processing workflows, providing complicated deformation sources with minimal a priori information, and providing data from real volcanoes with additional constraints from other data types beyond deformation. Like the SCEC exercises, this will likely require multiple concurrent efforts. Such efforts will have great potential for community building, forming standards of reproducibility, advancing methods development, and gaining new science insights. We also expect that model verification and validation exercises could be expanded upon or introduced in several other aspects of volcano science with great effect. Such efforts could be extended in common, rigorous, and sustained frameworks to advance volcano science, monitoring, and hazard forecasting.