6.1 Introduction

Capturing epistemic uncertainty within probabilistic seismic hazard assessments (PSHAs) has become a topic of increasing interest over the past couple of decades, especially since the publication of the SSHAC approach (Budnitz et al. 1997). Put simply this means that the seismic hazard model, comprising a characterisation of the seismic sources (locations, magnitude-frequency relations, maximum magnitudes) and the ground-motion model (median ground motion for a given magnitude and distance and its aleatory variability, characterising the probability distribution around this median), need to capture our knowledge and also our doubt about earthquakes and their associated shaking in the region of interest.

Epistemic uncertainty is generally quantified by constructing a logic tree with weighted branches modelling our degrees of belief in different inputs (Kulkarni et al. 1984), e.g.: what is the largest earthquake that could occur along a fault (maximum magnitude)? For this article I am using the terms “aleatory variability” and “epistemic uncertainty” in the way they are commonly used in the engineering seismology community, i.e. aleatory variability is accounted for in the hazard integral whereas epistemic uncertainty is capture within a logic tree. Stafford (2015) proposes a different framework and terminology. Because a given structure may only need to withstand a single potentially-damaging earthquake during its lifetime, Atkinson (2011) proposes that all between-event uncertainties (i.e. including some of those currently modelled as aleatory variability) should be considered epistemic. This viewpoint is not considered in the following as it is not (yet) standard practice.

The comparisons of the uncertainties captured in various recent PSHAs shown by Douglas et al. (2014) suggest potential inconsistencies in some logic trees. Douglas et al. (2014) report, for various studies and locations, a measure of the width of the fractiles/percentiles of the seismic hazard curves equal to 100[log(y84)−log(y16)] where y84 and y16 are the 84th and 16th percentiles of peak ground acceleration (PGA) or response spectral acceleration (SA) for a natural period of 1s. This measure indicates how much uncertainty there is in the assessed hazard. Douglas et al. (2014) argue that this measure should vary geographically with the level of knowledge of engineering-significant ground motions, specifically areas with limited data (generally stable regions) showing high values (large uncertainty) and areas with considerable data showing lower uncertainties. This behaviour was generally seen in Europe [e.g. within the European Seismic Hazard Model (ESHM, Woessner et al. 2015)] but not in all studies or for all locations. Comparing the ESHM with site-specific studies for the same locations suggests that the overall uncertainty modelled in the ESHM is too low or alternatively the fractiles of the site-specific studies too wide. The key driver of the modelled uncertainty in PSHAs is often the ground-motion logic tree (e.g. Toro 2006) and hence this is the first place to start when seeking a method to construct PSHAs that reflect the underlying level of knowledge.

The next section of this article assesses the level of uncertainty captured within some typical ground-motion logic trees. The following section discusses the various approaches applied in the past decade to construct ground-motion logic trees. The main focus of this article is to propose a new approach, which is presented in the subsequent section and then applied in the penultimate section for three European countries. The article ends with some conclusions.

6.2 Uncertainties Captured in Logic Trees

Toro (2006) presents approximate results to quantify the impact of epistemic uncertainty in the median ground motion on the mean hazard curve. He states that the ratio, R, of the mean to median ground motion (e.g. PGA) for a given annual frequency of exceedance roughly equals:

$$ \mathrm{R}=\exp \kern0.5em \left(0.5\kern0.5em k\kern0.5em {\sigma}_{\mu}^2\right), $$
(6.1)

where k is the slope of the hazard curve in log-log space and σμ is the lognormal epistemic uncertainty in the median ground motion (in terms of natural logarithmsFootnote 1). He notes that this increase in ground motion is the same as would be caused by an increase in the aleatory variability from σ0 to σ1, where σ1 2 = σ0 2+ σμ 2.

The slope of the hazard curve, k, generally is between 1 (generally, areas of low seismicity) and 4 (generally, areas of high seismicity) but k depends on the range of annual frequencies of exceedance considered (e.g. Weatherill et al. 2013; H. Bungum, written communication, 2017). Douglas (2010b) reports values of σμ between 0.23 (for well-studied areas such as western North America) and 0.69 (for the largest subduction events) from comparisons of median PGAs from many published ground motion prediction equations (GMPEs) for a few scenarios. Toro (2006) reports similar values from logic trees used in various US site-specific studies. Using Eq. (6.1) leads to the percentage increases in the median hazard given in Table 6.1. These calculations show that the effect of large epistemic uncertainties in the ground-motion model can be dramatic, e.g. more than doubling of the median ground motion (105% increase) for σμ = 0.69 and k = 3 and the effect is highly sensitive to the exact level of uncertainty modelled. Consequently there is a need for a rigorous method to assess what uncertainty should be captured.

Table 6.1 Percentage increase when going from median to mean ground motion due to epistemic uncertainty in the ground-motion model of σμ for hazard curves with a slope k

Rearranging Eq. (6.1) and assuming that the epistemic uncertainty in the hazard results is entirely due to the ground-motion logic tree, allows σμ to be estimated for published PSHAs. Table 6.2 reports the σμ obtained by this approach for a representative selection of the PSHAs considered by Douglas et al. (2014). The k value for each site and PSHA is estimated from the slope of the hazard curve computed using the ground-motion amplitudes for annual frequency of exceedance of 1/475 and 1/2475. This table suggests that the epistemic uncertainty captured in the ground-motion logic tree of regional PSHAs such as ESHM are too low, particularly for stable areas, as they are far below those captured in site-specific studies (e.g. Toro 2006) and those reported by Douglas (2010b) based simply on comparing predictions from GMPEs for some scenarios.

Table 6.2 Estimated epistemic uncertainty in the ground-motion logic tree for some representative sites and PSHAs (for 475 year return period and PGA, except for Thyspunt where the spectral acceleration at 0.01 s is used)

A quick check that could be performed when developing ground-motion logic trees for regional applications is to compute the σμ for a few key scenarios (see below for guidance from disaggregation on what earthquakes are likely to dominate the hazard) and compare it to the values reported in Table 6.2 for site-specific studies. Values much lower than these values would need to be carefully justified. Also the σμ for ground-motion logic trees for application in active areas should generally be lower than the σμ for stable areas, because of the lack of ground-motion data of engineering significance from such areas to constrain the GMPEs.

As stated by USNRC (2012), the aim of any PSHA should be to capture the “centre, body and range of technical defensible interpretations”, even if the PSHA is regional and not site-specific and even if it is being conducted at a low SSHAC level (1 or 2). The results of PSHAs should also be stable with time, i.e. if a new PSHA was conducted for the sample location in the future the results of the new study should not be greatly different to the original results. This requirement means that sufficient uncertainty has to be modelled so that the fractiles of the assessed hazard from the original and subsequent PSHAs broadly overlap.

Bommer and Scherbaum (2008) note that the different models on the branches of logic trees should be mutually exclusive and collectively exhaustive (MECE) so that the branch weights can be considered as probabilities, an implicit assumption when computing mean hazard curves and those for different fractiles. This means that one of the models (although we do not know which) is the true model and that all the models are independent. This criterion is likely not to hold for logic trees developed using GMPEs derived from overlapping datasets.

This study discusses how to populate logic trees for the ground-motion component of the seismic hazard model to capture epistemic uncertainty within national or continental-scale seismic hazard assessments, where our knowledge of ground motions in moderate and large earthquakes varies. The focus is not on site-specific studies (e.g. those conducted for critical infrastructure), which have different challenges (e.g. higher regulatory scrutiny) and opportunities (e.g. much smaller geographical range and more resources per km2 covered). The approach proposed in this article may, however, be of interest for these studies, particularly in regions with limited data. The focus of this study is on models for the prediction of the median ground motions rather than the, equally important, models for the aleatory variability (sigma).

6.3 Current State of Practice

There are three principal ways in which geographically-varying logic trees could be constructed, if the approach of Savy et al. (2002) using point-based estimates from expert judgement for various magnitude-distance-period scenarios is excluded as being too cumbersome for regional-scale PSHA. These are summarised in this section, focussing on the relatively new approach of backbone models. Goulet et al. (2017, Chapter 2) provide a recent comprehensive review of the development of ground-motion logic trees to capture epistemic uncertainty, particularly within US projects.

6.3.1 Multiple Ground Motion Prediction Equations

The most common way of constructing ground-motion logic trees is to populate the branches with a selection of previously published GMPEs. A recent example of such a logic tree was that used in the ESHM and presented in Delavaud et al. (2012). A recent overview of this approach and ways of selecting and weighting the GMPEs is provided by Kale and Akkar (2017) in the context of the Earthquake Model for the Middle East (EMME, Danciu et al. 2018).

For areas with a set of recently-published and robust GMPEs (e.g. California, Japan and Italy) this approach appears relatively straightforward. Nevertheless, for these areas there are difficulties in deciding which of the many available models to choose and how many models should be included. For California two recent projects (NGA-West1 and 2) have developed a set of five GMPEs using a consistent database and independent and dependent variables (Power et al. 2008; Bozorgnia et al. 2014). Therefore, it would appear that a logic tree for California should comprise the most recent versions of these five GMPEs. However, there are doubts that such a logic tree would capture all the epistemic uncertainty concerning earthquake ground motions in future earthquakes because the models may be too similar. In this situation it has been proposed that additional logic-tree branches equivalent to a backbone approach are required (see below).

For many areas, however, there are no indigenous GMPEs, or those that are available are based on extrapolations from weak-motion data, often using the stochastic method (e.g. Rietbrock et al. 2013). When adopting a multiple GMPE approach for such areas, the problem is which of the models published for other regions should be included and will these GMPEs truly model ground motions in future earthquakes. Delavaud et al. (2012) were uncertain about whether ground motions in much of northern Europe (continental crust) were similar to those in the Mediterranean region, for which many indigenous models exist, or closer to those in the Scandinavian shield, for which they believed GMPEs developed for tectonically-similar eastern North America could be used. Therefore, their ground-motion logic tree included GMPEs for both tectonic regimes for that part of the continent. The logic tree proposed by Stewart et al. (2015) for the Global Earthquake Model discusses the considerable epistemic uncertainty in assessing ground motions for the majority of the world and they seek to propose a logic tree that captures this uncertainty by choosing robust models that displaying differing characteristics, e.g. decay rates in subduction earthquake ground motions. However, this is not an objective procedure nor was the resulting logic tree checked to see whether it models sufficient (or what appears to be sufficient) uncertainty. EMME sought to combine and improve on both these procedures for the construction of its ground-motion logic tree (Danciu et al. 2018).

Musson (2012) proposes that the weights on the logic tree are the probability of each GMPE being “the best model available”. As pointed out by Bommer (2012), this implies that if there is only a single model available for a tectonic regime, this GMPE would automatically get a weight of unity, implying no epistemic uncertainty. In fact the uncertainty may be high, particularly as a single model often implies a lack of data from which to build more.

6.3.2 Hybrid Empirical Composite Ground-Motion Model

To create ground-motion models that are more regionally-specific, Campbell (2003) proposed the hybrid-empirical method where existing empirical GMPEs are adjusted based on the ratio of stochastic models for the target and host regions. In Campbell (2003) this method is applied to adjust GMPEs for California to make them applicable for eastern North America. Douglas et al. (2006) developed this method to account for uncertainties in developing stochastic models for the target region, where invariably there are fewer recorded data than in the host region (because otherwise robust GMPEs for the target region could have been proposed directly). These uncertainties in the various parameters of the stochastic model, e.g. stress (drop) parameter, are accounted for using a logic tree so that many stochastic models are created and applied when computing the ratios between target and host regions. In addition, Douglas et al. (2006) apply the technique to GMPEs from various host regions, again to capture uncertainty in the final logic tree of the adjusted models.

This approach appears more rigorous and transparent than the multiple GMPE approach using previously-published models but it is a time-consuming approach, particularly for a continent containing many target regions. In addition, there is subjectivity in deciding on the branches and weights for the stochastic models for the target region. Finally, although the uncertainties are propagated to the final adjusted GMPEs there is only a single (mean) model for each host GMPE and, consequently, a logic tree comprised simply of the adjusted GMPEs would not correctly model the uncertainty. However, the uncertainties could be tracked throughout the adjustment [e.g. Figure 15 of Douglas et al. 2006] to obtain a logic tree with branches modelling this uncertainty. For the site parameters Vs and kappa this is commonly done for site-specific studies and hence it could be extended to other parameters. This leads to the final main approach: backbone GMPEs, as discussed in the following section.

6.3.3 Backbone GMPEs

In the past decade and often in the context of site-specific PSHAs for critical facilities the backbone approach (Atkinson et al. 2014) has been used to construct ground-motion logic trees. In this approach a single or a handful of existing GMPEs are scaled up and down to account for uncertainty in the median motion. The scaling factors employed are generally related to uncertainty in the average stress (drop) parameter in the region, as well as other inputs to the stochastic model, e.g. anelastic attenuation. Using this approach leads to multiple GMPEs that are explicitly MECE. The level of uncertainty modelled is also made transparent.

Starting in 2008 and continuing in 2014, the US National Seismic Hazard Model applies a simple backbone approach to increase the modelled epistemic uncertainty in the ground-motion logic tree for shallow crustal seismicity in the western states (Petersen et al. 2014). Petersen et al. (2014) argue that the selected GMPEs for this tectonic regime (all five of the NGA-West2 models) show too much similarity because they were derived using similar data and approaches following considerable interaction between the GMPE developers. Therefore, the epistemic uncertainty captured by these models is too low. To overcome this, for each original GMPE branch in the logic tree they add a higher and lower branch equal to the original GMPE shifted up or down by a factor that varies in nine magnitude (M 5-6, 6-7 and 7+) and distance bins (<10 km, 10–30 km and >30 km). This factor is given by: exp[0.4√(n/N)], where n and N equal the number of earthquakes used to derive the GMPE within the M 7+ and R < 10 km bin and the number in the specific magnitude-distance bin, respectively. The M 7+ and R < 10 km bin is generally the one with the smallest number of earthquakes for which an epistemic uncertainty of 50% [i.e. exp.(0.4)] was assumed. The uncertainties for all other bins are scaled with respect to this bin.

The Petersen et al. (2014) approach recognizes that a lack of data is the reason for epistemic uncertainty and tries to capture this in a relatively simple manner. The basis of the function used to scale the uncertainty factor with the number of earthquakes is not given in Petersen et al. (2014) but it is likely related to the equation for the standard error of the mean, where the standard deviation is divided by √N where N is the number of values used to compute the standard deviation. This is a reasonable basis for the function. Also reasonable is the use of the number of earthquakes in each bin rather than the number of records because uncertainty in what the average source characteristics (e.g. stress drop) are for a region (related to the number of earthquakes observed from which to assess this) is often more important than what the average path or site characteristics are (related to the number of records).

The principal criticism of the factor of Petersen et al. (2014) is the apparently arbitrary decision to assume a 50% uncertainty for the M 7+ and R < 10 km bin from which to scale all others. This value should be related to how well average ground motions are known for that bin, which itself should be controlled by the available data. If the value was explicitly defined by the available data, over time the epistemic uncertainties modelled by this approach would reduce (as data are collected). Also it would allow the approach to be transportable to other regions or for other GMPEs. Currently if a GMPE is based on a single event for M > 7 then the additional uncertainty is the same as if a GMPE is based on many hundreds of earthquakes for that magnitude range. A recent study by Douglas and Boore (2017) suggests that the epistemic uncertainty in ground motions in the M 7+ bins is lower than could have been thought given the limited data used to constrain GMPEs in that magnitude range.

A more sophisticated method has been proposed by Al Atik and Youngs (2014) to add additional branches to model the statistical uncertainty characterized by the confidence limits from regression analysis based on a finite dataset [see Douglas 2007, 2010a for estimates of these confidence limits for other GMPEs]. Generally this additional uncertainty is smaller than the uncertainty coming from model-to-model differences. As Al Atik and Youngs (2014) show it is also smaller than the additional uncertainty added to the US National Seismic Hazard Model using the approach discussed above.

Atkinson and Adams (2013) use a backbone approach to develop ground-motion logic trees for the PSHA underlying the Canadian National Building Code. For crustal seismicity in western Canada, they examine the spread of the NGA-West1 models for magnitudes and distances critical for the PSHA of this region and averages of data to define a representative GMPE [in this case the GMPE of Boore and Atkinson 2008] and an upper and lower GMPE to cover the observed range of median predictions. They find that weakly distance-dependent additive and subtractive terms of between 0.23 and 0.69 in terms of natural logarithms can envelope the observed spread in the models. [Atkinson 2011 also develops a simple ground-motion logic tree using a similar approach, although as noted above she adopts an unconventional split between epistemic and aleatory components]. Because this western region is the best studied area of Canada, and epistemic uncertainties for active crustal GMPEs should be the lowest, this uncertainty is assumed to be a lower limit when constructing logic trees for other Canadian regions. For example, for eastern Canada, which is a stable continental region, Atkinson and Adams (2013) found that applying a similar approach led to lower uncertainties for some magnitudes and distances than in the western Canadian logic tree and hence they added uncertainty to take account of the fewer ground-motion observations from eastern Canada.

Gehl (2017) applies this approach to produce a pan-European representative GMPE. Kale and Akkar (2017) propose a similar technique for the selection of multiple GMPEs that cover the centre, body and range but they also include calculation of the seismic hazard to check that no particular model dominates.

An example of a backbone model from a site-specific study is that developed by Bommer et al. (2015) for the Thyspunt (South Africa) nuclear power plant hazard assessment. In this approach, as well as adjusting for Vs and kappa (and accounting for uncertainty in these site parameters), they also add branches to scale the predictions from three existing GMPEs to account for uncertainty in the median stress drop for earthquakes in the surrounding region. Four branches are considered: one for the chance that average stress drops in the region are lower than average (because the tectonics are extensional), one that the average stress drops are the same as in the original GMPEs (i.e. a scale factor of unity) and two for higher stress drops due to the area being part of a stable continental region.

In the procedure proposed by Goulet et al. (2017) to develop ground-motion logic trees for central and eastern North America, the suite of seed ground-motion models are extended, making sure that certain physical criteria are met, to a continuous distribution of ground-motion models so that the set is then MECE. The ground-motion space modelled by this continuous distribution is visualized using the mapping approach of Sammon (1969), which approximates the high-dimensional (magnitude, distance, period) ground-motion space on a 2D map (Scherbaum et al. 2010). To obtain a continuous distribution the expected epistemic uncertainty in ground motions at different distances is imposed based on analogies to western North America and understanding where the seed GMPEs are best constrained. The continuous distribution of ground-motion models is then discretized to a representative set that is easier to handle computationally within PSHA. Weights are then assigned to this set based on prior knowledge and residuals between each models predictions and strong-motion records from the considered region. This rigorous approach, although scientifically appealing, requires considerable computational effort and choices to be made. Therefore, in this article I am seeking a more straightforward approach but using some of the ideas from the procedure of Goulet et al. (2017).

6.4 Retrospective Test of Logic Trees for California

Because it is not possible to test objectively whether a ground-motion logic tree developed today will correctly capture observations of future earthquakes, Douglas (2016) undertakes a retrospective analysis using the 35 years following 1981 as a basis. As the analysis of Douglas (2016) was only presented within an oral presentation at the 35th General Assembly of the European Seismological Commission, I include a summary here.

Douglas (2016) chose 1981 as a basis for his analysis because, if a PSHA for horizontal PGA had been conducted at that time following current practiceFootnote 2 for shallow crustal seismicity (particularly in California), it is likely that the ground-motion logic tree would have included the three robust GMPEs that had only recently been published: Trifunac (1976), Campbell (1981) and Joyner and Boore (1981). Therefore, using the data in the NGA-West2 database (Ancheta et al. 2014) from 1981 onwards (the database ends in 2011) enables a comparison between the predictions from such a ground-motion logic tree and the observed PGAs. In particular, a check can be made of whether the right level of uncertainty was captured by this simple logic tree consisting of GMPEs that would no longer be considered state-of-the-art. The observation that inspired this approach is that made in Douglas (2010b, p. 1519), who notes that the scatter in predictions of median PGAs from many dozens of GMPEs (a proxy for the epistemic uncertainty) is wider than the confidence limits in the average observed PGA for narrow magnitude-distance bins from a large strong-motion database.

The ground-motion logic tree considered is one comprised of the three GMPEs each given a weight of one-third. The basis of the comparison is to compare, for all magnitude-distance bins, the predicted median PGAs from the logic tree, both in terms of the weighted average and the upper and lower PGAs branches (corresponding to the median PGA from one of the three GMPEs but which one will vary with M and R), with the median PGAs (and its 5 and 95% confidence limits, which are assumed proxies for the uncertainty due to the lack of data for that bin) computed from the PGAs observed in post-1981 earthquakes. Ideally the PGA from the lower branch of the logic tree should equal the 5% confidence limit from the observations, and weighted average from the logic tree should equal the median from the observations and the upper branch should equal the 95% confidence limit from the observations. If the logic tree’s branches are wider than the 5–95% confidence limits from the observations then the logic tree is capturing too much uncertainty whereas if they are narrower sufficient uncertainty is not being modelled.

The 100 bins used for the analysis were constructed using 10 intervals 0.5 units wide between M 3 and 8 and 10 logarithmically-spaced Joyner-Boore distance intervals between 0 and 300 km. The variability in a single observation (i.e. the standard deviation) as well as the uncertainty in a median observation (i.e. the standard error) from the database are shown in Fig. 6.1. From this figure it can be seen that despite low apparent aleatory variability at large magnitudes (around 0.5) the uncertainty in the median PGA is still quite high (between 0.1 and 0.15, i.e. factors of 10–15%) and despite high apparent aleatory variability at low magnitudes and large distances (around 1.0) the uncertainty in the median PGA is low (between 0.04 and 0.08). For a much larger database the graph on the left (aleatory variability) would likely show a similar trend because it is related to ground-motion variability, which cannot be modelled simply using magnitude and distance (e.g. Douglas and Smit 2001), whereas the graph on the right would approach zero throughout as the true median PGA given a magnitude and distance would be known exactly.

Fig. 6.1
figure 1

Standard deviations of observed PGAs in 10 × 10 bins (left) and standard errors of observed PGAs (right) from NGA-West2 database for earthquakes that occurred since 1981

Figure 6.2 shows the comparison between the predictions from the logic tree and the observations. Ideally the bottom row of graphs would show contours around unity, meaning that the correct amount of uncertainty is being captured. This is roughly the case for the upper branch of the logic tree (bottom middle and right graphs). Contours higher than unity mean that too much uncertainty is being captured. This is true for the lower branch of the logic tree (bottom left graph)

Fig. 6.2
figure 2

Predicted median PGAs from the lower (left), median (middle) and upper (right) branches of a logic tree comprised of Trifunac (1976), Campbell (1981) and Joyner and Boore (1981) with equal weighting (top row); 5% confidence limit, median and 95% confidence limits for the median observed PGAs from the NGA-West2 database (middle row); and the ratio of the middle row to the top row (observations/logic tree) (bottom row)

Figure 6.3 shows the standard deviation of the logic tree (cf. Toro 2006). This graph shows that the epistemic uncertainty captured by this logic tree is quite low. Referring to Table 6.1 indicates that the difference between the mean and median PGA for a given annual frequency of exceedance will hence also be small (less than 50% for typical values of k).

Fig. 6.3
figure 3

Standard deviation of the logic tree in natural logarithm units

Finally, Fig. 6.4 shows the Sammon’s map (Sammon 1969; Scherbaum et al. 2010) of the three original GMPEs, the GMPEs divided by two and multiplied by two (to simulate a simple backbone approach) and the binned observed PGAs. The observations are surrounded by the GMPEs, although this map suggests that the Joyner and Boore (1981) GMPE could be removed from the logic tree as it is further from the observations than the other two models.

Fig. 6.4
figure 4

Sammon’s map of the three unadjusted GMPEs, the GMPEs multiplied by two, the GMPEs divided by two and the observed PGAs. Note that the absolute positions are arbitrary – only the relative positions are meaningful

In conclusion, this simple logic tree would have been appropriate for a seismic hazard assessment conducted in 1981 (at least until 2011, the end of the database) because the epistemic uncertainty captured is roughly the same as observed in the data, although it is slightly too wide at the lower end. This suggests that perhaps we do not need more sophisticated GMPEs or logic trees. We should be wary of the limitations of this analysis, however. Firstly, this analysis was only for simple GMPEs that did not account for site effects (or, if they did, only in a crude way) and only for PGA. Secondly, the analysis relies on making the strong assumption of no regional dependency in earthquake ground motions (i.e. we can combine all the strong-motion data together to assess the medians and confidence limits). Thirdly, it assumes that the data available from the period 1981–2011 are sufficient to obtain robust statistics and that data from future events will not significantly change the assessed medians and confidence limits. A few well-recorded M 7+ earthquakes with apparently ‘abnormal’ ground motions could significantly change the analysis for that magnitude range; although, as noted above, Douglas and Boore (2017) suggest that current predictions for this magnitude range appear robust. Fourthly, and probably most importantly, this type of analysis cannot be conducted for areas with little or no observations without invoking the assumption of no regional dependency.

6.5 Proposed New Approach

When developing ground-motion models for induced seismicity Douglas et al. (2013) noted large differences in observed ground motions amongst the sites they considered. They relate this principally to differences in the average stress (drop) parameter for earthquakes near each site. Before a scheme that may induce seismicity is begun it is not possible to know what the average stress parameter in future earthquakes would be. Hence, Douglas et al. (2013) propose that the logic tree used for the initial hazard assessment for the scheme is populated by the 36 GMPEs they derive from stochastic models that cover the possible range of key parameters (stress parameter and attenuation modelled by Q and kappa). If information on what the average values of these parameters are for the site in question then the logic-tree weights can be tuned to reflect this. If and when ground-motion data are collected from the site then again the weights can be modified to emphasis more probable GMPEs and reduce the modelled uncertainty. Edwards and Douglas (2013) showed that this approach worked in practice by using data from the Cooper Basin (Australia) geothermal site, which were not used to develop the original 36 models.

It is proposed here that a similar approach could be applied for natural seismicity. The idea is the opposite of taking a set of GMPEs and then widening out the branches to capture uncertainty. Rather, many branches with default weights are considered and, when data are available, the weights adjusted to reflect our improved knowledge. This provides a framework where the reduction in epistemic uncertainty through the collection of new information is explicitly captured.

The philosophy of this approach is the same as employed by Douglas et al. (2009) to develop potential mean shear-wave velocity profiles and their uncertainty by starting with all possible profiles (generated using a Monte Carlo approach with underlying distributions based on an analysis of a large set of observed profiles) and then applying the available constraints to obtain a set of profiles that accounts for what you know and what you do not know about site conditions.

When developing this approach the simplest possible logic trees are sought because extra complexity is probably not justified for national or continental PSHAs given the large uncertainties and the need to make the hazard calculations for many locations tractable. As noted above, the intended use of this proposal is not site-specific seismic hazard assessments where time and resources would be available for analysis of all data. The focus is broad-brush PSHAs, which would be more typical of those with a wide geographical scope. Therefore, although the procedure of Goulet et al. (2017) is appealing, it is perhaps too complex for application beyond projects with considerable resources.

Disaggregation of PSHAs (Bazzurro and Cornell 1999) allows the earthquake scenarios that contribute most to the seismic hazard at a site to be determined. This information is useful for this study because it provides guidance on what magnitude-distance range needs to be the principal focus of the ground-motion model. Although it must be recalled that all magnitudes and distances that are not precluded by the seismic source model will influence the hazard and hence, even if some scenarios dominate, the ground-motion model should be accurate for all scenarios.

As an example of a low-to-moderate-seismicity European country, Table 6.3 reports the mean magnitude, distance and epsilon (i.e. number of standard deviations above the median ground-motion) of the disaggregated scenarios reported by Goda et al. (2013) for the UK. As an example for a moderate-to-high-seismicity European country, Fig. 6.5 shows the distribution of mean magnitude, distance and epsilons of the disaggregated scenarios reported by Barani et al. (2009) for Italy. These results indicate that generally the most important earthquake scenarios for return periods used for seismic hazard mapping are magnitudes between about 5.0 and 6.5 and distances up to about 60 km for PGA and SA(0.2 s) and between about 5.5 and 7.0 and distances up to about 100 km for SA(1.0 s). Epsilons are generally between 1.0 and 2.0 with higher values as the return period increases.

Table 6.3 Mean magnitude and distance of the dominant earthquake scenario and the value of epsilon for 2500 year return period for two UK cities (Goda et al. 2013)
Fig. 6.5
figure 5

Mean magnitude, distance and epsilon of the disaggregated scenarios for the 19 Italian cities considered by Barani et al. (2009). Top: 475 year return period, bottom: 2475 year return period, left: SA(0.2 s) and right: SA(1.0 s)

In the following subsection, a full ground-motion logic tree for shallow non-subduction earthquakes in Europe and the Middle East is proposed. This is followed by a subsection adjusting the weights as discussed previously for two countries with limited ground-motion data (Georgia and Iran) as well as one with much data (Italy).

6.5.1 Development of a Full Ground-Motion Logic Tree

The aim of this section is to generate a full population of possible GMPEs, one of which (although which one is currently unknown) is the correct ground-motion model for a given location in Europe and the Middle East. Following Bommer and Scherbaum (2008), I seek to create a family of MECE GMPEs. Only PGA and SA(1.0 s) are considered in the following due to space limitations. Also the analysis is conducted assuming VS30 = 800 m/s (Eurocode site class A/B boundary).

Considering just the principal inputs to a GMPE, i.e. magnitude and distance, the available strong-motion records for each magnitude-distance pair can be seen as samples from an underlying distribution for which we do not know the mean (when using logarithms of the intensity measure). (Its standard deviation is also unknown but as noted above this study is focused on the median ground motions not the variability). Using published GMPEs and well-recorded earthquakes it is possible to assess a possible range for the mean, which can then be related to the epistemic uncertainty that should be captured within the ground-motion logic tree. This part of the procedure is similar to the approach of Atkinson and Adams (2013) in developing a suite of models from a backbone model.

For active regions where strong-motion data have been collected over the past few decades, it is possible to assess how GMPEs are likely to change with the accumulation of new data. A study that provides guidance on this issue is by Bindi et al. (2009) who re-derive the Sabetta and Pugliese (1987) GMPE for Italy, which was derived using only 95 PGAs from 17 earthquakes (from before 1985), using the same functional form but 235 PGAs from 27 earthquakes from the period 1972–2002. Bindi et al. (2009) find that the 1987 model overpredicts median PGAs by less than 5% for magnitude 4.5 and short distances but by more than 50% for magnitude 7 and distances around 100 km (their Figure 5). Therefore, GMPEs based on sparse data are susceptible to significant change when updated. Hence, there is a need to recognize that what we think we know now may change. This doubt should be reflected in the epistemic uncertainty captured in the ground-motion logic tree.

The causes of epistemic uncertainty can be divided into two. Firstly the statistical uncertainty: even for regions with much strong-motion data (e.g. Italy) variability in the ground motions and a finite sample means that the median ground motion for a given magnitude and distance is not known precisely (cf. the formula for the standard error with √n on the denominator). This is modelled by the confidence limits of the regression analysis, which are only rarely published (e.g. Douglas 2007, 2010a; Al Atik and Youngs 2014; Bindi et al. 2017). Secondly, the ‘regional’ uncertainty: for regions with little strong-motion data from large earthquakes (e.g. much of northern Europe) it is not known whether ground motions show significant differences to those in well-observed regions (for which native GMPEs exist), e.g. because of differences in median stress drop. Calibrating the width of these additional branches is challenging as we need an assessment of our unknown knowledge (i.e. how much we do not know) – it is often hoped that selecting GMPEs from various regions covers this uncertainty.

6.5.2 Statistical Uncertainty

To estimate the statistical uncertainty due to regression analysis using relatively complex functional forms on finite datasets Figure 3 of Douglas (2007), Figure 6 of Douglas (2010a), results from Al Atik and Youngs (2014) and Figure 6 of Bindi et al. (2017) are used to assess the component of σμ coming only from this factor: σstatistical. The 95% confidence limits shown on Figure 3 of Douglas (2007) for seven GMPEs and the ratios of the 95% to 50% confidence limits of Douglas (2010a) for six GMPEs can be converted to obtain σstatistical. For the most poorly constrained models at the edges of their magnitude-distance range of applicability (e.g. Ambraseys et al. 2005; Sabetta and Pugliese 1987) σstatistical approaches 0.3 (natural logarithms). Better constrained models and within the magnitude-distance ‘comfort zone’ (Bommer et al. 2010) of GMPEs σstatistical from the models considered by Douglas (2007, 2010a), the NGA-West2 models considered by Al Atik and Youngs (2014) and the GMPE of Bindi et al. (2017) are similar with values around 0.1. For magnitudes larger than 7 there is an increase in σstatistical, which is expected because of the sparsity of data from large earthquakes.

6.5.3 ‘Regional’ Uncertainty

Here the potential that median ground motions in a region are different to those in well-observed regions is assessed by comparing median ground motions for scenarios and regions for which the median ground motions are well-known. We are using the spread in average ground motions in countries with extensive strong-motion databases as a proxy for what could be the spread for countries for which observations are currently sparse. Subsequently this information is used to assess the ‘regional’ uncertainty. Rather than potentially double count the statistical uncertainty, poorly-sampled areas are not considered when assessing this component of uncertainty.

As shown by the graphs of statistical uncertainty referred to in the previous subsection, GMPEs are best constrained for the distance range from roughly 20 to 60 km. This is also the range where anelastic attenuation is unlikely to be having a large influence and hence it is possible to identify the difference between average ground motions in various regions that is due predominantly to differences in average stress drop. Using the same assumption as Yenier and Atkinson (2015) that differences in stress drop are present in ground motions for all distances makes it possible to apply the factors derived from abundant data from this restricted distance range to all distances. As shown above, seismic hazard is often dominated by earthquakes within 100 km for the return periods of most interest for national mapping and, therefore, the range of distances used to estimate differences in average regional stress drop overlap. As shown by Figures 4 and 5 of Douglas and Jousset (2011), changing the stress drop does not significantly change the magnitude scaling so again it is an acceptable first-order solution to apply the derived adjustments for the limited magnitude range 5–6 to all magnitudes.

At distances beyond about 60 km (for short oscillator periods) regional differences in anelastic attenuation will make ground-motion predictions for different regions diverge (e.g. Kotha et al. 2016, Figure 5). As a first-order solution to account for this potential uncertainty within the population of possible GMPEs, the three regional models of Kotha et al. (2016) (i.e. the ‘Italy’, ‘Turkey’ and ‘Other’ models) are used as the backbone models that are then branched out to account for potential differences in the average stress drop. This makes the assumption that the anelastic attenuation in these three regions is an adequate sample of the population of all anelastic attenuation rates in Europe and the Middle East — this is probably untrue but it is assumed for convenience. As noted above seismic hazard is often dominated by earthquakes within 100 km and hence the effect of potential variations in anelastic attenuation is unlikely to be particularly important. The backbone models assumed here could be improved in future applications of the approach. Boore et al. (2014), for example, also provide terms to account for variations in anelastic attenuations between regions, which could be used instead.

An effect that could lead to regional dependency in ground motions, but which is neglected here because of a lack of a simple approach for its incorporation, is the influence of crustal structure. Previous studies (e.g. Dahle et al. 1990; Somerville et al. 1990; Douglas et al. 2004, 2007) have shown that effects such as wave reflections off the Moho can have a strong influence on ground motions at intermediate source-to-site distances (>50 km). This effect could potentially be incorporated into the approach proposed here by developing backbone models for different typical crustal structures, perhaps using simulations and the equivalent hypocentral distance technique of Douglas et al. (2004, 2007).

Any regional dependency at short distances (<10 km) is not possible to currently assess due to sparse datasets at such distances even when they are not separated by region (Fig. 6.6). Therefore, I have not attempted to account for any ‘regional’ uncertainty here. The statistical uncertainty discussed above, combined with the ‘regional’ uncertainty due to differences in average stress drop, is expected to account for sufficient uncertainty in this distance range. Although, again, this is a topic where additional work may be warranted.

Fig. 6.6
figure 6

Magnitude-distance distribution of data in the ESM strong-motion flat-file 2017 (Lanzano et al. 2017) used for the analysis in this section. The red box indicates the magnitude-distance interval used to assess the ‘regional’ uncertainty

For the calculations made in this section the Engineering Strong-Motion (ESM) flat-file 2017 (Lanzano et al. 2017) is used. The distribution of data in this flat-file with respect to magnitude, distance and various countries is shown in Fig. 6.6. Data from three countries with many strong-motion records (Italy, Turkey and Greece) are used to assess empirically the possible size of regional dependency due to average stress drop differences. Data within the magnitude-distance range of 5 ≤ Mw ≤ 6 and 20 ≤ rJB ≤ 60 km are used for these calculations for the reasons given above and to obtain statistically robust estimates of the ‘regional’ uncertainty without relying on weak-motion data (Mw < 5), whose relevance to the adjustment of empirical GMPEs is unclear.

The residuals with respect to the generic Kotha et al. (2016) GMPE, i.e. the model where the regional terms are turned off, for each record are computed. The equations for problem 1 of appendix of Spudich et al. (1999) are used to evaluate the average bias and its uncertainty (for all the data and for each of the three countries separately) to account for the correlations between data from the same earthquake. In addition, the average differences for 5 ≤ Mw ≤ 6 and 20 ≤ rJB ≤ 60 km between the Kotha et al. (2016) GMPE and recent country-specific GMPEs that are robust, at least for this restricted magnitude-distance range, are computed as an additional constraint. The country-specific GMPEs are: Akkar and Çağnan (2010) (Turkey), Bindi et al. (2011) (Italy), Danciu and Tselentis (2007) (Greece), Sedaghati and Pezeshk (2017) (Iran) and Yenier and Atkinson (2015) (central and eastern North America). The model for central and eastern North America is included as an example of a GMPE for a stable continental region so as to include within the ‘regional’ uncertainty the possibility that ground motions in an area may show similarities to such regions. This is potentially important for much of northern Europe but for which robust native GMPEs are lacking. Both approaches to evaluate this ‘regional’ uncertainty should be equivalent but that based on country-specific GMPEs could be more robust as predictions for this restrictive magnitude-distance range borrow robustness from neighbouring magnitudes and distances.

The results of these residual analyses are shown in Fig. 6.7. From these average residuals it can be seen that ground motions for this magnitude-distance range from some countries (e.g. Italy) are on average below that predicted by the Kotha et al. (2016) GMPE whereas ground motions from some countries (e.g. Greece) are higher than predicted by this pan-European model. Using this set of averages as a sample from the population of average deviations for each country/region in Europe and the Middle East, a simple logic tree can be proposed to capture the ‘regional’ uncertainty that these averages seek to model. This approach is similar to studies that develop a suite of stochastic models accounting for epistemic uncertainty in the average stress drop in a region (e.g. Douglas et al. 2013; Bommer et al. 2017).

Fig. 6.7
figure 7

Mean residuals and their 5–95% confidence limits for all ESM data and for data from three countries (the number of records used to compute the averages are indicated) as well as the average residuals between country-specific GMPEs and the GMPE of Kotha et al. (2016) for 5 ≤ Mw ≤ 6 and 20 ≤ rJB ≤ 60 km and for PGA (left) and SA(1 s) (right)

For PGA, symmetrical lower, middle and upper branches equal to predictions from: the Kotha et al. (2016) model × 0.6 [i.e. exp(-0.5)], the Kotha et al. (2016) model × 1.2 [i.e. exp(0.2)], and the Kotha et al. (2016) model × 2.5 [i.e. exp(0.9)], with weights using a standard three-point distribution of 0.185, 0.63 and 0.185, respectively, would roughly capture the spread in these average residuals. For SA(1 s), symmetrical lower, middle and upper branches equal to predictions from: the Kotha et al. (2016) model × 0.7 [i.e. exp(-0.4)], the Kotha et al. (2016) model × 1.1 [i.e. exp(0.1)], and the Kotha et al. (2016) model × 1.8 [i.e. exp(0.6)], with the same weights again would roughly capture the spread in these average residuals. As discussed above, it is assumed that these adjustments apply for all magnitudes and distances.

6.5.4 Final Logic Tree

The first set of branches is the three regional models of Kotha et al. (2016) accounting for variations in anelastic attenuation, each with equal weights of 1/3. The second set of branches is the lower, middle and upper branches that model the effect of uncertainty in the average stress drop for a given region. The third set of branches of the logic tree is those proposed by Al Atik and Youngs (2014) to account for the statistical uncertainty component σstatisical, where the upper and lower branches equal the 95% confidence limits using this standard deviation [although the values of Al Atik and Youngs 2014 for normal and reverse faulting are switched because Kotha et al. 2016‘s GMPE is better constrained for normal and strike-slip than reverse]. As noted above the model expressed in equations 9 to 11 of Al Atik and Youngs (2014) is adopted to account for this component as we are adopting well-constrained GMPEs, which are branched out to account for potential regional dependency. If less well-constrained models were used then a larger value for σstatisical should be used.

This ground-motion logic tree implies the values of σμ, characterising the level epistemic uncertainty, shown in the contour plot on Fig. 6.8. The epistemic uncertainty is independent of magnitude except for M > 7, where the statistical uncertainty from the model of Al Atik and Youngs (2014) increases slightly. The effect of the three models for anelastic attenuation is to increase the epistemic uncertainty at larger distances (>70 km). The overall epistemic uncertainty is similar to those implied by the site-specific logic trees listed in Table 6.2, giving confidence that roughly the right level of uncertainty is being captured. For SA(1 s) the graph of σμ is similar but the values are slightly lower (0.39 for short distances increasing to 0.66 at 300 km).

Fig. 6.8
figure 8

σy implied by the proposed ground-motion logic tree for PGA

In total 3 × 3 × 3 = 27 GMPEs make up the population that it is assumed to represent all possible ground-motion models for application in Europe and the Middle East. When there is no additional information on earthquake ground motions in a region this complete population should be used with the weights noted above. When observations are available the default weights can be altered to reflect this additional information. This is demonstrated in the next section for Georgia, Iran and Italy.

6.5.5 Pruning the Ground-Motion Logic Tree for Three Countries

Georgia and Iran are chosen here as examples of two countries with considerably less strong-motion data in the ESM database (Fig. 6.6). All available data from each country in the interval Mw ≥ 4 and 0 ≤ rJB ≤ 300 km are used to adjust the weights of the full logic tree derived in the previous section. Because the statistical uncertainty in the family of potential GMPEs remains, those branches of the logic tree are left unchanged and the testing only alters the weights of the first two sets of branches that account for potential regional dependency.

The log-likelihood approach of Scherbaum et al. (2009) and the data from Georgia and Iran are used to update the original weights of the 9 branches related to the regional dependence. Combining these branches with the set related to the statistical uncertainty leads to the predicted PGAs shown in Fig. 6.9. Also shown are predicted PGAs applying the same technique for Italy as a demonstration for a country with much data. The result of the adjusted weights is to decrease the width of the confidence limits particularly at moderate and long distances because there are many records to modify the weights of the three models of anelastic attenuation. However, the effect of the weighting on the level of epistemic uncertainty modelled is small even for Italy (σμ only decreases from about 0.45 to 0.41 at close distances although by larger amounts at greater distances) because of the large uncertainty in average bias even when there are many records (Fig. 6.7). Therefore, the changes to the weights for the lower, middle and upper branches of the ‘regional’ uncertainty are limited. Despite this there is a change in the predicted PGA from the original logic tree. Similar results are obtained for SA(1.0 s).

Fig. 6.9
figure 9

Median predicted PGA and its 5–95% confidence limits (dashed lines) for an M 6 earthquake with respect to Joyner-Boore distance. Black curves correspond to the original logic tree and the other colours correspond to logic trees for specific countries: Georgia (red), Iran (blue) and Italy (green)

Rather than simply using all the strong-motion data available from a region to adjust the weights it may be more appropriate to use only those records from the magnitude-distance range likely to be relevant from the point of seismic hazard. Also other information, e.g. tectonic analogies, independent estimates of anelastic attenuation, stochastic models derived from weak-motion data, could be used to modify the weights. However, we should be humble about our knowledge of a region with little strong-motion data.

6.6 Conclusions

In this article I have reviewed previous approaches to develop ground-motion logic trees that account for geographically-varying epistemic uncertainty. As demonstrated by the relatively low epistemic uncertainty implied by some recent continental seismic hazard assessments, the classic approach of selecting a handful of ground-motion models from the literature can lead to inconsistencies when compared with site-specific studies. Therefore, the backbone approach is attractive as it allows epistemic uncertainty to be more easily and transparently modelled. However, the principal difficulty is calibrating this approach when lacking observations, e.g. how much do we not know about earthquake ground motions in country X where an M > 5 earthquake has never been recorded within 50 km?

To provide guidance on applying the backbone approach in national and continental scale hazard assessments I have proposed a relatively simple ground-motion logic tree to account for potential variations in average stress drop and anelastic attenuation between regions as well as the statistical uncertainty inherent in regression-based models. It is based on assuming that the variation amongst regions that are currently poorly-observed will be similar to the differences amongst regions with relatively large strong-motion databases. This is a potential weakness of the proposal because these regions are also generally those with the highest seismicity and consequently, for target areas that are tectonically stable, the range of average ground motions modelled could be too narrow. The final step in the proposed procedure is to modify the weights of the logic tree by making use of any available strong-motion data from the target region. This step slightly reduces the epistemic uncertainty and adjusts the predictions to make them more applicable to the region. When such data are not available, the full uncertainty implied by the ground-motion model is incorporated into seismic hazard assessments.

When developing ground-motion models for use within national or continental hazard assessments there is a balance to be struck about the resolution captured. At one extreme, a single logic tree could be used for all locations, while at the other, each individual seismic source (e.g. a given fault) could have its own model. In this study, each country was assumed to have its own model but as tectonics do not generally follow national boundaries this was only done for convenience. The higher the resolution the smaller the databases available to calibrate the models. These smaller databases lead to higher standard errors in the averages and the risk of modelling an event-specific (or sequence-specific) rather than a regional-specific effect. For example, do the large differences between observed ground motions between two areas of central Italy (Molise and Umbria-Mache) as evidenced by Douglas (2007), for example, mean that these areas require separate ground-motion models? Or are the relatively small sets of observations from these two regions from one or two earthquake sequences insufficient to draw conclusion?

In addition to the potential problems mentioned above concerning the calibration of the average stress drop branches as well as the spatial resolution of the models, other parts of the procedure require additional work, e.g. the use of the Kotha et al. (2016) models to capture regionally dependence in anelastic attenuation, the updating of the weights using the log-likelihood procedure and how to incorporate knowledge gained from weak-motion data. In conclusion, the approach proposed here aims to be a first-order procedure to develop ground-motion logic trees capturing geographically-varying uncertainty for use in seismic hazard assessments covering a large area (i.e. not site-specific studies).