1 Introduction

It has become increasingly evident that the conventional probabilistic methodology to calculate the risks and losses from earthquakes mislead to unexpected fatal errors. A systematic analysis shows that the actual occurrence of strong earthquakes contradicts the results of the Global Seismic Hazard Assessment Program (GSHAP, 1992–1999) (Kossobokov and Nekrasova 2012). In particular, contrary to “a 10 % chance of exceedance in 50 years”, more than 40 % of 2,200 strong earthquakes with M ≥ 6 during 1990–2009 turn out to exceed GSHAP peak ground acceleration values, with 94 % of such “surprises” for 242 significant earthquakes with M ≥ 7, and 100 % for M ≥ 7.5 earthquakes. The errors in estimation of the peak ground acceleration with a 10 % probability to be exceeded in 50 years (Giardini et al. 1999) propagate non-linearly into the errors of expected numbers of fatalities, settlements, and population affected (Wyss et al. 2012). Regretfully, each of the top twelve deadliest earthquakes in 2000–2011 (total number of deaths exceeded 7,000 people) is a tragic evidence that the results of GSHAP, as well as underlying methodologies, are deeply flawed and, evidently, unacceptable as the base for any kind of critical risk assessments entitled to prevent disasters from earthquakes. The inadequacy of the GSHAP PGA map could have been established before publication of the resulting maps in 1999 (Kossobokov and Nekrasova 2012). By analogy with medicine testing, control should have been performed by the participants of this demonstration Program as a primary test of reliability of the GSHAP results (Stein et al. 2011).

A possible alternative to the conventional probabilistic approach is provided by the Neo-Deterministic Seismic Hazard Assessment (Panza et al. 2001, 2012; Peresan et al. 2011). It is notable that in the case of the recent 20 May 2012, Emilia, M5.9 earthquake (Northern Italy), the NDSHA map happen to be more realistic than the official one, which provides the basis for the Italian building code (Peresan and Panza 2012). In order to assess whether this is just a sporadic case or a paradigmatic example, a systematic analysis is performed to better understand the performances and possible limits of the different methods, investigating their performances with respect to past earthquakes. In this study, the seismic hazard maps for the territory of Italy, obtained with the two different approaches to seismic hazard assessment (SHA), are compared with the real seismicity.

The usefulness of any seismic hazard map, including probabilistic or deterministic ones, can be determined in a process of deterministic comparison to reality, namely by the comparison between the maps estimates with deterministically observed earthquake effects (i.e., macroseismic intensity, PGA, PGV, etc.). In this paper, we aim to test the model estimates of ground motion effect, given on a seismic hazard map, with the actual effect of the occurrences of many earthquakes. We are not trying to estimate intrinsic parameters of a given model, but reframe to finding out how often the value on the map was exceeded in reality before and after its publication. In fact, this count evaluates the map’s performance, which is its capability in anticipating the observed ground shaking, although with a bias related to testing mainly on the learning sample. Note that, according to (Kossobokov and Nekrasova 2012) in case of GSHAP Map testing on data in 1990–1999 before and in 2000–2009 after publication, this bias is negligibly small when compared to the observed misfit. As shown in (Albarello and D’Amico 2008; Mucciarelli et al. 2008), such counting procedure allows checking some internal parameters of the model used in compilation of the map, e.g., the expected repeat time of exceedance and, in principle, rejecting the model if this number contradicts expectations. Albarello and D’Amico (2008) have clearly demonstrated that “a significant mismatch exists between peak ground acceleration values characterized by an exceedance probability of 10 % in 30 years and what has actually been observed at 68 accelerometric stations located on stiff soil, where continuous seismicity monitoring has been performed in the last 30 years.” In this study, we do not go that far but just compare the performances of different maps in-between themselves.

We would like to clearly distinguish the expected repeat time of exceedance, which is an internal parameter of a special class of models, from the time period when the model map is applicable. In fact, 10 % in 50 years corresponds to the return period of 475 years in certain (i.e., Poisson distribution) probabilistic assumption and this number is a control parameter of the model PGA values on the map. The validity of the basic probabilistic model could be tested by the achieved statistics of model expectations in agreement or in contradiction with the rate of occurrence over the period of time long enough for making a statistically reliable judgment (Albarello and D’Amico 2008; Kossobokov and Nekrasova 2012).

We presume any hazard map is applicable to and fits the past evidence and, hopefully, predicts hazardous events in the future. Probabilistic estimates depend on an accepted probability model, which might be inadequate for practical applications. The dependence of ground shaking on the selected choice of control parameters is one of the basic disadvantages of SHA. The seismic hazard outcomes, specifically, of the strongest earthquakes may be stochastic but happen with certainty (i.e., 100 %) despite being infrequent happenings.

2 Data

In this study, we consider two official seismic hazard maps for Italy (available at http://zonesismiche.mi.ingv.it/mappa_ps_apr04/italia.html) within the boundaries from 36°N to 48°N and from 6°E to 20°E. The values of peak ground acceleration (PGA), sampled at the grid points of a regular 0.2° × 0.2° mesh, are obtained by probabilistic seismic hazard assessment (PSHA), as defined for a probability of exceedance of 10 % in 50 years (PGA10 % map), which is associated with a return period of 475 years, and for a probability of exceedance of PGA of 2 % in 50 years (PGA2 % map), which is associated with a 2,475-year return period, (Meletti and Montaldo 2007). Both of these maps have been compiled at Istituto Nazionale di Geofisica e Vulcanologia (INGV) based on the catalog CPTI04 (Gruppo di Lavoro 2004).

The other three maps that we consider for an overall comparison, over the same grid settings, are those of the design ground acceleration (DGA) based on the neo-deterministic seismic hazard assessment, NDSHA (Panza et al. 2001). In particular, these are the maps provided by the standard NDSHA method (maximum DGA map) and the two maps of ground shaking associated with the same return periods of 475 and 2,475 years as the PGA maps (referred as DGA10 % and DGA2 % maps, respectively). These last two maps were obtained by integrating the earthquake recurrence into NDSHA method (Folladore 2010); specifically, the recurrence of a single NDSHA source is considered to act as a weight that accounts for the average annual number of times the source occurrence. A synthetic seismogram is calculated for every source–receiver pair, within a maximum distance of 150 km. Each synthetic seismogram brings with it the information about the recurrence of the corresponding source, giving the yearly number of times the receiver is likely to experience that particular shaking. All the synthetic events are estimated in the same way as in the traditional NDSHA procedure, plus they are associated with the recurrence, that is a temporal parameter, which indicates the frequency of synthetic events in time (Magrin et al. 2013; Peresan et al. 2013). At each receiver site, the synthetic seismograms from different sources are considered, accounting also for their estimated recurrence; thus, it is possible to assign to a specified range of a ground shaking parameter the corresponding recurrence, obtained summing up contributions from different sources. Accordingly, besides the standard NDSHA maps, the maps of ground shaking associated with a specified average recurrence are defined; by analogy with traditional PSHA maps, the return periods of 475 and 2,475 years are considered. The map with return period of 475 years has, in general, less points than that with T = 2,475 years because it refers to lower values of ground shaking and hence it may involve signals from farther sources, which have no assigned recurrence and hence could not be included in the map. Thus, at each grid node of the same regular 0.2° × 0.2° mesh covering the Italian territory, there are five seismic hazard estimates given by the five maps: PGA10 %, PGA2 %, DGA, DGA10 %, and DGA2 %.

To characterize the real seismic activity, we use information from the CPTI04 catalog (Gruppo di Lavoro 2004) of the historical seismic events with intensity VI or more from 217 B.C. up to 2002 A.D. for the territory of investigation. For each grid point of the same regular 0.2° × 0.2° mesh on the territory of Italy, including Apennines and the Alpine region, we find the maximal observed intensity I 0 (CPTI04) in the 1/4° square centered at this point. Since the values of intensity in CPTI04 are semi-integers, we round down them for our analysis. As a result, we get the map of the observed intensity I obs = max{I 0 (CPTI04)} to be used in comparison with the five seismic hazard estimates for the Italian territory.

3 Method

Table 1 presents the relations between the Intensity in the Mercalli, Cancani and Sieberg (MCS) scale, and the ground acceleration values derived from the comparison between the map of maximum felt intensities in Italy (Boschi et al. 1995) and the maximum DGA, obtained from modeling the ground motion generated by past seismicity (Panza et al. 2001). These relations are used for converting the ground motion data from PGA10 %, PGA2 %, DGA, DGA10 %, and DGA2 % into the MCS scale values. The five pairs of the linked MCS values on a ground motion model map I mGA (i.e., I PGA10%, I PGA2%, I DGA, I DGA10%, I DGA2%) and the actual one I obs are analyzed to find inconsistencies.

Table 1 Relation between I MCS and model ground motion, mGA, corresponding either to PGA(g) or DGA(g) for the territory of Italy (after Indirli et al. 2011)

According to Indirli et al. (2011), the link between PGA and MCS over the Italian territory, summarized in Table 1, survived the rigid test by the empirical seismological evidence since its publication in (Panza et al. 1997). Moreover, these conversion rules are found applicable not only in Italy but in other seismic regions worldwide. Note that the robust relationship between PGA and MCS, as given in Table 1, allows for an uncertainty of a factor two in the ranges of PGA attributed to MCS integers (specifically, the upper limit of a range is by a factor of two larger than its lower limit). Naturally, this kind of uncertainty is intrinsic for sizing earthquakes either in terms of energy or macroseismic effect (Bormann 2012).

As already mentioned in Introduction, we compare the maps by finding out whether or not macroseismic effects after real earthquakes exceed the values on the model maps. Our counting procedure is rather general and has been used already to find a significant mismatch “between peak ground acceleration values characterized by an exceedance probability of 10 % in 30 years and what has actually been observed at 68 accelerometric stations located on stiff soil” (Albarello and D’Amico 2008).

4 Results of comparison with intensity at epicenters

Figure 1 presents the six intensity maps obtained (1) from the real seismicity I obs (Fig. 1a) as well as (2) from the ground motion estimates I PGA10%, I PGA2%, I DGA, I DGA10%, I DGA2% (Fig. 1b–f, correspondingly). The percentage of the points with intensity VIII or more for each of these maps is summarized in Table 2. One can see that PGA10 %, PGA2 %, DGA and DGA2 % model maps assign intensity VIII or larger for more than 90 % the territory of investigation, whereas the I obs map of intensities (based on about 2,000 years observations)—just for 38 %.

Fig. 1
figure 1

The intensity maps in comparison a I obs—obtained from the real seismicity, CPTI04 catalogs PSHA method maps b 475-year return period map—I PGA10%, c 2,475-year return period map—I PGA2%, NDSHA method maps d standard—I DGA, e 475-year return period map—I DGA10%, f 2,475-year return period map—I DGA2%

Table 2 The percentage of I MCS from different ranges in the six intensity maps

There are 564 grid points with I obs ≥ VI, which have at least one model intensity value I mGA, where I mGA is one of I PGA10%, I PGA2%, I DGA, I DGA10%, and I DGA2%. There are 564 I PGA10% values, 564 I PGA2% values, 564 I DGA values, 372 I DGA10% values, 458 I DGA2% values out of 800, 800, 800, 448, and 590, correspondingly.

The empirical density distributions of the difference, ΔI = I mGA − I obs, derived by comparison of the model and observed intensity maps, for each of the five methods under consideration, are shown in Fig. 2a. The density distributions of ΔI are presented as functions of I obs. Color indicates one of the three intervals of the observed intensity: VI ≤ I obs ≤ VII (blue), VIII ≤ I obs ≤ IX (green), and I obs ≥ X (red).

Fig. 2
figure 2

Intensity difference, ΔI = I mGA − I obs: (a) as function of I obs, where I mGA is the intensity from a model map mGA, I obs is the observed macroseismic intensity and (b) the balance between under- and overestimation of ground shaking by each model

In Fig. 2a, we observe clearly a systematic shift of the density distribution, with the maximum moving from positive to negative values of ΔI for increasing I obs; in particular, the maximum of ∆I varies from 3 to 1 and 0 for PGA10 %, from 4 to 2 and 1 for PGA2 %, from 3 to 1 and 0 for DGA, from 1 to 0 and −1 for DGA10 %, and from 2 to 1 and 0 for DGA2 %. For each SHA model map, Fig. 2b shows the balance of positive and negative values in the distributions of ΔI. According to the sum of under- and overestimation errors illustrated in Fig. 2b, the five maps can be ordered as follows: 80.38 % for DGA10 %, 84.50 % for DGA2 %, 89.54 % for DGA, 91.49 % for PGA10 %, and 95.92 % for PGA2 %. The statistical significance of the observed differences and their interpretation should be considered in further investigations that will require special analysis of the basic complexity and dependencies of the observed ground shaking and earthquake location, besides a straight forward comparison of the distribution functions. However, it can be observed that both PGA10 % and PGA2 % maps provide a higher rate of overestimations (ΔI > 0) and a lower rate of presumably correct estimates (ΔI ≤ 0), with respect to DGA maps. Specifically, the ratio between the two rates equals to 7.06, 21.56, 6.83, 1.66, and 4.09 for PGA10 %, PGA2 %, DGA, DGA10 %, and DGA2 %, correspondingly.

Figure 3a shows the empirical cumulative distributions F i (I) of embedment of intensity VI or more from the CPTI04 catalog, I obs, and from the five model estimates I PGA10%, I PGA2%, I DGA, I DGA10%, and I DGA2%. The fit of F i ’s related to DGA10 % and to the CPTI04 real catalog data is remarkable at intensities IX and X, when compared with the other four models. Figure 3b shows the five curves of the difference F i (I) − F 0(I) between each of the models and the real data. The maximum absolute difference of the empirical distributions is commonly used in the Kolmogorov–Smirnov two-sample criterion to distinguish whether or not the values from the two samples are drawn from the same statistical distribution of independent variables. The two sample Kolmogorov–Smirnov statistic λ K−S applied to a model and the real catalog is defined as λ K−S(D, n, m) = [nm/(n + m)]1/2 D, where D = max |F i (I) − F 0(I)| is the maximum value of the absolute difference between the empirical distributions F i (I) and F 0(I), I = VI, VII, VIII, IX, X, XI, whose sizes are n and m respectively. Table 3 summarizes the results of comparison for each of the five model maps in terms of D and λ K−S.

Fig. 3
figure 3

The empirical probability functions of the macroseismic intensity (a) and the difference between a model and the real intensities F i (I) − F 0(I) (b)

Table 3 The Kolmogorov–Smirnov two-sample statistic λ K−S applied to a model versus the real seismic intensity maps

The K–S test results confirm quantitatively the observations concluded from Fig. 3: the values of seismic intensity attributed by any model considered and reported by CPTI04 are hardly from the same distribution (the significance level is by far <1 %, i.e., confidence more than 99 %). On the other hand, the DGA10 % map appears to be “the best fit” among the five models available. When looking at Fig. 3a, one may think that the distribution based on PGA10 % also fits well the observed one. However, it is true for much smaller sample of intensity X or higher (which is just 3 % of the total), while being evidently outperformed by each of the three DGA based models at intensities IV–IX. This is better understood when looking at Fig. 3b where the quantified value of the fit (which maximum deviation from 0 determines the Kolmogorov–Smirnov statistic) is plotted for each model.

Finally, we compare the hazard maps of I PGA10%, I PGA2%, I DGA, I DGA10%, and I DGA2% with location of seismic events with the maximum intensity VIII or larger at epicenter. According to CPTI04 catalog, there are 204 of earthquakes that strong, which can be associated with the grid points of the model intensity maps for PGA10 %, PGA2 % and DGA, whereas their number reduces to 153—for DGA10 %, and 177—for DGA2 %. The results are summarized in Table 4. As a reference observational data set, the comparative analysis is extended also to the macroseismic database DBMI04, utilized in compilation of the Italian catalog CPTI04, (Stucchi et al. 2007). Table 4a lists the empirical counts required for calculation of the statistical significance P given in Table 4b, estimated as:

$$P = 1 - B\left( {N_{s + } - 1,\,N_{s} ,\,N_{1 + } /N_{all} } \right)$$
(1)

where B(m,n,p) is the standard binomial distribution function that provides the probability of m or less successes on random in n trials, with probability P of success in a single trial; N s+ and N s are the numbers of the strong seismic events in agreement with intensity map and total for the territory under investigation; N all is the total number of grid nodes of an intensity map; and N I+ is the number of the nodes with intensity I or more.

Table 4 The binomial test of the five hazard maps and macroseismic observations (last row) against earthquakes from the four intensity ranges

One can see that, according to the binomial probability test P, for the intensity range VIII, the correspondence between reported intensities and all hazard maps can be attributed to a random coincidence. As concerning the fit in the other three higher ranges of intensity, the test P might be indicative of the non-randomness for each of the five model maps. Here, we must emphasize that this is an application of the binomial test to the earthquake data that were available and presumably used for the seismic hazard estimates (i.e., not independent data); therefore, the obtained probabilities are by no means unbiased values, although it might be used as a quantitative indicator of the model fit to the real data.

5 Results of comparison with improved seismic and reported macroseismic data

For the three basic maps PGA10 %, PGA2 % and DGA (the other two DGA maps are omitted here due to incomplete coverage of the territory considered), we make the additional comparison with an updated, presumably, improved compilation of seismic data. For this purpose, we used the catalog of the moderate magnitude 5 or larger, earthquakes with the depth of 70 km or less reported in the up-to-date version of UCI2001 (Peresan and Panza 2002) for the Italian territory. In addition to a comparison at epicenters, we consider also a comparison with direct macroseismic observations. Specifically, we used the database of direct macroseismic observations DBMI04 (Stucchi et al. 2007) rounded to 0.5 Units of intensity. The corresponding control maps are shown in Fig. 4.

Fig. 4
figure 4

The intensity maps in comparison: a I obs—same as Fig. 1a; b I DBMI04 —database of reported macroseismic data

Each of the 233 earthquakes from UCI2001, 1900 to July 2012, has the value of mGA at a distance less or equal to 20 km from its epicenter (ϕ, λ). The relation used in Wyss et al. 2012

$${\text{M}}_{mGA} = 0.85 \times {\text{Ln}}\left( {\text{mGA}} \right) + 5.59 \pm 0.17$$
(2)

provides a necessary link for a comparison of the model ground shaking estimated by mGA with the maximum magnitude (M max) reported in UCI2001. Note that the accuracy of magnitude determination in Eq. (2) is about intrinsic uncertainty of determination from seismometric data (Bormann 2012).

Table 5 lists the number of earthquakes that violate mGA-map predictions, i.e., those with the difference ∆M = M max − M mGA > 0 for the pairs (M max, M mGA). The obtained results are qualitatively the same as for the global comparison (Kossobokov and Nekrasova 2012). Specifically, all the major earthquakes with the magnitude 7 or more were unexpected by the three analyzed maps. However, for the PGA2 % and DGA maps, the difference ∆M above half a unit of intensity appears only once in the four trials, which might be acceptable, taking into account the intrinsic limitations in determination of intensity. For strong earthquakes with the magnitude 6 or more, 18 events were unexpected by PGA10 % map, 8—by PGA2 % map, and 11—by DGA map, which is 69.23, 30.77, and 42.31 % of the total number of these seismic events, correspondingly. The differences ∆M > 0.5 in this case are 30.77, 3.85, and 11.54 % for the PGA10 %, PGA2 %, and DGA, respectively.

Table 5 Number of shallow earthquakes that violate mGA-map predictions

Thus, in terms of efficiency, measured by the sum of the error percentages (smaller value corresponds to better efficiency), i.e., 102.86 % = 33.63 % + 69.23 % for PGA10 %, 99.65 % = 68.88 % + 30.77 % for PGA2 %, and 81.81 % = 39.50 % + 42.31 % for DGA, the neo-deterministic model appears to outscore the probabilistic ones. Similarly, the efficiency measured accounting only for “large errors” in expectation (∆M > 0.5) is estimated with the 64.40 % = 33.63 % + 30.77 % for PGA10 %, 72.73 % = 68.88 % + 3.85 % for PGA2 %, and 51.04 % = 39.50 % + 11.54 % for DGA.

Table 6 lists the number of intensity records that violate mGA-maps prediction, i.e., those with the positive difference ∆I = I DBMI04 − I mGA, where I mGA is the value attributed to the nearest SHA-map grid point within the 20-km distance from the I DBMI04 data point location (Fig. 4b). The degree of violation is qualitatively same as in Tables 4c and 5 obtained when using CPTI04 and UCI2001 catalogs. In particular, all the 65 reports of intensity XI or more from DBMI04 are “surprises” for PGA10 %, while all being expected by PGA2 % map in 33.75 % of the territory considered. The performance of DGA map is characterized by 12.31 % of “surprises” that fall outside 15.75 % of the territory. For intensity X or more, there are only 27 points out of the total 535 with ΔI DGA > 0, which is 5.05 %. Similarly, there are 37.57 % of “surprises” for the PGA10 % map and 0.37 % for PGA2 % map. For the intensity VIII or more, ΔI mGA% > 0 in 208 out of 4,421 cases for PGA10 %, in 3—for PGA2 %, and 56—for DGA, i.e., 4.70, 0.07 and 1.27 in 97.88, 100, and 96.88 % of the territory, correspondingly.

Table 6 Number of DBMI04 intensity that violate mGA-maps prediction

Thus, once again, in terms of the sum of the error percentages (Tables 4c, 5, and 7), the neo-deterministic model appears to outscore the probabilistic ones.

Table 7 Sum of errors for the three model maps compared to DBMI04

It could be argued that the comparison of macroseismic observations with model macroseismic parameters presented in this section could be sufficient and the only one to be considered. However, a single one earthquake may invalidate more than a single value on a given map, so that testing the model hypothesis would be biased by dependencies in a control sample. That is why in the main part of our study we attribute to each seismic event only one value of macroseismic intensity, which is the empirical MCS intensity at epicenter, I 0 , provided in the CPTI04 catalog.

6 Conclusions

The comparison of the model intensity maps against the real seismic activity in Italy reveals many discrepancies regarding several aspects of models seismic ground shaking distribution in space and size. We did repeat our analysis making use of the coarser 0.5° × 0.5° meshes, as well as of the Italian macroseismic database DBMI04 (Stucchi et al. 2007), confirming the stability of the results and the following conclusions:

  • the estimates of seismic intensity attributed by any of the five models considered, including the official seismic hazard map, and those reported in the Italian databases of empirical observations could hardly arise from the same distribution;

  • models (except for PGA10 % at the cost of large number of “surprises”) generally provide rather conservative estimates, which tend to over-estimate the hazard particularly for the lower intensity events and yet do not guarantee avoiding the errors;

  • probabilistic maps have a higher tendency to overestimate the hazard, with respect to the corresponding deterministic maps;

  • in terms of efficiency in anticipating ground shaking, measured by the sum of the error percentages, the neo-deterministic models appear to outscore the probabilistic ones and might be a better fit to the real seismicity.

The study of statistical significance of the detected inconsistencies between model and observed intensities and their interpretation should be addressed in further investigation of earthquake phenomenon, the predictability of the maximum ground shaking, in particular.

The sum of errors appears to be rather high for any of the models considered, e.g., for intensity VIII, it is about 100 %, and for intensity IX, exceeds 80 %. As already mentioned, this might be indicative of the limited predictive understanding of the ground shaking phenomena due to a short history of instrumental seismology and, in particular, the rules accepted for extrapolation. Furthermore, the observed “best fit” of the empirical distribution function of the DGA10 % map to the I obs and DBMI04 ones (Fig. 3a) and the simultaneous non-randomness of this result (Table 4b) also suggest that maps associated with a short return period (i.e., 475 years) may well describe past seismicity, but they might fail in predicting future hazard, as it occurs with the PGA10 % map. Thus, the obtained results might be also indicative of a fundamental misfit of the generally accepted uniform rules of homogeneous smoothing applied to observations on top the naturally fractal system of blocks-and-faults with evidently heterogeneous structure and rheology.

Because seismic hazard maps seek to predict the shaking that would actually occur, testing available models for SHA, including earthquake forecasts and predictions (Jordan et al. 2011; Peresan et al. 2012), against the real seismic activity must become the necessary precondition of any responsible seismic hazard and risk estimation. Otherwise, the use of untested maps may mislead to crime of negligence, similar to medical malpractice, although at much higher level of simultaneous losses (Wyss et al. 2012).