1 Introduction

Operational issues are a pressing concern in earthquake forecasting as well as in seismic hazard assessment, as fatally evidenced by the recent destructive events, including the Tohoku (Japan, M 9.0, 11 March 2011), Haiti (M 7.3, 12 January 2010), and L’Aquila (M 6.3, 6 April 2009) earthquakes. Any forecasting tool, to be effective, must demonstrate its capability in anticipating large earthquake occurrences and related ground shaking, a result that can be attained only through rigorous verification and validation process. The “usefulness” of a method can then be judged depending on specific mitigation actions of different levels (from low-key to evacuation).

The L’Aquila earthquake, which was preceded by a remarkable swarm and by claims of prediction, eventually motivated the setting up of the International Commission on Earthquake Forecasting (ICEF), with the task to provide an overview of current knowledge and consistent recommendations for operational earthquake forecast/prediction development and use. The document produced by the ICEF Commission (ICEF Report, Jordan et al. 2011), represents certainly a relevant effort in collecting information and it deserves great attention as a starting point for a critical discussion about different forecast/prediction methodologies.

With this paper, we aim to open a discussion and complement the ICEF summary and recommendations, focussing on (a) definition of forecast/prediction, (b) validation of forecast/predictions, (c) use of information from earthquake forecast/prediction, and (d) existing operational practice in Italy. We argue for the use of earthquake prediction methods that were verified and validated by a long run statistical testing as opposed to an ad hoc application of probabilistic forecasting.

2 Prediction or forecasting?

The United States National Research Council, Panel on Earthquake Prediction of the Committee on Seismology suggested the following consensus definition (1976, p.7):

“An earthquake prediction must specify the expected magnitude range, the geographical area within which it will occur, and the time interval within which it will happen with sufficient precision so that the ultimate success or failure of the prediction can readily be judged. Only by careful recording and analysis of failures as well as successes can the eventual success of the total effort be evaluated and future directions charted.”

The ICEF Report (IR) attempts to enrich this definition with the following distinction of prediction and forecast: “A prediction is defined as a deterministic statement that a future earthquake will or will not occur in a particular geographic region, time window, and magnitude range, whereas a forecast gives a probability (greater than zero but less than one) that such an event will occur.”(IR, p. 319).

However, when coming to operational decision making, one has to follow:

“Recommendation G2: Quantitative and transparent protocols should be established for decision-making that include mitigation actions with different impacts that would be implemented if certain thresholds in earthquake probability are exceeded.”(IR, p. 363)

Thus, according to the IR definition and conclusions, forecasting may become useful when and only when formulated as operational prediction.

Many researchers concentrate their efforts on assigning probability values. However, it is well known that providing reliable and/or detailed probability information, particularly for large and infrequent events, requires sufficient long-term accurate information, which is not available nowadays for destructive earthquakes. Resorting to subjective probability models and estimates, e.g. by expert elicitation process, produces batches of un-validated numbers, misleading to the impression of detailed knowledge. Regretfully the subjective choice of earthquake probability model, in most cases is wrong like those of the widely advertised Global Seismic Hazard Assessment Program (GSHAP) maps (Panza et al. 2004; Кossobokov and Nekrasova 2010, 2011) and Short-Term Earthquake Probabilities (STEP) forecasts (Kossobokov 2005, 2008, 2009).

There are natural unavoidable problems in assigning, for responsible practical use, any specific value of probability to earthquake occurrence that is mathematically acceptable. Therefore, since the design of the prototypes of M8 and CN algorithms in 1984 (Keilis-Borok and Kossobokov 1990; Keilis-Borok and Rotwain 1990) a qualitative category, i.e., Times of Increased Probability is preferred. Why? In point of fact, making quantitative probabilistic claims, within the frameworks of the most popular objectivistic viewpoint on probability theory, requires a long series of recurrences, which cannot be obtained at local scale from the existing catalogs of earthquakes. Given a 0.1º × 0.1º geographic cell (e.g. Gerstenberger et al. 2005, Giardini et al. 1999), how many trials one would need to distinguish at 95 % confidence level a daily forecast probability of 1/1,000 from the “optimist” forecast strategy of constant 0? What about 1/20,000? Of course, the answer heavily depends on the choice of the accepted probability model, and all agree that nowadays epistemic uncertainties of earthquake probability models are large and their limits are yet unknown. However, decision makers do not request any specific number of probability, but rather an authoritative opinion on increased likelihood of incipient disaster (Guidelines for Earthquake Predictors 1983).

IR definition overlooks classification of forecast/prediction claims about earthquakes based on spatial and energy uncertainties, which have important implications. In fact, (1) predicting the “exact” fault segment to rupture is by far more difficult and might be an unsolvable problem, while (2) the Gutenberg-Richter law suggests limiting magnitude range of prediction to about 1 unit (the statistics of recurrence is essentially related to dominating smaller earthquakes). Although the report recognizes fractal distribution of faults, it does not pay attention to earthquake forecast/prediction approaches based on hierarchical, step-by-step prediction techniques, which account for multi-scale escalation of seismic activity to the main rupture (Keilis-Borok 1990; Kossobokov et al. 1999; Keilis-Borok and Soloviev 2003). According to Kossobokov and Shebalin (2003), such an approach starts with the recognition of earthquake-prone zones for earthquakes in a number of magnitude ranges, then it determines long- and intermediate-term areas and times of increased probability, and, finally, may end up with an exact short-term or immediate alert. Note that this approach at any stage, including short-term or immediate ones, may benefit from independent complementary evidence from space geodesy, geochemical, hydrologic and other geophysical observations (Kanamori 2003; Bormann 2011; Panza et al. 2011b).

3 Validation of forecast/prediction methods

Since a few decades it is well recognized that “Forecasting methods intended for operational use should be scientifically tested against the available data for reliability and skill, both retrospectively and prospectively. All operational models should be under continuous prospective testing.” (IR, p. 362). Nevertheless, “probabilistic” approaches to earthquake forecast/prediction are recommended sometimes, without any formal and rigorous validation supporting their reliability in prediction of large earthquakes. For instance, in case of STEP (Gersternberger et al. 2005), the best documented evidence used to validate the generic clustering model for California is, in fact, the unarguable rejection of the model itself (Kossobokov 2005, 2008, 2009). Testing results from models forecasting aftershocks and low magnitude events are frequently mentioned as an argument in favor of probabilistic approaches, although this is basically not correct. The use of the Epidemic Type Aftershock System (ETAS) forecasts that join in number the Collaboratory for the Study of Earthquake Predictability (CSEP) group is based on rates of activity. However, Rundle et al. (2011) show that rate models such as ETAS, while possibly useful in forecasting aftershocks, are basically useless for forecasting mainshocks. Very few attempts are made to evaluate the spatial extent of the “alarmed” area, which is implied by introducing a probability threshold at the level of a “successful” forecast. Such kind of testing is necessary when dealing with the operational use of forecasts. To guarantee an acceptable score of forecasted earthquakes, in fact, the “alarmed area” (i.e. the territory where probability exceeds the specified value) may turn out to be very large.

On the other side, “deterministic” approaches to earthquake forecast/prediction based on diagnosis of formally defined premonitory seismicity patterns have demonstrated effective and statistically significant results in rigorous real-time testing, ongoing for more than a decade over the Italian territory (Peresan et al. 2005, 2011) and about two decades on a global scale (Kossobokov et al. 1999; Ismail-Zadeh and Kossobokov 2011). Since Utsu (1977) and Aki (1981), probability gain is considered for the estimation of the forecast/predictions performances. Due to the fundamental property of the probability gain scoring (Molchan 2003), the maximum is reached when the alarm time goes to zero: as a result, a single success of infrequent predictions, based on random guessing, may provide very high probability gain score. Thus, the choice of the best forecast/prediction from a large collection of competing methods made according to the best probability gain achieved in a rather short testing period could be very misleading. Moreover, the probability gains obtained in decades of prospective testing earthquake predictions, arising from reproducible intermediate-term middle-range diagnosis, are very different from “the nominal probability gain factors in regions close to the epicenters of small-magnitude events”, which are clearly recognized as “highly uncertain and largely unvalidated” (ICEF Report, Jordan et al. 2011).

The most frequent inaccuracies in the comparative analysis of prediction/forecast models, that are found in IR, as well as in some related papers (e.g. Marzocchi 2008), can be summarized as follows:

  • Inadequate or missing methods/criteria to compare different alarm-based models with probability-based models (Molchan and Romashkova 2011; Molchan 2011);

  • Comparison of statistics achieved in real-time testing to the model ones, with parameters adjusted a posteriori;

  • Neglecting evident failures, by “cherry-picking” and discussing the most favorable cases, which may create the illusion of high efficiency for some models;

  • Failing to evaluate the space–time volume of alarms (e.g. areal extent of the alerted territory, given a specific probability threshold) associated with probabilistic forecasts.

4 What can we do with earthquake forecast/predictions?

The operational relevance of forecast/prediction models is generally assessed in terms of probability gain. The methods characterized by low probability gain and/or large space–time uncertainty, although statistically significant, are sometimes arbitrarily discarded, because of their claimed limited use. Uncertainties of forecast/prediction of course limit their operational use when making appropriate choice of disaster mitigation; these might be essential unavoidable limitations for methodologies that predict predictable. There is a number of low-key-actions, however, which can be taken to mitigate the damage from an earthquake, based on formally defined earthquake forecast/predictions. In general, the prediction of an earthquake of a certain magnitude may extend, in time, from the zero-approximation of seismic zoning (no time information), through the long-term (decades), intermediate-term (months to years) and short-term ones (hours to days), while, in space, it may vary from long-range territories (thousands kilometres) to the exact location of the earthquake source (tens of kilometres). Accordingly, the preparedness measures range from the definition of adequate building codes, to intermediate-term alarm declaration and reinforcement of high-risk facilities, to imminent “red alert”. Different time intervals, from decades to seconds, are required to undertake different measures. A list of possible low-key actions was given in Keilis-Borok and Primakov (1997); the basic concepts were analyzed in detail by Kantorovich and Keilis-Borok (1991). Having different cost, they can be realistically maintained during different time periods and over territories of different size. The key to damage reduction in an area of concern is the timely escalation or de-escalation of preparedness measures, depending on the current state of alert. A list of possible low-key actions is provided below, as reported at the International Framework for Development of Disaster Reduction Technology List on Implementation Strategies (“Disaster Reduction Hyperbase” NIED. Tsukuba, Japan. 27–28 February 2006).

The enlisted safety measures are not independent, but form an obvious hierarchy: they make sense, if activated in a certain set and given order, as a part of a scenario of response to prediction.

  1. a)

    Permanent safety measures maintained during the decades:

    • Restriction of land use, especially for high-risk objects and earthquake-inducing activities.

    • Building code, demanding reinforcement of constructions.

    • Tightening of general safety regulations.

    • Enforced public safety services.

    • Insurance and special taxation.

    • Observations and data analyses to estimate seismic risk and to monitor earthquake precursors.

    • Preparation of the response to time-prediction, and of post-disaster activities: planning; establishment of legal background; accumulation of the stand-by resources; simulation of alarms; training of population etc.

  2. b)

    Temporary safety measures activated in response to time prediction:

    • Enhancement of permanent measures (see list with permanent safety measures).

    • Emergency legislation (up to martial one), to facilitate the rational response to prediction.

    • Mandatory regulation of economy.

    • Neutralization of the sources of high risk: life lines; nuclear power plants; chemical plants; unsafe buildings, up to suspension of operation partial disassembling, demolition, etc.

    • Evacuation of population and highly vulnerable objects (e.g., schools and hospitals).

    • Mobilization of post-disaster emergency services.

    • Preparation of measures for long-term post disaster relief (restoration of dwellings, jobs, production, credit etc.)

    • Monitoring of socio-economic changes, and prevention of prediction-induced hazards.

Some additional low-key action could be:

  • Develop a retrofitting plan for strategic buildings in the alerted area.

  • Control that the rescue plan is ready to start with minimal delay.

  • Control the maintenance state of temporary housing, stored in the civil defence centers, and guarantee their timely mobilization.

  • Intensify preparedness practice, increasing the frequency of actions involving students and civil defence.

  • Diffuse in a systematic way by media simple instructions like establishing small restoration corners in the strongest parts of the building with basic supplies (water, emergency foods, basic tools, etc.).

The listed measures are in different forms applicable to international, national, regional, provincial and local levels.

As far as the practical use of forecast/predictions is concerned, IR includes some apparently contradictory statements and conclusions, illustrated in the following.

“At the present time, earthquake probabilities derived from validated models are too low for precise short-term predictions of when and where big quakes will strike; consequently, no schemes for “deterministic” earthquake prediction have been qualified for operational purposes. However, the methods of probabilistic earthquake forecasting are improving in reliability and skill, and they can provide time-dependent hazard information potentially useful in reducing earthquake losses and enhancing community preparedness and resilience” (pag. 320)

“Properly applied, short-term forecasts have operational utility; for example, in anticipating aftershocks that follow large earthquakes. Although the value of long-term forecasts for ensuring seismic safety is clear, the interpretation of short-term forecasts is problematic, because earthquake probabilities may vary over orders of magnitude but typically remain low in an absolute sense (<1 % per day). Translating such low-probability forecasts into effective decision-making is a difficult challenge.” (pag. 319)

Although the focus is frequently shifted from big quakes to short-term aftershocks forecasting, there is no answer to the following question: are the existing probabilistic forecasts useful for operational purposes or not? According to IR, the (necessarily) low probability estimates are not really helpful for decision making. Rescaling probabilities to the larger territory of preparation of a strong earthquake thus appear more realistic from a physical, statistical, and operational point of view.

5 Existing operational practice in Italy

On a global scale, the existing practices of widely advertised “long-term time-independent earthquake forecasting models” (e.g., GSHAP PGA map) and “routine use […] of operational earthquake forecasting” (i.e., STEP) in California (IR, pp. 358, 359), when compared against actual seismic activity, are proved to be severely misleading and therefore useless for any kind of responsible seismic risk evaluation and knowledgeable disaster prevention (Кossobokov and Nekrasova 2010, 2011; Kossobokov 2005, 2008, 2009), even if it keeps being advocated as promising (Jordan et al. 2011; Lee et al. 2011). On the other side, the established routine practice of the intermediate-term middle-range diagnosis of Times of Increased Probability for the great (M8.0+) and the major (M7.5+) earthquakes worldwide (see http://mitp.ru/en/default.html), i.e. the on-going real-time prediction experiment started in 1992 based on seismicity patterns (Healy et al. 1992), is ranked as statistically significant in IR, contradicting erroneous conclusions by Marzocchi et al. (2003).

A similar experiment started in Italy in 2002 (Kossobokov et al. 2002; Peresan et al. 2005), aimed at a real-time testing of M8S and CN predictions for earthquakes with magnitude larger than a given threshold (namely 5.4 and 5.6 for CN algorithm, and 5.5 for M8S algorithm). Predictions are regularly updated every 2 months and a complete archive of predictions is made available online (http://www.ictp.trieste.it/www_users/sand/prediction/prediction.htm), thus allowing for a rigorous testing of the predictive capability of the algorithms. The results obtained during almost a decade of real-time monitoring already permitted a preliminary assessment of the significance of the issued predictions (Peresan et al. 2011). The prediction experiment by CN and M8S algorithms, so far, the only formally validated tools for anticipating the occurrence of strong Italian earthquakes, started much earlier than model testing within the Collaboratory for the Study of Earthquake Predictability—Testing Region Italy (August 2009; http://cseptesting.org/regions/italy).

The prediction of ground shaking is considered in IR a issue of high importance:

“From an operational perspective, the demonstration of forecasting value may best be cast in terms of ground motions. In other words, the evaluation of earthquake forecasts is best done in conjunction with the testing of seismic hazard forecasts against observed ground motions”

This is precisely what is done in the existing operational practice of definition of time-dependent scenarios for the territory of Italy carried on routinely in the framework of the Agreement between Friuli Venezia Giulia Civil Defence and the Abdus Salam International Centre for Theoretical Physics (“Convenzione PCFVG-ICTP: aggiornamento delle previsioni CN ed M8S e scenari di moto del suolo” (DGR 2226 dd. 14.9.2005 and DGR 1459 dd. 24.6.2009; http://www.regione.fvg.it/asp/delibereinternet/reposit/DGR2226_9_20_05_12_53_12_PM.zip). In particular, every 2 months since 2005, the intermediate-term middle-range predictions and the related estimates of ground motion from the expected earthquakes are routinely updated, following an integrated neo-deterministic approach (Panza et al. 2001; Peresan et al. 2002, 2011) and reported to Civil Defence. Algorithm CN failed to predict L’Aquila (M = 6.3, 2009) event, since its epicenter has been located about 10 km away from the alarmed area, nevertheless the ground motion scenarios computed accordingly with the rules of the Agreement PCFVG-ICTP did quite well predict the maximum intensities observed in L’Aquila (Panza et al. 2009; Peresan et al. 2011).

Moreover, the current operational implementation of forecast/prediction methods integrating Earth Observations and Seismological Data, represented by the SISMA Prototype System (http://sisma.galileianplus.it/) developed for the Italian Space Agency (ASI; http://www.asi.it/) fits perfectly one of the IR Recommendations:

“Sustain the development and implementation of capabilities to integrate seismic and geodetic data streams collected by different organizations to provide a real-time processing infrastructure, so that basic data and information derived from it can be provided consistently and quickly.” (pag. 364)

In the framework of ASI-SISMA project (Crippa et al. 2008; Panza et al. 2011b), an integrated prototype system for real-time joint processing of seismic and geodetic data streams is made available to the Civil Defence of the Friuli Venezia Giulia Region for independent testing. The SISMA prototype is fully formalized and highly automated (including version control for software and products), so as to provide a reliable tool for systematic real-time monitoring of deformations and seismicity patterns.

Surely, verification and validation of the different forecast/prediction methods is the necessary step before the implementation process starts and the rules of the Game for the CSEP-Testing Region Italy (see http://www.cseptesting.org/sites/default/files/Rules_of_the_Game_Italy.20090506.pdf) should be improved to remove the following basic shortcomings:

  • Errors in the input data: the rules of the game (Points 6 and 8) state that: “Models will be evaluated against the authoritative observed data supplied by INGV” and “The official bulletin for future earthquakes that will be used for evaluation of the forecasts is the INGV bulletin […] The INGV ML magnitude scale will be considered the reference scale for model development and testing”. Unfortunately, the authoritative data set for prediction testing spans just a few years (since April 2005), while the remaining proposed data are discontinuous (in the period 2003–April 2005), or insufficiently complete (e.g. declustered catalogue CPTI, Rovida et al. 2008) and eventually heterogeneous (Romashkova et al. 2009);

  • Short testing time interval: 5-year testing is too short to reach any conclusion about the effectiveness and reliability of predictions related to the large earthquakes;

  • Violation of real-time testing: “Tests are performed with a delay of 30 days relative to real-time, in order for the authoritative data to be manually revised and published.”(Point 8) The time delay makes forecast/predictions retrospective.

6 Conclusions

The issues and decision-making problems related with seismic hazard assessment have been explored by Stein (2010) and provided matter for significant debate in the last years (Panza et al. 2011a). Surprisingly enough, when discussing about seismic hazard, IR refers just to PSHA, completely ignoring other classical and recent deterministic approaches that are comprehensively described in the literature since the early review by Reiter (1990) and have relevant applications also in Italy (e.g. Zuccolo et al. 2011). This is particularly critical for PSHA performances (Stein et al. 2011), proved very unsatisfactory at the occurence of the recent destructive earthquakes by Kossobokov and Nekrasova (2011). Accordingly the Italian Parliament Resolution 8/00124 on “Recommended modifications of the Italian design rules for seismically isolated structures”, approved on 8 June 2011 (Camera dei Deputati 2011), explicitly mentions the need to resort to physically sound deterministic methods for seismic hazard assessment, like the NDSHA (Panza et al. 2011b). The IR report provides a quite extensive overview of several existing forecast/prediction methods but it seems somehow incomplete in that it generally recommends the use of probabilistic forecasts while implicitly discouraging deterministic ones, which is in contrast to IR Recommendation G2. Any forecasting/prediction tool, to be effective, must demonstrate its capability to anticipate large earthquake occurrences and related ground shaking, a result that can be attained only through a rigorous verification and validation process. Retrospective testing suggested by IR is not performed on independent data and, therefore, may look encouraging but it is insufficient for operational use. So far, prospective evidence, including that from Italy, is given only for “deterministic” forecast/prediction methods based on seismicity patterns. Effectiveness of probabilistic forecasts in operational procedures is hampered by: (a) the very low probability estimates associated with large earthquakes in small areas; (b) the need to define and use a probability threshold that, according to the IR definition, turns forecast into prediction. Some critical comments about the conclusions that nowadays no short-term prediction is possible and whatever proposed forecasting is not scientifically based are given by Grandori and Guagenti (2009). They treat in some detail drawbacks of decisions taken in uncertain conditions, discussing the specific case of the 2009 L’Aquila earthquake. Thus, the paper by Grandori and Guagenti (2009) and the recommendations of IR contradict the conclusions of the Committee of Experts (Commissione Grandi Rischi) formulated at the time of L’Aquila event (Memoria del Pubblico Ministero 2010). A more specific discussion of IR and of its relation to L’Aquila event is outside the purpose of this general-scope paper and will be the subject of a forthcoming paper.