1 Introduction

Establishing the relationships between observable effects of an earthquake and the instrumentally recorded data is a difficult task, especially because the macroseismic data are the combined results of strong motion effects, building vulnerability, site amplification, and other factors (Caprio et al. 2015; Faenza and Michelini 2011; Cua et al. 2010; Allen and Wald 2009). In turn, strong-motion recordings can be affected by several features, such as the quality of the recording instrument and of its installation, the proximity to civil structures, and site amplification phenomena. Therefore, while calibrating these relationships, a large quantity and high quality of both macroseismic and strong-motion recordings has to be considered.

This paper demonstrates the usefulness of bridging the macroseismic and the accelerometric databases run at the Istituto Nazionale di Geofisica e Vulcanologia (INGV) in coordination with the Italian Department of Civil Protection (DPC) by including external identifiers right into their database schema. A web-tool called Rosetta (a name reminding the stele with a decree written in ancient Egyptian hieroglyphs, Demotic, and ancient Greek) was setup as an experimental proof-of-concept to test linking procedures and user friendly solutions among the considered databases and their heterogeneous data types. Finally, Rosetta was tested by non-IT experts to derive a preliminary relationship between Peak Ground Acceleration (PGA) and macroseismic intensity expressed in MCS scale (Mercalli-Cancani-Sieberg; Sieberg 1930).

2 Data

Italy has a very long tradition in collecting, handling and producing seismological data (Camassi 2004), both in terms of earthquakes effects documented by observers, and instrumentally recorded data. With the advent of the Internet and the Web, the process of retrieving data became a trivial task, especially for digitally native data. As the data available on the Web experienced an exponential growth, it is important to be able of identifying which data sources can be considered trustworthy.

Currently, macroseismic data are made available by the Italian Macroseismic Database (DBMI), which is updated on a multiannual base, and makes available intensity data related to earthquakes contained in the parametric earthquake catalog of Italy (CPTI). On the other hand, accelerometric data are made available by the Italian Accelerometric Archive (ITACA), which is an all-in-one solution for accessing strong-motion recordings, accelerometric stations information, and the related list of earthquakes. The version 2011 of DBMI and CPTI covers a time-window 1000–2006, whereas the version 2 of ITACA reports events from 1972 to 2014. The present study considers the overlapping period, from 1972 to 2006.

2.1 The database of the Italian macroseismic observations, DBMI

Earthquake effects are classified through macroseismic observations that can be determined without the use of instruments (Musson and Cecić 2012). Effects are expressed in terms of intensities, a classification (not a measure) of the strength of shaking at a place during an earthquake. An intensity classifications system is called “scale”, and since the first one introduced by Rossi-Forel in 1883, multiple scales have been developed (see Musson et al. 2009 for a comprehensive description). One of the most recent one is the EMS-98 scale (European Macroseismic Scale; Grünthal 1998), an evolution of the previous MSK scale (Medvedev-Sponheuer-Karník; Medvedev et al. 1964). Despite there have been multiple attempts to introduce the use of EMS-98 scale in Italy (e.g., Molin 2003; Molin et al. 2010; Tertulliani and Galli 2012), the most widely used scale in Italy remains the MCS scale.

A Macroseismic Data Point (MDP) is made of three components: (1) the date of the event, (2) a location, expressed in terms of geographical coordinates usually associated to a place name, and the (3) assessed intensity. Old macroseismic scales clearly define what an intensity is, but do not usually provide a precise and concise definition of what should be intended with the interchangeable terms location, place, or locality, as scales apparently are taking for granted that such terms cannot be misinterpreted. However, real life examples might result in a difficult definition on what a locality precisely is (e.g. Fig. 1).

Fig. 1
figure 1

Examples of the distributions of localities (dots) in a municipality (dotted lines), and provinces boundaries (continuous lines): a localities in the municipality of Cento and surroundings, b detail of the locality of Casumaro, which is administratively subdivided between two municipalities and two provinces, and considered one locality in DBMI for assessing a macroseismic intensity

As far as earthquakes occurred prior to the twentieth century are concerned, some European countries (e.g., France: Scotti et al. 2004; Switzerland: Fäh et al. 2011) have one source of macroseismic data only. Conversely, in Italy there are multiple sources of macroseismic intensity data, most of them available as published articles or internal reports written by a multitude of different authors from different institutions. Data sources considered by DBMI can be subdivided in three categories:

  • Studies based on historical research, relying on the investigation of non-seismologically oriented written sources (Stucchi and Albini 1991; Guidoboni and Stucchi 1993);

  • Studies based on field surveys, conducted by teams of experts (Cecić and Musson 2004; Camassi et al. 2009);

  • Macroseismic Bulletins, a periodic publication made by Istituto Nazionale di Geofisica between 1980 and 1999, and by Istituto Nazionale di Geofisica e Vulcanologia from 2000 up to 2005; it is based on a questionnaire sent out after earthquakes with magnitude 3 or larger, and filled by selected correspondents such as municipality representatives, military divisions, or civil corps, depending on the area (Gasparini et al. 1992; De Rubeis et al. 1992).

Users of macroseismic data are facing a very complicated situation due to the existence of multiple sources of data, sometimes related to the same earthquake, and sometimes providing conflicting interpretations. In addition, this complex scenario has to deal with an evolving knowledge, as new research is performed on historical earthquakes, and new publications may dramatically revise earthquake scenarios depicted by previous publications. As taking every possible publication into consideration is a necessary condition to compile a comprehensive and reliable national catalog of earthquakes, DBMI collects each published article, parses its text (or map) content in order to retrieve intensity data, establish relationships among different sources, and eventually select the most reliable interpretation. Additionally, each considered observation is geographically homogenized using a common Gazetteer, thus fixing potential georeferencing errors in the original data source. The georeferencing activity is particularly important for all publications made roughly prior to 2005, when free and easy-to-use Web mapping solutions such as Google Maps or Bing Maps were not available.

INGV release a DBMI version on a multiannual base, an updated snapshot of the collected intensity data aimed at providing input data to official catalog of earthquakes, CPTI. The first version of DBMI (“DOM”: Monachesi and Stucchi 1997) had a time-window 1000–1992, for a total number of 935 earthquakes and 30,119 MDPs. The second version (“DBMI04”: Stucchi et al. 2007) covers a time-window from 217 BC to 2002 AD, for a total of 1041 earthquakes and 58,146 MDPs. The latest version (“DBMI11”: Locati et al. 2011) span the time-window 1000 AD-2006 AD, for a total number of 1681 earthquakes and 86,071 MDPs. The plot of the maximum intensities of DBMI11 is shown in Fig. 2a, whereas the Fig. 2b shows the time-window 1972–2006.

Fig. 2
figure 2

Plot of the maximum observed intensities in DBMI11: a 1681 earthquakes, 86,071 MDPs distribute in 14,150 localities of the time-window 1000–2006; b 341 earthquakes, 41,933 MDPs distributed on 9446 localities of the time window 1972–2006

Each release of DBMI can be accessed via its official website (http://emidius.mi.ingv.it), and it can be queried both by earthquake or locality. If the number of reported MDPs in a place is greater than two, DBMI allows the user to retrieve the so called “place seismic history”, the list of observed intensities reported at the place.

2.2 The parametric catalogue of the Italian earthquakes, CPTI

CPTI makes available parameters (epicentral location, magnitude and their uncertainties) related to earthquakes of interest for the Italian territory occurred since the year 1000 and documented in the scientific literature. CPTI is the national reference catalog, therefore it is used in a wide range of activities, such as seismic hazard assessment (MPS Working Group 2004), natural hazard emergency plans for Italian municipalities, public awareness teaching courses, and many others.

Covering such a long seismic history requires taking advantage of heterogeneous sources of information such as macroseismic as well as instrumentally recorded data, that CPTI blend together in a unique and comprehensive solution. Parameters related to historical earthquakes are based on macroseismic data only released by the DBMI, whereas for more recent earthquakes, macroseismic observations are integrated by instrumental data from a selection of instrumental earthquake catalogs released in Italy, and catalogs from the surrounding countries. Indeed, not all the known earthquakes are described by a number of MDPs, whereas many recent earthquakes have parametric data derived from instrumental recordings only.

CPTI is released on a multiannual base, in tune with a corresponding release of DBMI. The last version is CPTI11 (Rovida et al. 2011) which contains 3182 events (Fig. 3a), covering the time-window 1000–2006 AD, and has a reference threshold of Mw ≥ 4.5 or an epicentral intensity ≥5–6 MCS. CPTI11 reports 751 events in the time-window 1972 and 2006 (Fig. 3b). Around twenty instrumental earthquake catalogs were selected for compiling CPTI11, among which the most important in terms of contributed data are ISC Bulletin (ISC, International Seismological Centre), CSI 1.1 (Castello et al. 2006), CSTI 1.1 (CSTI Working Group 2005), Bollettino strumentale (Istituto Nazionale di Geofisica 1982–1996; INGV 1983–2008), and the PDE Bulletin (NEIC, National Earthquake Information Center). When both macroseismic and instrumental parameters are available, the two determinations and a default one are provided (in this case, the epicenter is selected according to expert judgement, while Mw is obtained as a weighted mean). For some events with poor macroseismic data only, the earthquake is listed, but no macroseismic parameters have been determined.

Fig. 3
figure 3

Plot of the epicenters in CPTI11: a 3182 earthquakes of the entire time-window 1000–2006; b 751 earthquakes of the time window 1972–2006

2.3 The Italian accelerometric archive, ITACA

Italian accelerometric data are released by multiple institutions, according to different aim, standard, and formats. The difference in the data collecting procedure with respect to macroseismic observations, is that accelerometric data are already digitally structured, and mostly released through the Web, thus leading to an easier data retrieval procedure.

The Italian Strong Motion Database, ITACA (Luzi et al. 2008; Pacor et al. 2011), was developed in the framework of the agreements between the Italian Department of Civil Protection (Dipartimento della Protezione Civile, DPC) and the Istituto Nazionale di Geofisica e Vulcanologia (INGV), starting from 2005. The majority of strong-motion data have been recorded by the Italian Strong-motion Network (RAN), operated by the DPC, and by the National Seismic Network, operated by INGV. The version 2.0 of ITACA, released in January 2015, contains data from more than a dozen seismic networks, with about 7695 processed three-component waveforms generated by 377 M > 4 earthquakes (Fig. 4a) occurred in the period 1972–2015. ITACA reports 529 events in the time-window 1972 and 2006 (Fig. 4b). It is worth mentioning that ITACA reports many analog recordings in this time-window, as digital accelerometers started spreading in Italy only in the late ‘90s, and it took about 10 years to upgrade most of the RAN network. Since the transition to the digital era, the accuracy of the strong motion parameters increased dramatically, as limitations such as the late triggering, and the limited bandwidth were solved (Paolucci et al. 2011).

Fig. 4
figure 4

Plot of the epicenters in ITACA 2.0: a 1214 earthquakes of the entire time-window 1972–2014; b 529 earthquakes of the time window 1972–2006

Since accelerometric records may contain instrumental or environmental noise in both time and frequency domains, data need to be processed before being utilized, in order to consider only the part of the recordings which represents the seismic signal. The use of non-processed waveform may include errors in the time-series or in the derived intensity measures (e.g. peak ground acceleration or velocity, Housner intensity, etc.). ITACA adopts a robust processing schema developed by Paolucci et al. (2011) which is applied to each waveform after visual inspection by an analyst.

ITACA reports recordings from a total number of 967 accelerometric stations (Fig. 5a), and for some of them it provides a site characterization in terms of metadata describing the location, housing, geological and geomorphological details, shear-wave velocity profiles and resonance frequencies obtained by the analysis of ambient noise, if any. This information may be useful to better understand the peculiarities of a waveform, and to establish a more appropriate link between localities with reported macroseismic intensities and accelerometric stations.

Fig. 5
figure 5

Plot of the stations reported in ITACA 2.0: a all 967 stations; b the 77 stations operating since 1975 or, at least, for 25 years, selected in this study

3 “Rosetta”: linking localities, stations, and events

DBMI11 reports 41,933 MDPs in the period of time shared with ITACA (1972–2006), covering 9446 localities (Fig. 2b) out of the 70,000 localities reported in the reference Gazetteer. The aim of Rosetta at the present development stage, is to investigate the possibility to establish a direct link between localities quoted in DBMI and the 77 selected stations in ITACA (Fig. 5b) operating since 1975 or, at least, for 25 years out of the whole 967 available stations (Fig. 5a) (Gomez Capera et al. 2015). These 77 stations were identified in the framework of the site-specific characterization activities of the project S2 of the 2014 agreement between DPC and INGV (Luzi et al. 2015; Felicetta et al. 2016).

Prior to establish links among DBMI, CPTI, and ITACA, we compared the respective database-schemas, and the most relevant identifiers:

  • DBMI has identifiers for earthquakes, MDPs, and localities;

  • CPTI has identifiers for earthquakes;

  • ITACA has identifiers for earthquakes, networks, stations, locations, instruments, and streams.

While the current version of CPTI and ITACA let their identifiers public, DBMI does not publish its internal identifiers through the website, this might change in the future. At the time of developing Rosetta, all databases were accessed only from their respective websites, as no computer friendly interface or applications program interfaces (APIs) were available for dealing with automatically generated queries. The situation will change soon, as all involved Working Groups have decided to implement a web service following the standard proposed by the International Federation of Digital Seismograph Networks (FDSN, http://www.fdsn.org/), even if there are not easily solvable issues in these standards. For instance, the FDSN-event service adopts QuakeML v1.2 (Schorlemmer et al. 2004, 2011), an XML dialect describing event parameters, which does not support macroseismic data type, while the FDSN-dataselect is specific for continuous streams but not for event-related waveforms. Because of these issues, the development of web services is temporarily postponed, and consequently, a system able to automatically download, compare data using complex queries, and establish links autonomously is also postponed. As for the exercise presented in the present study, the linking between the coordinates of the localities and of the stations was manually performed in a geographic information system (GIS). A simple geographic approach was adopted as a more complex approach (e.g., by taking into account site specific conditions) would require an amount of work not planned for this exercise. As a result of the matching procedure, all stations has at least one locality within 3 km (Fig. 6a), but when taking into account those localities quoted in DBMI11 only, the maximum distance reaches 6 km (Fig. 6b). The resulting list of locality-station pairs is reported in the Online Appendix 1.

Fig. 6
figure 6

Distance between stations and localities: a considering all existing localities; b considering those localities quoted in DBMI11 only

The association of earthquakes entries was performed between CPTI11 and ITACA, as earthquakes in DBMI11 are a subset of those reported in CPTI11. In the overlapping period 1972–2006, CPTI11 contains 751 earthquakes, while ITACA reports data for 529 earthquakes. Also in this case, the linking procedure was manually performed because of the current lack of web services, and considered the limited number of events. The common events are 265, that is 35 % of CPTI11 events, or 50 % of ITACA events. The list of the 265 common events is reported in Online Appendix 2.

The adopted solution for settling the resulting associated identifiers between localities and stations, and among earthquake entries, is to accommodate the corresponding external identifier in each database schema. By including these identifiers, the database maintainer has an accurate control on the linked elements, whatever linking procedure will be implemented.

During the implementation of Rosetta, the authors decided to develop a prototype website, and made it openly available (http://emidius.mi.ingv.it/DPC/S2-2014). Thanks to the extensive use of identifiers, all elements, stations, places, events, waveforms, and intensities, are directly linked to the relevant resource in their respective website. In addition to traditional HTML based tables, the Web interface of Rosetta allows the users to display the surroundings of each station using an interactive, 3D model of a DTM (Digital Terrain Model) covering a square area of 10 × 10 km. (Fig. 7). The technical implementation is based on a WebGL solution (Sandvi 2013). The DTM is derived from the TINITALY dataset (Tarquini et al. 2007, 2012), and can be rendered as flat shaded surface, or using satellite images publicly available (MATTM 2012).

Fig. 7
figure 7

Example of the Rosetta Web interface showing the surrounding area (10 km × 10 km) of the “Atina” (ATN) station. The interactive scene is centered at the station (red label), the closest place (green label), and other places (black labels) from DBMI11 are represented also

4 Investigating the relationships between PGA and intensity

Empirical relationships between MCS intensity and PGA was achieved by coupling the MDP with the strong-motion stations existing in a radius of 7 km. The associated dataset resulted in 118 pairs of site Intensity-PGA (Fig. 8a; Online Appendices 3, 4) from 53 Italian earthquakes in the time-window 1976–2003. Even if not included in DBMI11, the 6 April 2009 Aquila earthquake was considered as both MDPs and Strong-Motion Recordings (SMRs) are available, respectively from Galli et al. (2009), and from Chiarabba et al. (2009). The Mw ranges from 3.9 to 6.9 (Fig. 8b), with a site intensity ranging between 3–4 and 8–9 MCS.

Fig. 8
figure 8

Plot of the 53 earthquakes in the dataset of this study: a the circle size is proportional to the number of strong-motion stations that recorded the seismic event; as an example, the purple circles represent the 2009 Aquila earthquake (Mw 6.3) which is associated to 13 Intensity-PGA pairs, and the 1980 Irpinia-Basilicata earthquake (Mw 6.89), associated to 12 pairs; b the circle size is proportional to the magnitude

The methodology used is given by a simple predictive linear relationship between site intensity and logarithm PGA, as the only independent variable. To account for the uneven distribution of GMP corresponding to a single macroseismic intensity, a mean of PGA is assigned to each macroseismic intensity class from the 118 distributed macroseismic intensity-GMP pairs. The same approach was applied among others by Bilal and Askan (2014) for Turkey, and Tselentis and Danciu (2008) for Greece.

The correlation and regressions of macroseismic intensity versus mean of Log PGA is:

$${\text{Intensity}} = - 0.64 + 3.58\,{\text{LogPGA}}({\text{cm/s}}^{2} )\quad{\text{with}}\,\,\sigma = 0.69$$

The PGA range in the regression is 5.36 ≤ PGA (cm/s2) ≤ 644.2 (Fig. 9).

Fig. 9
figure 9

Distribution of the mean and standard deviation error bars for Log(PGA) for the macroseismic intensity-PGA relationship obtained in this study. The dashed lines represent the σ

The regression proposed in the present study is compared to three alternative regressions:

  • Faccioli and Cauzzi (2006) based on a dataset of 75 pairs, related to 26 earthquakes;

  • Gomez Capera et al. (2007) based on a combined dataset by Faccioli and Cauzzi (2006), and the dataset by Margottini et al. (1992) that is based on 56 pairs, related to 26 earthquakes;

  • Faenza and Michelini (2010) based on 141 pairs, related to 40 earthquakes.

The resulting comparison among the proposed regression and the three selected studies is shown in Fig. 10. The uncertainties expressed as standard deviation (±σ) to each regression are also shown. The proposed regression has a value of slope coefficient equal to Gomez Capera et al. (2007) regression. The regression proposed by Faenza and Michelini (2010) is closest to the proposed model, within the uncertainty (±0.7) in the macroseismic intensity range 6 ≤ I ≤ 8. It is observed that for I < 6 the Faenza and Michelini and Faccioli and Cauzzi regressions overestimate the intensity values by approximately one intensity degree. The I ≥ 7 range values shown in Fig. 10 indicate that Faccioli and Cauzzi (2006) overestimates the PGA values. The comparison among these studies is difficult due to the different number of observations and earthquakes, and different regression techniques adopted by these studies to fit the data.

Fig. 10
figure 10

Comparison of the macroseismic intensity-PGA relationship obtained in the present study with similar relations from previous studies in Italy

The Rosetta website helped and speedup in many aspects the performed analysis, but additional features could be implemented, such as making data downloadable, and allowing the user to customize the default associations between station and locality. Nevertheless, the use of ready-made associations between macroseismic and accelerometric data, greatly simplified the building of the dataset, and, once the upcoming release of ITACA and DBMI will be available, it will allow a newer, more reliable study on the relationship between PGA and intensity.

5 Conclusions

The Rosetta prototype website links macroseismic and accelerometric data in an integrated and user-friendly solution. A test phase aimed at investigating the relationships between intensity and accelerometric data was performed, and the ready-made intensity-accelerometric data pairs were particularly useful for avoiding the task of searching, downloading, re-formatting and establishing links among raw data. During the test, new features were suggested and implemented. New ideas for improving Rosetta were drafted, such as hints for emphasizing possibly missing macroseismic datasets or vice versa, and solutions for using FDSN based web services.

During the development of Rosetta, various methods were tested for linking data in DBMI, CPTI, and ITACA, each containing a different data type, and structured using a different logic. Rosetta demonstrated the usefulness of external identifiers for enabling the finest level of control over cross-database bridging solutions.

New releases of CPTI, DBMI, and ITACA will benefit from this experience also, as new links among these databases will be implemented directly in their Web user interface, together with new links to external online resources such as earthquake catalogues, scientific papers, or other type of supplementary datasets. In fact, an increasing amount of data related to site-specific conditions are being made available through station-books, or microzonation activities performed in Italy at regional level. These very detailed data, once combined with their associated web services, will allow an easier connection of the available scientific knowledge, greatly extending possible cross-disciplinary research activities.