1 INTRODUCTION

In the last several years, approaches to seismic hazard assessment based on the principles of pattern recognition and system analysis have considerably improved. They include both modern versions of classical methods developed back in the 1970s and the fundamentally new methods of system analysis and their corresponding algorithms. Studies in this field have largely been supported by the Russian Science Foundation project no. 15-17-30020 “Application of System Analysis for Estimation of Seismic Hazard in the Regions of Russia, Including the Caucasus–Crimea and the Altai–Sayany–Baikal,” which was implemented in 2015–2017 by the Geophysical Center of the Russian Academy of Sciences (GC RAS) jointly with the Institute of Earthquake Prediction Theory and Mathematical Geophysics, Russian Academy of Sciences (IEPT RAS). In the framework of this project, new pattern recognition methods were developed and the classical approaches to recognition of seismic zones were refined. All of them were successfully applied to assess seismic hazard in several Russian regions, including the Caucasus–Crimea and the Altai–Sayany–Baikal. For a comprehensive analysis of the results obtained by different methods, they were integrated into a unified environment employing a system approach to geodata consideration. The present work is dedicated to describing this system.

A geographic information system (GIS) is designed to collect, store, analyze, and visualize spatial data and the associated semantic information. The main GIS developers are well-known companies, such as ESRI (Mitchell, 2005), MapInfo, Autodesk, Intergraph, Golden Software, etc. Many problems in Earth sciences, including seismic hazard assessment (e.g., (Kuznetsov and Gitis, 2004)), coupled with the constantly increasing volume of spatial data, impose new requirements on GIS, in particular, the possibility of integrating the intelligent component and decision-making system (Krasnoperov and Soloviev, 2015; Nikolov et al., 2015; Soloviev et al., 2016, 2018; Bondur, 2014). The GIS discussed in the present work is an important step toward fulfilling these requirements.

The purposes of the presented GIS and the corresponding spatial database (hereinafter, DB) are as follows:

(1) availability of an integrated environment for preparation, visualization, and analysis of results on recognizing possible areas where the largest, large, and significant earthquakes can occur;

(2) support of geodata system analysis;

(3) development of intelligent multifunctional GISs;

(4) the possibility of integrated seismic hazard assessment based on the results of independent recognition algorithms;

(5) mitigation of earthquake aftermaths in earthquake-prone regions.

Development of the GIS involved three stages:

(1) collection and conversion of point, linear, and raster initial data to the GIS format;

(2) creation of a unified DB for storing initial data and the results of integrated data processing;

(3) implementation of a multifunctional user interface for multicriterion queries to the DB in the GIS environment, in order to provide multifactor seismic hazard assessment for a region set by the user.

At the moment, the system is available as a stand-alone desktop application.

The software shell for DB operations was implemented in ESRI ArcGIS for a Desktop environment (Fu, 2016); all user query modules were written in Python script using the ArcPy library (Zandbergen, 2014). The data can be stored in widely used formats, predominantly shapefiles (ESRI shapefile, 1998) and raster images with georeferencing. The cartographic base is automatically loaded from ArcGIS Online, which is a cloud platform for general access to basic maps published by ESRI.

The obtained result is a first approximation in the creation of an intelligent decision-making system for assessing seismic hazard and risks for a wide range of users, including researchers, decision-makers, and the relevant services in earthquake-prone regions. This is the first system of its kind; no earlier analogs have been developed. In its final version, it is planned to have a certain similarity to the Yandex Navigator app, the main purpose of which is to optimize laying out of a course for car drivers taking into account the current traffic situation.

Section 2 describes the created DB, which included both initial data and their processing results, which in turn are used for seismic hazard assessment and decision-making. Section 3 illustrates the data and provides examples of operations with them; the user interface for DB management in the GIS environment and examples of user queries are also discussed. The operation of the developed system is exemplified with the Caucasus–Crimea region. Final statements are provided in the concluding section.

2 INITIAL DATA

The initial data, which represent basis of the developed GIS, include the following set of digital special geodata layers:

(1) geographic base of the region;

(2) cities and other localities;

(3) calibrated regional earthquake catalog;

(4) morphostructural zoning (MSZ) results (Gorshkov, 2010);

(5) satellite images required for constructing MSZ schemes;

(6) results of EPA (earthquake-prone areas, i.e., areas where strong earthquakes can occur) recognition obtained using two-class-based learning (Gel’fand et al., 1972);

(7) auxiliary layers of geological-geophysical data used in calculation of characteristics of objects for implementing EPA recognition;

(8) results of recognition of areas where strong earthquakes can occur based on the Barrier algorithm (Gvishiani et al., 2017a) with learning based on one highly seismic class;

(9) results of recognition of highly seismic zones using the FCAZ system (Gvishiani et al., 2013a);

(10) values of the coefficients of the general law of earthquake similarity (Kosobokov, 2005) in (a) regular grid nodes and (b) intersections of morphostructural lineaments;

(11) results of modeling block structure dynamics and seismicity (Ismail-Zadeh et al., 2007), including (a) a block structure scheme and (b) catalog of synthetic earthquakes.

Below is a description of the mentioned layers (recall that the data are illustrated and examples of operations using them are provided in Section 3).

The geographic base of a region includes digital maps (both physiographical and road), as well as an administrative map showing subdivision of countries into regions and districts.

The calibrated earthquake catalog was compiled from all available sources, among which the most reliable magnitude and epicentral coordinates were chosen. The initial catalogs contain different magnitudes: body wave magnitude (Mb), surface wave magnitude (Ms), local magnitudes, etc. Due to the specifics of the used recognition algorithms, only one magnitude value is assigned to each earthquake, e.g., the maximum magnitude value from all catalogs listing this event. Also, initial catalogs may provide different epicentral coordinates; in this case, only the coordinates of the most reliable source are included. For the territory of the Caucasus and Crimea, the catalog contains 31772 events.

Resulting from the MSZ, which is based on the idea about the block structure of the crust, three types of block structure elements are distinguished in the region under study: hierarchically ordered blocks, morphostructural lineaments representing block boundaries, and morphostructural nodes, which are block junctions where lineaments intersect. In accordance with the hierarchy, blocks are assigned with ranks from 1 (highest) to 3. Lineament rank is determined from the ranks of blocks it separates (if two blocks belong to different ranks, the higher one is assigned as the lineament rank). Lineaments are subdivided into longitudinal (i.e., predominantly running along the main geological structures) and transverse. Positions of blocks, lineaments, and nodes are determined by special-purpose analysis of the Earth’s surface relief using topographical maps and satellite images (see, e.g., (Bondur and Zverev, 2005, 2007; Bondur et al., 2012)), taking into account data from geological and tectonic maps.

The MSZ technique as applied to mountain regions is discussed in detail in (Alexeevskaya et al., 1977а, 1977b; Rantsman, 1979; Gorshkov, 2010). The MSZ scheme is the initial basis for recognizing locations where the largest, large, and significant earthquakes may occur by the EPA method that is based on the hypothesis that the epicenters of relatively strong earthquakes coincide with morphostructural nodes. The geomorphic arguments for this hypothesis were formulated in (Rantsman, 1979), and it was also supported by statistical analysis of mutual locations of earthquake epicenters and intersections of morphostructural lineaments (Gvishiani and Soloviev, 1981). The developed DB contains the complete set of lineaments and their intersections for the region under study.

As noted above, in constructing MSZ schemes, satellite images of Earth surface are used, and these images contain informative signs of linear landforms. The developed DB includes images of the Caucasus–Crimea region in the visible spectrum, obtained from the open collection of ArcGIS World Imagery (ArcGIS World Imagery, 2018), which is a compilation of corrected images shot by multiple satellites in the period since 1990 (see the detailed list of satellite missions in (World Imagery Map Contributors, 2018)). The resolution of images for the considered region is about 50 cm, which is quite sufficient for constructing regional MSZ schemes. Figure 1 shows the constructed MSZ schemes superimposed on satellite images; both data sets are stored in the DB.

Fig. 1.
figure 1

MSZ schemes (Soloviev et al., 2013; Gorshkov, 2017) constructed for (a) the Crimea Mountains and (b) the Caucasus, superimposed on satellite images of the respective regions (images from ArcGIS World Imagery collection (ArcGIS World Imagery, 2018)).

The EPA method developed by Gel’fand et al. (1972) is designed to recognize areas where strong earthquakes can occur. It was applied earlier to many seismically active regions of the world. The reliability of the results obtained by this method is validated by data on strong earthquakes that occurred in the considered regions after the performed EPA recognition (Soloviev et al., 2014). Importantly, the EPA method works by leaning on high and low seismicity classes with subsequent dichotomy of the entire set of objects. The EPA method implies that each intersection of lineaments within the studied region is described by the same set of quantitative characteristics determined from topographical, geological, geomorphic, geophysical, and other data. The described DB includes the layers of geological–geophysical data covering the territories of the Caucasus and Crimea; the characteristics of intersections, as well as the results of EPA classification of intersections as either highly seismic or lowly seismic, are determined on the basis of these data (Soloviev et al., 2016; Gorshkov et al., 2017).

Thus, EPA recognition requires geological–geophysical data on the area in the vicinity of a given intersection of morphostructural lineaments to be employed. Note that all employed geophysical data are digital models obtained using remote sensing data: Bouguer gravity anomalies (EIGEN model) calculated based on data from the GOCE satellite (Forste et al., 2012; Shako et al., 2013); relief (GEBCO and ETOPO models; SRTM, SEASAT, GEOSAT, and ERS-1 satellites); and lithospheric magnetic field anomalies (WDMAM model; Orsted, CHAMP and Swarm satellites) (Lesur et al., 2016). Thus, along with the MSZ schemes, the EPA method is based on the results of remote sensing studies. In the developed GIS, program solutions were implemented to determine the values of the desired characteristics for the intersections set by the user from the database (Geospatial Database of Earth Sciences, GS RAS; see (Beriozko et al., 2008, 2011)) in the interactive mode. This increases the objectivity and reproducibility of the results obtained by the EPA method. Figure 2 shows an example of calculating the characteristics from Bouguer gravity data within 25 km of lineament intersections.

Fig. 2.
figure 2

Example of calculation of gravity field characteristics in 25-km vicinities of set points (intersections of lineaments in Crimea; marked with stars): minimum and maximum values, and amplitude of Bouguer anomalies. Values of anomalies are obtained from EIGEN-6c2 model, constructed using data from GOCE satellite (Forste et al., 2012; Shako et al., 2013).

The fundamentally novel algorithm with learning Barrier was proposed in (Gvishiani et al., 2017a) for recognizing potential highly seismic areas within mountain regions. In contrast to the EPA method, the Barrier algorithm learns only on one “pure” highly seismic class and deals only with absolute values of the geological–geophysical characteristics. Instead of the dichotomy implemented by the EPA method, the Barrier algorithm solves the problem of constructing alternatives within some finite set based on all the scalar attributes of the subset, which is closer to the only highly seismic learning class. The developed DB contains the results of potentially seismic zone recognition by the Barrier algorithm for the territories of the Caucasus and the Crimean Mountains.

The fundamental difference between the EPA method and Barrier algorithm should be emphasized. In the former case, learning is done on both the pure highly seismic class of lineament intersections, marked by the occurrence of strong earthquakes, and a mixed low-seismic class. This mixed class a priori contains highly seismic objects, otherwise the problem becomes meaningless. Thus, the second class a priori includes objects that distort the learning results. In terms of the Barrier algorithm, learning is done only on the pure highly seismic class containing no a priori errors, which, as expected, may increase recognition quality.

The Fuzzy Clustering and Zoning (FCAZ) algorithm is an essentially different approach developed in recent years. It is based on topological filtering and the distinguishing of earthquake epicenter clusters (Gvishiani et al., 2013a, 2016). The FCAZ continues the family of algorithms based on discrete mathematical analysis discussed, e.g., in (Gvishiani et al., 2003, 2004, 2010, 2013b; Agayan et al., 2016). Contrary to the EPA and Barrier algorithms, this algorithm employs recognition objects determined exclusively from earthquake catalog data. Additional geological–geophysical data are used in this algorithm at the stage of assessing the reliability of recognition results. The final result of the FCAZ algorithm is 2D zones within which epicenters of earthquakes with magnitudes exceeding a specified threshold can occur. In the case of Crimea, this threshold magnitude was set at 4.5 (Gvishiani et al., 2017b), and the corresponding value for the Caucasus was 5.0 (Gvishiani et al., 2016). The results of FCAZ recognition for the mentioned regions are also included in the DB. Operation of the algorithm is described with an example in Section 3.

The developed DB also includes estimates of coefficients A, B, and C from the Unified Scaling Law for Earthquakes (USLE) (Kosobokov, 2005; Nekrasova et  al., 2011) in the nodes of a regular geographic grid (Nekrasova and Kosobokov, 2016). USLE is an expansion of the classical Gutenberg–Richter law of earthquake similarity, which determines a quasilinear dependence of logarithm of the number of earthquakes on magnitude above some set threshold in the form of the values of coefficients A and B (slope of the line). Such a dependence is usually constructed in a unified manner for some large region, thus leading to the effect of a smoothing nonuniform distribution of seismicity within the region.

For USLE, the magnitude–frequency ratio is constructed for elementary areas within the region, so the nonuniform distribution of epicenters over a large region can be taken into consideration. This is attained by introduction of a third coefficient, C, which depends on the linear size of a particular area and takes into account fractality in the distribution of epicenters (therefore, it characterizes a locally partitioned dimension of the geoblock hosting earthquake epicenters). Thus, the higher the values of coefficients A and C and the less the value of coefficient B, the higher the seismic hazard. The developed DB stores estimates of USLE coefficients determined for intersections of morphostructural lineaments.

Lastly, the DB includes the results of modeling block structure dynamics and seismicity. The modeling is based on the following main principles (Ismail-Zadeh et al., 2007; Soloviev, 2011): (1) it is assumed that the structure is represented by absolutely rigid blocks separated by infinitely thin fault planes; (2) along fault planes and at bases of blocks, viscoelastic interaction of blocks between each other and the underlying medium takes place. At each moment of time, shifts and rotations of blocks are calculated proceeding from the condition that the entire block structure is in a quasistatic state of equilibrium. Earthquakes occur in accordance with the dry friction model at time instants when the stress-to-pressure ratio in any part of a fault exceeds the specified threshold value. The magnitude and coordinates of the epicenter are calculated for the model (synthetic) earthquakes. As a result, a catalog of synthetic earthquakes is formed, which may correspond to an arbitrary time interval, even one exceeding considerably the duration of the instrumental observation period in the studied region. The DB contains modeling results on the block structure of the Caucasus (Soloviev and Gorshkov, 2017) constructed on the basis of the MSZ scheme (Soloviev et al., 2013).

3 USER INTERFACE

The developed GIS allows a user to perform a broad range of interactive operations with the data mentioned above. First of all, a user can select the desired region among the following options:

(1) select country;

(2) select political division for set country (oblast, krai, …) (Fig. 3a);

Fig. 3.
figure 3

Setting of desired area (a) based on district, region, and country (Imereti region of Georgia) and (b) by drawing arbitrary contour on map.

(3) draw arbitrary region on map (Fig. 3b);

(4) set square area with desired side;

(5) set circular area with desired radius;

etc.

Below is the list of possible user queries to the database for the set area, implemented in the framework of the developed system:

(1) epicenters of earthquakes with magnitudes from the set range;

(2) lineaments matching the set area, with their ranks indicated;

(3) EPA recognition of strong earthquake locations from the set of morphostructural nodes;

(4) recognition of strong earthquake locations from the set of morphostructural nodes obtained by the Barrier algorithm;

(5) highly seismic 2D zones determined by the FCAZ algorithm;

(6) epicenters of synthetic earthquakes matching the set area and with magnitudes of the set range;

(7) maximum magnitude of earthquakes expected in the set area with probability p during N years.

Figures 4–10 show examples of the query results; the captions describe the queries themselves in detail. Each figure also shows the interface for query construction (gray window with data entry fields and legend with indications of resulting data layers).

Fig. 4.
figure 4

Example of query on earthquake epicenters from calibrated catalog for Georgia, magnitudes 4–7.

Fig. 5.
figure 5

Example of displayed morphostructural lineaments in Russia, with indications of their ranks (Soloviev et al., 2013).

Fig. 6.
figure 6

Result of query for highly seismic intersections of morphostructural lineaments (circles of 25 km in diameter around them) in Armenia obtained by EPA method (Soloviev et al., 2016). Threshold magnitude of strong earthquakes M0 = 6.

Fig. 7.
figure 7

Result of query for highly seismic intercepts of morphostructural lineaments (circles of 25 km in diameter around them) obtained by Barrier algorithm (Gvishiani et al., 2017a). Threshold magnitude of strong earthquakes M0 = 6. Black rectangle around Crimean Peninsula is region set by user.

Fig. 8.
figure 8

Example of query for FCAZ zones in Krasnodar krai with M ≥ 4.5 (Gvishiani et al., 2017b).

Fig. 9.
figure 9

Example of displayed block structure (black lines) and epicenters of synthetic earthquakes with magnitudes 3–4 (green circles; see legend) (Soloviev and Gorshkov, 2017) within oval user-set region. Obviously, the number of weak earthquakes that occur more frequently is considerably larger than the number of strong ones.

Fig. 10.
figure 10

Result of maximum magnitudes calculated using the USLE coefficients (Parvez et al., 2014) for earthquakes expected (a) in Republic of Dagestan over 15 years with 90% probability and (b) in arbitrarily set zone (black contour) over 10 years with 50% probability.

Figure 10 shows the calculation on a uniform geographical grid of the maximum magnitudes of earthquakes expected to occur with a set probability for the set time interval. Calculation is carried out on the basis of the values of USLE coefficients A, B, and C, the magnitude–frequency dependence of earthquakes is constructed for a set region, and then, based on this dependence, the expected maximum magnitude of an earthquake that can occur with the set probability for the set time interval is calculated. Depending on the specifications of the query, the result can be either displayed as a single number for the entire region or visualized as a map with values indicated in uniform grid nodes (see Fig. 10 proper). A cell for which maximum magnitude was not calculated but where seismicity was recorded (most likely, a weak one), a maximum value of 4 was set. Problems with calculation may be for different reasons, e.g., weak seismicity or lack of initial data for estimating the USLE coefficients.

4 DISCUSSION

The unified GIS environment integrates the most complete information on both initial data and their processing results, which in turn makes it possible to assess multicriterion seismic hazard in seismically active Russian areas and adjacent regions. At present, the developed GIS and underlying DB are available for the Caucasus–Crimea region. This environment includes the results of both conventional and alternative approaches to seismic hazard assessment. The GIS-integrated tools for geodata analysis and query construction allow a user to perform seismic hazard assessment based on different criteria in desired regions. Thus, a user performs his own independent system analysis of the data from different pattern recognition methods and formulates the final result taking into consideration specific demands. The functionality of the developed GIS provides broad opportunities for analyzing and interpreting the obtained data. In particular, the calculation results can be geometrically fitted in order to provide a comprehensive seismic hazard assessment, which in turn allows comprehensive system research of seismically active regions to be conducted and multifactor risks to be assessed.

We emphasize that this system has been developed for the first time. It enables the most objective seismic hazard assessment, because it unifies different approaches to such assessment in the framework of a unified program environment. Other existing systems—e.g., the Global Seismic Hazard Assessment Program, GSHAP (Ulomov, 1999), maps of General Seismic Zoning of Russia (Ulomov, 2013; Ulomov and Bogdanov, 2013), maps representing detailed seismic zoning (DSZ), seismic microzoning (SMZ), and zones of possible earthquake sources (PES zones)—do not provide such a broad range of possible prediction methods and are mostly aimed at single-aspect consideration of the problem.

It is planned to expand the developed system to the other seismically active Russian regions and adjacent countries. In particular, its expansion to the Altai–Sayany–Baikal seismically active region, which was also studied within the mentioned project no. 15-17-30020, is ongoing. At the moment, desktop application of the system is available; further development is planned for a web-based version with a centralized database hosted on servers, so multiple user access will be available. The web-based version will use geoportal technology (see, e.g., (Krasnoperov et al., 2016; Soloviev et al., 2018)), which do not require any special client program to be installed by the user. Another planned enhancement is a voice-operated interface for the system and introduction of AI elements for automated responses to user queries.