1 Introduction

Natural hazards such as earthquakes are complex phenomena that can occur independently or in a sequence that can significantly affect the environment, society, and infrastructures (Khatakho et al. 2021). Earthquakes are naturally occurring phenomena of short duration; however, their impacts on society, buildings, and infrastructures may persist for years. It is estimated that only due to earthquakes that occurred in the twentieth century, more than 2 million people died globally (Doocy et al. 2013). Therefore, proper planning and disaster mitigation steps must protect the society and infrastructure from such catastrophic events (Cruz-Milán et al. 2016). However, in developing southeast Asian countries, uncontrolled migration, urbanization, ill-maintained and obsolete infrastructures, and land use mismanagement have hindered disaster mitigation efforts against natural hazards such as earthquakes (Xu et al. 2010). Therefore, a comprehensive study to understand the impacts of such disasters on cities, populations, and infrastructure facilities is required urgently. Recent developments in remote sensing, geospatial information system (GIS) techniques, and machine learning can help to identify potential risk zones in a much larger area.

The Himalayan region is one of the most seismo-tectonically active zones on the earth. The continuous collision of the Eurasian and the Indian plate, which continuously pushes the Himalayas higher and higher, increases the chance of large earthquakes with every passing day (Bilham et al. 2001). Recent devastating earthquakes in the Himalayan region, e.g. the Gorkha earthquake (M ~ 7.8) that occurred on 25th April 2015 in the central Himalayan region, caused about 9000 casualties and left nearly 22,000 people injured, besides destroying valuable infrastructure, causing considerable economic losses (Bilham 2015). It is estimated that a potential earthquake similar to that in the year 1505 may affect millions due to multiple times increases in population (Wyss et al. 2018), which is a matter of concern for the inhabitants of that region and the surrounding areas.

Rapid population growth and unplanned urbanization in the Himalayan region have enhanced the threat to life and property. It is estimated that between 1961 and 2011, the population in the Himalayan region increased by ~ 250%, with an annual growth rate of about 3.3%, which is three times higher than the global average. If the population growth continues at the same rate, the number of people residing may increase by 13-fold by 2061 (Apollo 2017), making this region and the population more vulnerable to disasters. The unplanned population growth also leads to unplanned urbanization and puts much stress on available resources. Besides population density, land use mismanagement, non-engineered infrastructures, and poorly constructed buildings are also a threat during an earthquake, specifically in metropolitan cities (Asadi et al. 2019). Identifying the vulnerable and risk zones, developing an adequate number of hospitals and well-designed infrastructure, educational and training institutes for the benefit of vulnerable communities, and early warning systems may help in minimizing the impacts of earthquakes. Hence, a comprehensive earthquake study using advanced computing techniques such as machine learning, remote sensing and GIS-based techniques is needed.

Over the years, various techniques, such as machine learning and multi-criteria decision models, have been used to estimate earthquake-associated hazards. The analytical network process (ANP), artificial neural network (ANN), and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) have been used to assess earthquake vulnerability and map the probability (Alizadeh et al. 2018; Jena et al. 2019; Jena and Pradhan 2020). Jena et al. (2020) applied the recurrent neural network (RNN) to estimate earthquake probability in Odisha, India, whereas integrated AHP with a probabilistic neural network (PNN) was used for vulnerability mapping (Jena et al. 2021a). Yariyan et al. (2020) evaluated the earthquake risk spatially for Sanandaj, Iran, using a fuzzy AHP-ANN model; Yariyan et al. (2021) attempted to map the seismic vulnerability for Sanandaj, Iran, using hybrid ANN models. Several machine learning techniques have also been successfully implemented for geotechnical applications, such as to map potential zones for groundwater, landslide and flood susceptibility mapping (de Oliveira et al. 2019; Liu et al. 2021; Chakrabortty et al. 2020; Kaur et al. 2022; Satarzadeh et al. 2022). For Sikkim Himalaya, earthquake hazard was estimated by fuzzy AHP (Pal et al. 2008), whereas the earthquake vulnerability for the Himalayan region has also been estimated using multi-criteria decision models such as AHP-VIKOR, AHP-Grey Relational Analysis (GRA) and fuzzy AHP-fuzzy TOPSIS methods (Malakar and Rai 2022a, b). Studies on earthquake probability and hazard estimation in the Himalayan region that are based on traditional seismological techniques provide an idea of risk and hazard in the region (Mahajan et al. 2010; Yadav et al. 2012; Roy et al. 2012; Chandra et al. 2018; Stevens et al. 2020). To study a larger region, remote sensing and GIS-based methods, along with machine learning techniques that can handle various spatial data simultaneously, can provide a better estimate of hazard, vulnerability and coping capacity.

In this paper, we present a comprehensive earthquake risk study for the Himalayan region which will help to develop mitigation strategies in this active seismo-tectonic zone. However, the assimilation of the dataset for a larger area is a relatively challenging task, which is the main reason for studies focusing on smaller areas instead of considering large-scale scenarios. As discussed above, a number of studies have integrated the MCDM model with ANN to map the earthquake probability and hazard using an MCDM model, such as AHP, which provides weights of various datasets based on experts' opinions and previously available literature. However, experts may ignore the information contained in the data while calculating weights, which may lead to uncertainty in the result (Bhattacharya et al. 2010; Emrouznejad and Marra 2017; Rodcha et al. 2019). This problem can be resolved by integrated MCDM models, as discussed by Malakar and Rai (2022).

Here, we have attempted to integrate subjective and objective MCDM models, i.e. AHP and entropy, with ANN to estimate the earthquake risk and integrate the results on a GIS platform for large-scale scenarios by using a number of publicly available parameters (Table 1). These parameters directly or indirectly influence the earthquake risk assessment. This study aims to minimize the impact of a potential earthquake in future by identifying the most hazardous zones in the study area. The detailed data sources and methodology are discussed in the following sections.

Table 1 Vulnerability, coping capacity and probability indicators for earthquake risk assessment

2 Study area

The study area covers a significant portion of the Himalayan Mountains (Fig. 1), one of the most seismo-tectonically active zones. The region extends from the northeastern to the north-western border of India and several southeast Asia countries, covering an area of more than a million km2 and inhabited by more than 90 million people. The mountain chain has resulted from a continuous collision between the Indian and Eurasian plates (Dewey and Bird 1970; Le Fort 1975; Molnar 1984; Dewey et al. 1989). Global positioning system (GPS) measurements indicate a convergence rate of 4–5 cm/year (Banerjee and Bürgmann 2002), which has resulted in the accumulation of significant strain energy that is released from time to time in the form of moderate to large earthquakes (Besse and Courtillot 1988). The prominent tectonic features in the region are the Main Central Thrust (MCT), the Main Boundary Thrust (MBT), the Indus Tsangpo Suture (ITS), and the Himalayan Frontal Arch (HFA) extending from the northwest to the southeast (Sinha and Upadhyay 1995). The basement crystalline complex is overthrust by the Precambrian to Tertiary rocks from north to south by a series of three- and four-thrust sheets (Sinha and Upadhyay 1995). The main axial lineament of the Himalayas is characterized by vertically dipping crystalline rocks, with the southern and northern limbs diverging and being intruded by younger granites (Sinha 1992). The MBT separates the para-autochthonous and allochthonous units south of Himalaya from the Siwalik Molasse along the Himalayan foothills, whereas the MCT separates the metamorphosed crystalline Vaikrita Complex and the carbonate para-autochthonous zone. The ophiolitic Melange results from the obduction of oceanic material and subduction of a continental plate (Sinha and Upadhyay 1995). As per the seismic hazard zonation map (BIS 2002), the region falls in zone IV and V. The peak ground acceleration (PGA) in the region is estimated to vary between 0.10 and 0.40 g, with a 10% probability of exceedance in 50 years (Bhatia et al. 1999).

Fig.1
figure 1

Location of the study area. Circles show earthquakes with magnitude M > 4.5

Most earthquakes are shallow-focus earthquakes with depths of less than 30 km. Though a few deep-focus earthquakes have also been reported, the classic and recent example is the October 2015 Hindu Kush earthquake, besides a few major historical earthquakes (Bilham 2019). It is almost impossible to theoretically and experimentally predict earthquakes, as the stress released along this subducted region is not homogeneous in space and time (Bilham 2015). Therefore, estimating earthquake-associated hazards and risks is essential for the mitigation of impacts due to any significant seismo-tectonic event.

3 Materials and methods

3.1 Data acquisition

For the earthquake risk assessment, non-spatial and spatial data collected from various sources were used in this study. These data include an earthquake catalogue, geological data such as faults and lithology, and data about social and structural information, hospitals, and communication networks (Table 1). The earthquake catalogue was obtained from the National Centre of Seismology, India (https://seismo.gov.in/) for the year between 1900 and 2020, whereas geological data such as the distribution of active faults were obtained from the global earthquake model (https://www.globalquakemodel.org/), whereas the lithological data were obtained from the Universitat Hamburg (http://lithomap.cen.uni-hamburg.de/). The time-averaged shear wave velocity up to 30 m depth (Vs30) was obtained from the USGS (https://www.usgs.gov/), whereas the soft soil thickness was obtained from the Food and Agricultural Organization, United Nations (https://www.fao.org/home/en). The Peak Ground Acceleration (PGA) was estimated using the relationship given by Panjamani et al. (2016). The SRTM elevation data (SRTM 2013) were used to derive the slope in the area. The administrative boundary, buildings, hospitals, communication networks, and land use datasets were obtained from various freely available sources such as DIVA-GIS (https://www.diva-gis.org/) and OpenStreetMap (https://www.openstreetmap.org/). The used datasets were last accessed on 25th September 2021. Secondary layers such as Euclidian distance, Inverse Distance Weighting (IDW), and Kernel density were derived using multiple algorithms. The datasets were standardized and reclassified into five classes: very low, low, moderate, high, and very high, using the quantile classification technique. In the subsequent section, we discussed the detailed methodology of the proposed model used in this study (Fig. 2).

Fig.2
figure 2

Flowchart illustrating the method adopted to compute the risk in the study area

3.2 AHP-entropy integration

Saaty (1980) developed one of the multi-criteria decision-making (MCDM) models, known as the analytical hierarchy process (AHP). This model uses a hierarchical structure to evaluate the priority of the criteria included in the complex decision-making problem through pairwise comparison matrices. These matrices are primarily based on the opinion of the experts or the published literature. The steps included in this method include the construction of the pairwise comparison matrix using the criterion scores. The criteria score is assigned from 1 to 9 (Saaty 1980) based on the experts’ knowledge derived from the published literature. The matrix is then normalized, and the respective weights of each criterion are evaluated. To check the consistency of the developed result, we used the terms consistency index (CI) and consistency ratio (CR). The CI and CR are given by

$${\text{CI}} = \frac{{\lambda_{\max } - n}}{n - 1}\;{\text{and}}\;{\text{CR}} = \frac{{{\text{CI}}}}{{{\text{RI}}}}$$

where \({\uplambda }_{{{\text{max}}}}\) is the principal eigenvalue, n is the number of used criteria, and RI is a randomness indicator. The value of RI is predicted by Saaty (1980) for the matrices dimension between 1 and 15. If the CR < 0.1, then the consistency level is acceptable for evaluating the priority \({ }({\text{w}}_{{\text{i }}}^{^{\prime}} )\).

On the other hand, entropy measures the degree of disorder in a system or the uncertainty using the probability theory (Shannon 1948). The concept of entropy is that a higher-weight index value is more valuable than a smaller index value.

Firstly, the matrix is normalized as \({\text{ p}}_{{{\text{ij}}}} = \frac{{{\text{x}}_{{{\text{ij}}}} }}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{m}}} {\text{x}}_{{{\text{ij}}}} }}\), where \({\text{p}}_{{{\text{ij}}}}\) (i = 1,…,m; j = 1,..,n) is the standardized value of the non-negative index, \({\text{x}}_{{{\text{ij}}}}\) is the performance measure of the jth attribute in the ith alternative. Secondly, the entropy value is computed as

$${\text{E}}_{{\text{j}}} = { } - \frac{1}{{{\text{ln}}\left( {\text{m}} \right)}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{m}}} {\text{p}}_{{{\text{ij}}}} {\text{ln}}({\text{p}}_{{{\text{ij}}}} )$$

The entropy weight \({\text{w}}_{{\text{i}}}^{{^{\prime\prime}}}\) is estimated as

$${\text{w}}_{{\text{i}}}^{{^{\prime\prime}}} = { }\frac{{1 - {\text{E}}_{{\text{j}}} }}{{\mathop \sum \nolimits_{{{\text{j}} = 1}}^{{\text{n}}} \left( {1 - {\text{E}}_{{\text{j}}} } \right)}}$$

The \(\left( {1 - {\text{E}}_{{\text{j}}} } \right){ }\) value is called the degree of diversification factor (\({\text{D}}_{{\text{j}}}\) \()\) that describes the degree of divergence of each criterion’s inherent information. The weight estimated from the entropy specifies the criterion’s importance in making the decision.

Ultimately, the overall weight of the criteria is estimated. The entropy weight is entirely based on the data; in contrast, the AHP weights are obtained from expert opinion and available literature. Sometimes, the entropy weights are different from reality (Wang et al. 2009; Weijs et al. 2010; Cui et al. 2018), whereas the experts usually ignore the data information while calculating the weights using AHP, which may lead to uncertainty in the result (Bhattacharya et al. 2010; Emrouznejad and Marra 2017; Rodcha et al. 2019). Therefore, weights determined using AHP (\({\text{w}}_{{\text{i }}}^{^{\prime}} ,{ }\) subjective) and the entropy method (\({\text{w}}_{{\text{i}}}^{{^{\prime\prime}}}\), objective) can be combined (Chuansheng et al. 2012) to define a new weight given by

$$\omega_{i} = \alpha w_{i }^{^{\prime}} + \left( {1 - \alpha } \right)w_{i}^{^{\prime\prime}}$$

The value of \({\upalpha }\) can vary between 0 and 1. In this study, we set the value of \(\mathrm{\alpha }\) as 0.6 (Chuansheng et al. 2012). The vulnerability and coping capacity have been estimated using the AHP-Entropy integration method through GIS (Tables 2 and 3). The earthquake probability mapping has also been done by integrating AHP-Entropy with ANN.

Table 2 Priority of the parameters used to map vulnerability
Table 3 Priority of all the parameters used to map coping capacity

3.3 Artificial neural network

ANN contains layers of nodes or neurons that transform input data into output (Nedic et al. 2014; Alizadeh et al. 2018) and has numerous advantages compared to statistical methods (Zhang et al. 1998), as ANN can handle uncertainty, noise, and incomplete datasets (Midilli et al., 2007). Therefore, high accuracy can be achieved while mapping the earthquake probability through ANN (Yariyan et al. 2020). For this, an appropriate set of training parameters and an adequate network architecture need to be defined (Sözen 2009) by trial and error (Karapidakis 2007), which still outperforms other methodologies in terms of accuracy (Lynch et al. 2001).

The multilayer perceptron (MLP) network is a simple, versatile, and flexible form of neural network (Alizadeh et al. 2018; Jena et al. 2019; Yariyan et al. 2020). The network consists of an input layer, one or more hidden layers based on the complexity of the problem, and the output layer (Roy et al. 1993). Each layer of ANN is made up of a number of neurons that process information independently and are linked to neurons in other layers via weight. The hidden layer neurons initiate networking with the input neurons’ weight and process the data by linking to each other (Abraham 2005). We have also trained the MLP using the backpropagation algorithm (Fig. 3), which reduces the error in MLP (Salarian et al. 2014).

Fig. 3
figure 3

Architecture of the multi-layered Perceptron (MLP) used for estimating earthquake probability

The neural network learning algorithm utilizes the training set to create suitable network weights, which link the input and output layers. Then, the trained network’s performance is verified on the test dataset. Hence, we have to efficiently prepare and select the training site parameter, which impacts the accuracy of the obtained result (Nedic et al. 2014). We have acquired the complete earthquake catalogue from the National Centre of Seismology, Government of India, from 1900 to 2020. Even a large amount of data, however, may be insufficient for modelling the neural network (Yariyan et al. 2020). Therefore, our focus should be to accumulate large datasets to develop a model with the help of the ANN technique, as it is a data-intensive technique.

3.4 Integration of AHP-Entropy with MLP: Earthquake Probability

A total of ten spatial datasets were used to evaluate the earthquake probability (Nedic et al. 2014) for the study region (Fig. 3). Implementing the MLP network model required training and test datasets to examine the model’s performance that helps to select a specific training network (Aghazadeh et al. 2018). Here, we used the AHP-Entropy integrated MCDM to generate an appropriate training database for the MLP network, using 70% of parameters having the highest weights. Subsequently, 6000 points were selected from the training dataset that was characterized as non-earthquake and earthquake points. These points were utilized in the training stage of the feedforward MLP and helped in measuring the trained network’s accuracy. Finally, the output is standardized and classified into five classes.

A feedforward MLP with a two-layered structure was trained, and the backpropagation algorithm was applied to minimize the error and estimate the root mean square error (RMSE). The feedforward ANN helps to describe the interconnection between the neurons in different layers of the model (Jena et al. 2019). We developed a network using the MLP classifiers, trained it, and the accuracy of the trained model was measured. After that, we predicted the pixel values and achieved the earthquake probability map for the Himalayan region. The network topology, data, and training parameters are presented in Table 4. The resulting map was transferred to the GIS platform to map the earthquake hazard discussed in the following sub-section.

Table 4 Network, data and training parameters used for implementing the MLP model

3.5 Hazard and Risk

Hazard, in general, is the probability of the event occurring during any given time point in the temporal and spatial scale (Jena et al. 2019). In this study, we developed the hazard map using spatial information on earthquake probability and intensity variation for the Himalayan region. The intensity map was generated by calculating the intensity value from the earthquake magnitude, which was then interpolated to understand the intensity variation in the study area (Bartier and Keller 1996). Based on the intersection theory, the hazard zones were then classified using the quantile classification (Jena et al. 2019). The hazard map generated in this study is created by integrating two MCDM models with ANN through GIS and applied for the first time in the Himalayan region.

Finally, the earthquake risk was estimated using the spatial information of the earthquake hazard, vulnerability, and coping capacity for the region (Westen 2013; Jena et al. 2021b). Mathematically, the earthquake risk is defined as (WHO 2009)

$${\text{Risk}} = { }\frac{{\text{Hazard*Vulnerability}}}{{\text{Coping Capacity}}}$$

4 Results and discussion

In this study, we have proposed a novel methodology by integrating the AHP and entropy MCDM models with ANN to evaluate earthquake risk. The reason behind integrating AHP and entropy has been discussed in previous sections. We used twenty-nine parameters to estimate earthquake risk that directly or indirectly influences the region. For our analysis, we have used the mathematical formulation proposed by WHO (2009) to assess the risk, which is a function of hazard, vulnerability, and coping capacity.

4.1 Vulnerability

The parameters used in the estimation of vulnerability are presented in Table 1. The weights of the parameters were estimated using the AHP-Entropy integration method discussed earlier. The comparison matrix and the estimated weights are presented in Table 2. The priority of the parameter population density was found to be higher, followed by the building density. The least weight was found for the secondary parameter gas station. Our analysis indicates that the southern part of the study area is relatively highly vulnerable than the northern and eastern parts (Fig. 4). It is also observed that several cities in the study area fall under very high vulnerability zones.

Fig. 4
figure 4

Vulnerability and coping capacity map of the study region. High and low vulnerability and coping capacity zones are indicated in red and green

The high vulnerability in metropolitan cities may be due to high population density, mismanaged land use, lack of resources, and non-homogenized distribution of educational institutions and transportation terminals (Fig. 4). The population growth rate between the year 1961 and 2011 has been 5.53% in the Sikkim Himalayan region, 5.43% in Arunachal and 5.04% in the Kashmir, which are much higher than the global average (Apollo 2017). The lowest population growth rate was reported in the Kumaun Himalaya (2.65%), which is still more than twice the world average. The average population density in the Himalayas (excluding Nepal and Arunachal Himalayas) was over 22 people per square kilometre in 1911, which increased to over 96 people per square kilometre by 2011. Bhutan and Arunachal Himalaya are the least dense areas, with 16.31 and 19.23 people per square kilometre, respectively. The Darjeeling Himalaya has the highest population density of 923.57 people per square kilometre (Apollo 2017). The increasing population, rapid unplanned urbanization, and mismanagement of land use lead to a scarcity of resources. Furthermore, the metropolitan cities have relatively high building densities compared with the Himalayan rural parts. This could be the primary reason for the high vulnerability observed in Fig. 4. Areas with a low vulnerability may have good socio-economic conditions, low population and building density. The Tibetan region in the study area also falls under the low vulnerability zone, presumably due to low population density resulting from harsh climatic conditions and rugged terrain, making it extremely challenging to earn a livelihood.

The numerical value indicates that 19.86% of the study area lies under very low vulnerability, 20.61% low, 19.59% moderate, and 20.41% high, and the residual includes very high vulnerability. In contrast, about 4% of the population is under very low earthquake vulnerability, 5.93% low, 11.05% moderate, and approximately 79.07% of the population resides under high to very high threat of earthquakes (Table 5). Therefore, about 4/5th of the total Himalayan inhabitants are vulnerable to earthquake hazards, and agencies should have more concerned for highly vulnerable regions and communities.

Table 5 Hazard, vulnerability and risk in terms of areas, population and buildings in the Himalayan region

4.2 Earthquake hazard and risk

As discussed earlier, the earthquake hazard was estimated by integrating the spatial information of earthquake probability and variation in the earthquake intensity. The earthquake probability is estimated by combining AHP-Entropy with ANN. Historical seismicity, geological features, Vs30, PGA, elevation, and slope were included in estimating the earthquake probability. Figure 5 shows areas with high hazards (shown in red), whereas relatively lower hazard areas are shown in green. The hazard map indicates that several major cities in the west fall under high-hazard regions, similar to the central Himalayan region (Fig. 5). The Indian Himalayan region, including cities such as Dehradun, Ambala, and Itanagar, falls in a relatively low-hazardous zone, whereas Dibrugarh town in the east is under a high hazardous zone. The degree of hazard varies from very low to high in Bhutan Himalayas.

Fig. 5
figure 5

Earthquake probability and hazard map of the study region. High and low probability zones and hazard regions are indicated in red and green, respectively

The high fault and epicentre density and other prominent complex tectonic features contribute to the probability of the eastern Himalayas causing several strike-slip faults (Jena and Pradhan 2020). For instance, Darjeeling is characterized by metamorphic rocks with narrow fault distances, high Vs30, elevation and slope and frequent near and far source events. Furthermore, in the western Himalayas, the regions are primarily hazardous to the earthquake. Srinagar falls in the Kashmir basin have several tectonic features, high epicentre and fault density with moderate elevation and slope. The tectonic features include Main Boundary Thrust (MBT), Main Crystalline Thrust (MCT), Main Mantle Thrust (MMT), Panjal Thrust (PT), Kishtwar Fault (KF), Jhelum Fault (JF), Reasi Thrust (RT), Bagh-Balakot Fault (B-BF), Balapur Fault (BF), Hazara Thrust System (HTS), Hazara-Kashmir Syntaxis (HKS), Drangbal-Laridora Fault (DL) (Sana 2019). Numerous events have shaken Srinagar; for instance, 1885 (M = 6.2) and 2005 (M = 7.6) earthquakes with estimated intensity VI-VII (Bilham et al. 2010) make the region highly hazardous. Islamabad and Rawalpindi area in the western part is also tectonically active with low elevation and slope and is dominated by unconsolidated sediments. The Riwat Thrust runs near the southwest edge of the Islamabad-Rawalpindi area (Jadoon and Frisch 1997) and is another prominent fault associated with some past events. The most recent 2005 Kashmir earthquake caused considerable damage to Islamabad. The local earthquake in the northern Potwar near Islamabad (M = 5.8) that occurred in February 1977 produced an intensity of VII (Adhami et al. 1980). In the central Himalayas, the cities are highly populated and appear to be high hazardous regions with high fault and epicentre density. The 2015 Gorkha earthquake is a recent devasting shallow-depth earthquake with a magnitude of 7.8 and an epicentre 77 km northwest of Kathmandu. This event ruptured a section of the Main Himalayan Thrust, a low-angle continental subduction interface between the Indian and Eurasian plates to the south and north, respectively (Rupakhety 2018), making the region highly hazardous.

Out of the total area of the study region, ~ 22.43% of the area lies under a very low hazard zone, 24.86% as low, 20.76% as moderate, 20.16% as high, and 11.78% as a very high hazardous area (Table 5). On the other hand, 19.77% of the total population resides under the very low hazard zone, 19.63% low, 34.76% moderate, 18.80% high, and the rest live under a very high hazardous region (Table 5). These indicate that more than 60% of the population resides under moderate to very high earthquake hazard zone, which is a matter of concern for various agencies involved in hazard mitigation.

The risk map (Fig. 6) is estimated using the spatial information of earthquake hazard derived from the earthquake probability and intensity variation, earthquake vulnerability, and coping capacity. The risk map is further classified into five classes, the high-risk areas are red, while the low-risk areas are green. Our analysis shows that several cities in the study area fall into very high-risk zones. In the Indian Himalayan region, Darjeeling and Srinagar appear to be at high earthquake risk, and Dibrugarh town is at moderate risk. The central Himalayan region is relatively at very high earthquake risk compared to other parts of the study area.

Fig. 6
figure 6

Integrated earthquake risk map for the Himalayan region. High-risk and low-risk zones are shown in red and green, respectively

The results indicate that 8.44% of the Himalayan region falls under very low earthquake risk, 20.89% low, 39.21% moderate, 22.90% high, and 8.55% very high earthquake risk zones. The earthquake risk as a function of the population shows similar patterns, i.e. 10.88% of the total population residing in the Himalayan region is under very low earthquake risk, 14.79% low, 47.79% moderate, 20.24% high, and 6.61% of the population lives in very high earthquake risk zones (Table 5).

4.3 Coping capacity

The coping capacity is an important factor during an earthquake, as it is generally expected that an educated community can cope with hazards effectively. According to the United Nations Office for Disaster Risk Reduction, coping capacity is a combination of available skills and resources within an organization or community that can reduce or manage the effects or risk level of a disaster. The coping capacity requires continuous training, awareness, and proper management of the available resources during the pre-disaster, the disaster, and post-disaster. In this study, we have considered the availability of hospitals, communication networks, educated people, police stations, and service centres to estimate the coping capacity of the study area. The weights of the factors used to study the coping capacity for the study area are calculated using the AHP-Entropy approach. The comparison matrix and the estimated weights are presented in Table 3. Our analysis indicates that the northern part of the study area has a relatively low coping capacity as compared to the southern part (Fig. 4). The central and western Himalayan region comprises some prominent cities that lie under high coping capacity. However, this is not the case throughout the western part, and a significant portion falls under moderate to low coping capacity areas.

The development of hospitals, communication networks, and an increase in literacy rate are the primary reasons for the cities having high coping capacity (Fig. 4). For instance, the average literacy rate of Dibrugarh is 89.5% which is higher than the national average (Census 2011), indicating that the education system in the northeastern part of India is improving. The development of communication networks, hospitals and sanctions of the adequate budget for the education sector is always the priority of different government agencies. In the central Himalayan region, significant development has been recorded regarding education, communication networks and the availability of hospitals and disaster management centres after the devastating effects of the 2015 Gorkha earthquake. The average literacy rate in the central Himalayas in 2001 was ~ 72.28%, and this number keeps increasing, which is a good sign for disaster risk reduction (Apollo 2017). In the western section of the Himalayas, the literacy rate is the lowest, with a value of 62.62% (Apollo 2017), making the area very low to moderate coping capacity. The low coping capacity zone is presumably due to the rugged terrain obstructing the construction of communication networks and the low literacy rate. The area having low coping capacity should focus on earthquake mitigation strategies.

It is also interesting to note that less than 0.5% of the total population in the study areas lives under very low coping capacity, 4.83% low, 14.01% moderate, 41.82% high, and 38.88% reside with a very high chance of coping with the earthquake hazard. The result indicates that all the government agencies focus on developing hospitals with good communication networks and educating people to deal with hazardous situations, which is a good sign of development and a fight against disaster. But still, about 27% of the total population is at high risk of seismic hazards. Nevertheless, with proper planning and change in mitigation strategies, these areas could be changed into the low-risk zone.

4.4 Sensitivity analysis and validation of the model

To understand the impact of various parameters used in this study, we performed a sensitivity analysis of the vulnerability and the coping Capacity (Table 6). In sensitivity analysis, we vary the parameter alpha (α) to understand the impact of weights assigned to the parameters used. For vulnerability, the ranking of the parameters does not vary for α ≥ 0.6, which is consistent with the findings of Chuansheng et al. (2012). Similarly, for the coping capacity, the ranking remains the same for values of α between 0.4 and 0.6.

Table 6 Sensitivity analysis for influence levels of the α in the AHP-Entropy method for vulnerability (top) and coping capacity (bottom)

To validate our proposed model, we have estimated the ROC curve, which evaluates the sensitivity of the models. Figure 7 represents the ROC curve for the earthquake probability, which shows the relationship between the true positive and false positive values. The area under the curve (AUC) is a measure of the accuracy of the probability assessment. The AUC value of 0.5 indicates no discrimination, whereas an AUC of 0.7–0.8 is classified as acceptable, 0.8 to 0.9 as excellent, and an AUC > 0.9 as outstanding (Hosmer and Lemeshow 2000; Mandrekar 2010). Our result derived by the AHP-Entropy-MLP integration method shows an AUC value of 0.83, which indicates that our model accuracy is excellent (Fig. 7). We have plotted the ROC curve for the earthquake probability using an integration of the AHP-MLP and Entropy-MLP to find a better approach to evaluating earthquake risk in any region. The results indicate that the AUC value of AHP-MLP is 0.78, and for the Entropy-MLP, the value is 0.81, which makes the AHP-Entropy-MLP a better approach for mapping the earthquake risk (Fig. 7).

Fig. 7
figure 7

Receiver operating characteristic (ROC) curve for the earthquake probability map

4.5 Strengths and limitations of the model

The strengths, limitations, and challenges of the proposed model are primarily associated with the selection of parameters, data quality, and implementation of the model. This study uses a robust technique of integrating subjective and objective MCDM models, i.e. AHP and entropy, respectively, with ANN through GIS that could provide accurate earthquake risk results. The proposed model may provide the knowledge needed to select the essential criteria under each component for hazard, vulnerability, and coping capacity, which finally leads to risk assessment. The AHP-Entropy integration is applied in evaluating vulnerability and coping capacity, which is effective for prioritizing criteria and helps to deal with the uncertainties AHP and entropy developed. This study demonstrates comprehensive risk assessment and can be applied at a large scale to evaluate accurate risk information. However, this is the first of its kind model and requires improvement by incorporating mitigation measures.

The limitations and challenges of this study are predominantly associated with acquiring the datasets for large-scale scenarios and processing them through machine learning approaches. Acquiring datasets for larger study areas distributed in several countries is quite challenging, as data may not be available for some countries. Therefore, data from secondary sources are used, which may have uncertainty and errors. Similarly, the unavailability of parameters such as liquefaction factors, earthquake precursors, soil characteristics, seismic structure, fault characteristics, and building categories also impacts the study. Availability of good quality data and estimating the priority are also challenging due to subjectivity. Furthermore, limitations are also due to not considering the diurnal, i.e. day and night variations in various parameters for estimating vulnerability. Secondly, the ANN is data-dependent, and a large amount of training data is required for an earthquake probability distribution study; choosing proper parameters is critical to avoid biased results. Considerable time is required to design and implement the ANN model. Despite these limitations and challenges, the applied model is still effective for assessing earthquake risk and could help mitigate and reduce disaster risk. This model can be implemented in other larger areas with minimal modification. To minimize casualties, various government agencies could use the results to prevent and implement mitigation plans in the study area during a potential earthquake.

In future, high-resolution DEM generated from Light Detection and Ranging (LiDAR) and a 3-D city model could be used to achieve better results. Future studies may also consider the role of biodiversity on earthquake risk assessment in a particular area besides using other machine learning approaches. Finally, integrating the AHP-Entropy-ANN through GIS is a potential application framework which can be explored to understand the impacts of other disasters that impact society and infrastructure.

5 Conclusions

We attempted to integrate objective and subjective MCDM models, i.e. AHP and Entropy with ANN. Twenty-nine non-spatial and spatial data accumulated from the publically available sources were utilized to estimate earthquake risk for the study area.

Our analysis indicates that the southern part of the study area is relatively highly vulnerable than the northern and eastern parts. The result also reveals that approximately 79% of the population resides in high to a very high vulnerable zone. The hazard result indicates that several major cities in the western and central areas of the study area fall under high-hazard regions. More than 60% of the population resides in moderate to very high earthquake hazard zones. The risk is estimated using the spatial information of earthquake hazard derived from the earthquake probability and intensity variation, earthquake vulnerability, and coping capacity. Our analysis indicates that about 8.55% of the total area of the Himalayan region falls under very high earthquake risk, 22.90% high, 39.21% moderate, 20.89% low, and the rest comes under very low earthquake risk. 6.61% of the inhabitants live under very high earthquake risk, 20.24% high, 47.49% moderate, 14.79% low, and 10.88% of the population live under very low earthquake risk. In addition, about 7.44% of the total number of buildings are at a very high risk of earthquake, 23.18% high, 41.94% moderate, 20.87% low, and 6.56% very low-risk. Nevertheless, during an event, coping capacity may be an important factor. The central and western Himalayan region cities appear to have high coping capacity. About 80% of the population has a high chance of coping with earthquake hazards. However, about 27% of the total population is at high risk of seismic hazards. The results may be helpful for different planning and mitigation agencies to identify the earthquake risk zones and plan accordingly for measures to be taken in case of a potential earthquake that might affect the region.