Introduction

Landslides as one of the most frequent occurred natural geo-disasters are subsequence of several determining and triggering factors (Popescu 1994). This nonlinear destructive mass movement and slope failure phenomenon (Wang et al. 2020; Sidle and Ochiai 2006) cause great losses on human lives, natural resources, and social-economic (Abbaszadeh Shahri et al. 2019; Chen et al. 2019). The data from the Centre for Research on the Epidemiology of Disasters show that landslides cover at least 17% of all fatalities of worldwide natural hazards (Lacasse and Nadim 2009). Moreover, it has been highlighted that the financial aspects of this complicated geo-hazard phenomenon have been much more than the fatalities (Guzzetti et al. 1999; Listo and Carvalho Vieira 2012). Iran due to high seismicity as well as specific geologic, morphologic, climatic, and tectonic settings is also distinguished as landslide-prone area in the world. This claim can be approved by 162 deaths, 176 destroyed houses, and 170 damaged roads as a consequence of approximately 2600 monitored landslides by the year 2000 (Akbarimehr et al. 2013). The reported annually average losses (e.g., Table 1) imply that producing more accurate landslide susceptibility map (LSM) as an essential objective for engineers, planners, and governmental decision makers needs to be undertaken for risk assessment and hazard mitigation as well as selecting appropriate and safer areas to execute development schemes (Abbaszadeh Shahri et al. 2019; Cotecchia et al. 2016). However, such assessments in slow-moving landslides because of their complex mechanisms are a serious challenge (Greif and Vlcko 2012).

Table 1 Some of the reported annually losses of occurred landslides

The LSM due to various interlinked causative and triggering factors can be evaluated using different techniques (Fig. 1). However, no general agreement for a unified method is available (Abbaszadeh Shahri et al. 2015). Furthermore, it is not well clarified which combination of causative factors would produce the best susceptibility map while applying all available factors also may reduce the accuracy of results. The resolution and precision of LSM is influenced by adapted modeling approach and corresponding assumptions as well as quality of data and landslide inventory map (Hussin et al. 2016). It has been confirmed that quantitative methods (Fig. 1) due to more accurate results are typically preferable (Abbaszadeh Shahri et al. 2019). However, choosing an optimal method from the wide range of available techniques is difficult and in practical-oriented problems the results need to be carefully evaluated.

Fig. 1
figure 1

A brief overview on applied methods in producing LSM

In the literatures, producing of LSM with different accuracies using logistic regression (Conoscenti et al. 2015; Raja et al. 2017; Polykretis and Chalkias 2018; Luo et al. 2019), probabilistic methods (Wang and Rathje 2015; Hong et al. 2018a), weight of evidences (Liu and Duan 2018; Ding et al. 2017; Polykretis and Chalkias 2018), bivariate statistical models (Hussin et al. 2016; Mandal and Mandal 2018), evidential belief function (Ding et al. 2017), information theory (Tsangaratos et al. 2017), random forests (Dou et al. 2019; Zhang et al. 2017; Sun et al. 2020), multivariate models (Felicísimo et al. 2013; Conoscenti et al. 2015), certainty factor (Hong et al. 2017; Chen et al. 2016), analytical hierarchy process (AHP) (Chen et al. 2016; El Jazouli et al. 2019; Pawluszek and Borkowski 2017), frequency ratio (Ding et al. 2017; Nicu 2018), statistical index (Liu and Duan 2018; Razavizadeh et al. 2017), index of entropy (Hong et al. 2017; Bui et al. 2018), deterministic analysis (Jelínek and Wagner 2007), fuzzy logic (Roy and Saha 2019), neuro-fuzzy (Chen et al. 2017b; Chen et al. 2019), artificial neural networks (ANNs) (Abbaszadeh Shahri et al. 2019; Aditian et al. 2018; Polykretis and Chalkias 2018), support vector machine (SVM) (Bui et al. 2016; Lin et al. 2017; Conoscenti et al. 2015), and decision tree (Chen et al. 2017a; Hong et al. 2018b; Zhang et al. 2017; Due et al. 2019) has widely been highlighted. However, large size of models is time consuming to learn while small networks may be trapped into local minima. Hybridizing is one of the methods that can significantly assist to increase the power, capacity, and predictability of the intelligence model (Abbaszadeh Shahri et al. 2020a; Asheghi et al. 2019), but using such systems for the purpose of LSM has rarely been designed or reported (Bui et al. 2017).

In last few decades, large numbers of triggered catastrophic landslides by rainfall or seismic shakes with considerable human and socio-economy losses have been monitored in northern Iran. Such evidences reveal the necessity of producing capable LSM to delineate and distinguish landslide-prone areas. Among 79 recorded large landslides prior to 1900, 16 events were in the north of Iran (Berberian 1994). Furthermore, a number of 120 to 140 triggered landslides related to Manjil-Roudbar earthquake (1990) (Ishihara et al., 1992; Shoaei and Sassa 1993) show the importance of this phenomenon in Guilan province.

In this paper the challenge of finding feasible solution for large-scale LSM was overcome using a novel hybrid block neural network structure (HBNN) incorporated with genetic algorithm (GA). This strategy based on mixture of experts and task splitting can effectively speed up the training process and skip the corresponding drawbacks (e.g., over fitting, slow training, converge to local minima), and thus is capable to support bigger networks. Accordingly, a practical-oriented LSM in 10 m × 10 m resolution pixel size for southern part of Guilan province (north of Iran) was created. The bias effects on results were avoided by a two-stage training of mixture of experts. The superior accuracy performance of introduced HBNN then was approved in comparison with two new ANN-based models using multilayer perceptrons (MLPs) and generalized feed forward neural network (GFNN) subjected to different analytical indices and error monitoring criteria. The importance of applied causative factors on generated LSM also was figured out using sensitivity and weight analyses. Therefore, it was concluded that the predictability of developed models in distinguishing prone areas can effectively be used by decision makers to minimize or control the damage to people, property, and infrastructure.

Study area

The study area with 2640 km2 geographically is suited in southern part of Guilan province in north of Iran (Fig. 2a) between 342,000 to 390,000 Easting and 4,055,000 to 4,110,000 Northing coordinates (Fig. 2b). The Guilan province is characterized with a humid subtropical climate and highest amount of rainfall (1400–1900 mm). Mountainous terrains are the predominant morphology in this province which mostly is covered by dense forests, pastures, and croplands. In recent years, changes of land cover/use and in particular deforesting of this area due to development of urbanization and industrial plans as an environmental concern have been highlighted (Poorzady and Bakhtiari 2009). The available alluvial plains in Guilan due to considerable annual precipitations and wide-scale sizes of streams (domestic to regional) are greatly appropriate for agriculture and vegetations. However, the available rivers may cause irreparable damages. This territory tectonically lies in a high active seismic belt between Alborz Mountains and the Caspian Sea that not only has been suffering from earthquake and flood disasters but also impacted by many large, adversely and costly landslides (Fig. 2a). The information on previously occurred landslides is accessible through the National Geoscience Database (NGDIR) and Geological Survey of Iran (GSI) as well as various archived documents. The generated digital elevation model (DEM) shows that the altitude of study area varies between 15 and 2810 m. Morphology of the area demonstrates lagoon formation structures due to rapid sea-level rise as well as plains, hills, and approximately deep valleys.

Fig. 2
figure 2

(a) Recorded landslides in Iran (reproduced from NGDIR) and (b) location of study area in Guilan province

Construction of thematic database

Applied causative factors

In this study 14 potential causative factors consisting of different variable types (class, ordinal, continuous, and categorical) were considered. The employed factors in the form of thematic layers then were classified into topographic and geomorphologic features (slope angle, aspect, curvature, elevation, topography, road), geological factors (lithology, soil type, erosion), hydrology and hydrogeology parameters (water body, river, watercourse), and land data (land use, land cover, normalized difference vegetation index (NDVI)), tectonical characteristics (fault), and climate conditions (precipitation).

Slope angle is one of the main used parameters in producing susceptibility map, but in nearly vertical conditions, landslides are scarce or absent (Gomez and Kavzoglu 2005). Aspect as the compass direction of slope faces is involved to investigate the relationship between slope orientation and landslide occurrence (Abbaszadeh Shahri et al. 2019). Topographic curvature represents the rate of slope or aspect change in a particular direction. It can reflect the erosion rate to have a basic idea of geomorphology (Yesilnacar and Topal 2005) and also controls the effect of both surface run-off and gravitational stresses on shallow slides (Costanzo et al. 2012). The altitude due to connecting with several geologic and geomorphological processes as well as influence on biophysical parameters and anthropogenic activities can cause slope failure (Gritzner et al. 2001; Gomez and Kavzoglu 2005). Transportation networks (roads and railways) are associated with different factors (e.g., dynamic loads, excavation, hydrology, and stress changes) that can contribute to landslide occurrence. Lithology, soil type, and erosion as other important factors should effectively be grouped (Dai et al. 2001; Abbaszadeh Shahri et al. 2015; Ghaderi et al. 2019; Zhang et al. 2020). Soil type and corresponding thickness are widely used as conditioning factors in slope stability and landslide analyses (e.g., Wang et al. 2020; Abbaszadeh Shahri 2016; Abbaszadeh Shahri et al. 2019). Landslide susceptibility also is affected by land use, land cover, and NDVI (Gomez and Kavzoglu 2005). It was approved that the faults also have significant effect on landslide occurrence (Sen et al. 2015).

All employed causative factors have been gathered from validated and relevant governmental resources. The DEM with 10 m × 10 m resolution was created from digital 1:25,000 topographic map with 10-m contour intervals prepared by the National Cartographic Center of Iran and the Forest, Range and Watershed Management Organization. Accordingly, using DEM the topographic and geomorphologic thematic layers including slope, aspect, curvature, and elevation were produced. The created DEM then was compared with proposed enhance processing procedure by Abbaszadeh Shahri et al. (2019) and ground-truthing to assess more accurate NDVI classification. Thematic layers representing the lithology and faults for the study area were generated on the basis of provided information by GSI and international institute of earthquake engineering and seismology (IIEES). The rainfall and erosion layers were processed using acquired data from meteorological organization of Iran (irimo.ir) and Ministry of Jihad-e Agriculture respectively. Hydrological and hydrogeological factors including rivers, water courses, and water bodies were digitalized from the corresponding maps and included in the database. Land cover and NDVI were extracted from Landsat8 Operational Land Imager (OLI) using an object-based classification method as well as defined procedure by Saadat et al. (2011), and then verified by both archived documents of field surveys and governmental maps. A complete transportation network was derived and digitized from Ministry of Roads and Urban Development. In Figs. 3 and 4, the processed thematic layers of study area with 10 m × 10 m resolution grid cells and corresponding percentage of pixels in each defined class are reflected.

Fig. 3
figure 3figure 3

The processed thematic layers of applied conditioning factors in producing landslide susceptibility map (A) aspect, (B) curvature, (C) erosion, (D) elevation, (E) distance to faults, (F) distance to rivers, (G) slope, (H) distance to roads, (I) land use, (J) lithology, (K) NDVI, (L) precipitation, (M) soil type, and (N) land type

Fig. 4
figure 4

Distribution of classified datasets in thematic layers based on the total number of pixels in study area

Index-overlay analysis is one of the most used approaches to solve multicriteria problem modeling. Using this procedure, the systematic differences in ranges of the input layers is combined in a single analysis. The process is carried out by assigned values, where each grid cell in each thematic layer must relatively be reclassified into a preference scale. The reclassification of assigned value in index-overlay procedure indicates the categories of the parameters depending on the relative significance of each layer for triggering landslides. These values are then normalized with respect to the highest attribute of the corresponding causative factor to form the input data for the artificial intelligence models in text or ASCII format. The datasets according to sub-division approach (Abbaszadeh Shahri et al. 2019) were then randomly separated into three classes for training (55%), testing (25%), and validation (20%). The area was characterized with 26.4 million pixels to be run through modeling.

Landslide inventory map

Accurate mapping of occurred events is an essential tool to describe the relationship between the landslide distribution and the conditioning factors. A high-quality landslide inventory map for the study area was provided using synergy works between relevant resources, i.e., images (satellite/ aerial photographs), documented reports, monitored/recorded landslides, field investigations, and some regional geotechnical data of carried out projects. To identify probable historical landslides, different criteria such as images, digital maps (topographic, cartographic, DEM), morphology characteristics subjected to breaks (e.g., in forest canopy and bare soils), and feature changes (e.g., head and side scarps, flow tracks, soil, and debris deposits below a scar) were also analyzed. As a result, a total of 79 landslides was distinguished and mapped (Fig. 5). Subsequently, the recognized events were randomized into training, testing, and validation sets (55%, 25%, 20%) and thereafter classified and sorted based on their modes of occurrence. The occurred landslides in this area depend on the type of geologic material are encompassed fall, topple, slide, spread, and flow modes. According to assembled information from scar locations and reported documents, the events were predominantly shallow, planar, or rotational failures.

Fig. 5
figure 5

DEM of landslide inventory map indicating randomized training, testing, and validation data

Applied method to produce and delineate LSM

Generalized feed forward neural network

The ANN is a powerful computerized layout of the human brain structure which can be learned to emulate nonlinear behavioral models. The output of a neuron (y) is a combined set of weights (W) and biases (b) as:

$$ y=f\left(W.{X}^T+b\right)=f\left({w}_1{x}_1+{w}_2{x}_2+\dots +{w}_n{x}_n+b\right) $$
(1)

where X = {x1, x2, …, xn} denotes the input vectors. f is the applied activation function on the aggregated signal in the output.

The result of kth neuron in output layer (Zk) using response of jth unit in hidden layer (yj) and corresponding primary input (xi) then is expressed as:

$$ {y}_j=f\left({X}_j\right)=f\left({w}_{oj}+\sum \limits_{i=1}^I{w}_{ij}{x}_i\right) $$
(2)
$$ {Z}_k=f\left({Y}_k\right)=f\left({w}_{ok}+\sum \limits_{j=1}^J{w}_{jk}{y}_j\right)\overset{\mathrm{for}\ \mathrm{layer}\ l\ \mathrm{in}\ \mathrm{the}\ {t}^{th}\ \mathrm{iteration}\ }{\to }\ {y}_j^{(l)}=\sum \limits_{i=0}^p\left({y}_j^{\left(l-1\right)}(t){w}_{ji}^l(t)\right) $$
(3)

where woj and wok are the bias weights for setting the threshold values. Xj and Yk represent temporary results before using the activation function, f, which should be applied in the hidden and output layers.

In generalized feed forward neural network (GFFN) (Fig. 6), the connecting system can jump over one or more layers. This ability is because of embedded generalized shunting neuron (GSN) that not only utilizes adaptive nonlinear filters but also provides higher flexibility in computational power than multilayer perceptrons (MLPs) (Arulampalam and Bouzerdoum 2002; Ghaderi et al. 2019; Abbaszadeh Shahri et al. 2020b). In such topology, all input is summed and passed through an activation function like a perceptron neuron to produce the output as:

$$ {y}_j=\frac{b_j+f\left(\sum \limits_i{w}_{ji}{I}_j+{w}_{jo}\right)}{a_j+g\left(\sum \limits_i{c}_{ji}{I}_i+{c}_{jo}\right)}=\frac{b_j+f\left({w}_jI+{w}_{jo}\right)}{a_j+g\left({c}_jI+{c}_{jo}\right)} $$
(4)
Fig. 6
figure 6

Topology of GFNN classifier and performance of GSN in producing output

where Ij and Ii represent the inputs to the ith and jth neurons. aj is passive decay rate of the neuron (positive constant). wji and cji express the connection weight from the ith inputs to the jth neuron. aj and bj, then g and f refer to constant biases and activation functions respectively.

Proposed HBNN model

The learning ability of complex data can be influenced by different internal ANN characteristics (e.g., number of neurons, layer arrangements, training algorithms, learning rate, activation function). Large ANN models are suitable for more complicated processes but take a long time to learn, while small networks may be trapped into a local error minimum or not learned from the training data. Assigning the split task to different sub-network modules then is considered an effective approach to find feasible solution in large-scale problems (Jacobs and Jordan 1993; Sharma et al. 2003). By integration of this idea and divide-and-conquer strategy, a new hybrid block neural network structure (HBNN) incorporated with genetic algorithm (GA) for complex LSM task was proposed. Referring to literature, the performance of GFFN is more efficient than MLPs (Ghaderi et al. 2019; Asheghi et al. 2019; Arulampalam and Bouzerdoum 2002; Abbaszadeh Shahri et al. 2020a, b). This implies that using GFFN instead of MLPs can increase the capability of HBNN. As presented in Fig. 7, the proposed topology consists of input block, layers of hidden blocks, and an additional decision block comprising GFFN in all sub-networks. The used thematic layers then were categorized as per input blocks. Each block as a basic processing element corresponds to a network with input/output nodes. The inputs randomly are connected to only one of the hidden layers to pass the output to decision block. The number of inputs and outputs of the block is determined by the system, but the optimum internal network characteristics can be selected independent of the overall architecture using GA (Fig. 8). To specify the topology of HBNN, the number of blocks in the input layer (nib) is determined through the dimension of input vectors (l) and the number of inputs per block (ni) as:

$$ {n}_{ib}=\left[\frac{l}{n_i}\right] $$
(7)
Fig. 7
figure 7

The architecture of proposed HBNN

Fig. 8
figure 8

Simplified flow diagram of developed HBNN and training procedure of expert and block networks (n and J denote the number of epochs and neurons in hidden layer)

Following the modular structure (Dailey and Cottrell 1999), each block in the input layer can have either k or [\( {\mathit{\log}}_2^k \)] outputs subjected to:

$$ \left\{\begin{array}{c}{n}_i>k\\ {}{n}_{ih}>{\mathit{\log}}_2^k\end{array}\right. $$
(8)

where k and nih reflect the numbers of output classes (i.e., one output neuron for each class) and inputs in each hidden block respectively. At least the outputs of two hidden blocks are used as an input for decision block to mark the end of hidden blocks of the network (Fig. 7). If nih in a given hidden block is lower than or equal to twice of the number of inputs of one hidden block, then the dataset is directly considered input of the decision block (Fig. 7). Accordingly, the inputs of decision network can be obtained using (m × k) or m × [\( {\mathit{\log}}_2^k\Big] \), where the number of network outputs (no) is determined by:

$$ {n}_o=\left\{\begin{array}{c}{\mathit{\log}}_2^k\kern1.5em k= power\ of\ 2\\ {}\left[{\mathit{\log}}_2^k\right]+1\kern1em k\ne tpower\ of\ 2\end{array}\right. $$
(9)

The optimum number of hidden units can be specified using different methods such as trial-error procedure (Abbaszadeh Shahri 2016), constructive technique (Kwok and Yeung 1997), pruning algorithms (Reed 1993), integrating the trial-error procedure with constructive technique (Ghaderi et al. 2019), and alternating the simultaneous internal network characteristics (Abbaszadeh Shahri et al. 2020b). It is also initially can be set to n or some combination of n and no such as\( \left[\sqrt{n_i.{n}_o}\right] \), \( \frac{n_i+{n}_o}{2}\kern0.5em \left\{m<2{n}_i\right\} \) or \( \frac{2\left({n}_i+{n}_o\right)}{3} \) which systematically should be changed to find the best performance.

Using this strategy, developing fully connected layers and possible corresponding drawbacks (e.g., over fitting, slow training, converge to local minima) is skipped and thus the ability of the model in supporting bigger networks is enhanced. In this paper for training process, a vector-based method was applied not only to minimize the possible drawbacks but also to specify the required parameters of blocks and associated processing elements (Fig. 8). Using proposed procedure by Asheghi et al. (2019), each block was trained for the entire vector neurons to explore the optimum individual internal characteristics. Following mixture of expert networks (Jacobs and Jordan 1993; Hodge et al. 1999), the architecture of each block was organized to fuse appropriate decision. Correspondingly, the outputs of each block are mediated by an integrating unit that is not permitted to feed information back. Selecting appropriate combination of blocks for training is decided by integrating unit to form the final output. The outcome of decision block is based on the embedded experts in hidden blocks not directly from the input data and thus provides significant advantages over the single network and facilitates to build larger scale network models. This also causes to speed up training times and reduce the number of required training exemplars. The training process is terminated using two-stage criteria using root mean square error (RMSE) and the number of epochs (t). If RMSE as the priority is not achieved, then t (set for 10,000) will consider. The weights of each neuron based on the applied training algorithm then should be adopted according to previous values and correction term. The applied learning rate on training algorithm specifies the task time. Small learning rates take long processing time whereas in high values then the adaptation diverges and the weights are unusable. The GA as a heuristic search and optimization method (Mitchell 1996) was applied on the output of training process to optimize weights and biases of the GFFN and minimize the fitness function (\( \frac{1}{N}\sum \limits_{i=1}^N{\left({t}_i-{o}_i\right)}^2\Big) \)for N training datasets over successive generations (ti and oi denote the target and training output). The results of GA can be represented by string of weights and biases to the objective function as encoding parameters of the solution. Accordingly, the length of the weight-bias vector (L) consisting of a single input variable with n neuron in hidden layer is 3 × n + 1, where the length of the input vector to the GA fitness function is consequently set to L. Refer to Fig. 9, such optimization requires multiple times of training to find the settings with the lowest error. This implies that training process like MLPs covers the output of hidden layer, error investigation, weight updates, and final predicted outputs. However, several modifications in formulations were carried out. The main significant progress is the mixed final output of two-stage training process for both experts and decision block. Moreover, in this structure variance and prior probability were used not only for weight update procedures but also to provide more precise results. The training process is carried out in two stages, first the embedded experts and then the decision block. This implies that the selected appropriate training algorithm by expert networks further is mediated by weight adjustments in decision block. Therefore, the landslide susceptibility as the output of the ith expert network (LSi) is computed using weighted sum of the hidden layers by:

$$ {LS}_i=\sum \limits_m\left({g}_m\sum \limits_j{w}_{mj}{f}_{mj}\right) $$
(10)
Fig. 9
figure 9

Error improvement of the presented model in Table (2) for 2000 epochs (a) and evolution trends of each block structure using GA (b)

where fmj denotes the applied activation function on ith block with j unit number in the layer. Accordingly, the landslide susceptibility in decision block (LSd) subjected to N number of expert blocks as weighted sum of the outputs of all experts is calculated as:

$$ {LS}_d=\sum \limits_{i=1}^N{g}_i{LS}_i=\sum \limits_{1=1}^N\left[\frac{\exp \left({O}_{di}\right)}{\sum \limits_{j=1}^N\exp \left({O}_{dj}\right)}\right]{LS}_i $$
(11)

where Od,i expresses the output of decision block subject to activation function of expert i. gi represents assigned weight by the decision block to the output of ith expert. The gi values are nonnegative and sum to 1.

The weights of decision and expert blocks (wd, wi) are updated using:

$$ \Delta {w}_d={\eta}_d\left(\overset{\mathrm{prior}\ \mathrm{probability}}{\overbrace{\frac{\frac{g_i}{\sigma_i}\exp \left[-\frac{{\left({d}_o-{LS}_i\right)}^2}{2{\sigma}_i^2}\right]}{\sum \limits_{j=1}^k\frac{g_j}{\sigma_j}\left[-\frac{{\left({d}_o-{LS}_i\right)}^2}{2{\sigma}_i^2}\right]}}}-{g}_i\right){X}^T;\Delta {w}_i={\eta}_i\overset{\mathrm{prior}\ \mathrm{probability}}{\overbrace{\frac{\frac{g_i}{\sigma_i}\exp \left[-\frac{{\left({d}_o-{LS}_i\right)}^2}{2{\sigma}_i^2}\right]}{\sum \limits_{j=1}^k\frac{g_j}{\sigma_j}\left[-\frac{{\left({d}_o-{LS}_i\right)}^2}{2{\sigma}_i^2}\right]}}}\left({LS}_d-{o}_{di}\right){X}^T $$
(12)

where ηd and ηi denote the learning rate of decision and expert blocks respectively. do reflects the desired response of the ith expert and \( {\sigma}_i^2 \) shows the variance of expert i subjected to Gaussian random variable with zero mean.

Consequently, the predicted landslide susceptibility for the entire network (LS) subjected to recombination of mixed outputs is calculated by:

$$ LS=\sum \limits_i{g}_i{LS}_i $$
(13)

The information of each terrain pixel of stacked thematic layers can reflect unique physical environmental conditions and expected degree for landslide susceptibility where higher cell values represent more susceptibility (Abbaszadeh Shahri et al. 2019). The landslide susceptibility for each cell (LSpix) using n neurons in the hidden layers is then calculated as:

$$ {LS}_{pix}=f\left(\sum \limits_{r=1}^n{w}_{ir}f\left(\sum \limits_{j=1}^m{v}_{rj}{u}_j+{b}_r\Big)\right)+{c}_y\right) $$
(14)

where wir and vrj are adjusted weights, uj and y represent m × 1 input and output vector layers, and br and cy are neuron biases in the hidden and output layers.

Analysis of system results

Hybridizing with GA and two-stage training and implementing different internal characteristics assist to not only avoid the possible over fitting and get stuck in local minima but also prevent bias effects in final results. For example, in training process it was observed that replacing the gradient descent by the momentum optimizer with step size 0.001 minimizes the chances for trapping in local minima. Moreover, in this paper error improvement of each examined models in 3 runs was monitored to be ensured nor over fitted neither trapped in local minima. Error improvement refers to network performance predictability during the last and/or each iteration. Therefore, it can detect the situation when network is not improving, and further training is unavailing. The proposed HBNN aims to speed up training times and reduce the number of required training exemplars, and this was the reason why 55%, 25%, and 20% of randomized datasets were assigned for training, testing, and validation process. Such randomizations have previously been used to enhance the learnability of different models (e.g., Ghaderi et al. 2019; Asheghi et al. 2019; Abbaszadeh Shahri 2016; Abbaszadeh Shahri et al. 2019; 2020b). The sum of squares and cross-entropy functions were considered and tested for output errors respectively. Following the described training procedure, the expert models even in similar structure but different internal characteristics were found through calculated RMSE. A sample of 10 carried out efforts to find optimum HBNN topologies and monitored error improvements in 2000 epochs are reflected in Table 2 and Fig. 9A respectively. A value of 0.7 was considered for learning rate. Structure evolution of blocks based on chromosome population size for crossover and mutation probabilities is given in Fig. 9B. The produced LSM using optimum HBNN then is created and classified using natural breaks criteria (Fig. 10). The area with the absence of landslide or with 0 in slope was classified as very low to low susceptible areas.

Table 2 Sample of trained structures to find the optimum topology for each block
Fig. 10
figure 10

Produced LSM using developed HBNN

To insight about the improved predictability of optimum HBNN, two MLP and GFFN models subjected to the same randomized datasets and defined training procedure were developed. Here, a sample of testing procedure using the variation of the network RMSE based on the number of neurons subjected to different training algorithms is presented (Fig. 11). The numbers of neurons in minimum observed RMSE then were managed in different layers to find the optimum topology. Examining numerous models (> 1100 structures) showed that 14-23-2 and 14-10-12-2 can be accepted as the optimal MLP and GFFN topologies (Fig. 11, Table 3). The produced LSMs using these optimum topologies are reflected in Fig. 12.

Fig. 11
figure 11

The variation of network error based on number of neurons subjected to different training algorithms and corresponding optimum structure of GFFN (A) and MLP (B)

Table 3 The characteristics of selected optimum structures to produce LSM
Fig. 12
figure 12

LSM of the study area and percentage of classified susceptibility levels using GFFN (A) and MLP (B)

Discussion and validation

Applied intelligence models for producing LSM address challenging problems by mostly relying on structural dependencies. The presented HBNN is a generic task split approach for regression/classification to capture large models to produce structured output data based on deep neural networks. This model is different from traditional networks, where the provided object activities using index-overlay procedure incorporate the inputs dependencies learning, the output dependencies, and the supervised task in the same framework. The results of the introduced HBNN, MLP and GFFN based models were verified using validation datasets, different area under curves (success and prediction rate, precision-recall) as well as sensitivity and weight analyses.

Applying validation datasets

To examine the accuracy of produced LSMs, the validation datasets were fed to optimum structures. In case of landslide, true/false positive rate (TPR/ FPR) define how many correct/incorrect positive results occur among all available positive/negative pixels during the validation process. It was observed that in HBNN 23 of 26 landslides and 28 of 30 non-landslide pixels were classified correctly. In cased of GFNN and MLP these values were (19/26 and 25/30) and (16/26 and 22/30) respectively. This indicates acceptable performance of the HBNN.

Area under the curves

In intelligence classifiers, performance measurement as an essential task can be evaluated and counted on area under curves of the receiver operating characteristic (AUCROC) metrics. ROC is a probability curve and illustrates the diagnostic ability of the classifier based on threshold variations whereas AUC represents degree or measure of separability and can be interpreted as the capability of model in distinguishing between classes. Therefore, AUCROC reflects model strength percentage and corresponding accuracy performance, where the higher the AUCROC, the higher the accuracy and model strength.

Accordingly, the capability of susceptibility models was compared using success and prediction rate as well as precision recall curves. The success/prediction rate curves were plotted using cumulative percentage of landslide occurrence against the landslide susceptibility index rank (cumulative area percentage) of the training/validation data (Fig. 13A). Considering thumb rules (e.g., 90–100: excellent; 80–90: good; 70–80: fair; 60–70: poor; and 50–60: fail), 90.15% and 86.52% in the AUC of HBNN for success and prediction rate were observed and followed by GFFN (83.84%/82.27%) and MLP (81.5%/79.24%) respectively. The success of prediction also can be figured out by precision-recall ROC curve (Powers 2011) using the number of truly turned results (precision) and corresponding tradeoff between different thresholds (recall). Therefore, both high precision and recall express that most of the results are labeled correctly. Accordingly, high recall but low precision returns most of predictions comparing to training are labeled incorrect and vice versa. As presented in Fig. 13B, the AUCROC for HBNN (89.36%) showed more accuracy than GFFN (83.24%) and MLP (80.5%). Referring to Hand (2009), the interpreted AUCROC imply on the benefit of HBNN in providing appropriate tool to select possibly optimal models.

Fig. 13
figure 13

Evaluating the accuracy metrics of developed models using AUC plots of success/prediction rate (A) and precision-recall (B)

Sensitivity and weight analyses

The robustness of the developed models was also analyzed using sensitivity and weight analyses. Such analyses can be used to distinguish output uncertainty, determine possible relationships between input-output parameters, model calibration by removing less important inputs, and subsequently reduce the computational effort (Asheghi et al. 2020). In this study the results of sensitivity cosine amplitude method (CAM) and weight analysis subjected proposed procedure by Zhou (1999) were applied and are reflected in Fig. 14. To ensure the efficiency and correctness of estimated weights, the ten training attempts were also checked for possible dependence of the final weights on the initial randomized selection. The observed similar values between initial random and final weights indicate that the initials did not have large effects on the final weights and thus both CAM and weight analyses represent similar trends.

Fig. 14
figure 14

Importance of applied causative factors using CAM sensitivity and weight analyses over 10 attempts

According to Fig. 14, aspect and curvature were introduced as the least effective factors whereas soil type and slope showed the highest effect on LSM. However, the low differences between precipitation and lithology with soil type and slope demonstrated the high influence of these factors on occurred landslides. The results showed that most of the occurred landslides fall in the region with 553–856 mm precipitations which cover high density forest and alfisols with high erosion intensity. These results confirm the expected significant role of lithology-related parameters and precipitation as previously discussed. In the study area, steep slopes are abundantly found and the great tendency of landslides in such circumstances indicates why slope exhibits one of the highest weighted values. According to produced LSM using HBNN (Fig. 10), most of the area has been covered between 12.15° and 18.45° slope, whereas most of the landslides occurred on slopes between 18.45° and 27°. However, in the study area very few events also due to complex triggering factors have occurred on slopes of less than 5.8° or more than 27°. Due to high seismicity of this region and triggered slope movements, high importance for slope in weight and sensitivity is expectable. Furthermore, interlinking of slope and elevation also can increase the weight of slope. The contributions of the distance to roads and rivers (water areas, and watercourses) can be interpreted as their presence or availability in the entire study area while the rivers due to higher densities showed more influence on LSM. In the case of tectonical features due to significant power of faults more importance than river and road is anticipated. This interpretation can also be made for the observed variation in land cover. According to investigations, minor contributions of elevation, aspect, and curvature in this area are reasonable. The results of sensitivity analysis through the statistical models can be used to achieve physical interpretation on the landslide possibility and the recognized important causative factors. As an example, the modulation effects of landslide-slope and then landslide-precipitation can be intensified through factor multiplication. Therefore, an integrated three multivariate landslide hazard-warning model is organized that can be examined by triggering factors. Results lead that the eroded or vegetized steep terrain-influenced rainfall can better identify hazard-warning locations. Combination of factor modules for the hazard warning can also mirror true environmental conditions, yielding more representative model results.

Concluding remarks

Integrating statistical techniques and different soft computing approaches with GIS is recognized as powerful and flexible tools to overcome the difficulties in modeling the complex nonlinear system of landslide and interlinked triggering causative factors.

In this study, a novel hybrid intelligence block neural network (HBNN) structure incorporated with GA for the purpose of producing high-resolution and more reliable LSM was proposed. The approach was developed using mixture of expert modular structures with an extra decision block and divide-and-conquer strategy. Reducing the number of variable parameters to find a good solution and presenting tractable network are the main reasons for dividing the problem into sub physically meaningful tasks. This issue in HBNN was implemented to overcome the challenge of speed up training process and network size in large number of informative pixels. To increase the applicability and develop the generalization of model, the internal characteristics were organized by GFFN and trained using two-stage process for both expert networks and blocks. The results of training procedures, monitored error improvements and evolution trends of each block showed that a model with minimum RMSE (0.223) subjected to 14-(6-6)-(7-5)-8-2 structure corresponding to causative factors-(number of neurons in input block)-(number of neurons in hidden blocks)-number of neurons in decision block-output under MO training algorithm and Thy activation function can be selected as optimum candidate. This model then was applied on southern part of Guilan province (north of Iran) to produce a high-resolution LSM. The improvement of HBBN then was compared with two neural network models using MLP and GFFN. A detailed discussion on performance metrics including AUC of success and prediction rates as well as precision-recall accompanying with sensitivity and weight analyses is presented. The accuracy and validity of models also were controlled using validation landslide locations. Calculated weights and sensitivity analyses exhibited similar trend and pointed the highest importance for soil type, slope, and lithology. According to results of sensitivity and weight analyses, in the high elevation regions with lack of data, the recognized key factors for LSM can be selected. Hence, the development of a scenario for future planning of risk mitigation is achieved in an efficient manner. The results showed 6.84% and 9.91% enhancement in HBBN with respect to GFFN and MLP respectively. Moreover, HBNN with 90.15% and 86.52% AUC in success and prediction rates demonstrated 9.59% and 7% as well as 8.41% and 4.6% improvements than MLP and GFFN respectively. The interpretation of these metrics as well as observed minimum RMSE implied on significant priority of HBNN than MLPs and GFFN, where the landslides can be predicted with higher probabilities.

As the produced LSM covers a much larger area than any previous assessments in the region, it can be a beneficial and cost-effective screening tool for risk mitigation and identifying prone areas. Furthermore, the capability of intelligent hybrid models in mapping large areas can be considered by urban planners in developing new comprehensive plans for cities. In produced map new areas were identified which are previously not known to be susceptible, potentially because of a more rural location away from more densely populated areas.