Keywords

Introduction

The model validation stage constitutes one of the most important issues in natural hazard modelling studies (Begueria 2006). Obviously, it could be considered that validation of landslide susceptibility models as well as the maps should also be the most critical topic in landslide susceptibility researches. Although the model validation assessment has a very important role in modelling, many studies in literature do not give necessary importance to model validation, they just allow to be evaluated some basic validation statistics to assess the performance of the models. In landslide susceptibility mapping studies, the performance of the models are checked using different validation statistics by different researchers and that contributes to differences in terms of performance evaluation and the comparisons of the models according to these performances. The main purpose of this study is to represent a procedure for the assessment of the performance of landslide susceptibility mapping. The study proposes a flow chart which evaluates the current validation indices of the models in three stages: i) the model data production stage, ii) model construction stage and iii) the production of model consequences stage. For the purpose, the landslide susceptibility analyses performed by Dagdelenler (2013) in eastern part of Gallipoli Peninsula (Canakkale, Turkey) were evaluated.

Model Validation

As mentioned, the performance evaluations of the models were performed in three stages given in the flow chart in Fig. 1. These stages are described in detail below.

Fig. 1
figure 1

Procedure for the assessment of the performance of landslide susceptibility mapping

During Model Data Production

During the landslide susceptibility analyses performed by Dagdelenler (2013), a total of 10 variables (7 continuous and 3 categorical) were used as independent variables and also presence (1) and non-presence (0) data of mapped landslides was used as the dependent variable. 20 % of the presence (1) data was separated as testing and 80 % of the presence (1) data was separated as training data sets for the models (Nefeslioglu et al. 2011, 2012; Oh and Pradhan 2011; San 2014). This process was carried out three times the models data sets Rnd1, Rnd2 and Rnd3 selected at random were obtained. For each random set, 80 % of the presence (1) data was separated and the training sets were generated by separating also 80 % of non-presence (0) data which equal to 80 % of presence (1) data in number. As it could be clearly realized, this stage constitutes the preparedness for further performance evaluations.

During Model Construction

The landslide susceptibility analyses were performed by applying the logistic regression technique by Dagdelenler (2013). As the results of the logistic regression analyses using training data sets, correct classification percentages (%), error matrices and validation statistics and Kappa index (k) values derived from the error matrices were determined. Correct classification percentages (%) calculated as the results of logistic regression analyses for the models were found to be acceptable and quite similar. They vary between 78 and 79 %.

An error matrix shows the number of correctly estimated observations for positive and negative cases. In the error matrix in Table 1, the observed and predicted presence (1) and non-presence (0) data sets are represented by a letter (a, b, c, and d). Validation statistics (Table 2) and Kappa index (k) values derived from the error matrices of the models are calculated according to formulations of the validation statistics derived from the error matrix (Begueria 2006). The formula of the Kappa index (k) derived from error matrices was seen in (1,2, and 3). According to the Kappa index classification chart proposed by Landis and Koch (1977), Kappa index value for each model indicates that the model compatibility powers were moderately good.

Table 1 The presentation of true positive
Table 2 The formulation of the validation statistics derived from the error matrix (Begueria 2006)
$$ P=\frac{a+d}{N} $$
(1)
$$ Pe=\frac{\left(a+b\right)\left(a+c\right)+\left(c+d\right)\left(d+b\right)}{N^2} $$
(2)
$$ k=\frac{P- Pe}{1- Pe} $$
(3)

Where,

P = The proportion of observations in agreement; Pe = The proportion in agreement due to chance; k = Kappa index.

RMSE performance index and correct classification percentage (%) values were calculated by using the testing data sets. Validation indices were derived from the error matrix and Kappa index values (threshold dependent) were specified by using the training data sets. Correct classification percentages for the first, second and third random sets of the landslide body sampling model were calculated as follows: 79.7 %, 80.5 % and 69.8 % respectively. The correct classification percentages for the landslide susceptibility models considered different buffer distances (d = 25 m, d = 50 m, d = 75 m and d = 100 m) in the seed cell samplings (Dagdelenler 2013) vary between 76.6 and 88.5 %. The seed cells obtained by the seed cell sampling strategy (Suzen and Doyuran 2004) are assumed to represent the pre-failure conditions of the landslides for the topographical parameters in particular. The calculated RMSE values for the Model 1 (landslide body samplings) are 0.398, 0.395 and 0.453 respectively and RMSE values vary between 0.334 and 0.422 for the Model 2 (seed cell samplings) at different buffer distances (d = 25 m, d = 50 m, d = 75 m, d = 100 m).

During the Production of Model Consequences

In the third stage of the performance evaluation procedure, the resultant landslide susceptibility maps were analysed by using the ROC curves and the area under ROC curves (AUC). The ROC curve evaluation and the AUC are threshold independent indices which are determined during the production of model consequences. Area under ROC curve value is used as a single threshold independent validation statistics (Begueria 2006). The An AUC value which is close to 1 means the performance of the model is good (Fawcett 2006). The ROC curves of the models were drawn and the AUC values were determined (Table 3). According to the results, the calculated AUC values for the models were found to be close to 1 and were very close to each other (Table 4). These results show that the performances of the models are quite acceptable.

Table 3 Validation statistical values derived from error matrices and Kappa index (k) for both displacement + accumulation area (Model 1) and seed cell (Model 2) sampling model
Table 4 Area under ROC curve (AUC) values for the models

Results and Conclusions

The validation indices are evaluated in three stages such as during model data production, model construction, and production of model consequences. In addition, a generalized flow chart for the performance evaluation of the landslide susceptibility models is proposed. According to the flow chart, it could be clearly realized that which validation indices are calculated from which data set and in which stage of the model. In recent huge landslide susceptibility literature, there is vagueness about the validation of the models constructed. This uncertainty starts from the model data production and goes up to production of the model consequences. The common way applied for validation is the evaluation of the ROC curves for whole study area (Ayalew and Yamagishi 2005; Mathew et al. 2007; Pradhan 2010). However, the performance evaluation of the model construction stage is commonly ignored particularly in the studies in which the bivariate statistics, artificial intelligence and data mining techniques are applied (Saito et al. 2009; Yilmaz 2009; Oh and Pradhan 2011; Akgun et al. 2012; Bui and Pradhan 2012; Conforti et al. 2014). Obviously, in order to apply this stage, a pre-processing stage including data production for further evaluations is necessary. The performance evaluation of the model construction stage was suggested to be a separate routine step in model validation for landslide susceptibility analyses in the proposed flow chart. It is commonly desired from a landslide susceptibility model to provide both high prediction capacity for the constructed model and high generalization capacity for the application results in whole study area (Can et al. 2005). Assuming that if all probabilities are calculated to be 1 for whole study area, in this case the spatial performance of the model is found to be 100 %. However, the resultant landslide susceptibility texture is irrational and it could be assumed that there is no generalization capacity for the model for this situation. The probabilities will also be calculated to be 1 for whole areas without landslides so that the calculated AUC values will also be low. In other words, the model prediction capacity is maximum while the generalization capacity is minimum. Hence, it could be clearly realized that the validation indices for the model construction and production model consequences should be evaluated separately. Therefore, it could be concluded that the proposed methodology in this study enables controlling both prediction and generalization capacities for any landslide susceptibility evaluations appropriately.

According to the results obtained from the sample analyses of the models given in this study, the validation indices are quite close and this also shows that model predictive and generalization capacities could be evaluated to be acceptable.