Introduction

Landslides are one of the principal geological disasters in urban areas, causing substantial property damage and human casualties. Urban areas are particularly vulnerable to landslides due to ground instability caused by human activity. Moreover, the high population density of urban areas means more people are exposed to geohazards than in rural regions. In recent years, landslide frequency has steadily increased due to intensifying rainfall, likely driven by ongoing climate change (Guzzetti et al. 2007; Kim et al. 2011, 2015). It is vital, therefore, to identify landslide-susceptible areas in order to mitigate and prepare for landslide events. In this regard, several studies have already made progress in landslide prediction based on geostatistical approaches.

The majority of research to date has used geographic information systems (GIS) to assess landslide susceptibility, as this is an effective approach to the quantitative evaluation of large regions. For example, landslide susceptibility has been estimated using GIS for several regions of South Korea where the risk of landslide hazards is particularly severe (Choi et al. 2012; Lee et al. 2006, 2013; Lee and Lee 2005; Lee and Min 2001; Park et al. 2013; Suh et al. 2011). Most of these investigations used GIS to perform spatial statistical analyses on susceptible areas. Other studies have employed statistical methods to evaluate landslide susceptibility; e.g., by comparing approaches including frequency ratio (FR) analysis, analytic hierarchical processes (AHP), artificial neural networks (ANN), and logistic regression (LR). Additionally, principal triggering factors of past landslides have been clarified and verified by comparison with a landslide inventory database (Akgun and Türk 2010; Oh and Pradhan 2011; Park et al. 2013; Pouduyal et al. 2010; Pradhan 2011; Reis et al. 2012; Schleier et al. 2014; Shahabi et al. 2015; Suh et al. 2011; Yalcin et al. 2011; Yilmaz 2009, 2010; Yilmaz et al. 2012).

Most previous studies have concentrated on assessing susceptible areas by analyzing their geographical and geological properties, especially focused on identifying the spatial pattern of landslides without considering the properties of adjacent areas (Goltz 1996; Liucci et al. 2015; Malamud et al. 2004). Yet the physical characteristics of the surrounding area can influence landslide hazards in both direct and indirect ways (Jones and Preston 2012; Massey et al. 2013; Petley et al. 2005; Van Westen et al. 2003). Here, we present a statistical approach for evaluating landslide susceptibility that considers triggering factors both at the landslide site and in the surrounding area, and that enables the production of landslide susceptibility index (LSI) maps with varying radii of influence. To find the optimum radius of influence, we used the area under the relative operating characteristics curve (AUC) method to compare LSI maps with selected control areas of landslide inventories.

The objectives of this paper are to (1) demonstrate our evaluation model of landslide susceptibility, which utilizes GIS technology and statistics; (2) estimate the LSI by incorporating the effects of adjacent areas and varying the radius of influence; and (3) identify the optimum radius of influence and to validate this approach by comparing our results with those of previous studies.

Study area

The study area is the Seoul metropolitan area (605.2 km2), which, with 10.6 million inhabitants, is the most urbanized area in South Korea. Seoul is located between the 37°25′N and 37°42′N, 126°45′E and 127°10′E, and is surrounded by mountainous terrain. Together with regular typhoons and intense summer rainfall, this high-relief environment results in a high incidence of landslides. For example, 69 landslides were recorded following intense rainfall (maximum hourly rainfall of 110.5 mm) on 27 July 2011, with more than half of these occurring close to Mt. Umyeon in the central-southern mountainous region of Seoul.

Geological structure of Mt. Umyeon, Seoul is mostly composed of gneiss and granite covered by soil layer, which has relatively stable bedrock. Yet infrastructures made by human activities have expanded to the steep mountainous area, slope failures including rock falls and debris flows occur in rainy season. Landslides in Mt. Umyeon in 2011 were triggered by heavy rainfall which deteriorated the surface stability that result in soil failure. In the same period of time, other landslide spots were in similar situation, most damages were made by soil failure. We assumed that the amount of precipitation over the study area was almost equal, so that the triggering factors, such as topography, soil, forest properties and land cover, which are known as highly involved with soil failure including debris flow, were chosen. Accordingly, the landslide susceptibility assessment of this study is focused on the case of landslides triggered with same precipitation condition, especially in urban areas on mountainous terrain.

A landslide inventory database is maintained by the Seoul metropolitan government and includes location and scale data for each landslide event. Location data include text address and approximate longitude/latitude information. The spatial extent of each event is recorded in the scale field, though the coverage is not precise. To specify the spatial coverage, the exact locations of landslide sites are assigned by manual geocoding and coverage is represented by a circular area, the diameter of which is calculated from the spatial scale of the event. As the landslide inventory is presented in map form, data are validated against seven known landslide sites, which constitute approximately 10 % of all landslide events (Fig. 1). The validation sites are sampled from a uniform distribution that properly represents each cluster. The remaining 62 landslide sites are used as training data for statistical analysis.

Fig. 1
figure 1

Locations of landslide sites (for training and validation) in the study area

Building the triggering factor database

In this study, 12 geospatial attributes are considered to be triggering factors on the basis of their geological and geographical properties (Table 1): slope angle, slope aspect, curvature, land cover, forest age, forest density, forest diameter, forest type, soil drainage, soil material, soil texture, and soil thickness. In accordance with Lee and Lee (2005), we selected these triggering factors for the following reasons: (1) spatial data on triggering factors are obtainable from government organizations; (2) the study areas are sufficiently close to one another, meaning that the same triggering factors are relevant at both; and (3) the LSI assessment method can be assessed effectively since we have employed the same statistical processes. Spatial data, including topographical maps, geological maps, and forest/soil properties maps, have been analyzed to rank the landslide prone areas.

Table 1 Landslide inventory and geospatial attributes database of the study area

Entries of database are selected as major factors that reflect the characteristics of landslide events in South Korea. Generally, slope angle, slope aspect and curvature are topographic attributes that are commonly suggested as landslide triggering factors (Schleier et al. 2014; Shahabi et al. 2015; Van Westen et al. 2003). The distance from the fault is also an important factor but there was no spatial database to be excluded. Seoul, the study area, has numerous soil failure, including debris flow and debris slide to cause the most dominant damages, land cover, forest and soil properties are chosen as triggering factors which have relation with inducing soil failure. Especially, previous studies in South Korea (Choi et al. 2012; Lee and Lee 2005; Lee et al. 2006; Yune et al. 2013), forest and soil properties are found as effective to accelerate or prevent from landslide occurrences.

To avoid the redundancy of the dataset, the classes which contain similar properties are required to be checked before FR analysis. In the entries of constructed database, forest age and forest diameter seem to be in close correlation but according to forest type, they could not be interdependent to each other. Likewise both soil texture and soil material are a similar kind of lithology attributes but they could show different aspects in surface drainage or moisture absorptivity depending on the combination. Hence, we conduct FR analysis to define them whether independent from landslide occurrence or not, and design more sophisticated analysis model.

All topographical data are constrained using a digital elevation model (DEM; ASTER G-DEM V2) with a resolution of ~30 m. This globally constant DEM is freely available from the Japanese Ministry of Economy, Trade and Industry (METI) and the US National Aeronautics and Space Administration (NASA). Maps of forest/soil properties and land cover are available from the Korean Water Management Information System (WAMIS; http://www.wamis.go.kr/) in GRID format and can be converted to raster files with a spatial resolution of 30 m.

The values of triggering factors are regarded as being either discrete or continuous. Basically, the classes of each triggering factor are classified with discrete (categorical) values, such as land cover and forest/soil properties. We tried to describe the whole classes in quantitative values even constructed with discrete values. Though, soil drainage was the only factor that cannot be numerated due to the surveys were made by naked eye of investigators who have specialized in over 20 years (Hong et al. 2009). Except this case, the quantitative factors, which are investigated with certain categories as constant interval method (i.e. forest age, forest density, forest diameter, and soil thickness), are used as the classes of the factors in FR analysis.

For the trigger factors comprising continuous values but generally defined with discrete classes, each factor map is reclassified into several classes using a constant interval method: flat and eight directions for aspect, and flat, concave, and convex for curvature. However, in the case of slope angle data, use of the constant interval method results in the data of each class being strongly skewed. To equalize the irregular distribution, it is common method in classification to apply natural breaks (Jenks optimization), quantile interval or standard deviation. Comparing the results from adopting these methods, the standard deviation method is used for the reclassification and we manually set the number of domains to be similar in each class. In the occasion of the data are distributed irregularly, natural breaks or equal-constant interval methods cannot classify the number of domain to each class properly. On the other hand, equal-area interval method can solve the unbalanced distribution but suggest ambiguous border that hardly show the distinct characteristics of classes as excessively focusing on equality of the domain. If certain class has not sufficient samples or the number of classes of certain triggering factor is not enough, it can be supplemented by applying other methods including natural breaks, equal-constant interval or equal-area interval method.

Concept of the radius of influence

Our approach differs from previous assessments of landslide susceptibility in that we incorporate the radius of influence into our GIS-based statistical analysis. Because of the likelihood that geohazards such as landslides can be caused by triggering factors in adjacent areas (Massey et al. 2013; Petley et al. 2005), it is more effective and accurate for landslide susceptibility training domains to cover an area extending beyond the immediate site of the event (Fig. 2). Adopting this concept is particularly advantageous in the case of landslides occurring at sites where triggering factors are deemed “less likely” or “rare,” such as at points (A) and (B), respectively, in Fig. 2. Triggering factors can be exaggerated in probability calculations due to a lack of sufficient domains, resulting in erroneous assessments of landslide vulnerability and reduced accuracy of statistical analysis. On the other hand, the area within an overestimated radius of influence might contain factors that do not contribute to landslide occurrence. Such exaggerations (i.e., if the radius of influence is not applied or is overestimated) can be reduced.

Fig. 2
figure 2

Conceptual diagram showing the effects of applying the radius of influence to a statistical analysis of landslide susceptibility

In summary, it is crucial to find the optimum radius of influence in order to produce the most accurate results. This can be determined in the validation stage, following statistical analysis and production of the LSI maps. In the present study, we calculated different radii of influence at an interval of 30 m, which is the minimum spatial resolution of available property maps. To determine the optimum radius of influence for triggering factors, we conducted statistical analyses over 11 different radii, from 0 to 300 m (Fig. 3).

Fig. 3
figure 3

Spatial database and process model for assessing landslide susceptibility in this study

Frequency ratio analysis for each area of influence

To quantify landslide susceptibility, we used FR analysis. This statistical approach is commonly applied to susceptibility assessments in cases when both the landslide inventory database and triggering factor maps are available for the entire study area (Choi et al. 2012; Park et al. 2013; Reis et al. 2012; Schleier et al. 2014; Shahabi et al. 2015; Suh et al. 2011; Ulrich et al. 2008; Yalcin et al. 2011; Yilmaz 2009). Specifically, the FR method estimates the probability of each class of triggering factors through spatial comparison with landslide inventories, both at the landslide site and elsewhere. Ultimately, the FR index indicates the degree to which each class of triggering factors correlates with landslide occurrence. In the case that the landslide training domain is much smaller than the whole domain, the class of the triggering factor supports ground stability. If the situation is reversed, the FR index exceeds 1, indicating a vulnerability to landslide failure. If the FR index is close to 1, the class of triggering factor has no effect on the potential for landslide events.

The FR index can be determined using Eq. 1, in which each variable is assessed by overlaying the maps of spatial databases. For the sake of calculation, the number of grid cells in the landslide inventories and for the entire study area need to be measured. It is also necessary to count the grid cells for different classes of triggering factor that occur within each domain, both in the landslide inventory and over the whole study area. The FR index of each class of triggering factor can then be evaluated by inserting these values into the equation.

$${\text{Frequency}}\;{\text{Ratio}} \left( {\text{FR}} \right) = \frac{{{\text{Percent}}\;{\text{of}}\;{\text{Landslide}}\;{\text{Training}}\;{\text{Domain}}\;{\text{for}}\;{\text{each}}\;{\text{class}}}}{{{\text{Percent }}\;{\text{of }}\;{\text{Whole}}\;{\text{Domain }}\;{\text{for}}\;{\text{each}}\;{\text{class}}}}$$
(1)

Accordingly, FR index layer maps of 12 factors are generated in 11 separate training domains for each radius of influence. The variables for each area of influence are measured by cross-tabbing training domains with maps of geological and geographical properties. While this process is time consuming when done manually, our spatial process module is designed to compute these analyses with multiple iterations of the radii of influence, resulting in FR index values for each class of triggering factors (Table 2).

Table 2 FR index values for each class of triggering factor according to the radius of influence

Without considering the radius of influence, the most landslide-prone classes include mountainous terrain in the land cover category, <10 years old in forest age, <6 cm in forest diameter (i.e., the median trunk diameter at breast height), Pinus koraiensis and artificial hardwood forest in forest type, and well drained in soil drainage. Together, these characteristics suggest that landslides occur primarily in mountainous regions with a cover of young, relatively small trees that are unlikely to support ground stability.

As the radius of influence increases, several FR indices show a gradual decrease or increase, ultimately approaching a value of 1. Therefore, any previous exaggeration of these classes is reduced when the area of influence is taken into account. For example, for classes with an initial FR index value of 0, the value increases dramatically once the radius of influence expands. This explains why larger areas of influence yield larger sample numbers from the training sites. This approach can help reduce the exaggeration of data and improve the accuracy of LSI maps.

Figure 4 shows representative trends in FR index values for each class of forest type. The FR index values for Pinus koraiensis and artificial hardwood forests exceed 15 within a 0 m radius. However, the values for Pinus koraiensis drop sharply from 20.76 to 2.037 with increasing radius size, indicating that this forest type is not a principal triggering factor for landslides. In contrast, the FR index value of artificial hardwood forest does not decrease but remains close to 15 for every radius of influence, suggesting that this class contributes to landslides in a way that is largely independent of the radius of influence. Similarly, there are triggering factors that tend not to induce a landslide. For example, both non-forested land and Populus deltoides classes give values of zero within a 0 m radius. However, as the radius increases, the FR index value of non-forested land increases gradually from 0 to 0.297, which is not apparent for radii of <60 m. Meanwhile, the Populus deltoides class shows no change in FR with increasing radius, indicating that this factor is independent of the radius of influence.

Fig. 4
figure 4

Frequency ratio index values of different forest types according to the radius of influence

Compilation of LSI maps

To calculate LSI values, the corresponding FR index values of the factors listed in Table 2 are calculated for each grid cell according to Eq. 2. Subsequently, LSI maps for each radius of influence are generated by stretching the summed values of each grid cell from 1 to 100 in equal area. As shown in Fig. 5, mountainous regions in southern areas, where major landslides have occurred, are classed as highly susceptible. The southeastern and northern parts of the study area are classified as highly vulnerable, though landslides have not yet occurred in these regions. As the radius of influence increases, the LSI map becomes blurred, indicating that highly variable values are being smoothed. As depicted in Table 2, this smoothing effect occurs when the LSI value of each grid cell is deduced on the basis of compensated FR index values.

$${\text{LSI}} = \sum {\text{FR}}\;{\text{of}}\;{\text{each}}\;{\text{factor}}\;{\text{for}}\;{\text{each}}\;{\text{radius}}\;{\text{of}}\;{\text{influence}}.$$
(2)
Fig. 5
figure 5

Generated LSI maps for each radius of influence

Validation of the estimated optimal radius of influence

As landslide susceptibility maps for each radius of influence are useful predictive tools, it is important to evaluate which LSI map is the most accurate by comparison with validation data. The most common approach for making this comparison in geohazard contexts employs the relative operating characteristics (ROC) curve method and the area under the ROC curve (AUC) method (e.g., Choi et al. 2012; Lee et al. 2013; Mason and Graham 2002; Oh and Pradhan 2011; Suh et al. 2013; Yilmaz 2009). To draw the ROC curve, LSI maps for each radius of influence are combined with the landslide validation area. The cumulative frequency histogram is made using the ratio of corresponding grid cells to LSI, as shown in Table 3. To ensure the AUC value shows a positive correlation with the accuracy of the LSI map, LSI is ranked in descending order. With increasing measured value of AUC, the average LSI of the validation sites also increases, which indicates the LSI map has greater accuracy.

Table 3 Cumulative frequency of LSI and accuracy of prediction maps according to the radius of influence

As the radius of influence increases, both the AUC values and LSI values (e.g., minimum LSI, LSI of the top 80 %) also increase (Fig. 6). However, the magnitude of the increase declines considerably and AUC values fluctuate beyond a radius of 210 m. We speculate that this pattern indicates that triggering factors in adjacent areas have a measurable impact on landslide occurrence up to the radius of 210 m. Conversely, AUC values decrease slightly beyond 270 m, which we interpret as overestimation of the area of influence, resulting in the inclusion of irrelevant triggering factors in the analysis.

Fig. 6
figure 6

AUCs of prediction maps according to the radius of influence (0, 60, 120, 180, 240, and 300 m)

Through this verification process, we determined that the optimum radius of influence is 240 m, where the AUC value peaks. The AUC value of 86.49 % obtained using existing methods shows an increase to 92.44 % when considering the optimized radius of influence. Moreover, in the case of 0 m, the lowest LSI is 35 %, while for 240 m it is only 20 %. According to Yilmaz (2009), the lowest value of AUC within a 0 m radius of influence (0.8649) remains valid, but LSI maps incorporating the optimized radius of influence will be more effective for hazard prediction.

Discussion

We compared both FR values that adopting the optimized radius of influence and not, in order to specify the reason of accuracy increase (Table 4). The number of classes which had comparatively constant FR values regardless of the radius of influence were 54 out of 85. From among these, six classes remained high in FR values, for instance >75.0 % in forest density and artificial hardwood forest in forest type. Still, eight classes kept 0 such as 40–50 years >50 % in forest age and wet land in land cover. The classes which retain their FR values, distributed dominantly around landslide occurred sites, can be divided into two categories, the spatial conditions where landslide can occur or not and triggering factors that can induce or prevent the landslide mechanism. It is the factors concerning with spatial conditions are >75.0 % in forest density and wet land in land cover. Meanwhile, artificial hardwood forest in forest type and 40–50 years >50 % in forest age contribute in triggering the landslides which directly affect to the ground stability and precipitation effects. These classes show the smaller fluctuation in FR values when varying radius of influence, which indicate the lower existence probability of other classes around adjacent areas, also assume rare influence to the adjacent areas.

Table 4 Changes in FR index (0 m and 240 m) for each class of triggering factor

The number of classes that the FR values were reversed below 1 to over 1, or vice versa, is 16 out of 85. However 6 out of 16 classes changed less than 200 %, which are titled as remained, and the other 10 classes showed sharp variations. The 31 classes had their FR values increase or decrease over 200 %. It means that adjacent triggering factors concern with landslide occurrence as much as the gap grows bigger, far from 1 in FR value. 7 out of 85 classes whose FR values were zero, titled as advent, were additionally included in training domain when adopting the radius of influence. It is the benefit to apply the radius of influence when the spatial data are distributed excessively sparse or dense, especially for the classes whose FR values become closer to 1. The more optimizing the radius of influence, the more powerfully the error can be mitigated so that overestimation can be relieved. The correction effect is highly likely in case of higher spatial resolution. Unless, it is useless to widen the radius of influence because most properties of domain are in same classes. Therefore, it is the effective way to apply the radius of influence when whole spatial database has similar spatial resolution.

Conclusions

In this study, we applied the radius of influence to assessments of landslide susceptibility in order to statistically analyze the importance of various triggering factors. For the sake of compiling precise LSI maps, a series of spatial analysis models was built to find the optimum radius of influence for predicting landslide hazards with greater accuracy. We adopted the widely used FR method as the statistical method for each radius of influence, and selected 12 triggering factors from geographical and geological properties. The radius of influence was increased in 30-m increments from zero to 300 m, enabling us to construct 11 layered LSI maps. During the validation stage, we determined the optimum radius of influence of the target area to be 240 m by comparing the AUC values for each radius of influence.

The most striking effect of applying the appropriate radius of influence is the large reduction in data exaggeration and anomaly errors. As the domain of the training sites increases, the triggering factors in adjacent areas can be included in the statistical analysis. Subsequently, we discovered that several classes of triggering factors are either over- or underestimated, or even omitted, when the area of influence is not included. Conversely, if the radius of influence increases, the accuracy of the LSI map declines due to redundant referencing.

The most critical change of process was FR analysis when adopting the radius of influence. 36 % of whole classes varied in FR values over 200 % and even 7 classes whose value were zero gained positive values. As the amount of increase or decrease in FR values are greater, it is implied that the classes of adjacent area have more influence on landslide occurred sites. 64 % of classes were independent from the radius of influence, especially whose spatial data were distributed relatively flat. It is less likely for the classes to concern with triggering landslide from the adjacent areas.

The classes with extreme FR values, such as wet land in land cover and >75.0 % in forest density, are highly related with geological properties where the landslide events occur, while 40–50 years >50 % in forest age and artificial hardwood forest in forest type are triggering conditions of landslide, especially with soil failure. 7 Classes which had rapid changes in FR values had been considered as extreme triggering factors without the radius of influence but not actually. It is the reason that the landslides spread and damage adjacent areas why the radius of influence should be considered to include those factors in the stage of statistical analysis. As a result, an LSI map that considers the optimum radius of influence is 5.95 % points more accurate than the map without. We anticipate finding more suitable values for the radius of influence in future studies.

The peak AUC value in our experiment is clearly apparent, and we stress that it is important to find the point at which AUC values cease to rise. If the radius of influence is set excessively high or low, the accuracy of the LSI maps will be reduced. The optimum radius of influence differs case by case. The number of sections and range of the radius of influence must be established in a manner that reflects each specific situation. If the spatial resolution of the database is insufficient or excessively fine, the area of influence might not be detected. With this in mind, it is prudent to apply the radius of influence to statistical models that incorporate not only FR analysis but also other methods, such as analytical hierarchy processes, bivariate statistics, logistic regression, and artificial neural networks.