Introduction

Recently there has been an increasing occurrence of landslides in Malaysia. Most of these landslides occurred on cut slopes or on embankments alongside roads and highways in mountainous areas. Some of these landslides occurred near high-rise apartments and in residential areas, causing great threat to many people. A few major and catastrophic landslides have also occurred within the last 10 years. These landslides have resulted in significant damage to people and property. In the area chosen in this study, Penang in Malaysia, much damage was caused on each of these occasions. The trigger for the landslides was a period of heavy rainfall, and, as there was little effort to assess or predict the event, damage was extensive. Through scientific analysis of landslides, landslide-susceptible areas can be assessed and predicted and thus landslide damage can be decreased through prevention effort. To achieve this aim, landslide susceptibility analysis techniques have been applied, and verified in the study area. In addition, landslide-related factors were also assessed. Geographic information system (GIS) software, ArcView 3.2, and ARC/INFO 8.1 NT version software packages were used as the basic analysis tools for spatial management and data manipulation.

The Penang area has suffered a lot of damage due to landslides, following heavy rains, and was selected as a suitable candidate to evaluate the frequency and distribution of landslides (Fig. 1). Penang is one of the 13 states of the Federation of Malaysia. The Penang area is on the northwest coast of the Malaysian peninsula. It is bounded to the north and east by the state of Kedah, to the south by the state of Perak, and to the west by the Straits of Malacca and by Sumatra (Indonesia). Penang consists of the island of Penang, and a coastal strip on the mainland, known as Province Wellesley. The island covers an area of 285 km2, and is separated from the mainland by a natural channel. The rainfall is quite evenly distributed throughout the year, with more rain occurring from September to November. Penang has a population of approximately one million people. The bedrock geology of the study area consists mainly of granite.

Fig. 1
figure 1

Study area map and landslide location map with hillshaded map

There have been many studies carried out on landslide hazard evaluation using GIS; for example, Guzzetti et al. (1999) summarized many landslide hazard evaluation studies, and many of these studies have applied probabilistic methods (Rowbotham and Dudycha 1998; Jibson et al. 2000; Luzi et al. 2000; Parise and Jibson 2000; Rautelal and Lakheraza 2000; Baeza and Corominas 2001; Lee and Min 2001; Temesgen et al. 2001; Clerici et al. 2002; Donati and Turrini 2002; Lee et al. 2002a, b; Rece and Capolongo 2002; Zhou et al. 2002; Lee and Choi 2003). One of the statistical methods available, the logistic regression method, has also been applied to landslide hazard mapping (Atkinson and Massari 1998; Dai et al. 2001; Dai and Lee 2002; Ohlmacher and Davis 2003), as has the geotechnical method and the safety factor method (Gokceoglu et al. 2000; Romeo 2000; Refice and Capolongo 2002; Carro et al. 2003; Shou and Wang 2003; Zhou et al. 2003). As a new approach to landslide hazard evaluation using GIS and data mining such as fuzzy logic, and artificial neural network methods have been applied (Ercanoglu and Gokceoglu 2002; Pistocchi et al. 2002; Lee et al. 2003a, b, 2004).

For the analysis of landslide susceptibility and for the assessment of the effect of each factor, landslide-related data have been collected and constructed to spatial database; landslide-related factors have been extracted and overlaid using frequency ratios; and landslide susceptibility maps have been made and verified.

Data gathering using GIS and remote sensing images

Accurate detection of the location of landslides is very important for probabilistic landslide susceptibility analysis. The application of remote sensing methods, such as aerial photographs and satellite images are used to obtain significant and cost-effective information on landslides. In this study, 1:10,000-scale to 1:50,000-scale aerial photographs were used to detect the landslide locations. These photographs were taken during the period of 1981–2000 and the landslide locations were detected by photo interpretation and verified by fieldwork. Recent landslides were observed in aerial photographs from breaks in the forest canopy, bare soil, or geomorphic characteristics typical of landslide scars, for example, head and side scarps, flow tracks, and soil and debris deposits below a scar. To assemble a database in order to assess the surface area and number of landslides in the study areas, a total of 541 landslides were mapped in an area of 293 km2.

Identification and mapping of a suitable set of instability factors having a relationship with the slope failures requires an a priori knowledge of the main causes of landslides (Guzzetti et al. 1999). These instability factors include surface and bedrock lithology and structure, dip and strike of bedding, seismicity, slope and morphology, stream evolution, groundwater conditions, climate, vegetation cover, landuse, and human activity. The availability of thematic data varies widely, depending on the type, scale, and method of data acquisition. To apply the probabilistic method, a spatial database that considers landslide-related factors was designed and constructed. These data are available in Malaysia either as paper or digital maps. The spatial database constructed is shown in Table 1.

Table 1 Data layer of study area

There were eight factors considered in calculating the probability, and the factors were extracted from the constructed spatial database, and transformed into a vector-type spatial database using the GIS, and landslide-related factors were extracted using our database. Using the topographic database, a digital elevation model (DEM) was created first. Contour and survey base points were extracted from the 1:50,000-scale topographic maps and the DEM was created with a resolution of 10 m. Using this DEM, the slope angle, slope aspect, and slope curvature were calculated. In addition, the distance from drainage was calculated using the topographic database. The drainage buffer was calculated at 100-m intervals. Using the geology database, the lithology was extracted, and the distance from lineament calculated. The lineament buffer was calculated at 100-m intervals. Landuse data was classified using a Landsat Thermatic Mapper (TM) image employing an unsupervised classification method and field survey. The 11 classes identified, such as urban, water, forest, agricultural area, and barren area, were extracted for landuse mapping. Finally, the normalized difference vegetation index (NDVI) was obtained from SPOT HRV satellite images. The NDVI value was calculated using the formula NDVI = (IR − R)/(IR + R), where IR value is the infrared portion of the electromagnetic spectrum, and R-value is the red portion of the electromagnetic spectrum. The NDVI value denotes areas of vegetation in an image.

The frequency ratio method and the relationship between landslides and factors

In general, to predict landslides, it is necessary to assume that landslide occurrence is determined by landslide-related factors, and that future landslides will occur under the same conditions as past landslides. On this basis, the relationship between areas where a landslide has occurred and landslide-related factors can be distinguished from the relationship between areas without past landslides and landslide-related factors. To represent this distinction quantitatively, the frequency ratio was used. The frequency ratio is the ratio of the area where landslides occurred in the total study area, and also, is the ratio of the probabilities of a landslide occurrence to a non-occurrence for a given attribute. In the case of landslide occurrence, if the landslide-occurrence event is denoted by “B”, and a given factor’s attribute is denoted by “D”, then the frequency ratio of D is the ratio of the conditional probabilities of B. Therefore, the greater this ratio is than unity, the stronger the relationship between landslide occurrence and the given factor’s attribute. The lower the ratio is than unity, the lesser the relationship is between landslide occurrence and the given factor’s attribute. To calculate the frequency ratio, a table was constructed for each landslide-related factor. Then, the area ratios for landslide occurrence and non-occurrence were calculated for each range or type of factor, and the area ratio for each range or type of factor to the total area was calculated. Finally, the frequency ratios for each range or type of factor were calculated by dividing the landslide-occurrence ratio by the area ratio.

The factors chosen, such as the slope, aspect, curvature, distance from drainage, lithology, distance from lineament, landuse, and vegetation index were evaluated using the frequency ratio method to determine the level of correlation between the location of the landslides in the study area and these factors. Probabilistic approaches are based on the observed relationships between each factor and the distribution of landslides.

The Table 2 shows the relationship between landslide occurrence and each factor. Topographic factors, such as slope, aspect, curvature, and distance from drainage were used. In the case of the relationship between landslide occurrence and slope, below a slope of 5°, the ratio was <1, which indicates a very low probability of landslide occurrence of 0.26. For slopes above 6°, the ratio was >1, which indicates a high probability of landslide occurrence. This means that the landslide probability increases according to slope angle. As the slope angle increases, then the shear stress in the soil or other unconsolidated material generally increases. Gentle slopes are expected to have a low frequency of landslides because of the generally lower shear stresses associated with low gradients. Steep natural slopes resulting from outcropping bedrock, however, may not be susceptible to shallow landslides. In the case of the relationship between landslide occurrence and aspect, landslides were most abundant on south-facing and northeast-facing slopes. The frequency of landslides was lowest on east-facing, west-facing, and northwest-facing slopes, except in flat areas. In the case of the relationship between landslide occurrence and curvature, the more positive or negative a value is, the higher the probability of a landslide occurrence. Flat areas had a low curvature value of 0.60. The curvature values represent the morphology of the topography. A positive curvature indicates that the surface was upwardly convex at that grid. A negative curvature indicates that the surface was upwardly concave at that grid. A value of zero indicates that the surface was flat. The reason for this is that following heavy rainfall, a convex or concave slope contains more water and retains this water for a longer period.

Table 2 Frequency ratio of factors to landslide occurrences

Analysis was carried out to assess the influence of drainage lines on landslide occurrence. For this purpose, the proximity to a drainage line was identified by buffering. In the case of the relationship between landslide occurrence and distance from drainage, as the distance from a drainage line increases, the landslide frequency generally decreases. At a distance of <400 m, the ratio was >1, indicating a high probability of landslide occurrence, and at distances >600 m, the ratio was 0, indicating zero probability. This can be attributed to the fact that terrain modification caused by gully erosion and undercutting may influence the initiation of landslides.

In the case of the relationship between landslide occurrence and lithology, the frequency ratio was higher in granite areas, at 1.25, and was lower in alluvium areas, at 0.53. In the case of the relationship between landslide occurrence and distance from a lineament, the closer the distance was to a lineament, then the greater was the landslide-occurrence probability. For distances to a lineament of <800 m, the ratio was >1, indicating a high probability of landslide occurrence, and for distances to a lineament of >800 m, the ratio was <1, indicating a low probability landslide occurrence. This means that the landslide probability decreases with increasing distance from a lineament. As the distance from a lineament decreases, the fracture of the rock increases, and in addition, the degree of weathering increases.

Using satellite images such as Landsat TM and SPOT HRV images, landslide-related factors were extracted, such as landuse and the NDVI. In the case of the relationship between landslide occurrence and landuse, for landslide-occurrence, values were higher in scrub, rubber plantation, and mixed areas; and lower in rice growing, swampy, coconut plantation, barren, and oil palm plantation areas. The reason for this is that landslides occurred mainly in inclined and mountainous areas. In the case of the relationship between landslide occurrence and NDVI, for NDVI values below 0.20, the frequency ratio was <1, which indicates a low landslide-occurrence probability, and for NDVI values above 0.20, the frequency ratio was >1, indicating a high landslide-occurrence probability. This result means that the landslide probability increases according to the vegetation index value.

Landslide susceptibility analysis

The calculated and extracted eight factors were converted to a 10×10 m2 grid (ARC/INFO GRID-type). In the study area, the total grid number was 2,928,378, and the landslide-occurrence grid number was 541. Using GIS software, the grids were overlain for study area. Then, using univariant probability analysis and the frequency ratio method, the spatial relationship between the landslide location and each landslide-related factor was analyzed. The correlation ratings were calculated from analysis of the relationship between the landslides and the relevant factors. Therefore, the rating of each factor’s type or range was assigned from the relationship between a landslide and each factor’s type or range, that is, the ratio of the number of grids where landslides did not occur to the number of grids where landslides had occurred, as shown in Table 2. The relationship was used to determine each factor’s rating in the overlay analysis. A factor’s ratings were summed to form the landslide susceptibility index and susceptibility map. The landslide susceptibility index (LSI) was calculated by summation of each factor’s ratio value using Eq. 1:

$$ {\text{LSI}} = \sum {{\text{Fr}}} $$
(1)

where Fr is the rating of each factor’s type or range.

After calculations using Eq. 1, in the case where all factors were used, the LSI had a minimum value of 2.26, and a maximum value of 15.90, with an average value of 8.14 and a standard deviation of 2.40. The other cases are shown in Table 3. In the case of all factors used, the distribution of LSI is showed in Fig. 2 as landslide susceptibility map. The LSI values were classified using equal areas, and grouped into five classes.

Table 3 Statistics of LSI value for all cases
Fig. 2
figure 2

Landslide susceptibility map based on frequency ratio, using all factors

Verification and effect analysis

Effect analysis studies show how a solution changes when the input factors are changed. If the selected factor results in a relatively large change in the outcome, then the outcomes is said to be effective to that factor. Effect analysis quantifies the uncertainty of each factor. The factors that have the greatest impact on the calculated landslide susceptibility map can therefore be identified using effect analysis.

In this work, the effect analyses were conducted by exclusion of each factor in turn during the summation stage using Eq. 1, and the effect of each factor was evaluated. That is, the susceptibility maps using Eq. 1 were verified using an existing landslide location. All eight factors were used, and the LSI values of nine cases including all factors were calculated. For the verification, the method was subjected to tests to determine whether its predictions matched the expected results based on knowledge of the factors, i.e., the authors carried out an effect analysis in which the model system was subjected to various selections of factors, and the outputs were compared with expected changes in the outputs. Rate curves were created to achieve this. To obtain the rate curves, the calculated landslide susceptibility index values of all grids in the study area were sorted in descending order. Then, the ordered grid values were divided into 100 classes, with accumulated 1% intervals. The rate curves explain how well the method and factors predict landslides. To compare the results, the areas below the curves were calculated and re-calculated. A total area = 1 denotes perfect prediction accuracy for all cases. The area below a curve can be used to assess the prediction accuracy qualitatively. The rate verification results appear as a line in Fig. 3 and in Table 4. For example, in the case where the slope was excluded, 10% of the study area where the landslide susceptibility index had a higher rank could explain 23% of all the landslides. In addition, 30% of the study area where the landslide susceptibility index had a higher rank could explain 55% of the landslides.

Fig. 3
figure 3

Cumulative frequency diagram showing landslide susceptibility index rank (x-axis) occurring in cumulative percent of landslide occurrence (y-axis)

Table 4 Area of below the curve

From the verification of the landslide susceptibility maps by effect analysis (Table 4), in sequence, the land cover, aspect, slope, distance form lineament, distance from drainage, NDVI and curvature (areas below the curve of 0.653, 0.679, 0.681, 0.683, 0.683, 0.686 and 0.690, respectively) have a positive effect (influence) on the landslide susceptibility map using all the factors (area below curve = 0.718). In contrast, lithology (areas below the curve = 0.721) had a very small negative influence on the landslide susceptibility map using all factors (area below curve = 0.718). This is because the lower the value of the area below the curve, the greater effect the factor has on the landslide susceptibility map. All factors except lithology have a positive effect on the landslide susceptibility map. In contrast, because a higher area below the curve means a more negative effect of the factor on the landslide susceptibility map, lithology has a very little negative effect on the landslide susceptibility map. The reason is not that the lithology is not important, but that the lithology data is too regional and simple in the study area.

Conclusions and discussion

Landslides are among the most hazardous of natural disasters. Government and research institutions worldwide have attempted for years to assess landslide hazards and risks, and to show their spatial distribution. Landslide susceptibility maps have been constructed using the relationship between each landslide and causal factors. In this study, a probabilistic approach to estimate susceptible areas to landslides using GIS and remote sensing is presented. Moreover, using effect analysis, the influence of factors on the landslide susceptibility map can be known qualitatively, and the selection of positive factors can improve the prediction accuracy of the landslide susceptibility map. This means that the selection of factors is important to landslide susceptibility mapping. The ratio value from the effect analysis can be used to weight the relative importance of these factors, and can improve the prediction accuracy of the landslide susceptibility map.

In this study, only susceptibility analysis was performed, because the small area studied did not allow the determination of the distribution of any rainfall. However, if data on factors causing the landslides, such as rainfall, earthquakes, or slope cutting exist, then possibility analysis can also be carried out. If the factors relevant to the vulnerability of buildings and other property are available, then risk analysis on this can also be carried out. These results can be used as basic data to assist slope management and land-use planning, but the methods used in this study are also valid for generalized planning and assessment purposes, although they may be less useful on the site-specific scale, where local geological and geographic heterogeneities may prevail. For the method to be more generally applied, more landslide data are needed.