1 Introduction

The Yukon Alaska Highway Corridor (YAHC; Fig. 1a) is exposed to several geohazards, such as permafrost degradation, flash floods, debris flows, and other types of landslides to name a few. Providing geological and geohazard information to decision-makers and stakeholders is a primary role for the Geological Survey of Canada (GSC) at Natural Resources Canada. Over the last decade, the GSC partnered with the Yukon Geological Survey, Simon Fraser University, and consultants to provide baseline geoscience information for this well-travelled transportation corridor.

Fig. 1
figure 1

a Physiographic setting of southern Yukon (modified from Mathews 1986; Huscroft et al. 2004; Blais-Stevens and Behnia 2016), b northern portion of YAHC where regional landslide susceptibility mapping was carried out (Blais-Stevens and Behnia 2016) and c present study area which is a smaller portion of the YAHC where random forest classification was applied for landslide susceptibility assessment

Following a landside inventory compilation displaying distribution maps along the YAHC (Blais-Stevens et al. 2010a) and regional qualitative landslide susceptibility modelling (Blais-Stevens et al. 2010b, 2012a; Couture et al. 2010), we tested the quantitative Flow-R method and a qualitative heuristic method for a smaller area near south Kluane Lake (Blais-Stevens and Behnia 2016) that had witnessed several debris flow events in the recent past (Evans and Clague 1989; Koch et al. 2014). In this present study, with higher-resolution topographic information, we aim to test the data-driven quantitative method called random forest. The study area is located north of Kluane Lake (Fig. 1c) within the Donjek River area, where three types of landslides are abundant. These are debris flows (DF), rockslides (RS), and active layer detachment slides (ALD). The latter are triggered in permafrost terrain. In comparison with our previous regional studies, which covered more than 25,000 km2 (Fig. 1a), this area covers a much smaller portion of 1800 km2 (Fig. 1c).

2 Physiographic setting and geology

The study area lies within the St. Elias Mountains and Yukon plateau (Fig. 1a; Mathews 1986; Huscroft et al. 2004). It is also crossed by an active fault called the Denali fault (Fig. 1a; Seitz et al. 2009). A detailed description of the physiographic regions shown in Fig. 1a is offered in Mathews (1986) and Huscroft et al. (2004).

Within the Donjek River area, the bedrock consists of a variety of sedimentary, volcanic, plutonic and metamorphic rocks from the Kluane Ranges within the St. Elias Mountains (< 150 Ma old; Gordey and Makepeace 2003; Huscroft et al. 2004). Furthermore, the rocks from the Yukon plateau are mainly composed of felsic intrusions (Gordey and Makepeace 2003; Huscroft et al. 2004). The surficial sediments are Pleistocene to Holocene in age. These are composed of till blankets and veneer, and ice marginal coarse-to-fine sediments. Steep slopes contain an abundance of colluvium and shallow slopes, coarse-to-fine-grained alluvium (Huscroft et al. 2004). The lower-lying areas are marked with several ponds resulting from thermokarstic erosion.

3 Climate and permafrost

Winters are long and cold and summers warm and short with mean temperatures ranging from − 27 °C in January to 27 °C in July in the YAHC. The area is characterized as subarctic to continental with a low mean precipitation of 340 mm per year (Huscroft et al. 2004).

The YAHC (Fig. 1a) lies within the transition zone from sporadic discontinuous permafrost to extensive discontinuous permafrost (Heginbottom et al. 1995). Moreover, geotechnical investigations by Foothills Pipelines (1979) indicate that permafrost is nearly continuous north of Kluane Lake, including the Donjek River study area (Fig. 1c). Furthermore, modelling studies by Bonnaventure et al. (2012) predict continuous permafrost in this area with greater probability of occurrence in the mountains. Thicker (> 20 m thick) and colder (approximately − 3 °C; Smith et al. 2015; Calmels et al. 2015) as well as ice-rich permafrost is more prevalent in the study area and further north-west to the Alaska border (Fig. 1a; Rampton et al. 1983; Huscroft et al. 2004).

4 Landslide inventory

For the entire YAHC (Blais-Stevens et al. 2010a, 2012a), we compiled the landslide inventory using Canadian National Air-Photos from the 1970s to mid-1990s and WorldView-2 high-resolution satellite imagery from 2010. From the 1600 landslides compiled for the entire YAHC (Fig. 1a; Blais-Stevens et al. 2014), a total of 368 landslide deposits: 83 debris flow deposits, 104 rock slides, and 181 active layer detachment slides occurred in this smaller study area (Fig. 1c). The debris flow deposits are located at the mouth of steep rivers or streams (Fig. 2a). Most deposits have been building as recurring events into large fans since deglaciation at the foot of the Kluane Ranges (Koch et al. 2014). Moreover, smaller historical events have been documented to have blocked the highway during intense storm events (Evans and Clague 1989; Lipovsky 2005; Blais-Stevens and Behnia 2016). Figure 2a displays an oblique Google Earth image of a debris flow deposit located just north of Donjek River (location 1 in Fig. 1c).

Fig. 2
figure 2

a Oblique Google Earth image of a debris flow deposit at the base of a mountain stream (yellow stippled line). It is located south of Donjek River labelled as location 1 in Fig. 1c. b Active layer detachment slides located north of Donjek River (Fig. 1c, location 2). Photograph taken looking west. Note these occur on steep, flat slopes at or near the tree line. c An oblique Google Earth image looking west which displays a rock slide (location 3 in Fig. 1c)

Active layer detachment slides are almost twice as abundant as debris flow deposits in this study area, but much smaller in volume. They are often triggered on steep, flat, south-facing slopes, where the permafrost’s active layer had time to thaw during the summer months which produces high pore water pressure. What is also noticeable is that many were triggered with their head scarp close to the tree line (Fig. 2b). Unlike the debris flow deposits and the rock slides, ALD have a shorter lifespan (see location 2 in Fig. 1c). We compared the same area with a high-resolution satellite image (WorldView-2) from 2010 and an air-photograph taken in 1995. Figure 3a, b compares the two images in time slices where ALD have either been completely revegetated, partially revegetated, or were absent in 1995, but present by 2010.

Fig. 3
figure 3

Active layer detachment slides shown on an air-photograph (1995; a) and WorldView-2 satellite image (2010; b). Note the two more recent events on top-right (A) are present in the satellite image, but absent in the air-photograph. The three ALD in the middle (B) display a fresh surface in 1995, but were partially revegetated by 2010. The two ALD in bottom left (C) have been completely revegetated by 2010

The rock slides can be quite large. Most have been triggered in outcrops of Carboniferous rocks consisting of tuff breccia, argillite, agglomerate, and basaltic and andesitic flows (Yukon Geological Survey 2014). Figure 2c represents an oblique Google Earth image of a rock slide triggered in Carboniferous bedrock (location 3 in Fig. 1c).

5 Input data

The criteria considered relevant to triggering landslides in the study area include slope angle, slope aspect, plan curvature, profile curvature, wetness index, proximity to drainage system, surficial geology, vegetation distribution, bedrock geology, proximity to major faults, and permafrost distribution. The first six variables were extracted from a high-resolution DEM (5 × 5 m generated by Natural Resources Canada’s Canadian Centre for Geospatial Information, 2014, using Canvec products and Topo to Raster tool in ArcGIS). All the data were used as grids with 5-m cell size.

The slope angle is considered one of the most important factors in initiating these types of landslides. We also considered slope aspect as a potential factor as the south-facing slope is more exposed to solar radiation. This, in turn, could contribute to either increased snow melt or permafrost thaw and subsequently to the drainage system. Plan curvature was also a factor included in the model as it reflects areas where debris can accumulate. Thus, we make the assumption that the more concave the plan curvature, the higher the probability of debris flow. We also included the profile curvature to investigate its potential role in influencing the three landslide types. The topographic wetness index (TWI), also called compound topographic index (CTI), is commonly used to quantify topographic control on hydrological processes (Sørensen et al. 2006). It reflects the soil moisture and is defined as ln(a/tan b) where a is the upstream contributing area and tan b is the local slope in radians (Sørensen et al. 2006). TWI was calculated in ArcGIS using the topography tool for ArcGIS (Dilts 2015). Furthermore, the drainage system was considered important as debris flows are triggered in steep streams. This data layer was generated by creating four buffer zones each 50 m around the drainage system. Sediment type was extracted from the 1:100,000 surficial geology map (Yukon Geological Survey 2014). The drainage system was used only in debris flow modelling. One important factor in triggering a rock slide is the type of bedrock. In addition to the lithologic properties of the bedrock, presence of discontinuities, such as joints, fractures, bedding planes, lithologic boundaries, and schistosity, can affect the deformability and strength of bedrock (Hencher 1987; Coe and Harp 2007). However, for this study, detailed structural characteristics were not available, only major fault distribution. Thus, the bedrock units and the major faults were extracted from the 1:250,000 bedrock geology map (Gordey and Makepeace 2003). To create the structural control, five buffer zones were generated each 50 m wide around the major fault structures based on the assumption that the proximal zones to major faults experience deformation. In addition, the permafrost data were extracted from the permafrost distribution probability map (Bonnaventure et al. 2012). The normalized difference vegetation index (NDVI) determined from LANDSAT-8 imagery, was utilized to characterize the vegetation distribution. Here, we made the assumption that the vegetation distribution, or lack thereof, is relatively similar in areas where landslides reoccur as for example, in a debris flow setting. For this reason, we included the NDVI parameter for debris flows. For the other two types of landslides, we also included this factor to investigate its potential contribution. Four ortho-rectified LANDSAT-8 scenes (acquired on 2013-08-11 and 2013-08-13) were downloaded from the USGS Global Visualization Viewer (http://glovis.usgs.gov). Radiometric calibration was first applied to change the DN values to reflectance. NDVI has a dynamic range from − 1 to + 1 (Chuvieco and Huete 2009) and is defined as:

$${\text{NDVI}} = ({\text{NIR }} - {\text{RED}})/({\text{NIR}} + {\text{RED}})$$
(1)

where NIR stands for near infrared. NDVI can provide information on the amount and type of vegetation. Barren rocks and snow have very low NDVI values (0.1 or less) and sparse vegetation such as shrubs and grass lands shows moderate NDVI (0.2–0.5). High NDVI values (0.6–0.9) indicate high density of green vegetation.

The bedrock geology (13 classes) and surficial geology units (18 classes) were not reclassified based on their types or importance; instead, the classes were used as they were mapped in the bedrock geology (Gordey and Makepeace 2003) or surficial geology (Yukon Geological Survey 2014) maps. The capability of random forest to work with different types of data enabled us to use the continuous data (i.e. slope angle, slope aspect, plan and profile curvatures, wetness index, and NDVI) in their original format rather than converted to discrete data. Categorizing continuous data may cause problems such as loss of information, power, and efficiency of data.

6 Random forest classification (RFC)

Random forest, originally developed by Breiman et al. (1984, 2001), is an ensemble of individual decision trees (Jakimow et al. 2015). A decision tree is composed of many nodes and edges, organized in a hierarchical structure and is used for decision-making (Criminisi et al. 2011). Each node is associated with a test, and the leaves contain the answers or decisions (Criminisi et al. 2011). Similar to other supervised classifiers, the tree structures are learned using the training data (e.g. Harris et al. 2015). Random forest, instead of individual decision trees, uses an ensemble of randomly trained decision trees. This is based on the assumption that the individual classifiers yield lower accuracy, but, when combined together, provide superior accuracy and generalization (e.g. Breiman 2001; Polikar 2006; Criminisi et al. 2011). Decision forests are mostly used for classification purposes, but they can be used in many machine learning problems such as regression, density estimation, manifold learning, and semisupervised learning (Criminisi et al. 2011).

The operator defines the number of trees in RFC. Only a subset of the total number of input variables, which is usually the square root or the binary logarithm (Log base 2), is randomly selected for each node in the tree (Jakimow et al. 2015; Harris et al. 2015; McKay and Harris 2015). The parent (or root) node is split into purer child nodes (right and left) through searching for the optimal split that maximizes the purity of the resulting node. An impurity measure, such as the Gini index or Shannon entropy (Louppe et al. 2013), is often calculated to determine the purity of child nodes compared to their parent nodes. Random selection of a fraction of total variables results in decreasing the correlation between the trees and also reduces the computational load of the algorithm (Gislason et al. 2006). The trees in the forest are grown (not pruned) to their maximum size (Breiman 2001 ), which further reduces the computational load and enables the random forest algorithm to handle high dimensions of data and use a large number of trees in the ensemble (Gislason et al. 2006).

Each predictor in the forest uses a bagging process (Breiman 1996, 2001), called bootstrapping (Harris et al. 2015; McKay and Harris 2015). For each tree, about 63% of the original training data (referred to as “in-bag”) are randomly selected and used in the classification (Jakimow et al. 2015; Harris et al. 2015). The remaining training data referred to as “out-of-bag” are used to validate the accuracy of the classification (out-of-bag error). The random sampling is repeated for all individual trees in the forest. Thus, each tree uses a new set of in-bag data resulting in an independent prediction. Multiple decision trees provide an ensemble of predictions, which vote for the most popular (majority) class (Harris et al. 2015). For every tree in the ensemble, out-of-bag data are used for prediction and the results over all trees are used to compute the error rate or out-of-bag error. The increase in the number of trees results in the out-of-bag error to decrease and converge to a threshold (Jakimow et al. 2015).

A probability of membership to each class (ranging between 0 and 1) is also generated (Jakimow et al. 2015; Harris et al. 2015). The probability value of an object for class A is 1 if it is classified as class A in all of the iterations and 0 if it is not class A in any of the iterations. RFC also calculates the importance (predictive power) of each variable used in the classification by producing a Mean Decrease Impurity importance (also called Mean Decrease Gini or Gini importance if the Gini index is used as the impurity function) or by measuring a Mean Decrease Accuracy (also called Permutation Importance) where the values of a variable are randomly permuted in the out-of-bag samples (Louppe et al. 2013). One of the advantages of RFC is that it can be applied on binary, categorical, or continuous data (Harris et al. 2015). The random forest method can generate good results with noisy data (Pal 2003; Jakimow et al. 2015) and does not overfit as more trees are added (Breiman 2001; Gislason et al. 2006).

The random forest method is relatively new in landslide susceptibility mapping. Catani et al. (2013) used this method for landslide susceptibility modelling focusing on the sensitivity and scaling issues. Micheletti et al. (2014) applied random forest for landslide susceptibility mapping in Switzerland. Youssef et al. (2016) used this method for landslide susceptibility mapping at Wadi Tayyah Basin in Saudi Arabia. In this study, we used imageRF (Waske et al. 2012; Jakimow et al. 2015), which is an add-on in the free EnMAP-Box and may be used in the IDL/ENVI software, to apply random forest classification and generate maps showing the probability (susceptibility) of debris flows, rock slides, and active layer detachment slides in the Donjek River area. ImageRF, developed by Jakimow et al. (2015), implements the machine learning approach of random forest™ (Breiman and Cutler 2011) and is mainly used for supervised classification and regression analysis of image data.

6.1 Application to Donjek River area

RFC uses a supervised learning algorithm and needs the classes to be defined before the classification. Each landslide type was modelled separately resulting in three susceptibility maps. Training areas for three landslide types were selected from the inventory maps. In order to reduce the spatial autocorrelation of the landslide samples and treat the small and large landslides equally (Petschko et al. 2014), it was decided to identify the landslides as a point instead of the whole area. The rock slide and active layer detachment slides include the initiation zone in the mapped landslide, so for these two types, the main scarp area of each landslide was used to select a random point as the landslide. For the debris flow deposits (DF), given that the inventory only includes the deposits and not the initiation zones, the source areas were defined by tracing 500-m uphill from the apex of the deposit, a polygon that included the potential source areas (see Blais-Stevens and Behnia 2016 for details). There are potentially other source areas uphill from these minimum distances, but our assumption is based on what we consider a realistic minimum limit for most of the steep streams. To define the potential source area (potential initiation zone), we extracted the stream network from the flow accumulation map and created the catchment basins up from the apex of each debris flow deposit. A 100-m buffer zone was also created around the streams. The areas covered by both catchment basins and streams within 500-m upstream from the debris flow deposits were selected as potential source areas (Blais-Stevens and Behnia 2016). The resulting source areas were used to randomly select a point within each polygon.

A total of 83 debris flow deposits, 181 active layer detachment slides, and 104 rock slides were used as training areas. Although in the random forest classification, each decision tree randomly selects 2/3 of the training data (in-bag data) for learning and keeps the remaining 1/3 (out-of-bag data) to calculate the out-of-bag error, we divided the training data into two parts and used only 2/3 of the data for classification. This was carried out by randomly selecting points in ArcGIS. The remaining 1/3 was not introduced into RFC and was only used after classification as an independent checking data set to calculate the classification and prediction accuracy of the susceptibility maps. Therefore, only 56 DF, 68 RS, and 119 ALD samples were used as training data and the remaining 27 DF, 36 ALD, and 62 RS samples were kept as independent checking data sets. In addition to the landslide sites, RFC requires another training set to represent the landslide free zones, i.e. non-DF, non-ALD, and non-RS samples. Selecting the landslide free zones is somewhat challenging as there is not complete certainty in defining stable zones. Considering that the landslide inventory is complete in the study area, we created a 200-m buffer zone around the landslides and used it as a mask to select non-landslide samples. A total of 362 points were selected randomly in the study area in areas outside of the landslide mask. To parametrize the random forest classifier, we selected three sets of training data for each type of landslides. We set the ratio of landslide samples and non-landslide samples to 1:1, 1:1.5, and 1:2, respectively (cf Van Den Eeckhaut et al. (2009) used stable zones five times the number of landslides in the Flemish Ardennes, Belgium), and repeated the classification for each data set to examine whether different sample ratios affect the RFC results. Only the results of two sample ratios, i.e. 1:1 and 1:2, are presented here (Table 1). Using sample ratios other than 1:1 may create imbalanced data sets, which results in lower accuracy for the minority class (He and Garcia 2009). However, since the occurrence of landslides is always less than the occurrence of non-landslides, we tested different ratios, i.e. 1:1.5 and 1:2, to find out whether there were differences in the classification. The RFC parameters used in this study include n (number of variables), which was set to binary logarithm of total number of variables and m (the number of trees), which was set to 1000–1500. We used the Gini index as the impurity function to calculate the impurity at each node and set the minimum number of samples in a node to 1 and minimum impurity to 0 as the stopping criteria to allow full growth of the decision tree. The classification accuracy is proportional to the number of trees (Rodriguez-Galiano et al. 2014; Jakimow et al. 2015; Harris et al. 2015). Once the out-of-bag error converges at a given number of trees, the change in the classification accuracy is minimal.

Table 1 Training and test data sets for three types of landslides used in random forest classification

The variable importance was calculated as follows (Jakimow et al. 2015): for each tree, the out-of-bag data (i.e. original data) were used to calculate the accuracy. Then, the out-of-bag samples were permuted for a given variable and the accuracy was calculated. The accuracies of the permuted out-of-bag data were subtracted from the accuracies of original out-of-bag data (Mean Decrease Accuracy; Louppe et al. 2013). Calculating the average of all trees provided the raw importance (predictive power) of the variable of interest.

In addition to the classification map, the probability of membership (confidence) to each class, i.e. each landslide type, was also generated by RFC, which shows the probability value for each pixel on the map. The classification map was used to calculate the accuracy of RF classification and area per cent classified as landslide, but the probability map was used as the susceptibility map for each landslide type.

6.1.1 Debris flow (DF)

The variables selected for DF random forest modelling include slope angle, aspect angle, proximity to drainage system, plan curvature, wetness index, surficial material, NDVI, and profile curvature.

The RF classification was repeated for the two sets of non-DF, DF samples explained in Sect. 6.1 and shown in Table 1. In both experiments, the out-of-bag error stabilized at 900–1100 trees. Figure 4a shows the learning curve (out-of-bag F1 accuracy when the sample ratio is 1:1). F1 accuracy is the arithmetic mean of class-wise F1 measures (Jakimow et al. 2015). F1 measure is the weighted harmonic mean of user accuracy (UA) and producer accuracy (PA) and is given for class i by: F1 i  = 2UA i PA i /(UA i  + PA i ). The F1 accuracies for DF and non-DF classes in the classification map are 75 and 76.67%, respectively. Increasing the number of non-DF samples (i.e. ratios 1:1.5 and 1:2) increases the accuracy for non-DF class, but decreases the accuracy for DF class (Table 2, Fig. 4b). The overall accuracy for the 1:1 and 1:2 sample ratio modes is 75.84 and 77.12%, respectively. The first model classifies 29.92% of the whole area as debris flow compared to 21.14% for the second model (Table 1). Figure 5a, b displays the debris flow susceptibility maps created using the sampling ratios of 1:1 and 1:2, respectively.

Fig. 4
figure 4

Out-of-bag accuracies for a sampling ratio (DF: non-DF) 1:1. The learning curve stabilized at 1100 iterations and b sampling ratio of 1:2. The learning curve stabilized at 900 iterations

Table 2 Out-of-bag (oob) accuracy for RF classification and class area percentage of DF classification maps
Fig. 5
figure 5

Debris flow susceptibility maps created using RFC. a Based on sample ratio of 1:1 and b based on sample ratio of 1:2. Model “a” displays a higher percentage of high susceptibility zones as observed in the inset maps for comparison

6.1.2 Active layer detachment slides

An active layer detachment is a shallow landslide in which the thawed or thawing portion of the permafrost (i.e. the active layer) detaches from the underlying frozen material (van Everdingen 1998). The parameters considered relevant to the initiation of active layer detachment slides include slope angle, slope aspect, wetness index, permafrost distribution, sediment type, and vegetation distribution. To investigate the potential role of plan and profile curvatures, these two parameters were also included in the model. A total of 181 active layer detachment slides were identified, 119 were selected to be used in random forest classification, and 62 were kept as independent checking data set. As discussed in Sect. 6.1, the main scarp area of each landslide was used to select a random point as the landslide sample. We repeated the experiment with the two sets of ALD/non-ALD sample ratios, i.e. 1:1 and 1:2 (Table 1), as with the DF classification. The out-of-bag error was stabilized at 1000–1300 iterations. The overall accuracy in the classification maps is 79.54 and 81.95% for the 1:1 and 1:2 sample ratios, respectively. Table 3 summarizes the out-of-bag accuracy of the two models. Figure 6a displays the learning curve for the model based on the sample ratio of 1:1, and probability/susceptibility map for this model is displayed in Fig. 7a.

Table 3 Out-of-bag accuracy for RFC classification and class area percentage of ALD and RS classification maps
Fig. 6
figure 6

Learning curve of random forest classification: a out-of-bag accuracy for active layer detachment slides stabilizes at about 1300 iterations when the sample ratio of ALD/non-ALD is 1:1, and b out-of-bag accuracy for rock slides for the model based on 1:1 sample ratio stabilizes at about 800 iterations

Fig. 7
figure 7

Susceptibility maps for the models based on the sample ratio of 1:1 for ALD (a) and RS (b). Inset maps display a zoomed in area for comparison

6.1.3 Rock slides (RS)

The variables considered to be important in triggering a rock slide include bedrock type, slope angle, and proximity to faults. We also included the wetness index, slope aspect, plan and profile curvatures, and NDVI to investigate the potential importance of these variables in distribution of rock slides. A total of 104 rock slides were used in the classification. Therefore, 65%, i.e. 68 rock slides, were used for the classification and 36 were kept for evaluation of the classification. As with the ALD samples, the RS samples were randomly selected from the scarp areas as points. The non-RS samples were also selected randomly in two sets as shown in Table 1. The out-of-bag error stabilized at about 800–1100 trees for both experiments. Figure 6b displays the learning curve for the model based on sample ratio of 1:1, and the corresponding probability/susceptibility map is shown in Fig. 7b. Table 3 summarizes the out-of-bag accuracy and area percentage of each classification map classified as rock slide. The same trend in accuracy change is seen for RS models. The overall accuracy is very close, 78.67 and 78. 3% for 1:1 and 1:2 sample ratios, respectively.

7 Results

Selecting different training data sets with varying ratios of landsides and non-landslides not only changed the out-of-bag accuracies, but also resulted in some differences in the susceptibility maps. The decrease in the number of non-landslides from the ratio of 1:2 to 1:1 resulted in the classification of a higher per cent of area into landslides in the classification maps. In debris flow models, the area classified as DF increased from 21.14 to 29.92% when the sample ratio changed from 1:2 to 1:1, respectively (Table 2). The same trend is seen in ALD and RS classification maps where the per cent of areas classified as ALD or RS increase from 13.74 to 20.06% and from 7.88 to 17.11%, respectively, when the sample ratios change from 1:2 to 1:1 (Table 3).

The variable importance for each RFC was also calculated as explained in Sect. 6.1. RFC calculates both raw and normalized values of variable importance. The normalized values are calculated by dividing the raw variable importance by the respective standard deviation (Jakimow et al. 2015). A high value indicates that the variable has a high importance or predictive power. Calculating variable importance for each model indicates that it is also sensitive to the sample ratio. Figure 8a, b displays the variable importance for DF models with sampling ratios of 1:1 and 1:2, respectively. There are differences in the rank of the variables in the two models, but slope angle, plan curvature, wetness index, drainage system, and NDVI show higher predictive power than the surface sediment type, profile curvature, and slope aspect in both models. A relatively higher predictive power was expected for the surficial sediment variable as it usually plays a role in initiation of a debris flow. The low variable importance for this factor may be due to lack of detailed information in the 1:100,000 surficial geology map. Slope aspect has the lowest value in all experiments of DF, indicating that it is likely not an important factor in triggering a debris flow in this study area (Fig. 8a, b). The high susceptibility zones occur mostly on colluvium, till, and alluvial deposits. The debris flow susceptibility maps display the southern flatter portion of the highway mostly as very low to low susceptibility areas (Fig. 5). There are several high susceptibility zones in the northern portion of the study area, occurring mostly on till deposits, which could be problematic if debris flow deposits were to reach the highway.

Fig. 8
figure 8

Ranked importance of the variables for DF classification models based on the a 1:1 sample ratio and b 1:2 sample ratio

Predictive powers of variables for ALD models show the least variability when using different sample ratios (Fig. 9a, b). The slope angle and slope aspect show the highest importance followed by wetness index, surface sediment type, and permafrost. Slope aspect shows a high predictive power as it was expected. This was based on the assumption that the south-facing slope is more exposed to solar radiation, consequently resulting in the increase of permafrost thaw and snow melt. The profile and plan curvatures have the least importance as also expected because most are triggered on flat steep surfaces. A higher value was expected for the permafrost variable since it is essential for triggering active layer detachment slides. However, there may be other factors that contribute to the triggering of ALD like the thickness of the White River volcanic ash, which is commonly observed at the top of the deposits (P. Lipovsky, oral communication 2010). Other contributing factors such as forest fires, and accumulation of heavy snow at the tree line, and permafrost conditions should be studied in more detail to shed light on other potential factors that could contribute to ALD distribution.

Fig. 9
figure 9

Predictive power of variables used in RF classification for ALD models. a Based on 1:1 sample ratio and b based on 1:2 sample ratio

NDVI also shows that it is not a significant factor in triggering an active layer detachment slide. This might be due to the different state of revegetation (Fig. 3a, b) reflecting different NDVI values. The high susceptibility zones in the ALD map occur mostly on the colluvium deposits. Although the ALD inventory used for training RFC was limited to south-east and north-west portions of the study area, near Donjek River, the ALD susceptibility map shows high susceptibility zones concentrated in the north-east and south-east of the area as well (Fig. 7a). The relatively short lifespan of the ALD may also misrepresent the frequency of these occurrences (Fig. 3a, b).

Predictive power of the variables used in RS models is shown in Fig. 10a, b for two sample ratios of 1:1 and 1:2, respectively. Bedrock type, wetness index, and slope angle have the highest predictive power in both models and are close in normalized values. Profile and plan curvatures, slope aspect, and NDVI show lower, but varying predictive power in different sample ratios. Despite the fact that structural characteristics of bedrock play an important role in triggering a rock slide, the fault distribution variable has the least and even negative value in this study. This is not surprising because as explained in Sect. 5, detailed structural characteristics were not available, but only the distribution of major faults, which cannot account for the local structural characteristics influencing the physical strength of bedrock. The high susceptibility zones in RS map are mostly concentrated in western side of the Alaska Highway within the tuff breccia, argillite, agglomerate, and basaltic and andesitic flows. The eastern side mostly shows moderate to low susceptibility except the areas close to Donjek River in north-eastern area where it coincides with quartz–chlorite–sericite–schists, quartzite, and limestones. A visual inspection of all susceptibility maps (and accuracy assessment discussed later in Sect. 7) indicates that there is a good agreement between the landslide inventory and the susceptibility maps.

Fig. 10
figure 10

Predictive power of variables used in RF classification for RS models. Based on a 1:1 sample ratio and b 1:2 sample ratio

8 Evaluation

About 1/3 of the whole training data set, i.e. 27 debris flow deposits, 62 active layer detachment slides, and 36 rock slides, was not used in random forest experiment (neither for learning nor for out-of-bag error), but were used as an independent checking data set to evaluate the resulting models. For each RFC model of three landslide types including different sets of sample ratios, an accuracy assessment was provided by constructing a confusion matrix between the classification map and the checking data sets and the overall accuracy and average F1 accuracy were calculated. Table 4 summarizes the average F1 accuracy for all the RFC models. The average out-of-bag F1 accuracy is also included for comparison. A lower accuracy is usually expected compared to the out-of-bag accuracy because these data were not used in the random forest model. However, the accuracies for all models are very close to out-of-bag ones and even for the ALD models, the accuracy is higher than the corresponding out-of-bag accuracy. The changes in the sample ratio do not affect the three landslide types equally. The DF and RS models show the highest accuracy when the sample ratio is 1:1. Conversely, for the ALD model, the highest accuracy is achieved when the sample ratios of 1:2 are used (Table 4).

Table 4 Accuracy of RFC for DF, ALD, and RS models

In addition, the efficiency of classification/prediction was examined for all susceptibility maps (Chung and Fabbri 2003; Blais-Stevens and Behnia 2016). Note that in order to calculate the above-mentioned accuracy measures, the classification maps for each landslide type were used; however, to assess the efficiency of classification, the probability/susceptibility maps are used and shown in Figs. 5a, b and 7a, b. The cumulative area from high to low landslide susceptibility for each map was plotted against the cumulative number of landslides for each type to generate the success rate and prediction rate curves (Figs. 11, 12). The success rate curves are based on the training data set i.e. 65% of the data used in establishing the model and determines how well the resulting susceptibility maps have classified the existing landslides. The prediction rate curves are based on the independent checking data set, i.e. 1/3 of the data and display how well the models predict the landslide occurrences. For a randomly generated prediction curve (the straight line), the area under the curve (AUC) is 0.50. For a classification or prediction to be of significance, the curve should be far above the straight line (Chung and Fabbri 2003). The efficiency of classification and prediction of the DF susceptibility maps based on the two sample ratios are very close to each other with slight differences. Figure 11a displays the results for the model based on sample ratio of 1:1. The AUC for the success rate and prediction rate curves for this model is 0.920 and 0.802, respectively. The model based on sample ratio 1:2 (Fig. 11b) provides lower success rate curve (AUC = 0.908) but a slightly higher prediction rate (AUC = 0.813).

Fig. 11
figure 11

Success rate and prediction rate curves for debris flow susceptibility maps with sample ratio of a 1:1 and b 1:2. There are only slight differences in the AUC between the two models

Fig. 12
figure 12

Success rate and prediction rate curves for a ALD susceptibility map when the ALD/non-ALD sample ratio is 1:1 and b for rock slide susceptibility map when the RS/non-RS sample ratio is 1:1

The efficiency of prediction for ALD models (based on the two sample ratios) is higher than those of the other two types of landslides and is very close to their classification rate. This is likely due to the higher number of existing ALD samples compared to DF and RS. Figure 12a displays the success rate and prediction rate curves for the ALD susceptibility maps for the sample ratios of 1:1. The AUC for success and prediction rate for the model based on 1:2 sample ratio is 0.948 and 0.921, respectively. The efficiency of classification and prediction of RS susceptibility map for the model based on sample ratio of 1:1 are shown in Fig. 12b. The two sample ratios provide nearly the same values for success rate and prediction rate curves for RS susceptibility maps.

9 Conclusion

Applying RF classification for landslide susceptibility mapping requires introducing non-landslide training samples as well as landslide samples. Selecting non-landslide samples may be challenging not only because there is not complete certainty in the absence of a landslide at a certain location, but also because the number of the non-landslide samples relative to the landslide samples, and the way the samples are selected, i.e. defining the landslide free zones, may affect the classification results. The landslides were mapped as points instead of the whole polygons to reduce the spatial autocorrelation of the landslide samples and treat the small and large landslides equally. The ALD and RS samples were randomly selected from the scarp areas of the landslides. The debris flow deposit samples were also randomly selected from the potential source areas as discussed in Sect. 6.1. Random forest classification results in two maps, i.e. classification and probability maps. The first was used for accuracy assessment, and the latter was used as the landslide susceptibility map. The change in landslide/non-landslide ratio, i.e. 1:1, and 1:2, resulted in differences in the per cent of area classified as landslide, the out-of-bag and F1 prediction accuracy of the resulting classification maps, and the variable importance. However, the AUC value of the models created for the susceptibility maps showed only slight differences due to varying sample ratios likely because only the landslide samples were used to assess the success and prediction rate of the susceptibility maps. As a whole, the difference between the susceptibility maps resulting from the two sample ratios was not significant possibly because the ratio of non-landslides to landslides was not very high (2:1).

Despite the slight differences, all resulting susceptibility maps for three types of landslides provide high F1 accuracy and high efficiency in classification and prediction, indicating that random forest has high capability in landslide susceptibility mapping, especially in areas with a relatively large number of mapped landslides. All accuracy measures are higher for ALD than the other two types. The prediction rate curves for ALD provide the highest accuracy very close to their success rate curves. This may be due to higher number of ALD deposits sampled relative to RS and DF deposits in the area. Detailed information on variables critical for each type of landslide such as surface sediment type (e.g. White River volcanic ash distribution), permafrost distribution, and bedrock structural data (e.g. distribution of minor faults and lineaments), would likely improve the quality of susceptibility maps.