1 Introduction

Landslide susceptibility analysis is important to researchers concerned with the contribution of hillslope processes to basin-wide sediment transfers over a variety of scales (e.g., Hovius et al. 2000; Martin 2000; Korup 2009) and those concerned with landslide hazard and mitigation (e.g., Guzzetti et al. 1999; Huabin et al. 2005). Landslides involve the downslope movement of rock and/or overlying weathered material (e.g., weathered regolith, soil etc.), triggered by a variety of mechanisms including earthquake activity or high rainfall intensities (Cruden 1991). Landslide analyses that assess safety factors, based on a comparison of driving and resisting stresses, are a powerful approach for investigating landslide susceptibility as their basis lies in mechanical theory. However, such approaches are not generally feasible when assessing landslide susceptibility over large regional landscapes (Baeza and Corominas 2001), as the necessary information for input/controlling variables is most often not reasonable to obtain beyond the scale of individual landslides and hillslopes. Therefore, the most common approach to investigate landslide susceptibility over larger scales involves the collection of landslide inventories with large numbers of events, and for which regional data for controlling variables are available. Such data sets can then be evaluated using various geomorphic and statistical analyses to make inferences about landsliding susceptibility across a region (e.g., Hovius et al. 1997; Martin et al. 2002; Guzzetti et al. 2002; Malamud et al. 2004).

Since landslides most often occur in steep, rugged terrain, the acquisition of landslide databases was difficult prior to the advent of remote sensing technologies. Since that time, the collection and analysis of landslide inventories based on remote sensing and/or GIS methodologies has been a focus of research by government agencies and university researchers. Landslide identification can be undertaken by field survey, aerial photographic interpretation and/or automated remote sensing approaches (e.g., Hovius et al. 1997; Martin et al. 2002; Barlow et al. 2003; Brardinoni and Church 2004). The quality and particular strength of a landslide inventory is affected by the operator’s expertise and experience, and factors such as the scale of aerial photography. Landslide inventories are often analyzed in conjunction with environmental variables to determine factors associated with landslide initiation and landslide susceptibility. While some analyses of landslide initiation and susceptibility have been undertaken without the use of rigorous statistics (in particular, this was the case for databases collected and analyzed prior to major advances in GIS technologies), many studies now make use of various multivariate statistical techniques, including logistic regression (e.g., Dai and Lee 2002; Ohlmacher and Davis 2003) and discriminant function analysis (e.g., Baeza and Corominas 2001; Jamaludin et al. 2006). Discriminant function analysis, the multivariate technique employed herein, has been applied in a number of different studies, with variable success in its application (e.g., Rice and Pillsbury 1982; Carrara 1983; Baeza and Corominas 2001; Ardizzone et al. 2002; Santacana et al. 2003; Jamaludin et al. 2006). Discriminant analysis explores the ability of combinations of variables to identify differences between the grouping variable (i.e., failed vs. unfailed slopes). In particular, stepwise discriminant analysis involves rigorous statistical assessment of variables as they are either added to or removed from the analysis. The primary objective is identification of the optimal linear combination of variables that best predicts the grouping of the dependent variable.

The present study considers landslide susceptibility in Haida Gwaii (formerly known as the Queen Charlotte Islands), British Columbia, Canada. Although this region has been spared the devastation (ranging from loss of life, to effects on transportation routes, utilities and other infrastructure) that often accompanies landsliding in other steep coastal mountainous regions (Hungr 2004), landslide susceptibility is of central concern to maintaining the ecological integrity and to understanding the nature of sediment transfers in both natural and logged hillslopes in this region. Our analysis is based on the exceptional landsliding inventory collected by Gimbarzevsky (1988) for the Canadian Forestry Service. The original landsliding inventory covers an area of ~10,000 km2, includes approximately 8,300 landslides, and is based on 1:50,000 aerial photographs. Gimbarzevsky did not distinguish mass movements by failure type, collectively referring to all events as “landslides”. The identified landslides include debris slides, debris flows, debris avalanches, rockslides and avalanche tracks. Compared to other published landslide inventories in coastal British Columbia, that typically utilize aerial photography at scales ranging from 1:11,000 to 1: 20,000 (e.g., Rood 1984; Jakob 2000), the Gimbarzevsky database is not expected to provide a comparable record of smaller, more frequently-occurring landslides. However, given the large areal coverage of the Gimbarzevsky database relative to other studies, which most often have areal coverages of order 102 km2, this database provides an improved statistical representation of medium and large, less frequent landslide events. Unfortunately, the original database was not published in the refereed literature. For this reason, it did not receive the exposure that this work warrants, despite representing a valuable contribution to the field of hillslope geomorphology and being a key regional inventory of landsliding that provides a unique record of medium to large landslides. GIS tools were not available to Gimbarzevsky (1988) and, therefore, we digitized his original hand-mapped landslide inventory, connecting each landslide vector to a 25-m DEM of Haida Gwaii. To undertake our discriminant function analysis, a number of landscape and environmental variables associated with landsliding initiation were converted into GIS coverages.

2 Study area

Haida Gwaii, a scimitar shaped archipelago, is located approximately 80 km west of the central coast of British Columbia in western Canada (Fig. 1). Haida Gwaii covers an area of approximately 10,000 km2, with Graham Island (6,671 km2) and Moresby Island (2,405 km2), accounting for more then 90% of the total area. Three distinct physiographic regions characterize the islands; (1) Queen Charlotte Ranges in the southwestern region, consisting of steep, mountainous terrain; (2) Skidegate Plateau in the central region of Graham Island, consisting of mountainous and hilly topography; and (3) Queen Charlotte Lowlands in the northeastern portion of Graham Island, which are relatively flat. Low elevations dominate the gently rolling hills in the northeastern lowlands, in contrast to the rapid elevation gain on the western side, where the Queen Charlotte Ranges extend up to roughly 1,160 masl in the southern portions of the islands.

Fig. 1
figure 1

The three major physiographic subdivisions for Haida Gwaii, British Columbia. Discriminant analysis was only performed on data from Graham Island. The area outlined by the thick line represents the area mapped in later figures and was chosen to optimize the scale for cartographic viewing. Source: Gimbarzevsky (1988)

Bedrock in Haida Gwaii is a mixture of sedimentary, volcanic and intrusive igneous rocks, and is commonly overlain by glacial deposits of Pleistocene age (Sutherland Brown 1968). During the most recent Wisconsin glaciation, Haida Gwaii developed an independent ice cap that deglaciated earlier than mainland British Columbia, beginning at 16,000 BP and finishing by about 13,500 BP (Barrie and Conway 2002). Glaciation resulted in abundant deposition of till and other glacial sediments (i.e., glacial lake sediments, alluvial fans, valley fill). The boundary between relatively impermeable till and the surficial weathered till/soils is critical to instability on hillslopes in the region (Martin et al. 2002). Impermeable till layers prevent rain or snowmelt from infiltrating to depth and may thus increase saturation levels of the overlying materials. Despite the significance of glacially-emplaced sediments in the region, in some instances soil cover may be derived from local bedrock. Of particular interest in this region are Folisols (upland organic soils; see Soil Classification Working Group 1998 for further details) as these soils are often found in the initiation zones of landslides (Campbell et al. 2010; Nagle 2000).

The diverse topography affects the distribution of precipitation, resulting in large disparities between proximate regions. Generally, mean annual precipitation ranges from about 1 200 to 1,500 mm a−1 for more easterly locations and can reach up to 4,000 mm a−1 along the windward western coast. Temperate coniferous forests dominate, with tree species including western hemlock, sitka spruce, western red cedar, yellow cedar and red alder. Species are distributed throughout the landscape in four biogeoclimatic zones including the western hemlock zone, coastal cedar-pine-hemlock zone, mountain hemlock zone and alpine tundra zone (Banner and Pojar 1982).

According to the 2006 census, approximately 5,000 individuals reside in Haida Gwaii, focused in several prominent centers. The economy of the region is largely resource driven, with forestry accounting for the largest portion of the economic sector at 19%. With such a large portion of the economy driven by resources of the island, there is a clear need for improved understanding of the spatial extent and magnitude of landslide susceptibility in the region.

3 Gimbarzevsky landsliding inventory

The database collected by Gimbarzevsky (1988) is remarkable in terms of its extensive areal coverage and the large number of landslides identified. The original landsliding inventory includes 8,328 landslides covering an area of about 9,946 km2. Landslides were identified through visual analysis of 1:50,000 and 1: 60,000 aerial photography flown in 1979 and 1980. To fill a limited gap in the coverage, aerial photographs taken in 1976 at a scale of 1:63,000 were also used. The landslides identified in this earlier study include debris slides, debris flows, debris avalanches, rockslides and avalanche tracks. Movements were not distinguished by failure type, and were collectively referred to as “landslides”. The events identified on the aerial photographs were plotted as vectors onto 1:50,000 NTS maps (Fig. 2). The full areal extent of each landslide was not recorded; rather, the linear extent of each landslide was recorded in vector format; such vectors could include the initiation point, flow path and run out, although it is not always clear which zones were included in each particular case. Gimbarzevsky estimated a threshold length of slope failure able to be identified as ~100 m, based on a minimal mappable unit of roughly 2 mm. This threshold is greater than for other regional landslide inventories that utilize larger-scale aerial photographs. This mapping resolution will have resulted in under-sampling of landslide scars having a magnitude below this critical threshold. In addition, as is the case for other studies, underestimation of smaller landsliding events also likely exists due to dense forest cover and shadows created by the steep mountainous topography (Gimbarzevsky 1988). However, due to the less frequent temporal and spatial occurrence of medium to large events, the large areal extent of this landslide inventory will help to improve their statistical representation relative to many regional databases that rely on larger-scale aerial photographs and cover a much reduced area.

Fig. 2
figure 2

Example of an NTS map sheet with Gimbarzevsky’s original inventory used in this study

When examining the original data set, it was found that several of the original NTS maps of landslide vectors were missing for areas of notable extent on Moresby Island. Hence, our final digital coverage for both islands contains 6,600 landslide vectors, rather than the original 8,328 landslides identified in Gimbarzevsky (1988). The original map sheets were scanned, georeferenced and then reprojected to the British Columbia government projection standard of Albers Conic Equal Area projection using PCI Geomatica. The georeferenced files were then mosaiced and landslide scars were subsequently digitized and saved as a shapefile.

The landslide vectors were segregated to identify ‘singular’ and ‘multiple’ initiations. The former refers to an initiation that consists of a single vector that does not split or converge with other initiations, and the latter refers to landslide initiations that either: (1) separate into multiple paths from one initiation location; or (2) begin as multiple initiation points that converge to form a single flow path. A total of 312 multiple path landslides were identified for Graham Island and 416 multiple path landslides were identified for Moresby Island. Based on the visual evidence of diverging flow paths, these features are most often debris flows.

4 Landsliding frequencies and scaling effects

The landsliding data set of Gimbarzevsky best represents medium to large landslides, because of the limitations associated with the relatively small scale of the aerial photographs used for landslide identification. The spatial coverage of studies utilizing larger-scale aerial photographs is usually more restricted, thus limiting adequate statistical representation of larger, more infrequent landslides. Therefore, the Gimbarzevsky database represents a unique contribution to landslide inventories collected for the western coast of North America.

Several approaches were followed by Gimbarzevsky (1988) to assess the magnitude threshold for landslides that were identified and how this impacted his final database. To test the amount of missing information in his landslide inventory (based on 1:50,000 scale photographs), he further analyzed six watersheds using 1:10,000 scale panchromatic and color-infrared photography (flown in 1981 with 300 mm focal length cameras) and 1:12,000 color-negative prints (obtained in 1982 using a 152 mm camera). Most magnitude-frequency curves for landslide databases show that small landslides constitute by far the greatest numbers of landslides in the distribution. Therefore, it is not surprising that Gimbarzevsky identified much higher numbers of total failures in this second analysis, largely due to the significant number of smaller landslides. Gimbarzevsky also found that color photography was preferred for identification of small failures, old revegetated debris slides, and debris flows. On average, the number of slope failures identified on these medium scale photographs was 27% higher than for 1:50,000 scale aerial photographs. The failures most often missed on the smaller-scale photographs were numerous slides less than 100 m long (<2 mm on the photograph; 2 mm represents the lower limit of landslide length for identification on the 1:50,000 aerial photographs), small debris flows, soil slumps along logging roads, and portions of torrented streams obscured by vegetation.

Other existing regional landslide inventories for coastal British Columbia have generally used larger-scale aerial photographs for identification than the Gimbarzevsky study, and generally cover smaller areas (e.g., Jakob 2000; Guthrie 2002; Martin et al. 2002; Brardinoni and Church 2004; Guthrie and Evans 2004). Although such databases produce a good sampling of small and medium landslides, they may not provide a solid statistical representation of either the very smallest or the larger landslides. For example, Brardinoni and Church (2004) emphasize that databases relying on aerial photographs at these scales do not consistently identify the full spectrum of the very smallest, most frequent landslides (they considered debris slides and debris flows). They undertook field investigations to quantify the numbers of smaller landslides missed by medium-scale aerial photograph analysis (i.e., those having volumes <~2,000 m3). In particular, field identification was much superior to aerial photographic analysis for events of magnitude less than about 400–500 m3. That being said, the exclusion of these very small events, which are only consistently observable in the field, was not found to unduly affect total erosion values given their small areal extent. In addition, Brardinoni and Church (2004) noted a break in their power-law scaling for events greater than about 4,000 m3; they believed that this may have resulted from inadequate representation of larger, rare events in a study of limited areal extent.

Other landsliding inventories for Haida Gwaii include studies by Schwab (1983, 1988, 1998) and Rood (1984, 1990). A landsliding inventory undertaken for Haida Gwaii by Rood (1984, 1990) involved compilation and analysis of 1,337 landslides (including both debris slides and debris flows) (see also Martin et al. 2002 for analysis of this database). The landslide inventory of Rood utilized larger-scale photographs (1:11,000–1:13,000 vs. 1:50,000 for Gimbarzevsky 1988), and covered a much-reduced area (~350 km2) relative to Gimbarzevsky (~10,000 km2). While the Rood landsliding inventory contains more detail about individual events due to the larger-scale aerial photographs used in the analysis, the large number of identified failures and areal coverage of the Gimbarzevsky database make it a powerful platform for the examination of medium to large landslides in the region. The contrasting scale of the aerial photographs used in these two landsliding databases for the same region introduces interesting questions about scaling effects of landsliding identification. When making comparisons, it should be noted that in addition to the differences in the scale of aerial photographs and areal coverage, Rood only identified debris slides and debris flows, in contrast to the greater number of event types identified by Gimbarzevsky (see earlier discussion).

For comparison, landsliding frequencies based on the original results of Gimbarzevsky (1988) are compared to those of Rood (1984, 1990). Gimbarzevsky (1988) obtained a landsliding value of 0.84 events km−2 if the entire area of Haida Gwaii is considered. When the landslide events are attributed to only the “active” portion of the landscape (based on UTM grid cells with identifiable landslide activity, or about 32% of total landscape area), the value is 2.64 events km−2. It should be noted that within this “active” portion of the landscape (3,153 1-km2 grid cells), about 9–10% of these cells had been disturbed by logging. For the combined unlogged and logged terrain in his study, Rood (1984, 1990) obtained a landsliding frequency of 3.82 events km−2. When landslide activity is attributed to only the “steepland” portion of his basins (slopes > 20°), the value becomes 7.23 events km−2. The logged portion of the landscape is about 13% for both the entire landscape area and the steepland portions only; this is roughly comparable to the amount of logging reported in Gimbarzevsky (1988). It should also be noted that Gimbarzevsky’s “active” area is not equivalent in definition to Rood’s steepland area, but it should provide at least somewhat comparable values. Rood estimated that landslide scars remained visible on his aerial photographs for about 40 years, whereas Gimbarzevsky (1988) did not address this issue directly in his report.

For comparative purposes, the frequency distributions of landslide lengths for the Gimbarzevsky and Rood databases are shown in Fig. 3 and Table 1. The length values for Rood include the transport and runout zones; the length values of Gimbarzevsky (1988) may also include some portions (or all) of the deposition zone (although no specific information on this is available). These results demonstrate that Rood’s database is particularly effective at capturing the smaller landslides (0–100 m in length), whereas the Gimbarzevsky database more adequately captures the less frequent landslides of greater length.

Fig. 3
figure 3

Landslide lengths for the Rood (1984, 1990) and Gimbarzevsky (1988) landslide inventories

Table 1 Frequency distributions of landslide lengths for the Gimbarzevsky (1988) and Rood (1984, 1990) databases

Based on the above, Rood consistently obtained a higher landslide frequency than Gimbarzevsky, despite the study being somewhat more restrictive in the types of failures that were identified. The primary reason for these differences may be attributed to Rood’s ability to identify the small, more frequent events using his scale of aerial photographs. The smallest landslides identified by Rood had volumes of about 200 m3 (~10 m in length), and his largest identified landslides were about 16,000 m3 (~750 m in length). Rood’s inability to identify a full sampling of larger, infrequent failures is not expected to affect his summary landsliding frequency rates significantly, as only numbers of events are considered in these frequencies, not the magnitude of the events. If total area affected by landslides is considered, then the poor spatial sampling of the largest landslides would be more problematic. The fact that Gimbarzevsky identified a greater number of failure types does not seem to counterbalance the lack of smaller events in his database.

5 Mechanical theory and variables for discriminant function analysis

5.1 Introduction

Landscape and environmental variables incorporated into discriminant function analysis should be selected on the basis of a clearly stated rationale; careful consideration of the variables in the development stages of a study allows for a more clear interpretation of results. A well-developed body of mechanical theory exists for most of the landslide types identified by Gimbarzevsky. Landslide initiation in all cases results from the interplay of driving stresses and resisting stresses acting on a particular hillslope location, the net result of which is defined by the safety factor:

$$ {\text{Safety}}\,{\text{Factor}} = \frac{{{\text{Shear}}\,{\text{Strength}}}}{{{\text{Shear}}\,{\text{Stress}}}}. $$

Unfortunately, it is often the case that the input variables required for mechanically-based landslide equations are not available over the large spatial extents and resolution associated with regional landsliding susceptibility. Hence, the actual variables incorporated in GIS-based landsliding analyses are most often generalized variables that encapsulate some aspect of mechanical theory, and that can be derived from existing digital elevation models (DEMs) or from other existing data sources (e.g., geological databases, regional climatological data).

The following variable coverages were selected as possibilities for incorporation in our discriminant function analysis: slope, profile curvature, plan curvature, elevation, precipitation, specific catchment area, aspect, geology and distance to fault lines. We now introduce basic mechanical theory for several landslide types, and discuss the variables from this list that might be used to represent aspects of the underlying mechanical theory in our regional analysis.

5.2 Debris slides

According to the Coulomb-Terzaghi criterion, slope instability for debris slides occurs when shear stress (τ) exceeds shear strength (S.S.) of the soil/regolith layer (Selby 1993). Shear stress (N m−2) is given by the equation:

$$ \tau = \rho_{\text{s}} g\,z\sin \theta \cos \theta $$

where ρ s is density of soil/regolith (kg m−3), g is gravitational acceleration (m s−2), z is soil depth (m), and θ is slope angle (°). The shear strength (N m−2) of the soil is defined as:

$$ {\text{S}} . {\text{S}} .= c_{\text{S}} + c_{\text{R}} + \sigma^{\prime } \tan \phi $$

where c S is soil cohesion (N m−2), c R is tree root cohesion (N m−2), σ′ is effective normal stress (N m−2) and tanφ is angle of internal friction (dimensionless). The effective normal stress (σ′) in the above equation is:

$$ \sigma^{\prime } = (\rho_{\text{S}} - \rho_{\text{W}} M)g\,z\cos^{2} \theta $$

where ρ W is density of water (kg m−3), M is the ratio of the height of the piezometric surface above the base of the soil (h) to the total vertical soil thickness (z).

Many of the variables required in these equations (i.e., angle of internal friction, height of piezometric surface for specific triggering events) are not generally available at the scales typical of landsliding inventories. However, it is possible to obtain coverages for several variables that are related to various aspects of the mechanical theory, and that are available at the regional scale.

Shear stress and normal stress are related to slope geometry. In particular, slope gradient, readily obtainable from DEMs, is a major factor in determining which of these stresses will be dominant over the other. Distance to fault lines may be an important variable for landsliding analysis, as seismic shaking provides sudden increases in shear stress through violent ground motion. While there are no direct variables to estimate depth to the regolith layer and bulk density of the soil as found in both the shear stress and normal stress equations, some other variables may partly contribute to the values for these variables. For example, profile/plan curvature, elevation, gradient, aspect, contributing area, geology and precipitation may affect the depth of the weathered profile and the degree of weathering; however, it is not clear that such relations would be strong enough to show up as significant contributors in statistical analysis.

A major trigger of landslides is the pore water pressure term found in the equation for effective normal stress. The number and intensity of rainfall events triggering landslides may have some relation to mean annual precipitation. The distribution of pore water pressures across the landscape during rainfall events may be partly captured by a number of variables (with some being more important than others): specific catchment area, and slope geometry variables (slope gradient, profile/plan curvature, aspect).

Unfortunately, two key geotechnical properties in the shear strength equation, soil cohesion and angle of internal friction, are not likely to be available over regional scales. In our study area, overlying soil properties are not often expected to have any significant relation to bedrock geology. Much of the bedrock is covered by various glacial and postglacial deposits (i.e., glacial till, glacial lake sediment, alluvial fans, valley fill). A large portion of the soil/regolith in Haida Gwaii consists of weathered glacial deposits, such as till, and not bedrock weathered in situ (see Martin 2000; Martin et al. 2002). For these reasons, bedrock geology may not be a significant factor affecting most debris slides (or debris flows). That being said, in some instances soil cover may be derived from local bedrock, and in such cases bedrock may have an influence. In particular, Folisols, derived from local bedrock, may be found in the initiation zone of landslides (Campbell et al. 2010; Nagle 2000).

Factors that control the degree of weathering of either bedrock or glacially-emplaced material (e.g., precipitation, elevation, aspect, slope geometry) may have some relation with these soil properties.

Many of the predictor variables mentioned above affect more than just one aspect of the underpinning mechanical theory, and many of the variables are themselves inter-related.

5.3 Rockslides

Rockslides initiate along planes of weakness; the properties of the latter are a major component of mechanical theory for rock failures (Selby 1993). Shear stress is similar to the equation given for debris slides:

$$ \tau = \rho_{\text{R}} g\,z\sin \theta \cos \theta $$

where ρ R is density of rock (kg m−3), z is the depth to the plane of weakness (m) and the other variables and units remain the same as defined for debris slides. The normal stress is now given by:

$$ \sigma = \rho_{\text{R}} gz\cos^{2} \theta $$

where the variables have all been defined previously. In addition to incorporating the normal stress, the shear strength is dependent on the frictional properties along the plane of weakness:

$$ {\text{S}} . {\text{S}} .= \sigma \tan \phi_{\text{pw}} $$

where tanφ pw is total friction along the plane of weakness, which consists of basic frictional properties along the plane and any large asperities that may contribute additional resistance. If water exists along the plane of weakness, then this may influence the shear strength along the plane of weakness.

Slope geometry variables are important in determining the value of both shear stress and normal stress. Profile curvature may provide an indication of any support at the slope base. Geological mapping may provide important information relating to several variables: rock density; spacing of planes of weakness; and frictional properties along planes of weakness. Water-related variables may play a role in the friction along the failure plane.

5.4 Debris flows

Takahashi (1981) outlines some of the conditions that must be met for the initiation of debris flows. The tangent slope of the debris flow must exceed the following two conditions:

$$ \frac{{c*(\rho_{\text{g}} - \rho_{\text{f}} )}}{{c*(\rho_{\text{g}} - \rho_{\text{f}} ) + \rho_{\text{f}} (1 + h_{0} d^{ - 1} )}}\tan \phi $$

and

$$ \frac{{c*(\rho_{\text{g}} - \rho_{\text{f}} )}}{{c*(\rho_{\text{g}} - \rho_{\text{f}} ) + \rho_{\text{f}} (1 + \kappa^{ - 1} )}}\tan \phi $$

where c* is grain concentration by volume in the static bed, ρ g is density of sediment grains (kg m−3), ρ f is density of fluid (kg m−3), h 0 is depth of water flow (m), d is diameter of grains (m), φ is internal friction angle and κ is a numerical coefficient (determined from experiments to be about 0.7).

Based on the above mechanical theory, several variables related to slope geometry may be important for debris flow initiation (e.g., slope gradient, profile curvature, plan curvature), as well as several water-related variables (see section on debris slides for complete list and details). Sediment that collects in gullies (which has the potential to be involved in debris flow activity) may or may not show a relation to underlying geology, depending on whether the bedrock has been scoured or not.

5.5 Debris avalanches

Debris avalanches involve the chaotic movement of rocks, soil and/or debris mixed with water and/or ice on steep slopes, and the flow is largely unconfined. Earthquakes may be an initial trigger of such events, and as material moves downslope, water and/or ice may be incorporated into the mass. Saturation of the material by water independent of earthquake activity may also be an important triggering mechanism. Mechanical theory of debris avalanches is not as well developed as for other more common landslide types, given their less frequent occurrence and often very unique situations. However, many of the same mechanical variables noted for debris slide and debris flow initiation are expected to be important. The variable coverages that might be significant are slope geometry and water-related characteristics, such as those discussed in the above sections.

5.6 Final selection of variables

Based on the above discussion, the following variable coverages are selected for incorporation in our discriminant function analysis: slope, profile curvature, plan curvature, elevation, precipitation, specific catchment area, aspect, bedrock geology and distance to nearest fault line (Table 2). The abstraction process for these variables, during which information is generalized into map coverages, may introduce errors, which can be further perpetuated through GIS manipulation and analysis (Walsh et al. 1987; Guzzetti et al. 1999). The use of stepwise discriminant function analysis allows for identification of variable combinations that are the best predictors of our grouping variable (failed vs. unfailed locations).

Table 2 Controlling variables of landslide initiation and format

6 Variable coverages

The digital elevation model (DEM) used in the analysis has 25 m postings and was derived from 1:20,000 Terrain Resource Information Maps (TRIM) available from the British Columbia government (http://archive.ilmb.gov.bc.ca/crgb/pba/trim/). Several of the topographic coverages for this study were derived from the DEM (Fig. 4). These derivatives include slope, aspect, profile curvature, plan curvature, and specific catchment area. Slope and aspect were generated in ArcView using the standard commands; a D-8 model embedded within ArcView calculates slope gradient. The remaining derivatives, including profile curvature, plan curvature and specific catchment area, were obtained using the hydrological extension TarDEM© (Tarboton 2000). Profile and plan curvature are expressed by derivatives of the dependence of the elevation z = f(x, y) on the coordinates x and y (cf., Evans 1980). To determine the catchment area for a location, the first step is to fill all pits in the DEM. Once pits in the DEM are filled, TarDEM© is used to calculate the number of grid cells draining through each downslope grid cell. Categorical data coverages were reclassed to interval values for discriminant analysis. Aspect was reclassed based on observed landslide frequencies of occurrence in a preliminary analysis.

Fig. 4
figure 4

a Elevation values (meters above sea level) for Graham Island. b Hillshade relief. Data source: British Columbia Government TRIM

In addition to the topographic variables, precipitation, bedrock geology and distance to nearest fault line were also included in our analysis. The bedrock geology data is based on geological maps having a scale of 1:125,000 and a UTM projection (Gimbarzevsky 1988 after Sutherland Brown 1968) (Fig. 5). The geological maps were scanned and imported into PCI Geomatica, where they were georeferenced, mosaiced and reprojected to British Columbia’s Albers Equal Area projection. These data consist of bedrock geology polygons and fault lines (398 digitized fault line vectors). The geological coverage was reclassed based on relative strengths (Sutherland Brown 1968; Gimbarzevsky 1988), and follows a basic relative hardness classification. The precipitation data used in our analysis are based on Hogan and Schwab (1990). This earlier study used two separate data sources in its analysis of precipitation characteristics for Haida Gwaii. The first data source consisted of 5-year records from 8 weather stations regulated by the Atmospheric Environment Service (AES). The second data source was from the British Columbia Ministry of Environment Resource Analysis Branch (RAB) and consisted of monthly total precipitation values for 27 stations covering a record of 4 years. Unfortunately, the original source data could not be located and are therefore not available (Hogan, personal communication 2004). The four precipitation zones delineated in Hogan and Schwab (1990) were scanned, and this image was rectified (Fig. 6).

Fig. 5
figure 5

Bedrock geology for Graham Island. Source: Gimbarzevsky (1988) after Sutherland Brown (1968)

Fig. 6
figure 6

Precipitation zones for Graham Island. Source: Hogan and Schwab (1990)

Once all of the variable coverages were gathered, the values for each variable were obtained for every landslide initiation point and for the random sample of locations representing unfailed slopes (same number of data points as the landsliding group; these data are required for the discriminant function analysis). This was accomplished in one of two ways, depending on whether the coverage was in vector or grid format. For the vector coverages, such as bedrock geology and precipitation, the attribute information was attached using a spatial join feature in ArcView. This function prevents data redundancy; instead of copying the information to the actual initiation point coverage, it associates the two files together but keeps the information of the two coverages independent from one another. For grid coverages such as elevation and slope, a “Get Grid Attribute” avenue script was used to copy the grid variable information at each landslide initiation point. The pooled within-group correlation matrix for the 9 variables is shown in Table 3. It should be noted that the strongest correlations exist between the variables slope, elevation, precipitation, geology and nearest fault distance.

Table 3 Pooled within-group correlation matrix

Tabachnick and Fidell (2001, p. 462) explain that discriminant analysis “…is robust to failures of normality if violation is caused by skewness rather than outliers.” Our data for variable coverages were examined for normality, skewness and outliers. Even though there is skewness associated with some of our variables (such as precipitation, catchment area, nearest fault distance), outliers are minimized in our data as many of our variables were extracted from information derived from DEMs for the Queen Charlotte Islands that has been subject to careful prior analysis. Furthermore, several of our variables were reclassed, during which any outliers (should they have existed) were recognized and removed.

7 Forward stepwise discriminant function analysis

7.1 Methods

Discriminant function analysis (DFA) is the statistical technique used in our study for identifying the factors associated with landslide initiation (Carrara et al. 1991; Baeza and Corominas 2001). In our analysis, two broad dependent groups are defined: (1) locations of landsliding occurrence (Group 1) and (2) a random sample of locations that represents unfailed slopes, having the same number of data points as the landsliding group (Group 0). DFA optimizes a linear combination of predictor variables (i.e., in our study, those variables likely to be associated with landslide initiation) that are best suited for discriminating amongst the categorical dependent variables. The discriminant score is calculated by the equation:

$$ Z \, = \, a \, + \, C_{1} X_{1} + \, C_{2} X_{2} + \, C_{3} X_{3} + \cdots + \, C_{n} X_{n} $$

where Z is the discriminant score, the number used to predict group membership of a case; a is a coefficient that maximizes variability between failed and unfailed groups (while also minimizing variability within groups); X i refers to predictor variables (in our study, variables conducive to landslide initiation), and C i is the discriminant weight or coefficient, a measure of the extent to which a variable X i discriminates amongst the groups. The Z-scores for failed and unfailed slopes should be as different as possible, highlighting the separation between groups (Carrara et al. 1991).

In DFA, a key criterion is the optimization of the between-groups to within-groups sum of squares (Huberty 1994), defined as:

$$ {\text{TSS}} = \sum {\left( {Z_{i} - \bar{Z}} \right)^{2} } $$
$$ {\text{WSS}} = \sum {\left( {Z_{ij} - \bar{Z}_{j} } \right)^{2} } $$
$$ {\text{BSS}} = \sum {\left( {\bar{Z}_{j} - \bar{Z}} \right)^{2} } $$

where TSS is total sum of squares, WSS is within-group sum of squares, BSS is between-group sum of squares, i is an individual case, j is a group, Z i and Z ij are individual discriminant scores, \( \bar{Z} \) is the grand mean of the discriminant scores, \( \bar{Z}_{j} \) is the mean discriminant score for Group j. The goal is to estimate parameters that minimize the within-group SS. Various statistical measures are utilized in the analysis that are based on these equations.

The forward stepwise selection process requires that before a variable is retained for further analysis, it must pass some minimum conditions as defined by a set of selection criteria. These minimum conditions include a tolerance test, which assures computational accuracy, and an F-statistic (see Table 4 for additional information on these criteria). The F value for a variable indicates its statistical significance in the discrimination between groups, and provides a measure of the unique contribution that a variable makes to the prediction of group membership. In forward stepwise discriminant analysis, the variables are entered in a stepwise fashion based on the Wilks’ lambda score. The Wilks’ lambda score, Λ, represents a measure of the equality of group means, analyzed between groups:

$$ \Uplambda = \frac{\text{WSS}}{\text{TSS}}. $$
Table 4 Results of the forward stepwise discriminant analysis

This statistical test identifies if differences exist between group means, i.e. our random locations and our failed locations (Groups 0 and 1 as defined earlier) for the dependent variables. Wilks’ lambda is a measure ranging in value between 0 and 1, with a value of 0 representing complete group mean separation and a value of 1 indicating group means are equal. Wilks’ lambda scores are used for two purposes in this analysis: the first is to assess the amount of group separability between each variable; the second is the assessment of the eigenvalue, a value that describes the reliability of the discriminant function.

Once the stepwise procedure begins, the predictor variable that best separates groups, based on the lowest significant Wilks’ lambda score, is entered into the model and the parameters for the resulting discriminant function are tested for group separation. Of the remaining variables, the predictor that has the next lowest Λ (i.e., the next best separator) is selected and entered into the model. It is then assessed if the addition of that variable was meaningful; furthermore, the predictor variable(s) previously entered must be checked to ensure that they remained significant. This procedure is repeated until all of the predictor variables are entered into the model or until none of the variables outside the model have meaningful Λ values; i.e., when none of our unselected variables meets the entry criterion. It should be noted that as variables are entered, the multivariate dimensionality increases and, thus, variables that on a univariate plane show good separation of group means may not show the same separation of group means in a multivariate framework (Huberty 1994).

7.2 Results of discriminant analysis

Due to the significant areal extent of missing data for Moresby Island, we restricted the discriminant analysis to Graham Island, involving a total of 3,466 landslides and covering an area of 6,671 km2. The order of variables entered into our forward stepwise discriminant analysis is shown in Table 4. When none of our unselected variables meets the entry criterion, the forward selection process stops (note that plan curvature and catchment area never entered into the analysis). Slope, elevation and precipitation are the three most important discriminatory variables in our analysis. Aspect, profile curvature, nearest fault distance and geology contribute the least to the discriminatory model. A number of parameters can be examined to evaluate the strength of the discriminant model.

The discriminatory power of the model can be assessed by calculation of the eigenvalue:

$$ \lambda = \frac{\text{BSS}}{\text{WSS}} $$

where BSS is the between-group sum of squares and WSS is the within-group sum of squares. The goal is to estimate parameters that minimize the WSS. An eigenvalue of 0 indicates that the model has no discriminatory power; the larger the value of λ, the greater the discriminatory power of the model. Summary results reveal a discriminant function using 7 of the 9 available variables, with an encouraging eigenvalue of 0.789, indicating a near 80% discriminatory power (Table 5a).

Table 5 a Eigenvalue for the discriminant function, b Wilks’ lambda score, c Classification results

The discriminatory power of our model is further supported by the canonical correlation, which is a measure of the association between groups formed by the predictor variables(s) and the discriminant function:

$$ \eta = \sqrt {\frac{\lambda }{1 + \lambda }} = \frac{\text{BSS}}{\text{TSS}}. $$

When η is zero, there is no relation between the groups and the discriminant function; when the canonical correlation is large, there is a high correlation between the groups and the discriminant function. Our results indicate a value of 66% for the canonical correlation (Table 5a). This indicates that approximately 80% of landslides were discriminantly separated from the random sample using 7 variables, and that these 7 predictor variables display a 66% correlation amongst the grouped dependents (landslide initiations) and the discriminant function.

The significance of the values in Table 5a is corroborated by the Wilks’ lambda score (Table 5b), which indicates the reliability of the eigenvalue. The value of Λ in this case can be converted to a Chi-square statistic distributed for df = (k − 1), where k is equal to the number of parameters estimated. The lambda score in our model results in a significant Chi-square (X 2) test, indicating that differences in mean discriminant scores of the two groups are greater than could be attributed to sampling error alone. Classification results indicate 81.7% of the original grouped cases were correctly classified (Table 5c). The value for correctly classified landslide locations (Group 1) is 87.1%, while the correctly classified non-landslide locations (i.e., our random locations in Group 0) is 76.4%. Finally, the standardized canonical discriminant function coefficients for the forward stepwise analysis for the 7 remaining variables are shown in Table 6. Based on the three principal variables found to explain the most variance, the discriminant function equation is given as:

$$ Z = - 2.988 + 0.05X_{1} + 0.002X_{2} + 0.407X_{3} $$

where X 1 is slope, X 2 is elevation and X 3 is precipitation.

8 Discussion and conclusions

The spatial coverage of the landslide inventory collected by Gimbarzevsky is remarkable, providing perhaps the best record of medium to large landslides for coastal British Columbia. The comparison of frequency distributions for landslide lengths found in the Rood and Gimbarzevsky databases highlights how contrasting scales of aerial photography can lead to very different results. The spatial distribution of landslide events in the Gimbarzevsky landslide inventory results from the real-time interaction of a variety of factors which, when combined with some triggering mechanism (i.e., high rainfall event, earthquake activity), determine the ratio between the shear stress and shear strength acting at any particular location. Researchers and land managers are also often concerned with landslide susceptibility over larger, regional scales (102–104 km2). At these large scales, it is not feasible to obtain the input information required for mechanical analysis. For this reason, variables that may be related to pertinent aspects of the underlying mechanical theory, but that are possible to obtain over larger spatial scales, are collected and then analyzed in conjunction with a landsliding inventory using various statistical techniques. Consideration of the mechanical theory as demonstrated herein shows that the various predictor variables are often related to more than just one aspect of mechanical theory; for example, slope geometry variables may affect shear stress, normal stress and pore water pressure. Moreover, certain predictor variables may be related; for example, elevation and precipitation often show a relation to one another. In regards to these various inter-relationships, a particular strength of the forward stepwise analysis adopted in this study is that as a new predictor variable enters the analysis, the relative contribution of other variables previously entered into the analysis can change (in response to these multiple roles).

The forward stepwise approach adopted for the discriminant analysis of the Gimbarzevsky landsliding inventory showed that seven of the nine possible predictor variables were significant in separating the grouping variable. The relatively high eigenvalue of 0.79 indicates a strong ability of the final model to discriminate between failed and unfailed locations (with a canonical correlation of 0.66 and a significant Wilks lambda score). The predicted group membership shows that 76.4% of the unfailed locations were correctly classified, while 87.1% of the failed locations were correctly classified; these are encouraging results.

The variables with a relatively larger value of the standardized discriminant coefficient, in either the positive or negative direction, are those having the most important role in discriminating between the grouping variables (failed vs. unfailed locations). A positive coefficient means that higher values of the variable have a tendency to be related to hillslope failure, whereas a negative coefficient means that a higher value of the variable shows a tendency towards non-failure; values closer to zero have a lesser role. The most important variables for discriminating between failed and unfailed locations are slope gradient (coefficient of +0.580), elevation (+0.344) and precipitation (+0.358). The remaining variables are less important in discriminating amongst these two groupings (Table 6).

Table 6 Standardized canonical discriminant function coefficients

Slope gradient is a notable predictor variable because it is a key variable in mechanical theory, appearing in the calculation of both shear stress and normal stress. Higher slope gradients lead to relatively greater shear stresses and lower values of resisting normal stresses. Higher elevation is probably associated with many failed locations due to its relation to several key mechanically-related variables (elevation in itself does not appear in the underlying mechanical theory; see earlier discussion). In particular, higher elevations often have steep slopes and are also subject to greater precipitation. Precipitation is a major factor in landslide initiation, as it controls the pore water pressure that is a key component of mechanical theory. The other remaining variables show much less contribution to the discrimination of the grouping variable. Despite the lack of direct soil information included in the discriminant function analysis, a good discrimination for the grouping variable was possible. Soil-related variables are critical in the calculation of shear strength (soil cohesion, angle of internal friction). It may be the case that the specific soil information contained in such variables correlates to some of the other regional predictor variables used in our analysis (elevation, slope gradient etc.).

Several other studies adopted a discriminant analysis approach to studying landsliding occurrence and found different combinations of predictor variables to be the most important. Baeza and Corominas (2001) found that slope gradient, watershed area and land-use were the most important predictor variables, while Santacana et al. (2003) identified slope gradient (easily derived from a DEM) and thickness of the superficial deposits (obtained from field work, so not possible for larger, regional studies) as being the most important variables.

Our analysis provides insights regarding the predictor variables that land managers and other researchers may consider including in future landsliding analyses. That being said, our landsliding inventory does focus on medium to large landslides, and on coastal British Columbia. Therefore, the most important predictor variables may differ for other studies, and our results should be used mainly for guidance purposes. Of particular note is that two of the most significant three variables identified in our study are all readily retrieved from standard DEM analyses; this is very encouraging from a feasibility perspective. Some of the other variables that are less straightforward to obtain played a less important role in discrimination of the grouping variables.