Introduction

As the vegetation restoration is becoming stronger in rock slope stability and roadside landscape restoration, it is taken on as the effective measure in controlling the erosion and stabling on rock slope (Tinker et al.1998). The rock slope stability is influenced by many factors, including the geological structure, development situation, soil property, slope status, and the geographical position, etc. For the high correlations of variables, it is difficult to decide the most important one and how it influences the slope stability degree. Yet the roadside slope is a crucial question of ecological restoration. Principal component analysis (PCA) is the most important part of multivariate statistical analysis. By compressing the variable numbers and diminishing the co-linearity of original data, it can transform the highly correlated variables into just fewer variables (Thorpe 1988). The basic approach of PCA is to compute the eigenvectors of the covariance matrix and approximate the original data by a linear combination of the leading eigenvectors (Annoni 2007). This has been proved to be effective in overcoming the instability and ill-conditioned matrix in structural analysis, and lead to the enough information of the original variables and accurate results reflected by the model (Sousa et al. 2007). PCA has been widely used in social sciences such as astronomy (Ronen et al. 1998), geography, physics, chemistry, and life sciences (Fievez et al. 2003), and great progress has been made in data compression, image processing (Calder et al. 2001), visual-isolation, exploratory data analysis, pattern recognition (Doty et al. 1994; Guo et al. 2004) and time series prediction (Berrar et al. 2003). In environmental field, PCA is used to analyze the relationship between fertilizer and biomass (Morillo et al. 2008; Seabloom et al. 2003); some are concentrated on the relationship between vegetation and environment (O’Lenic and Liverzey 1988; Inger et al. 2008; White and Hood 2004), and species diversity and its site (ter Braak 1983). Recently, a few researches have investigated the stability of roadside slope in cognitive approaches (ter Braak 1989, Wiser et al. 1996).

In this paper, two different methods, PCA and hierarchy cluster, are applied to analyze the main factors influencing the rock slope stability in a large scale in Sichuan basin of southwest China. In addition, the models of the rock stability will reveal the relationships of the environmental factors from two aspects, qualitative and quantitative.

Site description

Sichuan basin is one of the four famous basins in China. Dominated by the faults trending in NE–SW and north east direction, it is divided into two particular parts, marginal mount and bottom basin, which cover about 260,000 km2 with particular geomorphology including 7% plain, 52% hill, and 41% low mount. All sites are distributed in this area between latitudes 29°19′40″–32°15′34″ and longitudes 102°57′52″–105°33′30″ (Fig. 1). Field data were collected along the roadside slope with the altitude from 280 up to 1020 m above the sea level. Due to the westerly circulation and southwest monsoon, there has a subtropical climate. The annual average temperature is 17.5°C, with the lowest average monthly temperature ranging from 5°C to 8°C and the highest average monthly temperature ranging from 25°C to 29°C. The average annual rainfall in the area ranges from 1,000 to 1,300 mm and about 75% of the rainfall is concentrated from June to October. The natural vegetation belongs to the subtropical evergreen broadleaved forest. The dominant rock types are red sandstone and shale, and the major soil type is the purple loam in this area.

Fig. 1
figure 1

Spatial distribution diagram of the 147 roadside slope plots in Sichuan Basin, China

Objects and methods

147 samples from 80 sites of the rock roadsides around basin were established and studied in October, 2007. The site was set at least 15 m long along the road and 10 m high along the slope in order to meet to the minimum scale. The samples were set in the middle of the investigation area. Taking the community into account, grass sample was set 1 m × 1 m, 2 m × 2 m for shrub and 5 m × 5 m for tree, and 147 samples were settled in total. These variables including slope, aspect, and rock type, weathering degree, soil type, soil depth, altitude, latitude, and longitude were recorded and calculated in total. The methods and classification standards of the nine variables were listed as follows: Rock type was qualitative to three types according to the formation reason. Rock weathering degree was quantified by the ratio of weathering porosity, namely the ratio of inhaled water quality of weathered rock to dry weathering rock. The rock weathering degree was then divided into four types, complete weathering, strong weathering, weak weathering and slight weathering according to the ratio. Soil type was measured with the hydrometer, named by the soil texture classification of the ratio of physical sand to physical clay. The soil depth was measured with the earth auger from surface soil vertical downward to cane with the unit centimeter. When the soil thickness was thin, the soil profile was dug to be measured directly. Geography position and terrain were measured by portable global position system (GPS) and compass, respectively.

Data analysis

Principal component analysis

Nine variables of the 147 roadside slope plots were conducted using statistical package for social science (SPSS) version 16.0 for all calculation. The main steps were listed as follows: (1) Original data standardization. Owing to the data types, data formats and data units of nine variables were different and it was unreasonable to analyze them directly. The original data were standardized with the z-score transformation to the same scale, producing new variables with a mean of zero and a standard deviation of one. These new variables were independent linear combinations and retained the maximum possible variance of the original set, and could be added to the working data file for further analysis. (2) Coefficient matrix calculation and the significance level test. The Kaiser–Meyer–Olkin (KMO) test of sampling adequacy and Bartlett’s test of sphericity were used to assess the appropriateness of the correlation matrices. (3) Eigenvalue, contribution rate, cumulative contribution rate and eigenvector. (4) Principle component extraction, the construction of the extracted principal component models, and the slope stability evaluation model.

Hierarchy cluster

Hierarchical clustering referred to the formation of a recursive clustering of the data points: a partition into two clusters, each of which is itself hierarchically clustered. It can be used to evaluate the relationships between the environment stressors on the plant community effecting from the chemical and physical system (Lipkovich et al. 2008). The standardized variables were induced to hierarchy cluster based on the Pearson correlation. The cluster results were obtained according to their relative correlation.

Results

Correlation analysis

Based on the standardized variables, the correlation matrix of the 147 samples was measured. Before conducting principal component analysis, Barlett’s test of sphericity and KMO test of sampling adequacy were initially performed to confirming the appropriateness of conducting PCA (Sousa et al. 2007). The Bartlett’s test for sphericity showed that the correlation matrix was at an appropriate level to perform principal component analysis reaching a significance level of p < 0.001. The KMO measure provided a value between 0 and 1. Small value for the KMO indicated that a factor analysis of the variables may not be appropriate. Value higher than 0.5 was considered satisfactory for principal component analysis (Norusis 1990). In this paper, the KMO test was 0.726, which manifested that the samples were adequate to principal component analysis. Both of the two tests supported that the principal component analysis were appropriate. Table 1 showed that many of the standardized variables were relatively well correlated with one another. Both negative correlation and positive correlation were observed, where the correlation coefficient between soil depth and soil type was the maximum 0.912.

Table 1 Correlation coefficient of the standardized variables

Principal component analysis

The correlation matrix among the nine standardized variables was subjected to principal component analysis, a procedure that, although similar to the factor analysis, did not suffer from the factor in determinacy of factor analysis. The most common stopping rule in PCA was based on the average value of the eigenvalues >1.0 (i.e., the Kaiser-Guttman criterion; Guttman 1954; Cliff 1988; Jackson 1993). According to the rule, the first four principal components whose eigenvalues >1.00 were extracted which accounted for 75.552% of the standardized variance of the data. The eigenvalue of the first principal component was 2.782, and it explained 30.914% of the standardized variance, in which rock type, weathering degree, soil type, and soil depth were the major contributing variables. The eigenvalue of the second principal component was 1.851, and the latitude and longitude were the major factors with the contribution rate of 20.563%. The eigenvalue of the third principal component was 1.111, and altitude loaded the contribution rate 12.349%. Aspect and slope were the important variables of the fourth principal component, and the eigenvalue was 1.055 with the contribution rate 11.726%. The first four principal component whose eigenvalue >1.00 embraced all the standardized variables, which manifested the principal component analysis was effective in rock slope stability analysis. According to the component eigenvalue of the standardized variables, it was significant to name the first four principal components: the first principal component named as parent material factor, and the second and third principal components named as geographical factor, and the fourth principal component named as terrain factor.

Based on the principal component definition, four linear combinations of the principal component eigenvectors of the nine standardized variables were obtained (Table 2). The eigenvectors of the principal component were calculated according to the coefficeinces of the standardized variables. Accordingly, the z score functions of the four extracted principal component were listed as follows:

$$ \begin{gathered} F_{1} = \, - 0.279\; \times \;Zx_{1} + \, 0.071\; \times \;Zx_{2} - \, 0.439\; \times \;Zx_{3} + \, 0.377\; \times \;Zx_{4} + \, 0.525\; \times \;Zx_{5} + \, 0.549\; \times \;Zx_{6} + \hfill \\ 0.041\; \times \;Zx_{7} - \, 0.068\; \times \;Zx_{8} - \, 0.014\; \times \;Zx_{9} \hfill \\ \end{gathered} $$
$$ \begin{gathered} F_{2} = \, - 0.08\; \times \;Zx_{1} - \, 0.102\; \times \;Zx_{2} + \, 0.065\; \times \;Zx_{3} + \, 0.049\; \times \;Zx_{4} + \, 0.049\; \times \;Zx_{5} + \, 0.019\; \times \;Zx_{6} + \hfill \\ 0.373\; \times \;Zx_{7} + \, 0.700\; \times \;Zx_{8} + \, 0.587\; \times \;Zx_{9} \hfill \\ \end{gathered} $$
$$ \begin{gathered} F_{3} = \, 0.217\; \times \;Zx_{1} + \, 0.464\; \times \;Zx_{2} - \, 0.036\; \times \;Zx_{3} + \, 0.147\; \times \;Zx_{4} - \, 0.100\; \times \;Zx_{5} - \, 0.039\; \times \;Zx_{6} + \hfill \\ 0.712\; \times \;Zx_{7} + \, 0.078\; \times \;Zx_{8} - \, 0.435\; \times \;Zx_{9} \hfill \\ \end{gathered} $$
$$ \begin{gathered} F_{4} = \, 0.518\; \times \;Zx_{1} + \, 0.602\; \times \;Zx_{2} + \, 0.253\; \times \;Zx_{3} - \, 0.100\; \times \;Zx_{4} + \, 0.282\; \times \;Zx_{5} + \, 0.222\; \times \;Zx_{6} - \hfill \\ 0.306\; \times \;Zx_{7} + \, 0.044\; \times \;Zx_{8} + \, 0.268\; \times \;Zx_{9} \hfill \\ \end{gathered} $$

where Z indicated the standardized variables, and x denoted the variables, and i denoted the order of the variables.

Table 2 Eigenvectors of the four components after standardization variables

Taking the eigenvalues of the first four principal components into account, the weights were calculated as follows: 0.4092, 0.2722, 0.1634, and 0.1552. The slope stability evaluation model was expressed as

$$ F = 0.4092\; \times \;F_{1} \, + \, 0.2722\; \times \;F_{2} + \, 0.1634\; \times \;F_{3} + \, 0.1552\; \times \;F_{4} . $$

Hierarchy cluster

Based on the Pearson correlation of the standardized variables, hierarchy cluster was launched in SPSS 16.0. According to the hierarchy cluster analysis of the nine standardized variables, four positive correlations, and three negative correlations of Pearson correlations were observed. The correlation between rock type and soil type was the maximum positive correlation 0.680, while the slope and the aspect presented the maximum negative correlation −0.557. Nine standardized variables were classified into three categories: geographic location, parent material, and terrain factors from the rescaled distance 17.5–20.0. Category I included four variables, aspect, weathering degree, soil type, and soil depth. Category II included two variables, slope and rock type. The category III included altitude, latitude and longitude. The dendrogram using complete linkage of the hierarchy cluster is shown in detail in Fig. 2.

Fig. 2
figure 2

Dendrogram of standardized variables using complete linkage of hierarchy cluster

Discussion and conclusions

Principal component analysis has proved to be an exceedingly popular technique for dimensionality reduction (Tipping and Bishop 1999). As applied to the stability analysis of rock side slope, PCA serves a similar function: It identifies a limited number of factors that can represent the complex factor information in road side slope in a suitable form for slope stability. Two different methods were used to analyze the correlations of the standardized data. Principal component analysis extracted four principal components. The first principal component which was consisted of rock type, soil type, weathering degree, and soil depth was the most important factor of all whose cumulative contribute rate was 30.914%, which revealed the parent material factor influencing the roadside stability. In the second and third components, latitude, longitude, and altitude were important variables of geographical position. The fourth principal component revealed the terrain factors, slope, and aspect. Although four principal components were formed, they belong to three factors: material factor, geographical factor, and terrain factor. The conclusion was drawn that parent material factor was the most important component influencing the rock slope stability, and the geographical position was the second important factor.

Some differences exist between principal component analysis and hierarchy cluster of the standardized variables. In PCA, the first component showed that four factors contributed to the maximum contribution rate: rock type, weathering degree, soil type, and soil depth. But in hierarchy cluster, the cluster just contained three variables: weathering degree, soil type, and soil depth. In hierarchy cluster analysis, altitude, latitude, and longitude belong to the same cluster, while in PCA they were classified into two principal components. The reason for the difference could be explained by the data type of the two methods. Principal component analysis used the absolute value of the correlation coefficient, whereas the hierarchy cluster used the vector. Both of the two results provided that the environmental factors influencing the roadside stability were classified into three sorts, in which hierarchy cluster provided qualitative factors while PCA provided qualitative and quantitative factors of the rock slope stability.