Introduction

Debris flows, mostly caused by intense rainfall (Costa 1984; Rickenmann 1999), are common geomorphic processes with potential hazard in mountainous areas (Jan and Shen 1997; Liu et al. 2002; He et al. 2003). In a torrent watershed, debris flows can induce very high peak discharges, and consequently result in severe hazard to the local residents and their properties. Three key factors are identified to be the major contributors to debris flows: steep topography, loose terrane and soil, and high intensity flow in a short period. In southwestern China, the large areas of mountainous regions along with the semitropical climate condition, which is characterized by intense rainfalls and flows, make it very likely for debris flows to occur in these areas (Tan 1994). The frequent occurrence of debris flows poses a critical threat to the regional residents and economic development, hence significant attentions have been drawn to the research and management of debris flows in southwestern China.

Systematic observation and research on debris flows have been carried out since 1970s. However, the research on regional hazard assessment for debris flow is still at its primitive stage. Generally, the hazard of debris flow (HDF), which is defined as the magnitude and frequency of occurrence of debris flows, is used to assess the risk of debris flows at a regional scale (Liu and Lei 2003). Since the HDF reflects the potential threat of debris flows for a region (Deyle et al. 1998; Liu and Lei 2003), it can be used to support management decision makings for preventing debris flows.

Previous studies have proposed several methods and models to assess the HDF. Rickenmann and Beurteilung (1995) present a two-step method, where firstly the probability of occurrence of debris flow event in a torrent catchment is determined, and then a quantitative estimation of the most important parameters in a hazard assessment is obtained.

Geographic information system (GIS) technology has been introduced into the assessment of debris flows in recent years. For example, Lin et al. (2000, 2002) applied GIS techniques to assess the HDF in the Chen-Yu-Lan River Watershed, Taiwan, China. Bisson et al. (2002) applied GIS to delineate the HDF in southern Italy. He et al. (2003) developed GIS-based hazard mapping and zonation of debris flows in Xiaojiang Basin, southwestern China.

Liu (2002) proposed a multi-factor composite assessment model (MFCAM) for regional HDF assessment. In the model, hazard index (HI) is mathematically represented as a linear combination of all the contributing factors to derive site-specific regional hazard degree (RHD) values. The weights of each factor are obtained based on the subjective judgment of experts and stakeholders (Liu and Zhang 2004). In general, the weights in site-specific MFCAM models would vary between different study areas due to the spatial variability in debris flow pattern and expert opinions. Then the calculated HI value is mapped to the RHD scale to assess the degree of hazard. Since the values of these key-contributing factors are directly measurable, this approach offers a practical way to assess the HDF of debris flow impacted area based on the available data (Liu 2002; Liu and Lei 2003; Liu and Zhang 2004). Comparing with the previous methods, MFCAM is a more appropriate model for assessing regional HDF since it takes the inter-correlated factors into account and can reflect the potential threats of debris flows at a regional scale.

Although linear models are widely used for HDF assessment, they are subjected to significant limitations since the nature of debris flow is essentially governed by nonlinear process. In addition, the performances of all the aforementioned linear models heavily rely on the site-specific weights while no reliable method other than subjective judgment is available for accurately deriving these weights. Plus, the linear models obtain the RHD values through mapping continuous HI values to a discrete space and use the resulted RHD values to assess the risk of debris flow. However, this mapping approach would lead to unreasonable result. It is rather subjective and does not make practical sense since both of them actually represent the same condition in the continuous HI system but they are arbitrarily differentiated in the discrete RHD system. In such a mapping process, even a minor uncertainty in the weight values could lead to completely different assessment results using a same linear model, making the decision making very reliable and risky.

In practice, the site-specific RHD values for the threatened area can be conveniently represented using a graded system, i.e., hazard-degree rank (HDR). Previously a five-level HDR system, consisting of I, II, III, IV, and V, was used to represent the HDF threatened areas (Liu 2002). Level I represents very low HDF and level V represents very high HDF. HDR is a comprehensive indicator representing the potential risk of debris flow hazard; therefore it is generally impractical to directly measure the RHD value for a specific area. For the decision makers and the public, HDR is more convenient for understanding and establishment of adaptive countermeasures than HI.

A nonlinear model that directly relates the HDR with several key-contributing variables can be used to avoid the limitation of the linear models. Curve fitting is the most frequently used technique for deriving quantitative relationships between dependent and independent variables (Kartal and Ozer 1998; Irish et al. 1998). This method assumes a certain kind of functional relationship to exist between the dependent and independent variables, such as linear, polynomial, or exponential. However, no such knowledge is available for the relationship between HDR and its contributing variables. Therefore, it is not proper to represent the complicated relationship between HDR and the contributing variable values using a curve-fitting method. An alternative of the curve fitting method is neural network (NN) technique. The NN technique is one of the most advanced soft computing technologies and has been widely used in the field of pattern identification, dynamic system prediction, control, and optimization (Fu 1994). Some researches have indicated that the NN technique is very powerful in functional mapping (Luger 2002; Zou et al. 2002). Theoretically, an NN with a proper structure is able to map any complicated functional relationship between dependent and independent variables without the need for a preconceived function form.

This paper proposes a new hazard assessment approach that uses a NN model as the functional mapper to represent the nonlinear relationship between HDR and its key contributing variables. The NN model is developed based on data collected in previous reported studies, and the weight of each node in the network is derived through the network training process. An uncertainty analysis is implemented to demonstrate the effect of different network training data on the network performance. The trained network is applied to assess the HDF, represented by HDR, in the Lake Qinghai Watershed, China, and the results are interpreted with a context of risk assessment. The HDF assessment results provide important information to support the decision making for regional debris flow management.

The study area

Lake Qionghai, with an area of 27.88 km2 at a water-surface elevation of 1,509.8 m, and a drainage basin of 307.67 km2, is situated in the northeastern portion of Xichang City, Sichuan Province, China (Fig. 1). Its cubage is 2.891×108 m3. Lake Qionghai provides the local development with water and recreational resources.

Fig. 1
figure 1

Lake Qionghai Watershed in southwestern China and the administrative zonation: a location map of Lake Qionghai Watershed in China; b the rivers and administrative zonation of the Lake Qionghai Watershed and adjacent area

Lake Qionghai Watershed lies in the area with high occurrence of debris flows (Tan 1994). From 1950s to 2000s, there have been many debris flows in the adjacent areas, which caused severe damages. The typical ones, from the Database for Natural Disasters in Yangtze River Basin (http://www.changjiang.whlib.ac.cn/cjsjk/zhk/default.asp), include the debris flows in Upriver Reshui River of Xichang City on June 28, 1957, East Dyke of Xide County on July 10, 1988, Laqing Dyke of Zhaojue County on September 4, 1989, Suburb of Dechang City on July 19, 1992, and Caiazu Gully in Puge County on June 20, 2003. So it is urgent to analyze the potential RHD of the study area. The frequent debris flows are a direct result of local natural and social conditions, such as topography, geological, soil, climate, stream networks, and artificial pressure.

Fluctuant and mountainous topography is the driving force for debris flows. In Lake Qionghai Watershed, the coverage of mountains is more than 80%, which results in different hypsography falls. The maximum fall is 1,568 m, and is easy for floods.

Lake Qionghai Watershed is located in the boundary of two fault zones, which results in frequent earthquakes, metamorphic mountains, and fragmental rocks. And the latest earthquake is on June 17, 2003. Strong weathering is another source for fragmental rocks. The local weathering depth of metamorphic sandstone and mudstones are 1–3 and 3–8 m separately. They are the subsidiary driving factors for debris flows. The deposits of collapses and landslide with the past and ongoing debris flows are the main solid source of the coming debris flows. Soil erosion resulting from loose soil provides solid substance for the debris flows in the watershed (Fig. 2).

Fig. 2
figure 2

Percentage of various soil erosion types in Lake Qionghai Watershed. Note: The numbers in the bracket are the corresponding percentages of each soil erosion type

The climate of Lake Qionghai Watershed is of typically semitropical and upland characters, resulting in obvious distinguish between wet season and dry season. The annual average precipitation is 1,004.3 mm and 92.8% of the precipitation occurs in June, August, and September. Centralized precipitation is inducible for potential debris flows. Rock wreathing is often aroused by temperature change and sunshine. In the study area, the annual average temperature is 17.1°C; the extreme minimum and highest values are −5 and 39.7°C. The annual sunshine hours are 2,431.4, exceeding the average level in China.

Densely covered rivers and streams are a driving momentum for debris flows (Fig. 1). In the study area, the stream density is 0.68 per km2. There are four main rivers in the area: Guanba River, Ezhang River, Qing River, and Gangou River. The average slope is 27.14‰.

The pressure of human beings to the vegetation and land is another crucial factor for debris flows. Almost all the crude forests are destroyed and artificial Pinus yunnanenensis is the dominant tree type, with a coverage rate of 90% to the total forest area (Liu et al. 2005). Forests with single species are not so effective for debris flow prevention as primitive forests (Xie 2002).

Based on the analysis of the existing related assessing methods, a NN model for assessing RHD in Lake Qionghai Watershed is presented, taking account of the natural and artificial factors for the debris flows.

Methods

Field investigations, monitoring, and literature review are the basic data sources for regional debris flow hazard assessment. Based on case studies and analysis, eight variables are selected as the contributing factors for HDR, including density of debris flow gully (DDFG), degree of weathering of rocks (DWR), active fault density (AFD), area percentage of slope land greater than 25° of the total land (APL25), frequency of flooding hazards (FFH), average covariance of monthly precipitation by 10 years (ACMP10), average days with rainfall >25 mm by 10 years (25D10Y), and percentage of cultivated land with slope land greater than 25° of the total cultivated land (PCL25) (Liu 2002; Liu et al. 2002). They are taken as the input factor for NN model in this paper.

As discussed above, weights selection, nonlinearity, and uncertainties are the main obstacles for further application of MFCAM; and NN model is used as an alternative method for MFCAM based on data collected in previous reported studies (Liu and Tang 1995; Liu and Lei 2003; Liu and Zhang 2004).

Neural network theory has been introduced and has been successfully used in many fields, such as predication, classification (Lee and Lee 1996; Liu and Guo 2004), landslide (Lee et al. 2003) and geo-engineering (Neaupane and Achet 2003), for its applicability, ability and adaptability to learn, generalizability, and ease of implementation (Seginer 1997; Ferhat 2001).

Learning and recalling are two steps for NNs and the networks usually consisted of input layer, hidden layer, and output layer with nodes (Beale and Jackson 1990). Nodes are the basic elements of the layers and the data are transferred among the layers through nodes. Input and output layer are often composed of one layer, while the hidden layer can have more than one layer. The numbers of nodes in input and output layer are determined by the input and output data. The number of nodes and layers of hidden layer is determined through an iterative process of varying the number of nodes in the hidden layer until an optimal performance was achieved. Learning/training, validating and application are main stages for NN models. In learning phase, the known data sets are used as training signals in the input and output layer (Lee and Lee 1996). The data are imported in the input layer through nodes, and then to the hidden layer and output layer via the weight structure of NNs. Once the learning/training stage is finished, the trained network can be used for actual application (Philippe 1997).

Back-propagation (BP) algorithm of NN, brought out by Rumelhart et al. (1986), is popularly used. Error backpropagation is the key process in BP model and is defined as follows:

$$ E = \frac{1} {2}{\sum {\delta ^{{k^{2} }}_{j} } } = \frac{1} {2}{\sum {(a^{k}_{j} - c^{k}_{j} )^{2} } }, $$
(1)

where, E means error; a k j is the actual known value; and c k j is the network output; δ k j means the difference between actual value and output.

During model training, the network transfers the input data to the nodes of hidden layers and then to the output layer. The BP algorithm is often called supervised learning for that the network will be intervened by errors. If the errors between the outputs and actual known values are less than the expectant errors, or called error goals, the training is finished. Otherwise, the information will be backpropagated to the input and hidden layer for further adjusting until the error goals are met (Fu 1994; Luger 2002).

The major objective in NN design and training is to produce networks that generalize their behavior correctly to new, as yet unforeseen inputs (Robert 1997). Validation is a useful way to judge whether a trained network is generalizable or not. Some data that are never used in the process of training the network can be taken as unbiased measure of generalization capability. The sample data from the known case studies are divided into two groups, one for model parameter estimation and the other for model performance assessment or called performance validation. Robert (1997) suggests using 80% of the data for training and the remaining 20% for validation. The difference between the observed values and the network outputs of the 20% data for validation can be used as a measurement to determine the generalization capability of the trained network.

Assessment of the hazard degree of debris flow in Lake Qionghai Watershed

Neural network model foundation

To fully assess the RHD of debris flows in Lake Qionghai Watershed, the study area is partitioned into five sub-watersheds in terms of stream networks, geology, physiognomy, and administrative boundaries (Fig. 3, Table 1).

Fig. 3
figure 3

The five sub-watersheds in Lake Qionghai Watershed

Table 1 The area and eight variables of five sub-watersheds in Lake Qionghai Watershed

Training and validating data is the key foundation for NN model. Thirty-five groups of data from literatures are selected as the training and validating data for the NN-based model to assess the RHD of debris flows in Lake Qionghai Watershed. According to Robert (1997), 80% of the groups are used as training data and the others are as validating data to judge the generalization of the trained network (Table 2).

Table 2 The training data of NN model to assess the hazard of debris flow in Lake Qionghai Watershed

Neural network model structure and calculation

There are four steps in the NN model:

Data transformation and processing

The reason for data processing is that the eight variables are different in dimensions and are not suitable for direct input for the model. The common method for the data processing is to transform the data to the values between 0 and 1. For example, to y i (i=1, 2,...,n):

$$ Y_{i} = \frac{{y_{i} - y_{{\min }} }} {{y_{{\max }} - y_{{\min }} }}, $$
(2)

where, Y i means the transformed values of y i ; y min and y max represent the minimum and maximum value of y i , separately.

Analysis of training data

The training data are from literatures and case studies. And there maybe some differences among the data. Analysis of training data can be helpful to eliminate some improper data, thereby enhance the precision of NN model. Clustering and relative analysis are used popularly (Bouguettaya and Le 1998). According to the clustering results, measured by squared Euclidean distance, using software SPSS (Alistair et al. 2002), except No. 25, the distances between the other 34 groups data are small, which indicates they are in good consistency (Yuan 1999). And No. 25 in Table 2 is eliminated.

Construction and training of NN network

Selection of the network structure and parameters is crucial to the NN network. Based on the experiences and characters of debris flow hazard, the NN network used in this paper consisted of three layers. The numbers of nodes in input and output layer are 8 and 1 separately. The number of nodes in the hidden layer is determined by reduplicative computation and comparison of the computing epochs. Through analysis, when the number is 6, the network has the least computing epochs and consuming time. So the model structure is set at 8–6–1 (Fig. 4).

Fig. 4
figure 4

The NN model structure used to assess the hazard degree of debris flow in Lake Qionghai Watershed

The model is developed on the software MATLAB (Wen et al. 2000) with Levenberg–Marquardt algorithm (Marquardt 1963). The error goal is set at 0.00001. According to Robert (1997), 34 groups of sample data are not so sufficiently for an 8–6–1 model. Bootstrap method and interpolation are traditional methods to enlarge the sample data and the bootstrap method is applied in this paper. Randomly arrange the data for 200 times and a data collection of 34×200 groups is acquired. In each group, the first 28 ones are taken as training/learning data and the others for validating.

Validating is an important process for model generalization. In the 200-time calculations, selecting 100 times, which have better consistencies for validating data. The average errors are less than 5.0%.

A validation case is conducted before the NN model is extrapolated to Lake Qionghai Watershed. It is a group of new data, which are completely varied with Table 2, and is used for future validation (Table 3). It is about the up-section of Yongding River Watershed with an area of 1,855.6 km2 in Beijing, China with a high HD (Zhong et al. 2004).

From the NN model result, calculated through the 100 groups of random data, the results of HDRs are IV and V with the probabilities of 93 and 7%, respectively. The HDR from the model is consistent with the field study in Zhong et al. (2004), which means that the generalization capability is capable of further application for new data.

Table 3 The validation data about Yongding River Watershed in Beijing, China

Debris flow hazard assessment

Using the selected 34×100 groups’ data for application to assess HDRs of the five sub-watersheds in Lake Qionghai Watershed. The transformed data of the eight variables about the five sub-watersheds are taken as model input and 100 groups of results are obtained from model calculations. Then calculate distributing probabilities of HDR for each sub-watershed (Table 4). The assessment results for the sub-watersheds are determined as the ranks with the highest probabilities.

Table 4 HDRs of debris flow and countermeasures in Lake Qionghai Watershed

Results comparison between NN model and MFCAM

The models’ results show that the HDRs of the five sub-watersheds are as follows: IV, IV, III, III, IV–V, which means that the study area is intervenient of normal hazard and severe hazard (Table 4). Furthermore, a comparative calculation is made using model MFCAM (Liu 2002). The results are shown in Table 4.

The results between the two models indicate that they are basically consistent with each other. The NN results are more practical for decision and countermeasures making. Take RV as an example, the result of MFCAM is 0.72 which means high hazard degree, while the NN model tells that it is intervened between IV and V. Field investigations discover that in spite of the densely distributed debris flow gullies, the values of AFD and the proportion of slope lands are correspondingly smaller than the other four sub-watersheds. So the HDR of RV is site specific in smaller scale with the rank between IV and V, which indicate that the NN model is more accordant with the field investigation.

Discussion

Neural network model is a new method to assess RHD. In spite of the many advantages, there are some shortages, just like other models of NN. The results of NN models for new data rely on the sample data for training and validation. The errors and uncertainty in selecting sample data can increase the corresponding errors and uncertainty in the model.

The HDFs is a relative measurement of potential threats in the selected area. It is not so important what the exact numbers of the model results are. Actually, the hazard degrees corresponding to the results and countermeasures are significant.

The debris flow of the five sub-watersheds in Lake Qionghai Watershed, from Table 4, is in normal, high, and severe hazard, which are consistent with the field investigation (Tan 1994). The countermeasures are focused on early monitoring and integrated control. The economic activities should also be under control.

To the area of RIII and RIV, some appropriate development can be implemented, such as tourism and forestry. The trees producing fruits can be cultivated in the sloping land. The agriculture is only possible in the flat land. To the sloping land, the main task is to maintain the vegetation coverage, so that to reduce the soil losses and the solid materials for debris flows. The integrated control measures include restoring the known debris-flow gullies, increasing the vegetation coverage, strictly restricting the places for economic developments and fundamental constructions, such as roads and railways. Early monitoring of the two sub-watersheds is focused on possible intensive rainfalls.

In the area of RI and RII, the integrated control is mainly emphasized on biological and civil engineering to the debris-flow gullies, such as vegetation restoration and some concrete dykes to prevent the erosion to the riverbanks and roads by the rainfall and floods. To the mountains east to Lake Qionghai, vegetation conservation and early monitoring of sudden debris flows are the main tasks. The economic developments in this area are basically forbidden, especially for chemical and mining factories.

RV is the sub-watershed with potential high and severe threats from debris flows. In this area, prevention of losses and casualty is the significant mission for the local government and no economic activities are permitted.

Conclusion

The NN model is a useful tool in scientific research. It should be based on a large amount of case studies and known data. The hazard of the five sub-watersheds from NN model is consistent with the field investigation, which means that the NN model can be used as a new method in the further study and will provide scientific base for management and decision making. The debris flow of the five sub-watersheds in Lake Qionghai Watershed is in normal, high, and severe hazard. Countermeasures, management, and early monitoring are very important for the regional sustainable development.