1 Introduction

Development process in human society, from various perspectives, is a continuing struggle with natural hazards: successive dynasties and ruling regimes have always been preoccupied with attempts to control disasters. In fact, as a hot issue within academia, scholars have long been concerned with natural hazards (Montz and Tobin 2011), with early research focusing on understanding physical processes, spatial distributions and patterns, and, to some extent, the impacts of events and mitigation. However, the typical characteristics of the studies in this period were mainly descriptive, rather than analytical, and placed responsibility for catastrophe firmly on nature and not human endeavors, which can be summarized as “hazards determinism” (Smith 1982; Mileti 1999). With in-depth study, people have gradually recognized that natural hazards impacts are the joint product of stress and exposure on the one hand and fragility and vulnerability on the other hand (Blaikie et al. 1994a); in other words, if one place is not vulnerable to natural hazards events, there will be no natural disaster loss. So the study on physical processes can only explain who or which place may be exposed to the natural hazards. It is not sufficient to understand just the degree to which people at a location are threatened by that exposure. The natural hazards may produce significantly different impacts on people and places, often not only depending on the severity of the hazard, but also on the physical attributes and the socioeconomic characteristic of a locale. As a response to the deficiencies in early studies, scholars proposed a new concept of vulnerability to address natural hazard-related issues (White 1945).

Currently, the concept of vulnerability is used in many fields and for different spatial scales. Although there have been several attempts at defining and capturing what is meant by vulnerability, the term varies among disciplines and research areas (Liverman and O’Brien 1991; Watts and Bohle 1993; Dow and Downing 1995; Cutter 1996; Fraser et al. 2006; Janssen et al. 2006; Metzger et al. 2006). Thywissen (2006) and Manyena et al. (2008) carried out an extensive review of the terminology. The former includes a long list of definitions used for the term vulnerability and the latter includes definitions of vulnerability and resilience and their relationship. Due to different conceptual frameworks and definitions, as well as disciplinary views, approaches to address the causes of vulnerability also differ. Four approaches to understanding vulnerability and its causes can be distinguished, rooted in political economy, social ecology, vulnerability, and disaster risk assessment, as well as adaptation to climate change (Cardona et al. 2012):

  1. 1)

    The pressure and release (PAR) model (Blaikie et al. 1994b; Wisner et al. 2004) is common to social science–related vulnerability research and emphasizes the social conditions and root causes of exposure more than the hazard as generating unsafe conditions. This approach links vulnerability to unsafe conditions in a continuum that connects local vulnerability to wider national and global shifts in the political economy of resources and political power.

  2. 2)

    The social ecology perspective emphasizes the need to focus on coupled human-environmental systems (Hewitt et al. 1971; Turner et al. 2003a, b). This perspective stresses the ability of societies to transform nature and also the implications of changes in the environment for social and economic systems. It argues that the exposure and susceptibility of a system can only be adequately understood if these coupling processes and interactions are addressed.

  3. 3)

    Holistic perspectives on vulnerability aim to go beyond technical modeling to embrace a wider and comprehensive explanation of vulnerability. These approaches differentiate exposure, susceptibility, and societal response capacities as causes or factors of vulnerability (see Cardona 2004; IDEA 2005; Birkmann 2006; Birkmann and Fernando 2008). A core element of these approaches is the feedback loop which underlines that vulnerability is dynamic and is the main driver and determinant of current or future risk.

  4. 4)

    In the context of climate change adaptation, different vulnerability definitions and concepts have been developed and discussed. One of the most prominent definitions is the one reflected in the IPCC Fourth Assessment Report, which describes vulnerability as a function of exposure, sensitivity, and adaptive capacity. This approach differs from the understanding of vulnerability in the disaster risk management perspective, as the rate and magnitude of climate change are considered. The concept of vulnerability here includes external environmental factors of shock or stress. Therefore, in this view, the magnitude and frequency of potential hazard events are to be considered in the vulnerability to climate change.

However, despite the differences in conceptualization and understanding, vulnerability can be taken as the propensity or predisposition to be adversely affected (IPCC 2012). In relation to the natural disaster research community, vulnerability is used to measure capacity to withstand natural hazards in different regions with distinctive economies (Timmerman 1981; Turner 2010), which can be seen as the characteristics and circumstances of a place that make it susceptible to the damaging effects of the natural hazard. In this research, vulnerability refers to the propensity or predisposition of exposed elements such as human beings, their livelihoods, and assets to suffer adverse effects when impacted by hazard events. Specifically, vulnerability is the characteristic of a place to be wounded and has little capacity to cope (defined as the ability to absorb the damaging impacts of a hazard and continue functioning), without the consideration of resilience (the ability to recover rapidly from disaster), for example, suppose there are two regions facing the same severity of natural hazards events and exposure of regional socioeconomic systems. The region which suffers more disaster loss has higher level of vulnerability.

Although the concept of vulnerability has been applied in many research fields, approaches for analyzing and assessing vulnerability are still preliminary. Researchers have proposed some frameworks and metrics for vulnerability assessment (Blaikie et al. 1994a; Klein and Nicholls 1999; Cutter et al. 2003; Boruff et al. 2005; Cutter and Finch 2008). The methods adopted by most current researchers are to establish an index of vulnerability based on proxy datasets that indicate vulnerability (Cutter 2005; Eakin and Luers 2006; Rygel et al. 2006). It is no doubt that these methods provide useful tools for identifying and monitoring vulnerability over time and space, as well as introducing way to understand the processes underlying vulnerability. However, this type of research has generated additional questions. The first issue relates to the choice of proxies for vulnerability assessment. Because of the complexities involved in the concept of vulnerability, researchers use different proxies to indicate vulnerability. For example, Alberini et al. (2006) proposed six indices for vulnerability assessment: the fraction of people affected by natural disasters, infant mortality, life expectancy at birth, average calorie supply per person per day, percentage of people with access to improved sanitation, and percentage of people with access to an improved source of drinking water. Cutter et al. (2003) identified a set of 42 independent variables that influence vulnerability. The second issue relates to the relative contributions of the variables used to define vulnerability. Some researchers choose not to weight variables differentially to allow for dissimilar effects (Cutter et al. 2000), while others explore distinctive weighting schemes designed to reflect variations in importance, such as Factor analysis, Analytic hierarchy process, and Expert scoring method (Brooks et al. 2005; Lazarus 2011). The mechanism explaining how these factors influence the vulnerability and the associations among them are complicated and still not clear. Thus, determining appropriate weighting is a challenge for this type of vulnerability assessment method (Lazarus 2011). Some scholars have proved that the result of the vulnerability assessment is very sensitive to both the proxy selection and the weighting method based on a detail and quantitative analysis (Schmidtlein et al. 2008). The third question is about the classification of the vulnerability, namely, which place is highly or lowly vulnerable to natural hazards, based on value of the vulnerability index. It is a tough and disputed issue to decide the threshold values for the category of different vulnerability level (Uitto 1998; Mitchem 2004; Holand et al. 2011). Traditional methods for classification of vulnerability mainly include cluster analysis, equal interval, quantile division, standard deviation, and the subjective division based on empirical experience; however, these methods for classification mostly are sensitive to the value variation of the computed vulnerability index (Rygel et al. 2006); for the same value of the vulnerability index, it will be classified into different vulnerable level according to the different classification method. So these classification methods are more or less arbitrary.

Taking a different approach, a model based on a data envelopment analysis (DEA) is proposed in an attempt to improve upon traditional methods because the DEA approach does not require determining weight parameters and provides a new classification method. In recent years, DEA modeling has been used to evaluate the performance of various entities engaged in different activities in varied contexts, including benchmarking in health care (hospitals, doctors), education (schools, universities), banks, manufacturing, and management evaluation (Uitto 1998; Tongzon 2001; Anderson 2002). But only limited DEA research in the field of natural disasters has occurred, some researchers have begun to use DEA models to carry out preliminary studies (Wei et al. 2004; Zou and Wei 2009). There are some shortcomings in their studies, including minimal discussion on the applicability of the DEA model and inadequate consideration of the regional physical context of natural hazards and vulnerability. This article first discusses the applicability of DEA modeling to the natural hazard vulnerability assessment. It then constructs indices that are used for vulnerability assessment based on a natural disaster system framework. Finally, an analytical model is developed. Based on the annual governmental statistics data from 2001 to 2008 (National Bureau of Statistics of China 2001–2008; Department of Finance and Administration, Ministry of Civil Affairs of China 2001–2008), the vulnerability to natural hazards in China’s mainland area is assessed as a case study.

2 Methods

2.1 Study area

China has frequently suffered the ravages of natural disasters. According to government statistics, during the 19 years from 1990 to 2008, on an annual average, natural disasters affected approximately 300 million people, destroyed more than three million buildings, and forced the evacuation of more than nine million people. The direct financial losses exceeded 200 billion Yuan (Information Office of the State Council of the People’s Republic of China 2009). Floods in the Yangtze, Songhua, and Nen river valleys in 1998, serious droughts in Sichuan province and Chongqing municipality in 2006, devastating floods in the Huai river valley in 2007, extreme cold weather and sleet in south China in early 2008, and the earthquake that shook Sichuan, Gansu, Shaanxi, and other regions on May 12, 2008 all caused tremendous losses. Study of China’s vulnerability to natural hazards can provide benefits if such research leads to the adoption of effective strategies for future disaster mitigation.

2.2 The applicability of the DEA model for vulnerability assessment

Data envelopment analysis is an efficiency evaluation model based on mathematical programming theory, which offers an alternative to classical statistics in extracting information from sample observations (Bowlin et al. 1984). In the DEA model, the sectors being assessed are decision-making units (DMUs), and each DMU is responsible for converting inputs into outputs. In contrast to parametric approaches such as regression analysis which fit the data through a single regression plane, DEA model optimizes each individual observation with the objective of calculating a discrete piecewise frontier determined by the set of Pareto-efficient DMUs (Charnes et al. 1978). In other words, DEA model is a methodology directed to frontiers (envelope) rather than central tendencies (Fig. 1), the focal point of which is individual observations as opposed to single optimization statistical approaches which focus on averages of parameters. Furthermore, this method is flexible enough to assess both single input–output systems and multi-input and multi-output systems. Unlike other assessing methods, the producing functions are not required in DEA, and the outputs are not affected by the dimensions of data (Ali et al. 1995). Additionally, it can be used to assess the relative efficiencies between different units. Taking a system with one input and two outputs, for example, Fig. 1 (I) contains a set of DMUs that use two outputs per unit of input. Compared with other DMUs, the DMUs in this example (A, B, C, and D) represent the optimum mix of outputs that system could be achieved. In the DEA model, these four DMUs are the most efficient DMUs within the system and define an empirical efficient frontier, and the relative efficiency of other DMUs can be calculated by these four efficient DMUs, namely on their “position” relative to the efficient frontier (Charnes 1995). Therefore, based on the concept of relative efficiency, researchers take the DEA model as an excellent and easily used methodology for modeling operational processes for performance evaluations, namely a useful tool for the efficiency evaluation of an “input–output” system or a production process. A more detailed introduction of DEA can be found in the handbook “Data Envelopment Analysis: History, Models, and Interpretations” (Cooper et al. 2011).

Fig. 1
figure 1

Comparison between the DEA model and the regression model

Data envelopment analysis provides the flexibility to permit “unconventional” variables such as the number of students graduated, the number of patients served, and even journal ranking (Liner 2002) to be used for efficiency evaluation. This paper applies this method in the vulnerability assessment. In the context of natural hazards, the occurrence of disaster loss from natural hazards events represents the product of interactions within the natural disaster system, which can be considered as an “input–output” system. Specifically, disaster losses (as output factors) are the products of interactions within the regional natural disaster system whose input factors include hazards, environmental background conditions, exposure units, and vulnerability (Shi 1996; Mileti 1999). Hazards are the physical processes of the earth system that threaten human society. The hazard events can be characterized by their magnitude or intensity, speed of onset, duration, and area of extent. The hazards as described here are purely physically defined. The environmental background condition is the context of the physical environment that aggravates or decreases the effects of hazards events, such as slope, elevation, soil, and vegetation condition ; some specific environmental conditions (such as low-lying islands, coastal zones, mountain regions, dry lands, and floodplains) lead to the creation of hazards impacts or an increase in the levels or damage potential of existing ones, for example, even a small storm will cause the dangerous debris flow in the mountain areas that have the low vegetation and loose soil. Exposure units mainly include all types of human activities, such as the buildings, properties, and people. And the disaster loss as measured in human terms (lives lost, people affected, and economic losses) is the outcome of the hazards, environment background condition, and exposure units, mediated by the characteristics (the vulnerability defined above) of the human system that is exposed to and affected by the hazard. For example, flood as a hazard is dangerous only when a locale is near a river or has a low elevation, which can be seen as the action of the environmental background conditions; when a certain criticality level is exceeded, the hazard becomes a real threat. Hazard and environmental background conditions are only external conditions for disaster losses—the hazard danger is relative to human socioeconomic activities in a floodprone place. These regional human activities are the internal reason for disaster losses. A place devoid of human activity is not in danger, so the role of human activity is to create exposure units that experience damage in a flood. While human activities and structures constitute components that are the prerequisites for disaster losses, only the vulnerable locales will suffer losses. The variable vulnerability is a hidden characteristic of regional human activity, which is simply reflected in the differential severity of the disaster loss sustained by different areas under the same natural hazards scenario, in other words, the efficiency of the hazards events to produce disaster losses.

Regarding the process of disasters, the essence of relative efficiency in DEA model is similar to the vulnerability.Footnote 1 More generally, the vulnerability to natural hazards is reflected in the production efficiency of loss caused by natural disasters. Usually, a region with higher vulnerability, under the same level of natural hazard impact, will suffer more severe damage. While on the other hand, for regions with low vulnerability, because of the low efficiency in generating damages, the loss caused by natural disasters is usually relatively light. However, the DEA method is usually used to assess the relative efficiency of real-bodied producing sectors, such as factories, enterprises, and government departments. To make it more convenient to be assessed, in this study, the 31 basic assessment units (including 22 provinces, 5 autonomous regions, and 4 municipalities of China, except Hong Kong, Macau, and Taiwan province) are taken as the virtual sector for ‘‘production of losses.’’ The relative efficiency calculated by a DEA model is used to reflect the vulnerability to natural hazards (Fig. 2).

Fig. 2
figure 2

Framework for the assessment of the vulnerability to natural hazards based on DEA models

Since the DEA method was developed, various DEA models have been proposed, such as C2R, BC2, C2GS, C2WH, and C2W (Charnes et al. 1989). In this article, the classical C2R model is adopted for analysis. The concept of DEA is developed around the basic idea that the efficiency of a DMU is determined by its ability to transform inputs into the desired outputs. This concept of efficiency was adopted from engineering, which defines the efficiency of a machine/process as Output/Input ≤ 1. DEA generalizes this single output/input technical efficiency measure to multiple outputs/inputs by constructing a relative efficiency measure based on a single “virtual” output and a single “virtual” input. The efficient frontier is then determined by selecting DMUs that are most efficient in producing the virtual output from the virtual input. Because DMUs on the efficient frontier have efficiency score = 1, inefficient DMUs are measured relative to the efficient DMUs.

More formally, assume that there are n DMUs to be evaluated. Each DMU consumes varying amounts of m different inputs to produce s different outputs. Specifically, DMUj consumes amounts X j  = {x ij } of inputs (i = 1, 2 … m) and produces amounts Y j  = {y rj } of inputs (r = 1, 2 … s). The s × n matrix of output measures is denoted by Y, and the m × n matrix of input measures is denoted by X. Also, assume that x ij  > 0 and y rj  > 0. Consider the problem of evaluating the relative efficiency for any one of the n DMUs, which will be identified as DMU0 Relative efficiency for DMU0 is calculated by forming the ratio of a weighted sum of outputs to a weighted sum of inputs, subject to the constraint that no DMU can have a relative efficiency score greater than unity. Symbolically:

$$ \mathop {\max }\limits_{u,v} \frac{{\sum\nolimits_{r} {u_{r} y_{r0} } }}{{\sum\nolimits_{i} {v_{i} y_{i0} } }} = \frac{{u^{T} Y_{0} }}{{v^{T} X_{0} }}, $$

where u = (u 1 , … , u s )T, v = (u 1 , … , v s )T

subject to

$$ \frac{{u^{T} Y_{j} }}{{v^{T} X_{j} }} = \frac{{\sum\nolimits_{r} {u_{r} y_{rj} } }}{{\sum\nolimits_{i} {v_{i} y_{ij} } }} \le 1, $$

for j = 1, 2 … n; u r , v i  ≥ 0 for r = 1, 2 … s and i = 1, 2 … n; where u r and v i are weights assigned to output r and input i, respectively.

For this fractional programming problem with a potentially infinite number of optimal solutions, Charnes et al. (1978) were able to specify an equivalent linear programming problem (LP). This requires the introduction of a scalar quantity (θ) to adjust the input and output weights:

$$ \theta = \frac{1}{{v^{T} X_{0} }},u^{T} = \theta u^{T} ,w = \theta v^{T} $$

Appropriate substitutions produce the CCR LP problem:

$$ \mathop {\max }\limits_{u,v} \theta_{0} = \sum\limits_{r} {u_{r} v_{ro} = u^{T} Y_{0} } $$

subject to

$$ w^{T} X_{0} = \sum\limits_{i} {w_{i} x_{i0} } = 1,\sum\limits_{r} {u_{r} y_{rj} - \sum\limits_{i} {w_{i} x_{ij} \le 0,u_{r} ,w_{i} \ge } } \varepsilon $$

where the value of θ 0 is the relative efficiency of DMU0 and ε is a positive constant, called the non-Archimedean infinitesimal, which is introduced to facilitate solving of the LP problem, generally taking ε = 10−6. In DEA model, this LP is known as the C2R model, as it was developed by Charnes, Cooper, and Rhodes. In relation to the quantitative assessment of vulnerability to natural hazards, the above formula can be transformed in a compacted form as follows:

Consider that there are n decision-making units DMU j (j = 1, 2 … n) (the region prepared for vulnerability assessment). In model (1), the objective is to assess the vulnerability to natural hazards in each province, where the 31 provinces of China are taken as DMUs. Parameters are presented as below, where θ (0 < θ ≤ 1) is the Efficiency in DEA terms––it represents the relative severity of the impacts from natural disasters under certain regional natural hazards system situations. If the θ score of one DMU unit is closer to 1, it indicates that this province has a high input–output ration, namely high efficiency for the production of disaster loss, in other words, more vulnerable to natural hazards. When for a certain region, θ = 1 is the efficient DMUs in DEA term, while, under the circumstance of natural hazards, it means that the region gets the relative severest impacts from natural hazards; thus, regional vulnerability is the relative highest. X j is the input indicators of the jth region. In this study, they are the selected indicators for the hazards, environmental background conditions, and exposed units; Y j is the output indicators of the jth region––they are the indicators for the natural disaster losses. λ j (j 0 ) as weight variable, it is the weight value of the jth country when the j 0 th country is evaluated, also(j 0 ) ≥ 0; S is a slack variable, S + is the remnant variable; in this programming, θ, S +, S , and λ j (j 0 ) are parameters to be estimated. In this research, the free DEAP Version 2.1 software is used for computation (Coelli, 1996).

$$ \left( \begin{array}{l} \min [\theta - \varepsilon (\hat{e}^{T} S^{ - } + e^{T} S^{ + } )] \hfill \\ s.t.\sum\limits_{j = 1}^{31} {X_{j} \lambda_{j} (j_{0} ) + S^{ - } } = \theta X_{0} \hfill \\ \sum\limits_{j = 1}^{n} {Y_{j} - S^{ + } } = Y_{0} \hfill \\ \lambda_{j} (j_{0} ) \ge 0,X_{j} \ge 0,Y_{j} \ge 0 \hfill \\ S^{ - } \ge 0,S^{ + } \ge 0 \hfill \\ \hat{e}^{T} = (1,1, \ldots ,1) \in E_{m} ,e^{T} = (1,1, \ldots ,1) \in E_{n} \hfill \\ \end{array} \right. $$
(1)

To assess the vulnerability to the natural hazards, a basic assumption in this study is that, according to the essence of ‘‘propensity or predisposition to be impacted’’ under the same natural hazards scenario, the vulnerability is proportional to the loss from disasters, or in other words, they are positively correlated. Therefore, when measuring the vulnerability to natural hazards, it is reasonable to infer that the vulnerability will decrease when the loss from disasters decreases and vice versa.

2.3 The classification of vulnerability to natural hazards

After the quantitative assessment of vulnerability, the next issue is how to classify vulnerability. Based on the efficient DMUs calculated by the DEA model, this paper presents a new method for classification. To proceed, it is necessary to introduce the concept of non-domination (Steuer 1989). In this paper, a non-dominated case is a case that has no other cases in the dataset that are clearly more vulnerable than this particular case, by virtue of scoring at least as high or higher on all components. A non-dominated set of cases is all the non-dominated cases in the dataset. By determining the non-dominated set of cases in the complete dataset, removing them from the dataset, determining the non-dominated cases among the remaining cases, and then repeating this process, the investigator can make the vulnerability ranking in the dataset. The non-dominated set of cases at each repetition of this process is called the Pareto-optimal front or Pareto ranking (Goldberg 1989), a term referring to the notion of Pareto optimality from welfare economics (Pareto 1896; Johansson 1991). In fact, DEA (Charnes et al. 1978) is an alternative approach from econometric analysis that relies on the same underlying concepts as Pareto ranking, but is a conceptually and practically more complicated method, and the non-dominated sets of cases are the same as the efficient DMUs (θ = 1) which construct the efficient frontier in the DEA model (Clark et al. 1998).

The process is illustrated in Fig. 3. Taking a simple supposed case of DEA model, for example, if we still consider a case where there is only one input, but two heterogeneous outputs, the classification method can be relatively easily visualized. For each DMU, we can calculate the output for each unit of input and then plot the relative outputs on a two-dimensional graph, as the first panel shows. The second panel shows a non-dominated set of DMUs (note that these points are the efficient DMUs and form the production frontier, in DEA terms), from the perspective of vulnerability assessment, which can be considered as the most vulnerable DMUs. With the first Pareto rank of the efficient DMUs removed from the dataset, a new set of non-dominated block groups is identified in panel 3. This process continues with each rank being “peeled away” like the layers of an onion until all the DMUs have been assigned a vulnerability ranking, as shown by the lines in the final panel. Finally, we can make the classification of vulnerability to the natural hazards based on the division of different ranking. While these illustrations are in two dimensions for clarity, precisely the same logic and procedure can be applied to higher-dimensional data. Differing from the traditional classification methods which need to specify the classification criteria or threshold, DEA makes classification easier and more realistic because it enables derivation of an efficiency envelope (efficient frontier), which contains the most vulnerable provinces of the group analyzed, against which all other provinces are compared.

Fig. 3
figure 3

The illustration on the process of vulnerability classification

2.4 Index construction and data processing

As discussed above, disaster loss is the synergic action of the natural disaster system. Specifically, the regional natural hazards system can be considered as negative “production activities,” and the “input” factors include the dangerousness of the natural hazards, namely the combination of the magnitude and frequency of regional natural hazards and the conditions of the natural environment, such as the elevation, the slope, the geological conditions, the vegetation, and so on, as well as the exposure of the socioeconomic system, such as the area planted with crops, the total population size, the level of economic activity, the existing infrastructure, and other similar variables. The “output” elements are the disaster losses, such as the size of the affected area, the number of people impacted, the scale of economic losses, and the extent of casualties. Efficiency of the “production of disaster losses” can be taken as a reflection of vulnerability, the higher the efficiency, the higher the vulnerability of the region.

This research builds the input and output factors for the DEA model of regional natural hazards from three aspects: (1) the dangerousness of regional hazards (DI), an input factor; (2) the exposure of the regional socioeconomic system (EI), an input factor; and (3) the regional natural disaster losses (LI), an output factor. The efficiency calculated by the DEA model can be used as a surrogate by which to assess regional vulnerability to natural hazards. Considering the availability and representativeness of data, we made a selection of the following indicators for each index (Table 1). Because disaster losses fluctuate greatly over the years, single-year disaster loss data are unsuitable for vulnerability analysis. We use the average of disaster losses over multiple years (2001–2008) instead. Correspondingly, in order to keep consistency between the model input and output, we also take the average value of indicators of exposure of regional socioeconomic system (Table 1) for analysis.

Table 1 Indicator system and data for vulnerability assessment

Although the DEA model can be used to assess the relative efficiency of the DMUs with multiple inputs and outputs, the number of the inputs and outputs is not unlimited. There is an important rule of thumb that the number of DMUs must be no less than the number of inputs and outputs, and the variables must have low correlation, or the capacity of DEA model to calculate efficiency will be reduced (Anderson 2002). So it is better to reduce the number of inputs and outputs when the number of DMUs is limited, although it is also important to maintain the original information represented by the input and output variables. Factor analysis is a useful tool for reducing the number of variables while minimizing the information loss of the original variables. Because the factor analysis is a beneficial method to reduce correlated variables, the extracted principal components have low correlation (Wood 2008). This research uses factor analysis to process the indicators by using the extracted principal components as the input and output variables. Because the principal components extracted will have negative values that are prohibited in the DEA model, data transformation is needed. We make a linear stretch of the data to a new data range of 1–5 by using the minimum–maximum standardization method. The transformation is performed as follows:

$$ V' = \frac{{V - \min_{A} }}{{\max_{A} - \min_{A} }}(new\_\max_{A} - new\_\min_{A} ) + new\_\min_{A} $$
(2)

where the v′ is the new data after normalization, v is the original data, max A and min A represent the maximum and minimum of original datasets, respectively, while new_max A and new_min A taken as the maximum and minimum value of new dataset correspondingly.

3 Results and analysis

3.1 The result of factor analysis

We first perform a factor analysis of the DEA model input and output factors. Two principle factors are extracted from the indicators of exposure of regional socioeconomic system, and two other principle factors are extracted from regional natural disaster losses (Table 2). From the loadings of each principal component, we can see that the principal exposure factors of the regional socioeconomic system largely describe the density (factor 1) and the total amount (factor 2) characteristics of regional exposure. The principal factors of regional natural disaster losses reflect human and economic losses (factor 1) and agricultural losses (factor 2). We use these four new principle factors as the final variables of the DEA model to calculate the efficiency of each DMU.

Table 2 The extracted principal components based on factor analysis

3.2 The vulnerability assessment based on production efficiency of disaster loss

After data processing, we obtained the production efficiency of natural disaster losses of the 31 DMUs using the DEAP Version 2.1 software (Table 3), which can be used for the analysis of vulnerability. The efficiency of Inner Mongolia, Heilongjiang, Jiangxi, Hunan, Hainan, Chongqing, Sichuan, Guizhou, Yunnan, and Ningxia provinces are all highest (θ = 1), which means that the relative production efficiency of natural disaster losses is highest in all these DMUs. From a vulnerability perspective, these regions are more easily affected by the natural hazards and thus have higher vulnerability. As a result, for the same natural hazards (with the same model inputs), more disaster losses are suffered as compared with other regions. In fact, these regions are the ones typically vulnerable to natural hazards. For example, the Hainan province suffers from typhoons every year, and the Yunnan province experienced a severe drought 3 years ago. Generally, these places often suffer great economic losses due to the disastrous consequences of these events. The four regions that have the lowest efficiencies are Shanghai (0.438), Beijing (0.463), Jiangsu (0.561), and Guangdong (0.572). These places are the most economically and socially developed regions in mainland China and have a greater capacity to invest in disaster prevention. Even if natural disasters caused great damage to these places, compared to the huge size of the regional economy, the disaster losses would be relatively modest. As a consequence, these regions have a low vulnerability to natural hazards. The vulnerability as explained by the efficiency of DEA model is consistent with the actual vulnerability situation in China, so this method produces a good analytical result.

Table 3 The vulnerability assessment for regional natural disasters in China

3.3 Social factors are the main reason for the vulnerability

Vulnerability to natural hazards has a close relationship with the physical and social environments of a region. To determine the key factor that causes vulnerability, a correlation analysis was carried out in this research. Three indicators were selected as independent variables in the correlation analysis with the production efficiency of natural disaster losses: (1) the proportion of primary industry in GDP; (2) per capita GDP; and (3) the dangerousness index of regional hazards (Table 4). The results show that among the three indicators, the ratio of the primary industry has a predominant positive correlation with vulnerability. The reason for this correlation is the high sensitivity of the agricultural sector to the impacts of natural hazards. Per capita GDP as the representative of regional economic capacity has a significant negative relationship with vulnerability, because the more developed an area is, the higher is the level of preventative measures deployed to mitigate the impact of natural hazards. The danger index of regional hazards does not have a significant correlation with vulnerability. It is even slightly negatively related to vulnerability, which is contrary to common sense. This is because some regions in the high natural hazard danger areas may have low vulnerability due to a strong regional economic capacity, such as Guangdong, Jiangsu, and Zhejiang provinces, which are the most developed areas of China. These regions suffered the severest typhoons every year, but the disaster losses are much lower than in the neighboring undeveloped provinces where the degree of typhoon is much lower. In fact, the first two variables can be taken as representatives of the socioeconomic situation, while the third variable acts largely as the deputy of the physical factors. We conclude that the social environment is the main factor affecting regional vulnerability to natural hazards. That is, disasters are not caused by natural hazards, but by the social systems that make people or regions vulnerable (Cutter 2010).

Table 4 The correlation analysis of the vulnerability with the physical and societal environment indicators

3.4 The classification and mapping for vulnerability

Based on the approach of Pareto ranking proposed above, four production frontiers can be extracted from the 31 DMUs; the efficient DMUs of each production frontier are listed as follows (Table 5). According to the order of the production frontier excluded from Pareto ranking, this paper makes a classification of the 31 DMU units for four levels, which range from high vulnerability to low vulnerability, and the levels are defined as severe vulnerability, high vulnerability, medium vulnerability, and slight vulnerability.

Table 5 The result of the vulnerability classification

To illustrate the geographic patterns of vulnerability to natural hazards, we created a map of vulnerability to natural hazards (Fig. 4), which shows the geographic distribution of vulnerability in mainland China. Regions with slight vulnerability are concentrated in the developed eastern coastal areas of China, while regions with severe vulnerability are mainly located in the central and western areas of China. Most of the regions in the central and western areas have medium or high vulnerability. Therefore, the overall vulnerability level of China decreases from west to east. This trend has a close relationship with the characteristics of the natural environment and the country’s level of development. This result may provide some useful input to the natural hazards mitigation work of the government. In fact, the Chinese government’s current policies for disaster relief are directly opposite to what this research has suggested. For disaster relief purposes, provinces in mainland China are divided into three regional groups: eastern, central, and western. Provinces in western China, though underdeveloped, receive fewer central government relief funds compared with provinces in well-developed eastern China. Relief funds distributed to provinces in central China are at an intermediate level. In fact, the present Chinese government relief funds mainly consider the total loss, as shown in Table 6. However, the difference of the direct economic losses among the three major districts is not significant, but the gap in the relative disaster losses is obvious. The rates of economic losses in terms of total GDP in the central and the western district are much higher than in the eastern district. Obviously, the present policy of relief funds is not suitable. Our research proves this point from the perspective of vulnerability and suggests a totally different approach than current practice. For future natural hazard prevention planning, the government should increase the prevention and mitigation input into the central and western parts in order to improve regional resilience and reduce vulnerability. This would especially enhance the disaster prevention capacity of regions with severe vulnerability. It should be emphasized that this is only the holistic perspective for the hazards mitigation policy. Because the natural hazards tend to operate on much more localized scale, the typical vulnerable regions of all districts should also be taken as the focal point of the mitigation work. Additionally, considering that China has a large and critically important agricultural sector but possesses weak and poorly resilient agricultural infrastructure, sustained efforts should be made to increase the ability of the agricultural sector to fight natural hazards.

Fig. 4
figure 4

The geographic pattern of the regional vulnerability to natural hazards of mainland China

Table 6 The average economic impact of natural hazards among the three major districts from 2001 to 2008

4 Conclusions

Presently, the most often used metrics for quantitative vulnerability assessment emphasize identifying suitable vulnerability indicators and constructing an overall vulnerability index from those indicators. However, these methods have some constraints. Because of the complexity of vulnerability, which has various tangible and intangible aspects and causes in different dimensions, involving multiple variables (physical, social, cultural, economic, institutional, and environmental), different people have different opinions. For instance, vulnerability as a pre-existing condition and focuses on potential exposure to hazards (Hewitt et al. 1971; Liverman and O’Brien 1991; Haque and Blair 1992; Alexander 1993), or vulnerability depends on the coping ability of those affected (Anderson and Woodrow 1991; Downing 1991; Watts and Bohle 1993; Clark et al. 1998; Wu et al. 2002), suggesting that not all individuals and groups exposed to a hazard are equally vulnerable; people living at the margins, such as those without access to social services or political power, are more vulnerable than those with better access to resources (Adger and Kelly 1999). Finally, different composite indices and weighting measurements are proposed to capture their own favorable qualities. In essence, these deficiencies can be ascribed to the fact that the mechanism explaining how these factors influence the vulnerability and the associations amongst them are too complicated and still not clear.

This study proposes a new method of evaluating vulnerability to natural hazards from different perspectives by using a DEA model. In contrast to previous quantitative vulnerability assessments, which emphasize understanding the causes or mechanism of vulnerability, this research has assessed vulnerability from the perspective of the “production of disaster loss” in the natural disaster system. The production efficiency of natural disaster losses is taken as a reflection of vulnerability, and a new method for classifying vulnerability is also proposed. DEA models provide a useful tool, due to its non-parametric technique and accommodation of more than one output and input measure. In case of this natural hazards vulnerability assessment, the ability to handle more than one input and output and without supposing the production function is particularly appealing because the natural hazards system is complicated with many factors. In fact, the goal for research of the more often used metrics for the vulnerability assessment is to make a quantitative assessment based on the understanding of the reasons and mechanism of vulnerability, which can be seen as a direct method to study vulnerability. The purpose of this paper is to assess vulnerability from a different perspective, namely taking the mechanism or cause of vulnerability as a “black box” and finding something which can reflect vulnerability, not the indicators which cause vulnerability themselves, which can be taken as an indirect method. Additionally, it is clear that the two methods can supplement each other and help people to have a deeper understanding of vulnerability.

Based on the DEA model, the vulnerability to natural hazards of China is analyzed; the results show that the geographic pattern of vulnerability displays a decreasing trend from western China through central China to eastern China. Compared with the physical context, social context and in particular economic development have a much greater influence on regional vulnerability. We have found a negative correlation between the level of vulnerability and regional economic development level. The more economically developed regions in China have relatively lower vulnerability, since these places have more capacity and resources to prevent and resist, when natural hazards do happen. In fact, at a macroeconomic level, low levels of income and GDP are usually key drivers of vulnerability to natural hazards (Benson and Clay 2000; Mechler 2004). Besides, the structure of regional socioeconomic systems also has a close relationship with vulnerability; for instance, the proportion of primary industry in GDP has a significant positive correlation with vulnerability. These results are relevant to disaster relief policy because they suggest that government should pay more attention to underdeveloped and major agricultural areas of the country for disaster prevention and reduction. While it would be of great practical value to analyze how the vulnerability of different regions change over time, due to the limitations of our data, in this study we could only make a static assessment of regional vulnerability. Studies identifying changing vulnerability through time will be pursued in our future research.