Keywords

1 Introduction

Construction industry is one of the pillar industries in China. Evaluation on construction industry development can reflect the differences of regional construction industry and can guide the market to allocate resources efficiently, thereby improving the overall competitiveness of China’s construction industry. There are two types of evaluation methods (Wang Jia-yuan and Yuan Hong-ping 2007), namely subjective method and objective method. The former determines the weight of each evaluation index by experts’ subjective judgment according to their own knowledge and experience, such as analytic hierarchy process, fuzzy comprehensive evaluation, etc.; the latter determines the weight according to the objective relationship between the indexes, such as DEA (data envelopment analysis), principal component analysis, etc. (Xue et~al. 2008; Taewoo Youa and Hongmin Zib 2007; Tsolas 2011; Ruan Lian-fa and Zhang Yue-wei 2009; Deng Rong-hui and Xia Qing-dong 2006; Kang Xue-zeng and Meng Gang 2008). However, there are some deficiencies of these methods: due to the limitations of experts’ knowledge and experience, differences exist between expert’s weights and actual situation, which influences evaluation results; information overlap or high correlation between indexes makes the results not tally with actual situation.

Some scholars apply cluster and factor analysis to the research of the sustainable development, growth levels of construction industry, having achieved valuable results (Wang Lei et~al. 2006; Wang Xue-qing et~al. 2011; Kale and Arditi 2002; Wang Wen-xiong and Li Qi-ming 2008; Zhou Jian-hua and Yuan Hong-ping 2007). However, the indexes selected are incomplete. Therefore, on the basis of widely collecting and sorting the existing evaluation index, this paper proposes an evaluation index system, which can reflect the construction industry development and is also suitable for factor analysis and clustering. By adopting the data of China Statistical Yearbook 2011, the paper evaluates the construction industry development of 31 regions in China and categorizes them based on regional similarity through cluster analysis.

2 Methodology

2.1 Factor Analysis

The purpose of factor analysis is to describe the covariance relationships among observed and correlated variables in terms of a few underlying but unobserved random quantity variables called factors. In other words, it is possible that variations in three or four observed variables mainly reflect the variations in fewer unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modeled as linear combinations of the potential factors, plus “error” terms and the factor model is motivated by the hypothesis that variables can be grouped by their correlations (DeCoster 1998; Factor Analysis 2013).

2.2 Hierarchical Clustering

Cluster analysis is a task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis (Du Gang 2003).

Hierarchical clustering is based on the core idea of objects being more related to nearby objects than to objects farther away. As such, these algorithms connect “objects” to form “clusters” based on their distance. A cluster can be described largely by the maximum distance needed to connect parts of the cluster. At different distances, different clusters will be formed, which can be represented by a dendrogram.

3 Evaluation Index System

According to the principle of purposefulness, scientificalness, integrity, and operability and base on relevant researches, the paper designs an evaluation index system (see Table 1) to reflect construction industry development. The index system includes two hierarchies: first-level indexes and second-level indexes. First-level indexes contain six indexes, which are six aspects of construction industry development; Second-level indexes consist of 16 basic indexes.

Table 1 Evaluation index system

4 Factor Analysis & Results

Based on the evaluation index system, the paper analyzes and evaluates construction industry development of 31 provinces in China by using SPSS 18.0 with data collected from China Statistical Yearbook (2011).

4.1 KMO Test and Bartlett’s Test of Sphericity

KMO Test measures whether the samplings are enough for factor analysis and whether the partial correlation coefficient between the variables is too small. Bartlett’s Test of Sphericity tests whether correlation coefficient matrix is a unit matrix. If it is a unit matrix, it is not suitable for adopting factor model (Table 2).

Table 2 KMO test results and Bartlett test results

Kaiser gave the KMO Test standard about whether it is suitable for factor analysis: KMO > 0.9, quite suitable; 0.9 > KMO > 0.8, suitable; 0.8 > KMO > 0.7, generally suitable; 0.7 > KMO > 0.6, not quite suitable; KMO < 0.5, not suitable. SPSS results show that the variables have passed the KMO Test passes. And Bartlett’s Test of Sphericity = 817.557; significance = .000, which means that the variables have passed Bartlett’s Test of Sphericity. So the variables that the paper selects are suitable for factor analysis.

4.2 Factor Analysis Process and Results

In the process of factor analysis, the paper extracts four common factors by principal components method. Then by using Quartimax method to rotate the factor load matrix, we can obtain the factors’ scree plot (see Fig. 1), characteristic value and contribution rate (see Table 3), and rotated component matrix (see Table 4).

Fig. 1
figure 1

Scree plot

Table 3 Characteristic value and contribution rate of common factors
Table 4 Rotated component matrix

Table 3 shows that the accumulative contribution rate of four extracted common factors is 88.745 %, which is bigger than 85 %, i.e., the extraction of common factor is effective. The original 16 indexes can be integrated into four common factors: F1, F2, F3 and F4. According to the principle of factor analysis, the four common factors have no correlation with each other, but each common factor is highly correlated with its own contained original variables.

Table 4 shows the correlation coefficient between common factors and their own contained original variables. The first common factor F1 has a large load in Number of Enterprises (X1), Number of Employed Persons (X2), Total Assets (X3), Gross Output Value (X4), Value Added of the Construction Industry (X5), Total Tax (X9), The Proportion of Employment (X10), Total Number of Machinery and Equipment Owned (X12), Total Power of Machinery and Equipment Owned (X13) and Net Value of Machinery and Equipment Owned (X14). These ten indexes reflect the scale, economic and social benefits, equipment and assets of regional construction industry, so F1 can be denominated Total Factor.

The second common factor has a large load in Overall Labor Productivity In Terms of Gross Output Value (X11), Value of Machines per Laborer (X15) and Power of Machines per Laborer (X16). These three indexes reflect the labor productivity and technological level, so F2 can be denominated Productivity and Technology Factor.

The third common factor has a large load in Per capita GDP (X6) and Per capita Profit (X7), both of which reflect the Per capita level. So F3 can be denominated Per capita Factor.

The fourth common factor has a large load in Rate of Return on Common Stockholders’ Equity (X8), which reflects profitability of construction industry in different regions. So F4 can be Profitability Factor.

As a result, it is suitable to use Total Factor (F1), Productivity and Technology factor (F2), Per capita Factor (F3) and Profitability Factor (F4) to represent the original variables and evaluate regional construction industry development.

By using SPSS 18.0, it is easy to obtain the scores and rankings of each common factor of 31 regions. Set contribution rates of each common factor as weight and conduct linear weighted summation to obtain comprehensive scores and rankings (see Table 5). The calculation formula of comprehensive scores is as follows:

Table 5 Factor analysis results and clustering result of 31 regions
$$ F=0.5328\times {F}_1+0.1515\times {F}_2+0.1347\times {F}_3+0.0685\times {F}_4 $$
(1)

5 Clustering & Results

Take the Total factor, Productivity and Technology Factor, Per capita Factor and Profitability Factor as independent variables for cluster analysis and adopt method of between-groups linkage and measure of squared Euclidean distance to conduct hierarchical cluster analysis to generate Dendrogram (see Fig. 2).

Fig. 2
figure 2

Cluster genealogy chart

6 Discussion

From comprehensive score and ranking Table 5 and clustering, 31 regions can be categorized into five clusters.

The first cluster includes Beijing and Shanghai. The respective comprehensive rankings of these two regions are 3rd and 7th, with Total Factor 10th and 9th, Productivity and Technology Factor 9th and 16th, Per capita Factor 1st and 2nd, but the respective rankings of Profitability Factor are 30th and 27th, which have an obvious gap with former factors. Therefore, it can be categorized as: upper-middle scale, medium productivity and technology, high per capita and low profitability.

The second cluster includes Jiangsu and Zhejiang. The respective comprehensive rankings of these two regions are 1st and 2nd, and Total Factor also 1st and 2nd. But the other three common factors all rank low. It can be concluded that Total Factor has a huge impact on comprehensive ranking. The scores of Total Factor are 3.38454 and 2.53706 respectively, and much higher than Shandong’ score of 1.143, which ranks 3rd. These two regions can be categorized as: large scale, medium-lower productivity and technology, medium-lower per capita and profitability.

Tianjin is a special region and can be categorized by itself. Its score of Total Factor ranks 15th, but Productivity and Technology Factor scores 3.00921 and ranks 1st, much higher than the 2nd ranking score of 1.79 of Qinghai; its score of Per capita Factor ranks 5th, but Profitability Factor 21st. The comprehensive ranking is 6th, which shows that Productivity and Technology Factor improves the comprehensive score a lot and the construction industry of Tianjin is developing towards high productivity and technology. It can be categorized as: middle scale, high productivity and technology, high per capita and medium-lower profitability.

The fourth cluster includes Inner Mongolia and Tibet, the comprehensive ranking of which are 16th and 22nd. The scores of Per capita Factor ranks 3rd and 4th, and Profitability Factor 1st and 2nd. But the score of Total Factor and Productivity and Technology Factor are ranking low. It indicates that the construction industry of Inner Mongolia and Tibet is small-scale, low-productive and low-technological, but due to their small population both of the per capita level and profitability are high. It can be categorized as: small scale, low productivity and technology, high per capita and profitability.

The fifth cluster includes the rest 24 regions. The comprehensive scores of these 24 regions span from 0.58307(Shandong) to −0.87562 (Hainan), and they represent the basic development situation of China’s construction industry. The score of each common factor in these regions is not high, which shows that the overall development of China’s construction industry is not good and it is still in primary stage no matter from which point of view of the scale, productivity and technology, per capita or profitability. This cluster can be classified as: middle scale, medium productivity and technology, medium per capita and profitability.

7 Conclusion

The study constructs an index system, and then applies factor analysis and cluster analysis to conduct an empirical study on construction industry development of 31 regions in China by using SPSS 18.0. All regions are categorized into five clusters by four extracted factors: total factor, efficiency and technology factor, per capita factor and profitability factor. And the results show that significant differences exist in development level of construction industry among different regions. The purpose of the paper is to help policy-makers and industry practitioners find their own positions, and improve competitiveness.