Introduction

With the increase of population and rapid economic development, surface water has received large amounts of pollution from a variety of sources such as urban wastewater, industrial and agricultural activities, as well as the assimilation and transport of pollution effluents (Bowen and Depledge 2006; Milovanovic 2007). Human health is largely influenced by water pollution, as water pollution affects sustainability of the aquatic ecosystem and social economic development in some regions. Nutrients from farmland fertilizer and urban wastewater discharged into rivers contribute main pollutants to the surface water in a catchment, thereby tending to induce serious ecological problems such as eutrophication and environmental pollution (Wang et al. 2007). It is therefore essential and urgent to control water pollution and improve the water quality, and regularly implement monitoring programs which give the tools to help understand the spatial and temporal variations of the surface water quality.

Taihu Basin is one of the most industrialized regions in China with high population density, urbanization, and economic development. The area covers only 0.4% of territory of China while contributing about 11% of Gross National Product (GNP) and more than 14% of China’s gross domestic production (Qin et al. 2007). Since the 1980s, rapid development of local economy and increased population and urbanization has resulted in pollutants being produced and discharged into rivers and lakes. In recent years, serious water pollution problems have extracted much attention due to the fact that algal blooms occurred much more frequently, extending its coverage, while simultaneously persisting throughout the summer. This environmental issue seriously affects the lake, as a supply of drinking water (Qin et al. 2007). In the late May of 2007, Lake Taihu was overtaken by a major algae bloom, leaving approximately two million people without drinking water for at least 1 week.

In order to examine the pollution sources and improve the surface water quality, numerous studies have focused on eco-environmental issues in Lake Taihu and analyzed the loss and load of pollutants from different pollution sources (Zhang et al. 2003; Gao et al. 2004; Guo et al. 2004; Qin et al. 2007). However, research on the water quality at the whole catchment scale is very limited, thus improvements to research should be undertaken. A large number of studies focused on the pollution and eutrophication of Lake Taihu (Chen et al. 2003), whereas others focused on the status of pollution in principal rivers around the lake (Wang et al. 2007; Xie et al. 2007; Xu et al. 2009). These studies provided good insights into the spatial and temporal characteristics of the surface water quality in the Taihu Basin. However, the intensive agricultural activities and the rapid urbanization of the basin heavily influenced the natural flows, well developed drainage network which received large amounts of contaminants from industry, domestic wastewater, agriculture, aquiculture and livestock, and thus not only making it difficult to identify the pollutants sources but also assess characteristics of the surface water quality (Xu et al. 2009).

To better understand temporal and spatial patterns of water pollution and variance of the aquatic ecosystems, monitoring data are usually interpreted by applying multivariate statistical techniques (Yidana et al. 2008), such as: cluster analysis (CA), principal component analysis (PCA), discriminant analysis (DA) and factor analysis (FA). The multivariate statistical techniques are considered as useful tools for both simplifying the complicated data sets consisting of water quality variables, and extracting meaningful interpretation (Vega et al. 1998; Wunderlin et al. 2001; Bu et al. 2009). In this study, physicochemical parameters of water quality at 22 sites were surveyed and analyzed monthly from 2001 to 2002 in the Taihu Basin. Three different multivariate statistical techniques: PCA, CA, and fuzzy synthetic evaluation (FSE) were adopted to assess the characteristics of water quality and classification of water samples in the surface water bodies. The objectives of this paper are to: (1) identify the spatial and temporal characteristics of the physicochemical variables of water quality, (2) assess surface water quality using FSE, and (3) determine important factors and sources influencing the water quality.

Materials and methods

Study area

Taihu Basin is located near the downstream of the Yangtze River (Fig. 1), including parts of Jiangsu Province, Zhejiang Province, Anhui province and Shanghai city in administration with an area of approximately 36,895 km2. It is dominated by subtropical summer monsoons, with an average annual rainfall about 1,177 mm. There are various kinds of topographical situations, not only high mountainous and hilly area towards west, but also low alluvial plains in northern and eastern parts, occupying 80% of the whole basin (Fig. 1). The drainage networks are well developed and heavily diverted by human beings for flood control and agricultural irrigation (Fig. 1). There are more than 200 rivers distributed in the whole watershed, and 172 rivers or channels connected to Lake Taihu (Xu and Qin 2005). The total length of the rivers in the Taihu Basin is ca. 12,000 km, i.e., about 3.24 km km−2 (Qin et al. 2007). The greatest inflow rivers that bring most of the pollutants into the lake are Chendonggang, Xitiaoxi, and Yincungang, located in the west and southwest. The main outflow rivers are Taipu, Xinyunhe, and Xijiang in the southeast.

Fig. 1
figure 1

Location of the study area and monitoring sites

Sampling sites and analysis

Due to the large area of the Taihu Basin, it is difficult to collect the recent measurement data in the whole region. Water samples from 67 different sites in the Taihu Basin were collected at monthly intervals from 2000 to 2002 from the Water Resources Protection Bureau of the Taihu Basin (Fig. 1). According to the data completeness and homogeneity of the sites in the same reaches, we here only selected 22 sites between 2001 and 2002 for water quality analysis. Four sites were selected at the mouth of the rivers connected with Lake Taihu, and 18 stations were selected from the main streams and tributaries. The datasets consisted of 24 water quality variables: temperature, electrical conductivity, pH, total suspended solid (TSS), Secchi disc depth (transparency), total nitrogen (TN), ammonia nitrogen (NH4 +–N), chlorophyll-a, total phosphorus (TP), biologic oxygen demand (BOD), chemical oxygen demand in manganese (CODMn), chemical oxygen demand in chromium (COD), dissolved oxygen (DO), petroleum, volatile phenol, mercury, copper, lead, zinc, cadmium, chromium, iron, hardness, and nitrate (nitrogen). Due to adverse weather conditions, several samples were blank during some periods, such as April 2001. In addition, the monitoring of trace metal contaminants is discontinuous. Based on sampling continuity of the sites, we selected 8 physicochemical variables for our study, which consists of physical properties, organic constituents, and nutrients.

Data analysis methods

Multivariate statistical approaches are widely and comprehensively used to identify the variances or similarities of the environmental areas in term of surface water quality parameters (Maria and Graca 2006). PCA and CA were employed to sort the water quality variables and sampling sites, respectively. PCA was performed using Statistica 7.0, and CA was applied using the SPSS 16.0 statistic software packages. FSE was adopted to identify the pollution level of surface water using Matlab R2008a.

Principal component analysis

PCA is used to reduce the dimensionality of the data set by explaining the correlation among several random uncorrelated environmental variables in terms of a small number of underlying factors or principal components without extreme loss of information (Vega et al. 1998). The methodology will not be extensively reviewed here due to its wide application in environmental science and a larger number of documents providing a thorough description of its formulation and properties (e.g. Emery and Thomson 1997; Jolliffe 2002; Wall et al. 2003).

To objectively isolate the most important modes of variance of the water quality parameters, we applied PCA on the normalized data set of selected eight indicators, including pH, DO, COD, CODMn, BOD, NH4 +–N, TP, and TN. The factor loadings are usually presented as correlation coefficients between the associated principal components (PCs), and can be considered as a measure of the relative importance in the extracted PCs. Any factor with an eigenvalue greater than unity (eigenvalue >1) was considered significant according to the criteria of Cattell and Jaspers (1967).

Cluster analysis

Cluster analysis is a multivariate technique that can classify categories or clusters to reveal their intrinsic characteristics based on their similarity (Vega et al. 1998). Hierarchical clustering is the most common approach that uses Ward’s (1963) linkage as a measure of similarity. Unlike PCA that normally uses only two or three PCs for displaying key variances of the parameters, CA uses all the variance or information contained in the original data set to demonstrate the similarity and proximity. The results of CA are usually presented by a dendrogram which provides a visual summary of the clustering results. The visual summary explains the internal homogeneity and external heterogeneity of the objects (Singh et al. 2004; Shrestha and Kazama 2007). Many studies have shown that CA reliably classifies surface water quality and can guide future sampling strategies (Singh et al. 2004; Xu et al. 2009). In the present study, CA was performed on the normalized data set by means of the Wads’ method, using Euclidean distances as a measure of similarity.

Fuzzy synthetic evaluation

Fuzzy set theory is used for decision-making or pattern recognition when the context of the problem is unclear and boundaries are undefined or imprecise. The FSE method evaluates each individual variable’s value according to predefined quality criteria in the fuzzy environment by designing a suitable membership function and using the fuzzy operators (Cude 2001). In this approach, water classes are defined as fuzzy sets in terms of degrees of membership with flexible boundaries rather than binary/crisp sets (Dahiya et al. 2007). Recent studies indicate that FSE has become a useful tool which is extensively applied throughout the world in decision-making and evaluation processes in imprecise environments. Icaga (2007) proposed an index model to evaluate the surface water quality using fuzzy logic method in terms of physical and inorganic chemical parameters. Dahiya et al. (2007) used FSE to assess the physicochemical variables of groundwater for drinking purposes. Since the FSE method has been deeply discussed in Yen and Langari (1999), Ross (2004), Dahiya et al. (2007) and Lu et al. (2009), only a brief description of the procedure is presented below.

  1. 1.

    Select assessment variables and establish assessment criteria:

The first important step is to select the representative and rational water quality assessment variables. For each site, an assessment indicator matrix U can be expressed as:

$$ U = \left\{ {u_{1} } \right.,u_{2} , \ldots ,\left. {u_{n} } \right\} $$
(1)

where n is the number of the selected variables. The water quality is classified into five levels in terms of National Surface Water Environmental Quality Standards (Chinese Environmental Protection Agency 2002b, GB3838-2002). Afterwards, the assessment criteria matrix V can be expressed as follows:

$$ V = \left\{ {v_{1} } \right.,v_{2} , \ldots ,\left. {v_{m} } \right\} $$
(2)

where m is the number of assessment criteria categories, which equal 5 in this study.

  1. 2.

    Establish membership functions:

The membership functions represent the degree to which the water quality contaminants belong to the fuzzy set. For an element u i of U, the value r ij is called the membership degree of u i in the fuzzy set V. The value 0 means that u i is not a member of the fuzzy set; the value 1 means that u i is fully a member of the fuzzy set. The values between 0 and 1 characterize fuzzy members, which belong to the fuzzy set only partially. Various kinds of membership functions can be used to quantify the membership degree, such as: the Gaussian distribution function, the sigmoid curve, and quadratic and cubic polynomial curves. In this study, the triangular membership function is applied to evaluate the water quality variables as follows. The parameters in the following equations are shown in Table 1, obtained from the National Surface Water Environmental Quality Standards (Chinese Environmental Protection Agency 2002b, GB3838-2002).

Table 1 The limits of the membership function based on surface water environmental quality standards
$$ r_{i1} \left( {u_{i} } \right) = \left\{ {\begin{array}{*{20}c} 1 \\ {{{\left( {u_{i} - b} \right)} \mathord{\left/ {\vphantom {{\left( {u_{i} - b} \right)} {\left( {a - b} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {a - b} \right)}}} \\ 0 \\ \end{array} } \right.\,\,\,\begin{array}{*{20}c} {u_{i} \le a} \\ {a < u_{i} < b} \\ {u_{i} \ge b} \\ \end{array} $$
(3)
$$ r_{i2} \left( {u_{i} } \right) = \left\{ {\begin{array}{*{20}c} 0 \\ {{{\left( {u_{i} - a} \right)} \mathord{\left/ {\vphantom {{\left( {u_{i} - a} \right)} {\left( {b - a} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {b - a} \right)}}} \\ {{{\left( {u_{i} - c} \right)} \mathord{\left/ {\vphantom {{\left( {u_{i} - c} \right)} {\left( {b - c} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {b - c} \right)}}} \\ \end{array} } \right.\,\,\,\,\begin{array}{*{20}c} {u_{i} \le a,u_{i} \ge c} \\ {a < u_{i} < b} \\ {b < u_{i} < c} \\ \end{array} $$
(4)
$$ r_{i3} \left( {u_{i} } \right) = \left\{ {\begin{array}{*{20}c} 0 \\ {{{\left( {u_{i} - b} \right)} \mathord{\left/ {\vphantom {{\left( {u_{i} - b} \right)} {\left( {c - b} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {c - b} \right)}}} \\ {{{\left( {u_{i} - d} \right)} \mathord{\left/ {\vphantom {{\left( {u_{i} - d} \right)} {\left( {c - d} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {c - d} \right)}}} \\ \end{array} } \right.\,\,\,\begin{array}{*{20}c} {u_{i} \le b,u_{i} \ge d} \\ {b < u_{i} < c} \\ {c < u_{i} < d} \\ \end{array} $$
(5)
$$ r_{i4} \left( {u_{i} } \right) = \left\{ {\begin{array}{*{20}c} 0 \\ {{{\left( {u_{i} - c} \right)} \mathord{\left/ {\vphantom {{\left( {u_{i} - c} \right)} {\left( {d - c} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {d - c} \right)}}} \\ {{{\left( {u_{i} - e} \right)} \mathord{\left/ {\vphantom {{\left( {u_{i} - e} \right)} {\left( {d - e} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {d - e} \right)}}} \\ \end{array} } \right.\,\,\,\begin{array}{*{20}c} {c \le v_{ij - 1} ,u_{i} \ge e} \\ {c < u_{i} < d} \\ {d < u_{i} < e} \\ \end{array} $$
(6)
$$ r_{i5} \left( {u_{i} } \right) = \left\{ {\begin{array}{*{20}c} 0 \\ {{{\left( {u_{i} - d} \right)} \mathord{\left/ {\vphantom {{\left( {u_{i} - d} \right)} {\left( {e - d} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {e - d} \right)}}} \\ 1 \\ \end{array} } \right.\,\,\,\,\begin{array}{*{20}c} {u_{i} \le d} \\ {d < u_{i} < e} \\ {u_{i} \ge e} \\ \end{array} $$
(7)
  1. 3.

    Calculate the membership function matrix:

After substituting the monitoring data of each assessment variables, the fuzzy matrix R can be expressed as:

$$ R = \left( {r_{ij} } \right)_{n \times m} = \left( {\begin{array}{*{20}c} {r_{11} } & {r_{12} } & \ldots & {r_{1m} } \\ {r_{21} } & {r_{22} } & \ldots & {r_{2m} } \\ {\begin{array}{*{20}c} \vdots \\ \end{array} } & \vdots & \vdots & \vdots \\ {r_{n1} } & {r_{n2} } & \ldots & {r_{nm} } \\ \end{array} } \right) $$
(8)

where \( r_{ij} \left( {i = 1,2, \ldots ,n; \quad j = 1,2, \ldots ,m} \right) \) is the membership degree of the ith assessment variable at the jth level.

  1. 4.

    Determine the weights matrix:

Water quality cannot be determined according to a single water quality indicator. Each water quality variable has its own contribution to water quality (Shen et al. 2005; Lu et al. 2009). Therefore, it is necessary to determine the weights of each variable in FSE. In this study, the entropy method is used to determine the weights of assessment variables (Zou et al. 2006; Lu et al. 2009). The weight of entropy of the ith assessment variable is defined as:

$$ w_{i} = {\frac{{1 - H_{i} }}{{n - \sum\nolimits_{i = 1}^{n} {H_{i} } }}} $$
(9)

where \( 0 \le w_{i} \le 1\,\,\sum\nolimits_{i = 1}^{n} {w_{i} = 1} \). H i is the entropy of the ith indicator, calculated as:

$$ H_{i} = - k\sum\limits_{j = 1}^{p} {f_{ij} } \ln f_{ij} ,\,\,i = 1,2, \ldots ,n $$
(10)

where \( f_{ij} = {{r_{ij} } / {\sum\nolimits_{j = 1}^{p} {r_{ij} } }} \), k = 1/lnp, p is the number of monitoring sites (22 in this article), and f ij is the normalized value of the ith variable at the jth monitoring site. If f ij  = 0, then ln f ij is supposed to be zero.

  1. 5.

    Calculate evaluation results:

The surface water quality assessment results can be obtained from:

$$ D = W \cdot R = \left( {d_{1} ,d_{2} , \ldots ,d_{m} } \right) $$
(11)

in which the fuzzy matrix R = (r ij ) n×m , weight matrix W = (w i )n , and the results matrix D is calculated as:

$$ d_{j} = \sum\limits_{1}^{n} {w_{i} r_{ij} } ,\quad j = 1,2, \ldots ,m , $$
(12)

The final evaluation result is:

$$ {\text{Class = index}}\left( {\max \left( {d_{j} } \right)} \right)\quad j = 1,2, \ldots ,m. $$
(13)

The index of the maximum value of d j (i.e. j) denotes the water quality class of the water samples.

Results and discussion

Status of surface water quality in the Taihu Basin

Due to a large quantity of observed data, only the mean values and standard deviations of the measured water quality variables at 22 sites are shown in Table 2. The mean values of the pH vary from 7.04 to 7.87 which falls within the range of National Surface Water Environmental Quality Standards of China (Chinese Environmental Protection Agency 2002b, GB3838-2002).

Table 2 Mean and standard deviations (SD) of physicochemical variables of surface water quality in the Taihu Basin

The lowest DO value was found at Changzhengqiao (1.08 mg/L), and the relatively lower values at Xinfengzhen, Beiguodaqiao, Zhendongdukou and Baidugang were found ranging from 3.17 to 4.06 mg/L. This suggests that the discharge of industrial and domestic wastewater induced serious organic pollution at these regions, since the lower DO was mainly caused by the decomposition of organic compounds (Boyle and Fraleigh 2003). The highest DO concentration value was found at Hushanqiao (8.28 mg/L).

There is no distinct difference of CODMn, COD, and BOD among the sites. The CODMn, COD and BOD concentrations ranged from 3.58 to 12.00 mg/L, 12.77 to 57.04 mg/L, 1.73 to 12.50 mg/L, without large fluctuation at most sites except Changzhengqiao. The higher contents of CODMn, COD, and BOD appeared in the water samples collected from Changzhengqiao and Baidugang.

In contrast, the higher concentrations of TN, NH4 +–N and TP were detected at most sites upstream of Lake Taihu, such as Baidugang, Jiuxian, Panjiaba and Changzhengqiao. This indicates that the nutrients coming from both agricultural activities and domestic wastewater extensively deteriorate the surface water quality. Though diffuse pollution contributed more than 60% nutrient (N, P) loading (Wang et al. 2004), high concentrations of nitrogen and phosphorus (e.g. NH4 +–N) were found at many sites strongly affected by urbanization, which probably was the main reason for the hypertrophic state of the northern part of Lake Taihu (Chen et al. 2003). Previous studies also indicated that pollutants produced by industry were mainly from the east or southeast of the lake, such as Suzhou, Wuxi and Jiaxing, pollutants from agriculture (cropping, rice growing, etc.) accounted for 37% of COD, 49.5% of TN and 48% of TP produced in the basin, these non-point pollutant sources came mostly from the west (Huang et al. 2004; Qin et al. 2007).

Among all the monitoring sites, Changzhengqiao located in Wuxi city was found to have the worst water quality with most highest indicators (e.g. TP, TN, COD, BOD), which suggests that it is still urgent to control the point pollutions in the Taihu Basin. On the other hand, the water quality at Dagangqiao and Hushanqiao located near the hilly region were relatively better than the others.

Principal component analysis on variables of surface water quality

PCA was applied to identify characteristics of water quality variables at all studied sites, based on the normalized average values obtained from monitoring data surveyed in a monthly interval during the study period.

A planar plot of eight variables against their values is presented in Fig. 2. As shown in Fig. 2, the significant factor PC1, with an eigenvalue greater than unity extracted by PCA accounts for 73.72% of the total variance. The second factor (PC2) was almost equal to unity (0.99) which accounts for 12.46% of the total variance. It can therefore be clearly seen that the first factor captures all nutrient related variables, and has strong negative loadings on COD, CODMn, BOD, TP, NH4 +–N and TN, and positive loading on DO. The second factor (PC2) accounts for 12.46% of the total variance and has strong positive loading on pH and moderate positive loading on DO. Using PCA, the eight original variables were reduced to two key uncorrelated factors. Each factor is significantly correlated to specific variables representing a different dimension of the water quality. The first factor represents the nutrient dimension, which is related to all nutrient variables. The high correlation of nutrient variables suggests that there is high consistency in their variations, which is in agreement with previous research conducted on Lake Taihu (Wang et al. 2007). Furthermore, this factor accounts for the majority of total variance, which can be concluded that overloading of nutrients is the major environmental problem to aquatic systems in the Taihu Basin (Luo and Pang 2005). The second factor is only strongly associated with pH, which may be influenced by other environmental variables (Boyle and Fraleigh 2003; Wang et al. 2007).

Fig. 2
figure 2

Principal component analysis (PCA) ordination of the 22 sites by 8 environmental variables in the Taihu Basin

Clusters analysis on variables of water quality

A dendrogram of samples obtained by the Ward method using CA is shown in Fig. 3. Twenty-two sampling sites were divided into three groups. Group 1 consisted of sites 1, 3, 6, 7, 11–13 and 17–22. Group 2 consisted of sites 2, 4, 5, 8–10, 14 and 15, and group 3 only included site 16 (i.e. Changzhengqiao) which was detected as the heaviest polluted site. The CA results revealed that the similarities of the monitoring sites in each group are represented by the characteristics of the water quality variables.

Fig. 3
figure 3

Dendrogram showing sampling sites clusters in the Taihu Basin, China

As shown in Fig. 3, the first main group is formed from two subgroups that are linked at a rescaled distance of 7. The first subgroup includes sites 1, 11, 13, 17, 18, and 20 with higher concentrations of COD (>24 mg/L), NH4 +–N, TP, and TN, which shows that surface water has become highly eutrophic and polluted. In the PCA method of classification these samples scored negative on factor 1 and close to 0 on factor 2 (see Fig. 4). The sites 1, 11, 13, and 20 show much higher concentration in NH4 +–N (>2.2 mg/L), TP (>0.22 mg/L) and TN (>3.3 mg/L). This may have been caused by the large amounts of contaminants discharged into the surface water, produced in urban towns. Xie et al. (2007) investigated twelve most important rivers in Changshu city, Taihu region, and concluded that the highly NH4 +–N pollutants mainly originated from point sources, such as domestic sewage, rural human and animal excreta, and industrial wastewater. The second subgroup consists of sites 3, 6, 7, 12, 19, 21, and 22 with COD in the range from 19 to 23 mg/L and relatively lower NH4 +–N and TP, especially at sites 3, 21 and 22, the NH4 +–N and TN contents are much lower, indicating that these sites are not intensively influenced by point source pollution.

Fig. 4
figure 4

Scatter plot of the first two factor scores for the 22 sampling sites

In contrast, the sites in group 2, with infield and farmland around, were relatively moderately polluted. The contents of COD vary from 12.77 to 17.38 mg/L, and the NH4 +–N values are much lower than those in group 1. In the PCA analysis, these samples were mostly grouped as positive values on factor 1 (Fig. 4). Most sites in this group indicate that the agriculture and households were dominant contributors of the nutrients flowing into the lake. This can be explained by observing the fact that a large portion of discharges from villages and livestock farms (especially those outside the core cities) are not appropriately treated, and effective technical measures to control agricultural sources of pollution are not readily available in China at present (Wang et al. 2006). Intensive agricultural activities resulted in higher concentration of TN coming from fertilizers. This was the major reason of water eutrophication.

Group 3 corresponded to the highest polluted site which was only located at site 16 (i.e. Changzhengqiao). From Fig. 4, it also can be clearly seen that site 16 presented the worst quality due to the lowest negative value on factor 1. The site Changzhengqiao located in the north of Wuxi, received pollutants from point and non-point sources. Large amounts of urban wastewater, industry, and domestic wastewaters were discharged into the river directly, which brought site 16 to the super-polluted status.

Fuzzy synthetic evaluation of surface water quality in the Taihu Basin

Fuzzy synthetic evaluation with the entropy method for weight determination was used to assess the surface water quality in the Taihu Basin during the studied period. Figure 5 illustrated intra-annual variability of surface water quality in the Taihu Basin between 2001 and 2002.

Fig. 5
figure 5

Intra-annual variability of surface water quality in 2001 (a) and 2002 (b) in the Taihu Basin based on fuzzy synthetic analysis. The red and blue dots represent the monthly water quality in the catchment. The position of the dot relative to the centre of the diagram expresses time (clockwise from January to December). The ranking of the surface water quality is expressed by the distance from the centre of the diagram—the maximum is depicted at the external circle. The position of the monitoring stations, are indicated by the small crosses (+)

As shown in Fig. 5a, most of the samples in the west of Lake Taihu belong to class III water quality in both wet and dry season. Some sites, such as Wangjiaqiao, Jiuxian, Lianjiangqiao, and Panjiaba displayed class V water quality during the dry season, because most of them are located in the agricultural region and received nutrients from farmland. These sites showed obvious seasonal variances due to agricultural activities. As mentioned above, Changzhengqiao was the heaviest polluted site due to the industrial pollution and domestic wastewater.

There are only a few sites with relatively good water quality belonging to class II through the year, i.e. Hushanqiao, Tanjingcun, Jiangbianzha and Dagangqiao. Two sites: Tanjingcun and Hushanqiao, located downstream of Lake Taihu were not heavily polluted by the wastewater or industry. The sites Jiangbianzha connected with the Yangtze River may be diluted by the water interaction.

The FSE indicates that water quality at most of the sampling sites vary from class III to V. Figure 5b shows the assessment results in 2002 which are very similar to that in 2001. The water quality at Zhushagangkou and Luxudaqiao located in the southeast of Lake Taihu is better than the northern sites. This is attributed to the water retention time of Lake Taihu. The average retention time of Lake Taihu is about 5 months, but it is shorter in the south, since most runoff water is discharged via the Taipu River, in the southeast. Water quality, therefore, is better in the south than in the north (Qin et al. 2007).

Conclusions

Water quality analysis of the samples from the Taihu Basin indicates that pH values are in the range of natural surface water quality. The concentration of CODMn, BOD, COD, TP, NH4 +–N, and TN at most monitoring sites in the Taihu Basin indicates serious pollution. Spatial distributions of COD, NH4 +–N, TP and TN concentrations show that higher values appeared in water samples near large cities.

  1. 1.

    The results from PCA suggested that nutrient pollution, organic pollution, and agricultural runoff were potential pollution sources. The extracted significant factor (PC1), explaining 73.72% of the total variance, is correlated with DO, BOD, COD, CODMn, NH4 +–N, TP, and TN. The second factor accounting for 12.47% of the total variance is associated with pH.

  2. 2.

    The results of CA are in agreement with that from PCA. The CA divided 22 sampling sites into three groups: site 16 (Changzhengqiao) was the most polluted sample, and sites 1, 3, 6, 7, 11–13, 17–22 are moderately polluted points, the water quality of remaining sites is comparably better.

  3. 3.

    The FSE showed that there is no significant variance of water quality between 2001 and 2002 among all the sites. Seasonal changes could be detected at some sites in the west of Lake Taihu within the agricultural region. Changzhengqiao was found to be the heaviest polluted site due to large amounts of industrial pollutants, and domestic wastewater discharged into the rivers directly. The evaluation results indicate that water quality at this site belongs to class V during the whole period. The water quality at four sites (i.e. Hushanqiao, Tanqingcun, Jiangbianzha and Dagangqiao) showed good water quality ranging from class II to class IV.

In this study both the multivariate statistical analysis (PCA and CA) and FSE method were applied to assess the water quality in the Taihu Basin. The water quality assessment results indicate that the variables responsible for water quality variations are mainly related to nutrients (non-point sources: agricultural activities) in the west of Lake Taihu and organic pollution (point source: urbanization, industry and domestic wastewater) in the east area of the Taihu Basin. This study demonstrates the utility of multivariate statistical techniques and FSE method for analysis and interpretation of complex data set in environmental science, identification of pollution sources, and understanding spatial and temporal variations in water quality, which may provide a useful tool for water quality management, as well as guide the wastewater treatment and management to some extent. However, an updated long-term continuous data set would be necessary to improve the results.