Introduction

Hazardous element pollution has drawn increasing attention due to high toxicity, wide variety of sources, nonbiodegradable properties, and accumulative behaviors (Kukrer et al. 2014). Hazardous elements are the most persistent of pollutants in the ecosystem because of their resistance to decomposition in natural conditions (Kasa et al. 2014). It is well known that coastal and estuarine region are important sinks for many persistent pollutants and that these pollutants accumulate in organisms and bottom sediments (Gosar and Žibret 2011; Chen et al. 2012). With the rapid development of urbanization and industrialization in recent years, hazardous element pollution recorded in the sediments has indicated significant impacts of such pollution on the environment in estuarine areas (Xiao et al. 2013). The sediments are complex in nature, with numerous potential sources of hazardous elements, making it a challenge for researchers to assess the extent and degree of contamination (Song et al. 2011).

The source apportionment of hazardous elements in sediments is the key approach in identifying the effects of anthropogenic activities to biota (Gonzalez-Macias et al. 2014). Distinguishing between different sources of hazardous elements in sediments is helpful to control and reduce the risk of hazardous elements and to detect hidden information regarding hazardous element pollution, and it will help to characterize the overall condition of the aquatic environment (Duan et al. 2013; Ding and Hu 2013). In addition, understanding and calculating each source contribution to hazardous elements and the extent of the accuracy of the apportionment results is a prerequisite for the formulation of effective control strategies for controlling and reducing their pollution (Chen et al. 2013). Therefore, it is imperative to understand the concentrations and sources of hazardous elements in sediments. Currently, much effort has already gone into distinguishing between different sources of hazardous elements in sediments (Sajn and Gosar 2014).

To study the pollution source apportionment, a variety of classical methods have been developed in the past decades. Among the numerous methods, multivariate statistical models, such as principal components analysis (PCA), has been successfully used to identify anthropogenic pollution and source profiles in many river, estuary, and lake systems (Pekey et al. 2004; Chen et al. 2013). PCA is the traditional statistical tool used to reduce the number of variables required to explain the variability in a data set (Pekey and Dogan 2013). However, PCA also presents serious drawbacks because of the possibility that negative amounts will be present in almost all factors; in addition, the rotational indeterminacy of the solution means that multiple valid solutions can be found (Gonzalez-Macias et al. 2014; Dwivedi and Vankar 2014; Wang et al. 2014). In short, as a traditional multivariate statistic approach, PCA could provide qualitative information about the sources of the contaminant, but it is not adequate for supplying quantitative information regarding the contributions of each source type (Lin et al. 2011).

To quantitatively analyze different pollution sources, many approaches have been developed based on multivariate factor analysis (Ribeiro et al. 2010; Cerqueira et al. 2011; Cerqueira et al. 2012; Oliveira et al. 2012a, b; Quispe et al. 2012; Silva et al. 2012; Arenas-Lago et al. 2013; Hower et al. 2013; Oliveira et al. 2013; Ribeiro et al. 2013; Cutruneo et al. 2014; Saikia et al. 2014). Positive matrix factorization (PMF) is one of the recently developed approaches, and it has been proven to be a powerful and flexible alternative to traditional receptor modeling (Rahman et al. 2011). Compared to PCA, the PMF approach is a useful mean of identifying sources and quantitatively apportioning concentrations to their sources, even when the sources are not reasonably defined (Pekey and Dogan 2013; Tian et al. 2013). Unlike PCA, PMF considers the factor analysis as a weighted least square problem (Comero et al. 2014). PMF solves the factor analysis problem by integrating nonnegative constraints into the optimization process and using the error estimates for each data value as point-by-point error estimates (Comero et al. 2012; Lin et al. 2011).

In this study, based on 30 sample data of hazardous elements in the surface sediment in the Yangtze River estuary (YRE), a combined method based on PCA and PMF was applied to identify and quantitatively apportion the hazardous element pollution sources. The key objectives of this work were to (i) determine the number of independent sources, (ii) estimate the contribution of each source, and (iii) compare the PCA and PMF methods to highlight their positive and negative aspects. It is hoped that the comprehensive information obtained in this study will assist the hazardous element pollution control in this region and that these methods will have a wider applications in other regions worldwide.

Materials and methods

Study area, sample collection, and analysis

The Yangtze River estuary (YRE) is one of the world’s largest estuaries and is located in one of the highest density of population and fastest economic developing areas in China (Zhao et al. 2012a, b). The study area is divided into three parts: the inner-estuary region, mouth region, and adjacent sea based on the connection line of south coast and north coast. The inner-estuary region is divided into four branches by three islands: Chongming Island, Changxing Island, and Hengsha Island (Fig. 1). One of the largest cities in China, Shanghai, has a significant level of industrial activity that has a direct impact on the Yangtze delta (Chen et al. 2012). The hazardous element pollution status of the YRE has attracted considerable attention (Dong et al. 2014; Zhao et al. 2014a, b). Although hazardous element pollution has been investigated in the YRE, little systemic research on the source identification and apportionment has been performed in the YRE in recent years.

Fig. 1
figure 1

Study area and location of sample sites

Surface sediment samplings were taken in the YRE in November 2010. In each station, three surface sediments were collected, and then, the samples were mixed to form a composite sample for this station. The surface sediments were sampled to a depth of 2–5 cm. A total of 30 composite samples were obtained (Fig. 1). After sampling, the sediments were kept frozen at −20 °C prior to processing and analysis. In the laboratory, all of the samples were freeze-dried with a lyophilizer and mechanically homogenized and sieved through a 1-mm mesh to remove small stones. The fine-grained fraction (<63 m) was used in this study to minimize the grain-size effect in the chemical analysis (Chen et al. 2013). The dry samples were digested with HNO3/HF/HClO4, and then, the digestate was mixed with deionized water at a set volume. The concentration of elements, including aluminum (Al), arsenic (As), cadmium (Cd), chromium (Cr), copper (Cu), iron (Fe), mercury(Hg), manganese (Mn), nickel (Ni), lead (Pb), antimony (Sb), and zinc (Zn) were measured by using inductively coupled plasma-mass spectrometry (ICP-MS, Thermo) and atomic fluorescence spectrometry, which have been widely applied in many other studies(Wang et al. 2011a, b; Zhao et al. 2012a, b, 2013). The results met the accuracy demanded by the China State Bureau of Technical Supervision.

Statistical analysis

PCA is the most common multivariate statistical method used to explore the associations and origins of trace elements (Huang et al. 2009). The data obtained by the sampling was treated statistically using a statistical software package (STATISTICA).

PMF is used most frequently to obtain the source profiles (Paatero and Tapper 1993). In this study, the EPA PMF 3.0 program was applied to the data set. PMF is based on solving the factor analysis problem by the least squares approach using a data point weighting method, which decomposes a matrix of data of dimension n rows and m columns into two matrices, G and F (Rahman et al. 2011). The model can be written as:

$$ X=GF+E $$
(1)

where X is the matrix of measured values, F is the matrix of factor loadings, G is the matrix of factor scores, AFAIK, and E is the matrix of residuals, that is, the unexplained part of X.

Briefly, a data matrix X of i by j dimensions, comprised of i number of samples and j chemical species, can be written as:

$$ {X}_{ij}={\displaystyle \sum_{k=1}^p{g}_{ik}{f}_{kj}}+{e}_{ij}i=1\dots m;j=1\dots n;k=1\dots p $$
(2)

where X ij are the elements of the input data matrix; g ik and f kj are the elements of the factor scores and factor loading matrices, respectively; e ij are the residuals; and p is the number of resolved factors (Khan et al. 2012; Comero et al. 2012).

The task of PMF is to minimize the object function (Q), based upon the uncertainties:

$$ Q={{\displaystyle \sum {\displaystyle \sum \left({e}_{ij}/{s}_{ij}\right)}}}^2 $$
(3)

where s ij is the calculated error estimate of the measured variables.

Restoration of the missing values and the under-the-detection-limit values are the most important step in the PMF analysis. In the dataset, there were no missing concentration values. The values that were under the detection limit were replaced with available true analytical value, instead of replacing them by 1/2 of the detection limit (Comero et al. 2011).

Before implementing the PMF method, the choice of the number of source factors should be calculated; this number is based on the values of the statistical indicators, the diagnostic plots, and the relevance of the resolved factors to the known sources located in the area (Huston et al. 2012). Based on the user manual, the Q value was used to determine the appropriate number of factors. The target Q value has been suggested in the literature as:

$$ Q= nm-np-mp $$
(4)

where n is the number of samples, m is the number of species, and p is the number of sources included in the analysis (Chen et al. 2013; Huston et al. 2012).

Quality assurance and quality control

Quality assurance and quality control were assessed using duplicates, method blanks, and standard reference materials. The accuracy of the determination method was systematically and routinely examined with standard reference materials (GSF). Three replicates were conducted to determine the total contents of the metals. The metal contents of the standard reference materials were found to be within 86–102 % of the certified values.

Results and discussion

Description statistics

The statistical summary of the concentrations of hazardous elements in the surface sediment indicated that only small amount of elements exhibit wide variations in the elemental concentrations (Table 1). The highest variation was observed for Pb, with a maximum mean concentration of 105.10 μg g−1 and a minimum mean concentration of 48.00 μg g−1. The lowest variation was for Hg, with a maximum mean concentration of 0.12 μg g−1 and a minimum mean concentration of 0.02 μg g−1.

Table 1 Summary statistics for the measured elements

For As, Hg, Cu, Mn, Zn, and Cd, the lower concentrations were primarily in the inner-estuary region, and the higher concentrations were primarily in the mouth region. However, for Cr, Fe, and Ni, the high concentrations were primarily in the adjacent sea, and the low concentrations were primarily in the inner-estuary region. The different distribution patterns might reflect the different sources.

Source qualitative identification

Three principal components were responsible for the data structure, and they explained over 92 % of the total variance (Table 2). In the rotated component matrix, the first, second, and third principle components (PCs) accounted for 53.4, 19.0, and 20.1 % of the total sample variability, respectively. The results of the comparison between the component matrix data and the rotated component matrix data indicated that Hg, which exhibited a high correlation with PC1 before being rotated, exhibited a deflection toward PC3 after being rotated. Hg exhibited a combined relationship with both PC1 and PC3 in this case.

Table 2 Component matrixes for total heavy metal contents (three-components extracted)

Source identification is a multiclass classification problem. In this study, Al, Fe, Mn, Cr, and Ni were found to have high loadings on PC1. At the same time, moderate loadings of As, Cu, and Zn were also observed in this factor. Some previous studies indicated that Cu, as an important part of Cu-contained agrochemicals, was mainly derived from agricultural and sewage runoff (Mico et al. 2006). Ni, as a nonferrous element, is the main production in the industrial zone (Balachandran et al. 2006). Cr, as a ferrous element, is also an important material for industrial production (Miler and Gosar 2012). In addition, some studies also indicated that hazardous elements, such as Zn, Al, Fe, and Mn, appeared to be associated with mining wastewater (Silva et al. 2013). Note that all of these elements had a close relationship with sewage sludge and wastewater. As a result, PC1, including Al, Fe, Mn, Cr, Ni, As, Cu, and Zn, can be defined as a sewage component.

Atmospheric deposition is an important reason for the occurrence of the anthropogenic contamination of hazardous elements in the sediment environment. The hazardous elements in the atmosphere might come from industrial production, manufacturing processes, automobile exhaust, and waste incineration (Collins and Anthony 2008; Miler and Gosar 2012; Tian et al. 2012). Previous studies found that in the economically developed areas, atmospheric inputs of Pb and Sb were fairly significant (Li et al. 2009). Many experts believed that Pb, which was mainly derived from automobile exhaust in the economically developed areas, and Sb, which was produced from element smelting and refining, were unfortunately widespread throughout the exhaust gas in the atmosphere environment (Song et al. 2011; Cheng and Hu 2010; Simonetti et al. 2004). The study area was easily affected by the atmospheric deposition because it belonged to an estuary, for which the atmospheric conditions were complex and changeable. Therefore, PC2, which exhibited high loadings of Pb and Sb, could be considered as an atmospheric deposition component.

The highest loadings in PC3 were Cd and Hg. Cd and Hg, highly toxic hazardous elements, are widely used in pesticides. Some research studies proved that pesticides are potential sources of heavy elements in soil (Yang et al. 2013; Oliveira et al. 2013; Zhao et al. 2014a, b). Cd and Hg in the field were transported to surface water and then accumulated in the sediment by means of farm drainage surface runoff. It is well known that the Yangtze River basin has played a very important role in the grain production in China due to its superior natural conditions and combinations of natural resources, such as water, soil, light, and energy. The study area was significantly affected by the agriculture activity in the Yangtze River basin (Yi and Zhang 2012; Zhao et al. 2012a, b). Hence, PC3 could be defined as an agricultural nonpoint component.

Source apportionment estimations

Before running the PMF model, the number of source factors should be calculated. Taking into account the Q value, the number of source factors of PMF was set to eight (Fig. 2).

Fig. 2
figure 2

Source profiles of the heavy metals from PMF model analysis

The first factor (F1) had high factor loading for Fe. As mentioned in our earlier discussion, Fe occurred primarily in mining wastewater. Hence, it seemed reasonable to conclude that F1 appeared to be associated with mining wastewater.

The second factor (F2) was characterized by enrichment with Al, Cr, and Cu. From our earlier discussion, Cr, as a ferrous element, was also an important material for industrial production (Miler and Gosar 2012). Cu, as an important part of Cu-contained agrochemicals, was derived from agricultural runoff (Mico et al. 2006). Note that Al also had high loadings on F2. In the PCA, the accumulation of Al was identified as being influenced by the mining wastewater. In the PMF result, Al was also regarded as controlled by the sewage factor, and some research has indicated that Al in industrial wastewater would be precipitation when the pH value was changed (Wang et al. 2011a, b). Hence, F2 appears to represent a mixture of agricultural/industrial sewage.

The third factor (F3) had elevated values of Cd and Hg, and the fourth factor (F4) was enriched with Pb and Sb. These results all exhibited good agreement with the PCA result. Therefore, F3, including Cd and Hg, could be defined as an agricultural nonpoint component, and F4, including Cd and Hg, could be considered as an atmospheric deposition source.

The fifth factor (F5) has elevated values of Ni. Ni, as a nonferrous element, is the main raw material in the industrial zone (Balachandran et al. 2006). It is well known that Ni is very common in industrial production, such as the nickel plating industry, machinery manufacturing, and element processing industry (Ding and Hu 2013). Large amounts of Ni-containing waste water and industrial residue have been produced in the nickel plating industry, causing serious pollution of water bodies (Rahman et al. 2014). Through the analysis of PMF, Ni was proposed to be from the nickel plating industry.

The sixth factor (F6) is characterized by enrichment with Mn. Mn enrichment in the surface sediments of aquatic environment around the world is common and often is attributed to the redox cycling of Mn in sediments. Mn in sediments is dissolved under reducing conditions and then migrates upward and is accumulated in the oxidized surface sediments (Yao and Xue 2014). Some research studies indicated that Mn was the main chemical composition in marine sediments (Lim et al. 2006). Thus, F6 might primarily represent the marine activity.

The seventh factor had elevated values of Zn (F7). Zn is generally regarded as being sourced from anthropogenic source in recent research (Liu and Li 2011). Zn is an essential trace element for plant growth, which is mainly involved in the synthesis of auxin and the activities of some enzyme system (Cruz et al. 2014). In agricultural production, zinc sulfate is applied as a base fertilizer, and it is widely used in practice. China is one of the largest fertilizer producers and users worldwide, and the Yangtze River basin is one of the most developed areas of agriculture. Due to the wide applications of fertilizers, the agricultural nonpoint source has been the key pollution source of water environment (Ayoubi et al. 2014; Jayawardana et al. 2014). Hence, it seemed reasonable to conjecture that Zn is from an agricultural nonpoint source and that F7 appears to represent an agricultural fertilizer source.

The eighth factor (F8) had high factor loading for As. In PCA, As was considered as being derived from both parent rocks and sewage. However, through the analysis of PMF, As was separated from rocks and sewage. Research has shown that As mainly comes from industrial wastewater discharged by the electroplating, battery, alloy manufacturing, pigment, mining, and refining processes (Oliveira et al. 2013; Li et al. 2014b). Thus, F8 might mainly represent industrial wastewater. Similar results were previously reported by other authors (Silva et al. 2009, 2011a, b, c; Ribeiro et al. 2010; Kronbauer et al. 2013; Rodríguez-Vázquez, et al. 2013; Dias et al. 2014; Garcia et al. 2014; Martinello et al. 2014; Pérez et al. 2014; Sanchís et al. 2015).

The source apportionment results of the PCA and PMF methods are listed in Table 3. Considering the two techniques, PCA provides a mixed source profile of three sources, while in PMF, eight sources are identified: agricultural/industrial sewage mixture (18.6 %), mining wastewater (15.9 %), agricultural fertilizer (14.5 %), atmospheric deposition (12.8 %), agricultural nonpoint (10.6 %), industrial wastewater (9.8 %), marine activity (9.0 %), and nickel plating industry (8.8 %).

Table 3 Results from two different receptor models

F3 and F4 were found to be synchronized with PC1 and PC2, which represented atmospheric deposition source and agricultural nonpoint source, respectively. In addition, in PMF, the remaining six factors were the continual extension and refinement of PC1 toward the desired goal of precise source identification. Overall, the results from the PCA method are consistent with the results from the PMF analysis, and to some degree, the former is the foundation of the latter, and the latter provides more details and expands upon the former. Although PCA and PMF have their own advantages and drawbacks, the combination of the two methods can provide more valuable information. The combined method could be used to identify and interpret different sources and to quantify the contributions of the sources.

Source contribution analysis

The contributions of the eight sources to the hazardous elements in the sediment from the PMF analysis are presented in Fig. 3. Most of the hazardous elements were found to be affected by a number of sources. Most notable among these was Sb, which was controlled by only two sources: “atmospheric deposition” (51.5 %) and “mining wastewater” (48.5 %). Hazardous elements, such as As, Cd, Cu, Pb, and Hg, were controlled by six sources. For As, “industrial wastewater” contributed to 29.9 % of the total concentration of measured contaminant, along with “agricultural fertilizer” (26.0 %) and the “agricultural/industrial sewage mixture” (13.1 %). However for Cu, the total contribution of the “agricultural/industrial sewage mixture” and “agricultural fertilizer” to Cu was 65.9 %, which was far higher than the contribution of “mining wastewater”. For Cd, “agricultural nonpoint” contributed 50.9 %, which was far higher than the contributions of the other related factors. However, for Mn, the total contribution of “marine activity”, “agricultural/industrial sewage mixture”, and “agricultural fertilizer” was only 60.7 % (26.0 + 17.9 + 16.7 %). The major sources contributing to Pb were “atmospheric deposition” (45.5 %), followed by “mining wastewater” (33.7 %). Al and Mn were controlled by seven sources. The large difference between Al and Mn was that Al was affected by “agricultural nonpoint source” but Mn was affected by “mining sewage”. The rest of the elements, such as Cr, Fe, Ni, and Zn, were influenced by all of the eight sources, and the key point was that the proportions of each type of source had significant differences.

Fig. 3
figure 3

The contributions of all the identified sources to the heavy metals in the sediment

The contribution of the elements in each factor with positive matrix factorization indicated that the contributions of the elements in each factor were different and that every source was connected to different proportions of the elements in the study areas (Fig. 4). The hazardous element contents in sediments appeared to be more connected to anthropogenic activity than to natural sources. The hazardous elements in the surface sediments were largely influenced by wastewater discharge from the nearby industrial plants, and atmospheric deposition from automobile exhaust and industrial waste gas emission, fossil fuel combustion in the nearby cities, the mining industry, and so on (Cruz et al. 2014; Li et al. 2014a, b).

Fig. 4
figure 4

Explained contributions of the elements in each source with PMF analysis (%)

Compared to PCA, the main advantage of PMF is that PMF could quantitatively analyze different pollution sources (Qadir et al. 2014). The contributions of the eight sources to each hazardous element were different (Fig. 3), and the contributions of the elements in each source were also different (Fig. 4). In addition, the profile factors produced by the PMF method are better at describing the source structure than those derived by the PCA approach (Pekey and Dogan 2013). However, the results obtained from the PMF method might introduce uncertainty into the conclusions. Thus, pollution sources identified by the PMF model must be confirmed by PCA to improve its reliability (Huang et al. 2009; Lin et al. 2011; Saba and Su 2013; Ielpo et al. 2014). Furthermore, more scientific research should be carried out to verify the conclusion combining the eight sources to the hazardous elements in the sediment from the PMF analysis and other dataset.

Conclusions

To identify and target pollution sources and their relative contributious to the total pollution load, PMF combined with PCA was proposed and applied to apportion the sources of hazardous elements in surface sediments in the Yangtze River estuary.

The distribution patterns of As, Hg, Cu, Mn, Zn, and Cd were found to be similar, all with lower concentrations in the inner-estuary region and with higher concentrations in the mouth region. However, for Cr, Fe, and Ni, the higher concentration zone was in the adjacent sea, and the lower concentration zone was in the inner-estuary region.

The PCA performed on 12 elements identified three principal components that controlled their variability in the sediment samples. PC1, including Al, Fe, Mn, Cr, Ni, As, Cu, and Zn, seemed to be controlled by a sewage component. PC2, which included Pb and Sb, could be considered as an atmospheric deposition component. PC3, containing Cd and Hg, could be considered as an agricultural nonpoint component.

To further clarify the possible sources, the PMF method was also applied. Eight sources were identified: agricultural/industrial sewage mixture (18.6 %), mining wastewater (15.9 %), agricultural fertilizer (14.5 %), atmospheric deposition (12.8 %), agricultural nonpoint (10.6 %), industrial wastewater (9.8 %), marine activity (9.0 %), and the nickel plating industry (8.8 %). Moreover, the contributions of the eight sources to the hazardous elements in the sediment were also calculated; the hazardous elements content in tributary sediments were found to be more connected to anthropogenic activity than to natural sources.

As a result, PCA offered a general classification of the sources, and PMF resolved more factors with higher explained variances, that is, PMF provided more details and expanded the source classification. PMF is superior to PCA in source identification and the apportionment of hazardous elements, providing both the internal analysis and the quantitative analysis. The simultaneous use of these methods can provide valuable information.