Introduction

River water is an important part of water resources and is indispensable in ensuring residents’ production and living water, healthy development of society and economy, and a harmonious balance of the ecological environment (Herojeet et al., 2017; Ma et al., 2021). However, with the diversification and rapid development of industries and the continuous acceleration of urbanization, water quality degradation has become a major worldwide issue, causing environmental problems such as eutrophication of water bodies and excessive heavy metals, which seriously restrict the sustainable development of society (Nong et al., 2020; Piroozfar et al., 2021). Moreover, water quality is not only affected by natural conditions such as climate, geographical environment, and land use types but also by human factors such as industrial pollution, agricultural activities, and domestic sewage (Zhang et al., 2020, 2022). Therefore, understanding the temporal and spatial variations in water quality and accurately identifying the sources of pollution in river water has become a precondition for improving the quality of the water environment.

Because of the complexity of the monitored values of water quality indicators, it is difficult to explain the spatial–temporal variation characteristics of water quality and identify potential pollution sources (Li et al., 2020). Multivariate statistical analysis technology has unique advantages in explaining complicated datasets and is widely applied in water pollution source identification, which mainly includes cluster analysis (CA), factor analysis (FA), and PCA (de Oliveira et al., 2020; Han et al., 2019; Liu et al., 2019; Mir & Gani, 2019; Varol, 2020). In addition, according to the qualitative recognition results of PCA, the receptor model of absolute principal component score-multiple linear regression (APCS-MLR) was established, which has been widely used to quantitatively analyze the contribution rates of latent pollution sources (Cheng et al., 2020; Cho et al., 2022; Yu et al., 2022; Zhang et al., 2022). By combining PCA/FA with the APCS-MLR model, the ability to reveal potential sources of pollution in river water can be improved and provide a more reliable scientific basis for water environment pollution control (Cho et al., 2022).

The Laixi River is located in southwestern China and is an important tributary of the Tuojiang River, which is a first-class tributary of the Yangtze River (Liu et al., 2022). The Yangtze River is an important ecological security barrier central to China’s economic activities. With societal development, it is inevitably affected by rapid urbanization, industrial and agricultural development, and high population density, and the increasing environmental pressure has resulted in severe river water pollution (Cheng et al., 2020; Liu et al., 2022). In recent years, China’s government has issued several water pollution control policies (Fu et al., 2020). Based on these policies, in-depth studies have been conducted on the analysis of pollution sources in mainstream rivers, but there are relatively few studies on small watersheds (Chen et al., 2019; Cheng et al., 2020; Haji Gholizadeh et al., 2016; Huang et al., 2010; Kellner et al., 2018; Muangthong & Shrestha, 2015). As the primary pollution source of the mainstream, tributaries have problems, such as comprehensive pollution sources, serious pollutants exceeding the standard, and difficult governance (Yang et al., 2022; Zhang et al., 2020). Therefore, understanding the water quality contamination status of tributaries, combined with source identification techniques for pollution control, is of great significance for mainstream pollution management.

Given the above considerations, this study aimed to combine multivariate statistical methods to better understand potential water pollution sources and realize a quantitative assessment of the apportionment of water pollution sources in the small watershed of the Laixi River. Accordingly, this study (1) explored the spatiotemporal variation characteristics of water quality in the Laixi River Basin; (2) identified the major water quality deterioration parameters; (3) revealed the latent spatiotemporal sources of pollution and quantified their contribution ratios. These results provide a scientific basis for the formulation of effective water quality management and mainstream pollution control policies.

Material and methodology

Study area

The Laixi River, which originates in the Dazu District, Chongqing City, is a major tributary of the lower reaches of the Tuojiang River. The total basin area is 3257 km2, and the mainstream length is 238 km. The study area (105°14′57″–105°41′51″ E, 28°59′56″–29°20 3″ N) was located in Luzhou City, with a river length of 77 km and an area of 814 km2 (http://www.gscloud.cn/). It belongs to the lower Lacey River, with an average annual flow of 36 m3/s. The Jiuqu and Maxi Rivers are two of the largest tributaries of the Laixi River in the Luzhou section (Fig. 1). The Jiuqu River has an area of 155 km2, a length of 31 km, and an annual average flow of 10 m3/s. The Maxi River has an area of 292 km2, a length of 41 km, and an annual average flow of 2.5 m3/s. The climate is classified as subtropical humid. The annual precipitation is extremely uneven, with an average of 1153.7 mm falling primarily between May and September, accounting for approximately 76% of total precipitation. The average annual temperature is 18 °C, with the lowest being − 1.6 ℃ and the highest at 41.3 ℃. The study area is dominated by hilly terrain and inclines towards the southwest, with elevations of 218–757.5 m. The land use is primarily cultivated land, accounting for 58.3% of the total, with construction land accounting for less and distributed mainly on both sides of the river.

Fig. 1
figure 1

Map of study area and surface water quality sampling sites in the Laixi River basin, China

The Laixi River provides water for the development of industry, agriculture, and the lives of residents in the area through which it flows. However, recently, the Laixi River has been increasingly polluted, which is not only detrimental to the local development of Luzhou but also has a serious impact on the water environment of the Tuojiang River basin. Water pollution in the Laixi River Basin has a long history, and the water quality has not improved significantly. In 2019, the discharge of chemical oxygen demand (CODCr), ammonia nitrogen (NH4+-N), and total phosphorus (TP) in the Laixi River reached 5309.1 tons, 675.56 tons, and 330.4 tons, respectively (LEEB, 2021). From 2018 to 2020, concentrations of CODCr, permanganate index (CODMn), and TP were the most serious water quality parameters exceeding the standard, with 38.5%, 56.3%, and 19.7% of water quality grades lower than the III of Water Environment Quality Standard (MEPC, 2002). In addition, fluoride ions (F), NH4+–N, dissolved oxygen (DO), and other parameters also exceed the standard to varying degrees. Furthermore, the Laixi River receives various sewage and surface runoff from both sides of the river, and the health of the water ecology has aroused widespread concern.

Sampling collection and analysis

Monitoring datasets from six sampling sites (Fig. 1), comprising 12 water quality indicators monitored monthly from January 2018 to December 2020, were obtained from the Bureau of Ecology and Environment of Luzhou. Among them, the TZSDQ is at the entrance of the study area; sites EXJ, GDDQ, and HSDQ are distributed downstream; and sites NDQ and DWT are the monitoring points on the tributaries. Although there are a total of 26 monitoring indicators in the monthly water quality evaluation procedure, we selected 12 important parameters because the concentration of monitoring values of some indicators is too low or below the detection limit. The selected water quality parameters included 5-day biochemical oxygen demand (BOD5), potassium permanganate index (CODMn), chemical oxygen demand (CODCr), water temperature (WT), dissolved oxygen (DO), ammonia nitrogen (NH4+–N), hydrogen ion concentration index (pH), electrical conductivity (EC), fluoride (F), anionic surfactant (AS), total nitrogen (TN), and total phosphorus (TP). The descriptive statistics of the river water quality monitoring data sets of each site and month are shown in Table 1, including the mean value and standard deviation. The analytical methods of water quality parameters in this study were carried out in accordance with the instructions in the technical specification requirements for surface water and wastewater monitoring (MEEC, 2002).

Table 1 Mean and standard deviation (SD) of physicochemical variables of water quality in the Laixi River Basin

To make the data suitable for cluster analysis and principal component analysis, we (a) detected and processed outliers, (b) used statistical methods to supplement missing values with the mean values of the corresponding data groups, and (c) combined abundance, skewness analysis, and the K-S test for normality of the dataset (Katsaounis, 2004). In addition, we standardized the data to eliminate the influence of dimensionality on the mathematical analysis. All data analyses in this study were processed using Excel 2010 and SPSS26.0 software.

Multivariate statistical methods

Cluster analysis

Cluster analysis (CA) is commonly used to categorize complicated datasets and organize items into clusters based on similarities within classes and differences across classes (Herojeet et al., 2017). The most extensively used clustering approach is hierarchical clustering, typically used in conjunction with PCA to analyze surface water quality (Rezaei et al., 2019). In this study, we use the square of Euclidean distance as a similarity measure to measure the distance between clusters, and Ward’s method was used to perform cluster analysis on standardized datasets to minimize the sum of squares of adjacent clusters that could be formed in each step. Clustering results are usually represented by dendrograms, which can more intuitively illustrate the similarity between clusters and help improve the efficiency of the spatiotemporal analysis of water quality (Pinto et al., 2019).

Principal component analysis

PCA is a dimension-reduction technique that is commonly used to reduce original variables into a few unrelated new variables. The unrelated new variables, called principal components, represent most of the original data (Bonansea et al., 2015). Before principal component analysis, it is necessary to determine whether the dataset meets the prerequisite conditions. Before statistical analysis, we can test the correlation between variables through Bartlett’s sphericity test, the KMO test, and other methods to determine the applicability of PCA (Kaiser, 1974; Zhang et al., 2010). In general, the KMO test value is greater than 0.5 and the significance of Bartlett's sphericity test is less than 0.05, indicating that PCA is applicable (Li et al., 2020). In this study, the maximum variance method was used for rotation analysis, which could maximize the sum of the squares of loads of each component so that the principal components could explain more variables centrally (Liu et al., 2019). Components with eigenvalues greater than one were defined as the principal components. Generally, the higher the factor load, the greater the influence of the water quality parameter on the corresponding principal component. Factor loadings from 0.3–0.5, 0.5–0.75, and greater than 0.75 were defined as weak, medium, and strong, respectively (Liu et al., 2003). PCA was applied to detect the main influencing variables of each group in the cluster analysis results, and a qualitative analysis of surface water pollution sources in the Laixi River Basin was conducted. The data used for PCA analysis were the measured values of the selected parameters for a total of 36 months over three years.

APCS-MLR model

Regression analysis is a widely used quantitative analysis method to analyze the correlation between variables and establish regression equations to simulate and predict the corresponding variables (Cheng et al., 2020; Ding et al., 2016; Uddin et al., 2022). This study established a multiple linear regression model between the absolute principal component APCS (independent variable) and water quality parameter concentration (dependent variable) to quantitatively analyze the contribution of each pollution source to the monitoring index. The concentration contribution of each pollution source to all pollutants (Cj) can be expressed as

$${C}_{j}={b}_{j}+\sum_{h=1}^{n}{r}_{hj}\times {\mathrm{APCS}}_{hj}$$

where \({b}_{j}\) indicates a multiple regression constant for pollutant j, \({r}_{hj}\) represents the multiple regression coefficients of source h for j, \({APCS}_{hj}\) is the absolute principal component score of identified pollution source for each sample, \({r}_{hj}\times {APCS}_{hj}\) is the contribution of source h to Cj.

In the receptor model analysis, we introduced an absolute zero concentration of the sample. Then, using the score coefficient matrix obtained in the PCA process as the weight value, combined with the absolute zero concentration and standardized concentration of the water quality index, the APCS value of each sample was obtained through mathematical calculations (Wang et al., 2022). For more details on the APCS-MLR model analysis, please refer to Thurston and Spengler (1984).

The linear regression process often has a negative linear regression coefficient and APCS value, which makes the contribution rate tend to have negative inaccuracies in the interpretation of the contribution of pollution sources easily created if the negative numbers are not rectified, which reduces the precision of the pollution source analysis (Liu et al., 2020). Therefore, this study took the absolute value of the negative contribution rate to optimize our interpretation of the regression results (Haji Gholizadeh et al., 2016; Zhang et al., 2020).

Results and discussion

Spatiotemporal variations of water quality parameters

In order to show the change rule of water quality in the study area more intuitively, considering that the water pollution types of rivers may change in 3 years, we applied the independent sample t-test method to analyze the significant change in water quality in 2018–2020 and judged whether the water quality difference was significant in 3 years. The independent samples t-test (Table 2) showed no significant differences in most of the water quality indicators between the years 2018–2020 (sig > 0.05). The descriptive statistics only found that the NH4+-N had too high anomalous values, with a variability of 123.5%. Also, the LAS had 12.5% of the monitored samples below the minimum detection. These may have contributed to the low significance of the independent sample t-test (Table 2). In addition, the significance of the tests was greater than 0.05 for most indicators, indicating that the water quality conditions were close between the 3 years. Therefore, we can use the 3-year average to assess the overall water quality status.

Table 2 Results of independent samples t-test

The descriptive statistical calculation summary of the monthly average concentration and sampling site concentration of 12 water quality indicators is shown in Table 1. In general, the concentration levels of CODCr and TN were relatively high in both time and space compared to national standards (MEPC, 2002). Table 1 shows that CODCr and TN were the most serious indicators of pollution each month. The 12-month average CODCr concentration range is 17.67–31.18 mg/L, most of the months were around the class IV Water Environment Quality Standards (30 mg/L), and the highest mean CODCr (31.18 mg/L) appeared in May (MEPC, 2002). TN pollution was even more severe; the concentration each month exceeded the class V of water quality standard (2.0 mg/L), with the highest average value of 4.93 mg/L, which appeared in March. In addition, CODMn was slightly higher than class III standards (6 mg/L) from March to July. Except for CODCr, CODMn, and TN, other indicators met the class III standard each month.

As indicated by the sites, the water quality variables TN, CODCr, CODMn, BOD5, TP, NH4+–N, and F differed significantly between the sites. The spatial variation of these parameters may be due to the influence of different levels of urbanization, the intensity of anthropogenic activities, and industrial distribution (Zhang et al., 2022). Indicators representing the concentration of pollutants, such as CODCr, CODMn, NH4+–N, TN, TP, and F, have the highest concentrations in the NDQ site, showing the most serious pollution. The remaining sites showed small differences in the concentrations of pollution indicators and exhibited a mild to severe pollution status. The average concentration of TN was 2.39–4.06 mg/L in 6 sites, higher than the class V water quality standard (2.0 mg/L). In addition, the concentrations of the other indicators were within the range of class III to class IV water quality standards, with relatively mild pollution.

Temporal cluster analysis

The time-varying characteristics of water quality were discussed in more depth using cluster analysis, and the clustering analysis used monthly averages of 3 years of observed data. According to the variations in the monitoring months, the 12-month average data for 3 years (2018–2020) were classified as two clusters with significant distance differences at (Dlin/Dmax) × 100 < 10 (Fig. 2(a)). The two groups correspond to two phases with different levels of pollution in the study area. Group 1 contained seven months from June to December, representing the LPS, accounting for 65–75% of annual rainfall. Group 2 included the remaining months, from January to May, corresponding to the HPS, during which the rainfall was about 25–35% of the annual rainfall. The flow of group 1 was significantly higher than that of group 2.

Fig. 2
figure 2

Cluster analysis dendrogram showing the grouping of sampling months (a) and sampling sites (b) of the study area

From the statistical values of parameters in Table 1, it can be seen that CODCr and TN were the most polluted indicators in the study area. TP is a representative pollution indicator that can indicate various pollution sources. DO is usually regarded as an important parameter for the self-purification ability of rivers. Therefore, four indicators, CODCr, TN, TP, and DO, were selected for a more in-depth analysis of the temporal trends in river water quality, and the results are shown in Fig. 3. The concentration variation trends of the four selected indicators remained consistent, and the concentration variation range and mean value in HPS were higher than those in LPS. The more serious HPS pollution might be because it corresponds to the dry and planting seasons. Simultaneously, chemical fertilizers and pesticides are used extensively during sowing in spring, flowing into the river through surface runoff to intensify pollution, which is also the main route for phosphorus pollutants and organics to migrate from soil to water systems (Varol, 2020; Verheyen et al., 2015; Zhou & Gao, 2011). DO showed minimum values in HPS, mainly because the DO value was inversely proportional to the concentration of organic pollutants during their degradation (Liu et al., 2020). In addition, the variation in DO concentration complies with the natural law that warmer water can hold less DO (Wang et al., 2013). As shown in Fig. 3, there are some outliers in different seasons, which might be caused by significant differences in the spatial distribution of pollutant concentrations at each sampling site.

Fig. 3
figure 3

Temporal and spatial variations of CODCr, TN, TP, and DO in different groups

Spatial cluster analysis

Spatial CA was used to assess the differences in water quality between different sites across the region. The six monitoring sites were clustered into two statistically significant clusters (groups 1 and 2) at (Dlink/Dmax) × 100 < 25 (Fig. 2(b)). These groups were identified by judging their water quality, which is largely influenced by land use structure. The sites in each group had similar characteristics and natural background sources. In addition, CODCr, TN, TP, and DO were selected to investigate the spatial variation differences in water quality between the two clusters, and the corresponding results are shown in Fig. 3.

Group 1 included four sites: NDQ, DWT, EXJ, and GDDQ, which correspond to the highly polluted region (HPR). The four sites are located in the middle of the study area and are mainly surrounded by buildings and cultivated lands (Fig. 1), indicating that the water quality in this area is mainly polluted by domestic sewage, industrial wastewater, and agricultural nonpoint sources. The remaining sites (HSDQ and TZSDQ) belong to group 2, which corresponds to the LPR. The two sites were situated at the beginning and end of the study area and covered vastly cultivated and forest lands. Besides, HSDQ and TZSDQ were in a lightly polluted state, reflecting the minor impact of agricultural planting and the dilution effect of river water. The variation in the boxplot (Fig. 3) also confirmed this result: the mean value and concentration range of CODCr, TN, and TP were larger in the HPR, whereas the mean value of DO was smaller and there were more outliers, indicating that group 1 was more seriously polluted. As shown in the boxplot (Fig. 3), outliers indicate extreme values in some months, which may be influenced by periodic rainfall or anthropogenic activities.

Source identification

Source identification in a temporal pattern with PCA

PCA was performed on the temporal datasets to identify potential sources of contamination during the LPS and HPS periods. The KMO values for LPS and HPS were 0.675 and 0.732, respectively, and Bartlett’s sphericity test values were 0.00 (Sig.), indicating that the variables are strongly correlated and the correlation coefficient matrix is not significantly different from the identity matrix, which meets the conditions of PCA analysis (Haji Gholizadeh et al., 2016). Based on the Kaiser rule, four and three principal components with eigenvalues greater than 1 were extracted in LRP and HRP, respectively, totaling 63.87 and 74.65% of the variance in the original monitoring data, respectively. Table 3 shows the PCA results for different regions, including loads of each water quality parameter on each PC, the eigenvalues and variances of extracted PCS, and the total variance variables explained cumulatively.

Table 3 Loadings of 12 variables on varimax rotated factors of different seasons in the Laixi River Basin

For LPS, PC1 (31.35% of the total variance) had strong positive loadings on TN and NH4+-N (0.78 and 0.77) and a moderate positive loading on AS (0.61). PC1 is associated with nutrient pollutants (N), which may originate from point-source pollution of sewage treatment plants and factories or nonpoint-source pollution caused by agricultural cultivation and livestock breeding (Han et al., 2019; Matiatos, 2016; Zheng et al., 2015). In addition, AS has been used extensively in various applications, including domestic and industrial processes (Sasi et al., 2021). Thus, PC1 denotes the point sources of domestic and industrial wastewater and breeding pollution. PC2 had moderate positive loadings on CODMn, F, CODCr, EC, and BOD5 (0.71, 0.67, 0.64, 0.64, and 0.51) and explained 13.13% of the total variance. PC2 reflects the influence of water quality on organic pollutants resulting from anthropogenic activities such as the discharge of domestic sewage and industrial wastewater. Furthermore, this factor has a moderate correlation with F, usually observed in cement plants, mineral smelters, and certain chemical plants (Fu et al., 2020). But in fact, the F concentrations in all monitored months were below 1 mg/L, indicating either an absence or an extremely low level of contamination. This means that the F concentration probably originated from the local soil and entered rivers with rainfall runoff (Ma et al., 2020; Meng et al., 2018). Therefore, PC2 can be regarded as a type of mixed pollution influenced by domestic sewage, industrial wastewater, and soil weathering. PC3 and PC4 explained 11.27 and 8.12% of the total variance, respectively, with strong positive loadings on WT (0.90), medium positive loading on pH and TP (0.76 and 0.63), and medium negative loading on DO (− 0.69). It is generally believed that WT, pH, and DO are mainly affected by temperature changes and natural meteorology, and TP is likely attributed to agricultural activities (Giao et al., 2021; Zhou et al., 2007).

For HPS, the first PC (PC1), accounting for 49.89% of the total variance, had strong positive loadings on AS, TP, and NH4+-N (0.97, 0.92, and 0.81) and moderate positive loading on EC (0.66). Temporally, TP and NH4 + -N during HPS were larger than during LPS, likely due to increased agricultural planting activities and surface runoff in spring (Zhang et al., 2020). Therefore, PC1 could be interpreted as agricultural cultivation. PC2 (accounting for 14.34% of the total variance) had strong positive loadings on CODCr, CODMn, and F (0.87, 0.86, and 0.86) and medium positive loadings on DO, BOD5, and TN (0.75, 0.70, and 0.56). According to the LPS analysis, PC2 mainly represented the sources of domestic sewage, industrial wastewater, and animal waste. PC3 (10.43% of the total variance) had strong positive loading on pH (0.90) and moderate positive loading on WT (0.69), representing the influence of natural factors.

Through PCA, the pollution sources of LPS were identified as domestic sewage, industrial wastewater, and breeding pollution > soil weathering > agricultural activities > natural influence, according to the contribution rate. HPS can be ranked as agricultural cultivation > domestic sewage, industrial wastewater, and animal waste > natural variations.

Source identification in a spatial pattern with PCA

PCA was also performed on the two spatial groups of the monitoring sites, similar to the temporal groups. The KMO values for LPR and HPR were 0.743 and 0.773, respectively, and Bartlett’s sphericity test values were 0.00 (Sig.). In these two different spatial groups, four and three principal components (PCs) were extracted with eigenvalues > 1, explaining 87.10 and 74.03% of the total variance, respectively. Table 4 shows the PCA results, including the load, eigenvalue, and variance of each PC in the two periods, as well as the cumulative explained variance.

Table 4 Loadings of 12 variables on varimax rotated factors of different regions in the Laixi River Basin

For LPR, PC1 explained 47.37% of the total variance, with strong negative loadings on WT (− 0.87) and positive loadings on EC and TN (0.80 and 0.76). The TN in the water body may come from a variety of pollution sources, such as agricultural planting, livestock and poultry breeding, domestic sewage, industrial effluents, etc. (Wang et al., 2013; Zheng et al., 2015). Based on the land use map of the study area, the sites in the LPR were dominated by agricultural and some building land. Owing to the low density of industrial enterprises in LPR, TN seems primarily associated with manure and chemical fertilizer application and domestic sewage. Thus, PC1 can be considered the effect of agricultural pollution and rural domestic sewage. PC2, explaining 18.63% of the total variance, had strong positive loadings on BOD5 and CODCr (0.84 and 0.76) and moderate positive loadings on CODMn and F (0.71 and 0.57) (Table 4). This factor might be due to the accumulation of organic pollutants from rural household waste, industrial wastewater, and free-range livestock and poultry pollution (Liu et al., 2020; Najar & Khan, 2012). PC3 (12.27% of the total variance) had strong positive loadings on pH and DO (0.83 and 0.76) and moderate positive loading on AS (0.71). PC3 represents natural influence and domestic sewage (Haji Gholizadeh et al., 2016). PC4 accounted for 8.83% of the total variation and had moderate positive loadings for NH4+–N and TP (0.75 and 0.73). According to the previous analysis, PC4 may represent the pollution caused by N and P fertilizers used in agricultural planting entering rivers through surface scouring.

For HPR, PC1 explained 43.88% of the total variance, with strong positive loadings on NH4+-N, TP, CODMn, and CODCr (0.90, 0.85, 0.84, and 0.77) and moderate positive loading on BOD5 (0.63). According to land use information and the local statistical yearbook (LSB 20182020), this area was widely affected by population concentration and urbanization, industrial development, agricultural production, and livestock farming. Hence, combined with the above analysis, PC1 is largely related to municipal sewage with industrial wastewater, agricultural nonpoint sources, and livestock and poultry wastewater (Lap et al., 2021). PC2, accounting for 18.05% of the total variance, had a strong positive loading on EC (0.82) and medium positive loadings on TN, F, and AS (0.72, 0.64, and 0.63). As shown in Table 1, the F concentration at the NDQ site was higher than 1.0 mg/L, and there were many factories around this site. Thus, considering the previous PCA results, PC2 represents domestic sewage and industrial wastewater. PC3, occupying approximately 13.11% of the total variance, had moderate positive loadings on pH and DO (0.73 and 0.69) and medium negative loading on WT (− 0.59). Therefore, this factor can be considered a natural source (Ma et al., 2020).

The PCA results showed that there were different amounts and contributions of pollution sources affecting the water quality of LPR and HPR. Pollution sources in the LPR can be ranked as follows: rural domestic sewage > agricultural pollution > industrial effluents and free-range livestock and poultry pollution > natural influence. HPR could be ranked as municipal sewage and industrial effluents > agricultural nonpoint sources and livestock and poultry wastewater > natural sources.

Source apportionment in temporal pattern with APCS-MLR model

On the basis of qualitative identification of pollution sources, the APCS-MLR model was established to quantitatively calculate the contribution rate of each pollution source to LPS and HPS water quality indicators. Figure 4 shows the source apportionment results for the two temporal patterns. In previous studies, a correlation coefficient greater than 0.5 between the observed value and the estimated value indicates a good fit of the model (Haji Gholizadeh et al., 2016; Simeonov et al., 2003). In our work, the modeling results showed that the mean R2 of 0.62 for LPS and 0.65 for HPS (most parameters were greater than 0.6) reflect the accuracy and applicability of the APCS-MLR model.

Fig. 4
figure 4

Contributions on the selected water quality variables and average contributions of different pollution sources in LPS (a) and HPS (b) using APCS-MLR model (UIS: unidentified source)

For LPS, domestic and industrial wastewater and breeding pollution (PC1) was the first contamination sources, with an average contribution of 33.80%. PC1 includes nutrient indices TN (83.80%), NH4+-N (70.28%), and AS (68.30%). Furthermore, pollution sources come from industrial wastewater and domestic sewage and soil weathering sources (PC2) accounted for 29.02% of total pollution sources, represented as organic indicators CODCr (60.15%), CODMn (47.54%), and BOD5 (64.08%), and F (61.41%), and EC (49.94%), respectively. The contributions of agricultural activities and natural influences (PC3 and PC4, average contribution of 20.95%) ranged from 0.14 (pH) to 64.53% (TP) for the 12 monitoring parameters. The contributions of WT (63.62%), DO (58.95%), TP (64.53%), and pH (70.81%) mainly come from the pollution sources of PC3 and PC4. In this phase, the unidentified source contribution ranges from 0.06 (pH) to 15.78% (DO), it also contributes to each monitoring indicator to varying degrees. This may be because the contaminants come from diverse and complex sources, making it difficult to quantify pollution sources by using the APCS-MLR model (Zhang et al., 2017).

For HPS, most of the parameters were mainly affected by agricultural nonpoint source pollution (PC1, average 41.23%), manifested as nutrient indexes (NH4+–N, 66.68%, and TP, 65.78%), and AS (83.02%) and EC (51.81%) high contribution rates. Domestic sewage, industrial wastewater, and animal waste (PC2) accounted for 33.19% of total pollution sources, represented as organic parameters CODCr (48.75%), CODMn (51.93%), BOD5 (53.77%), nutrient indices TN (57.91%), and F (46.36%), and DO (34.39%). The natural variations contributed 21.43% (PC3), with most responsible for pH (76.47%) and WT (59.93%). Besides, the unidentified sources also caused the river water pollution of HPS, ranging from 0.23 (pH) to 11.92% (BOD5).

Source apportionment in spatial pattern with APCS-MLR model

The APCS-MLR model was also applied to calculate the contributions of each pollution source to the water quality indicators for LPR and HPR. Similar to the temporal patterns, most of the concentration R2 values of the 12 selected parameters of LPR and HPR were greater than 0.5, with mean values of 0.70 and 0.65, respectively, shows that the predicted value of the model is consistent with the actual observed value to a high degree, and the final apportionment result is scientific and reliable.

Figure 5 shows the source apportionment results for the two spatial patterns. As shown in Fig. 5 (LPR), the contributions of agricultural pollution and rural domestic sewage were 31.01% of total pollution sources (PC1), mainly represented by WT (88.20%), EC (62.31%), and TN (52.07%). Furthermore, organic pollution from rural household waste, industrial effluents, and free-range livestock and poultry pollution (PC2) accounted for 26.82% of total pollution sources, represented by BOD5 (68.07%), CODCr (65.95%), CODMn (56.77%), and F (56.04%), respectively. Contributions of physicochemical influence and domestic sewage (PC3, average 19.86%) to different water quality indicators ranged from 1.97 (NH4+–N) to 69.66% (pH). PC4 (average 14.82%) represented agricultural sources, and the corresponding contribution rates of NH4+-N and TP were 79.83 and 57.23%. Unidentified sources of pollution contributed to water quality indicators ranging from 0.27 (pH) to 6.52% (NH4+–N). For unidentified sources, the contribution to each monitoring indicator ranges from 0.27 (pH) to 6.52% (NH4+–N). Generally, compared with temporal apportionment, the source contribution of unknown pollution was relatively low in LPR (mean contribution of 2.22%). This indicated that the potential pollution sources in the LPR were accurately identified.

Fig. 5
figure 5

Contributions on the selected water quality variables and average contributions of different pollution sources in LPR a and HPR b using APCS-MLR model (UIS: unidentified source)

For HPR (Fig. 5), most water quality parameters were significantly affected by municipal sewage with industrial effluents, agricultural sources, and livestock and poultry wastewater (PC1, 37.96%), shown as nutrients index (NH4+–N, 79.78%; TP, 68.98%) and organic pollutants (CODMn, 62.81%; CODCr, 61.87%; BOD5, 56.68%). Domestic sewage and industrial wastewater sources (PC2) explained 33.55% of total pollution sources, represented as EC (69.94%), TN (69.04%), F (62.21%), and AS (53.22%). The natural sources contributed 25.23% (PC3), with the most responsible for pH (71.73%), DO (67.87%), and WT (67.45%). Unidentified contamination sources contributed to the water quality indicators, ranging from 0.15 (pH) to 12.14% (BOD5). The average contribution (3.26%) of unidentified pollution sources in HPR was roughly similar to that of LPR, indicating that potential pollution sources in HPR were basically identified completely.

Conclusions

In this study, the spatial and temporal distribution patterns of monitoring parameters in Laixi River were discussed by using multivariate statistical techniques, and the contribution of potential pollution sources in different spatial and temporal categories to selected monitoring indicators was clarified. The CA results showed that the 12 months were divided into two clusters, consistent with the LPS and HPS. The spatial clustering results showed that the six monitoring sites in the study area were divided into two groups with different pollution statuses: the LPR and the HPR. The number of pollution sources under different spatial and temporal conditions was determined from the PCA results. Finally, the relative contribution of the sources was quantified using the APCS-MLR model.

For LPS, domestic and industrial wastewater and breeding pollution (PC1), with a contribution rate of 33.80%, and for HPS, source pollution from agricultural activities (PC1), with a contribution rate of 41.23%, were the main pollution sources in river water quality. These were followed by industrial effluents, domestic sewage, and soil weathering (PC2) with a 29.02% contribution and agricultural activities and natural influence (PC3 and PC4) with a 20.95% contribution to LPS, and domestic sewage, industrial wastewater, and animal waste (PC2) with 33.19% contribution and natural variations (PC3) with 21.43% contribution to HPS. The four identified latent sources of contamination in LPR were rural domestic sewage > agricultural pollution > industrial effluents and free-range livestock and poultry pollution > natural influence, with average contributions of 31.01%, 26.82%, 25.13%, and 14.82%, respectively. While in HPR, the three identified latent pollution sources were municipal sewage and industrial effluents > agricultural nonpoint sources and livestock and poultry wastewater > natural sources, with average contributions of 37.96%, 33.55%, and 25.23%, respectively.

The results of this paper illustrate that multivariate statistical analysis methods can serve as excellent exploratory tools for analyzing and interpreting complex water quality datasets and identifying and assigning pollution sources. In addition, this evaluation can help managers and decision-makers gain an in-depth understanding of the main pollution sources of the study area and provide a reference for formulating more reasonable and reliable pollution control strategies in tributary watersheds.