Introduction

Surface water is one of the most significant water resources for the sustainable development of the nature and human society (Bozorg-Haddad et al. 2021; Ebenstein 2010; Ryberg and Chanat 2022). Therefore, its water quality is crucial to maintain the health of the local residents and to ensure socio-economic development (Ma et al. 2021; Madan and Ankit 2020). The water quality pollution and associated ecological environmental degradation are, however, becoming increasingly severe, particularly in China and India, due to rapid urbanization and agricultural modernization (Ali et al. 2016; Nong et al. 2020; Qin et al. 2020; Li and Wu 2019a; Subba Rao et al. 2022; Sarafraz et al. 2020; Varol 2019, 2020; Xu et al. 2022; Nsabimana and Li 2022; Yang et al. 2022). Deterioration of surface water quality is usually regarded a local or regional problem, but has become the focus of global attention in the context of water resource shortage and water environmental deterioration (Ali et al. 2022; Liu and Diamond 2005; Vörösmarty et al. 2010; Li and Wu 2019b; Downing et al. 2021; Subba Rao et al. 2019; Saha et al. 2020; UNESCO 2021; Velpandian et al. 2018). A scientific clarification of the ways and reasons responsible for the spatiotemporal changes in river water quality is critical for water environmental protection.

The Malian River is the main river in the Qingyang area of the Gansu Province, China, and is also the third biggest tributary of the Yellow River. It is essential for maintaining the high quality development of the region and even the Yellow River Basin (Li et al. 2022). Nevertheless, water shortage (Ringler et al. 2010), soil erosion (Du et al. 2021), and deterioration of water environment (Ma et al. 2012; Zhao et al. 2020), have plagued the region. The state and local governments have implemented a series of water environmental treatment, ecological protection, and restoration projects in the Malian River. Though progress has been made on these projects, the water quality of some monitored sections does not meet the national surface water quality requirements. Therefore, the spatiotemporal variation of the water quality must be urgently analyzed to identify relevant pollution sources.

Previous water quality studies on the Malian River focus on the hydrochemical characteristics of the Malian River and an associated water quality evaluation (Wang et al. 2018a; Su et al. 2009). Recent assessments of the seasonal water quality variation and the hydrochemistry also provide information on the overall changes in the Malian River (Wang et al. 2018b). Various anthropogenic or environmental factors with certain spatiotemporal variations may influence the water quality of the river during long-distance flow. However, to our knowledge, though information on the pollution sources of the Malian River is very significant for water resource management, they have not yet been identified. The water quality changes of the Malian River at different spatiotemporal scales are often statistically related to the climate, vegetation, hydrological processes, and socioeconomic conditions such as economic development level, industrial structure and population status (Ma et al. 2020; Qian et al. 2015; Tao et al. 2021; Wang et al. 2023; Yang et al. 2022). However, because of insufficient experimental data, comprehensive and systematic research on pollution sources and their contribution to water pollution is still lacking. Current water environmental analysis methods in the river basin fail to fully reflect the water quality status, which results in the incomprehensive use of a large amount of original data information. Therefore, comprehensive and reliable analyses methods should be used to analyze the data before doing correlation and contribution analyses.

In order to fully understand the water quality characteristics and pollution sources of the Malian River, this study used multivariate statistical methods to analyze the monthly water quality monitoring data at 6 monitoring sections during 2017–2021. These methods have been widely used for data analysis in various situations (Li et al. 2019, 2021; Meng et al. 2018; Nong et al. 2020; Subba Rao et al. 2018; Ren et al. 2021; Wu et al. 2014, 2020), indicating the suitability of them in water quality research. The objectives of this study were (1) to evaluate the spatiotemporal variation characteristics of the potential water quality parameters in the Malian River, (2) to identify the pollution sources by principal component analysis, and (3) to quantify the contribution of the identified sources to the river water quality using the APCS-MLR model. The research can provide a scientific reference for the targeted protection and management of the ecological environment in the Malian River Basin.

Materials and Methods

Study Area

The study area focuses at the Malian River Basin, which belongs to the Gansu Province, Northwest China. It is located between the latitudes of 35°14′ to 37°23′ N and the longitudes of 106°40′ to 108°35′ E. The main stream of the Malian River is located in Qingyang City, Gansu Province (Fig. 1). It flows from north to south through the Qingyang City, and finally joins the Jing River. The main stream of the Malian River is 374.8 km in length, with a watershed area of 19,086 km2, and an average multi-year flow rate of 14.2 m3/s (Wang et al. 2018a, 2018b). The runoff of the Malian River is unevenly distributed throughout the hydrological year, with 60% of the annual runoff occurring between July and September. The river has an annual average sand content of 294 kg/m3 and transports 1.3 × 108 tons of sand on average each year. This accounts for more than half of the average annual sand transported by the Jing River. The northern part of the study area belongs to the mid-temperate semi-arid zone, while the southern part belongs to the warm temperate semi-humid zone. The mean annual temperature of the Malian River ranges from 8 to 12 °C. The average annual rainfall ranges from 480 to 660 mm and decreases from southeast to northwest. The water surface evaporation rate ranges from 1380 to 1750 mm (Du et al. 2021). Qingyang City has a long history of oil resources exploitation, and the main towns and arable land are mostly located along the Malian River. The river has been severely polluted by point and non-point sources from industrial and mining enterprises, domestic waste, and agricultural activities.

Fig. 1
figure 1

Geographical locations of the study area and monitoring sections

Data Collection

Monthly water samples from six water quality monitoring sections were set up and collected by the national and local governments from January 2017 to December 2021. As shown in Fig. 1, six monitoring sections were distributed in the main river channel: Chaijiatai (CJT), Wuliqiao (WLQ), Dianziping (DZP), Tielichuan (TLC), Xinbucun (XBC), and Ningxianqiaotou (NXQT). The detailed information of the monitoring sections is listed in Table 1. Twelve water quality parameters (pH, EC, DO, COD, BOD5, F, TP, TN, NH3–N, E. coli, LAS, and Cr6+) were adopted in this study, among which pH, electrical conductivity (EC), dissolved oxygen (DO) were obtained on site using an portable water quality devices (OTT HydrolabDS5X multiparameter water quality monitor). The sampling procedures followed the national guidelines recommended by the Environmental Quality Standards for Surface Water (General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China and State Environmental Protection Administration of the People’s Republic of China 2002) and the protocols outlined in the Guidance on Sampling Techniques (Ministry of Environmental Protection of the People’s Republic of China 2009). Water samples were collected with pre-washed polyethylene bottles and stored in refrigerators. After sampling, all bottles were immediately transferred to the laboratory, and analyzed within 24 h. The remaining physicochemical analyses of water quality indicators referenced the Standard Methods for the Examination of Water and Wastewater (State Environmental Protection Administration of the People’s Republic of China 2002), and the detailed methods are as follows: chemical oxygen demand (COD), potassium dichromate method; five-day biochemical oxygen demand (BOD5), Winkler’s method (titrimetric); fluoride (F), Ion chromatograph; total nitrogen (TN) and total phosphorus (TP), potassium persulfate oxidation; ammonia nitrogen (NH3–N), Nesslerization method; fecal coliform (E. coli), Enzyme substrate method; linear alkylbenzene sulfonates (LAS), Methylene blue spectrophotometry; hexavalent chromium (Cr6+), Diphenylcarbohydrazide spectrophotometric method. Additionally, blank samples and duplicates were used to control the data quality.

Table 1 Locations of water quality monitoring sections of the Malian River

Source Apportionment Using the APCS-MLR Receptor Modeling Technique

APCS-MLR was used to model the receptor for the water pollution source allocation (Cheng et al. 2020; Li et al. 2021; Meng et al. 2018; Zhang et al. 2022). The model assumes that all the possible pollution sources have a linear contribution to the final pollutant concentrations at the receptor site.

First, the data were analyzed using PCA, where the observations of a group of possible-related variables called principal components (PCs) were generated and extracted. To further simplify the PCA data structure, FA was performed on the basis of the PCA. This was obtained by orthogonal transformation of factor loading matrix using maximum variance method and building new variables. The rotated factor loading matrix and the eigenvalue are then obtained and used to calculate the principal factor eigenvector. The eigenvector and APCS that were obtained from standardized physical and chemical data were further subject to APCS-MLR.

In the first step of the APCS-MLR model, the principal components of the water quality indicators were extracted. This forms the foundation for identifying and quantifying pollution sources. The calculation formulae are as follows:

$$(A_{s} )_{k} = \sum\limits_{{j} = 1} {w_{j}} \cdot z_{k}$$
(1)
$$z_{k} = \frac{{c_k - \overline{c} }}{\sigma },$$
(2)

where, (As)k is the score of the principal component, wj is the weight of the j-th principal component, and j is the principal component serial number in Eq. 1. zk is the standardized value of the parameter concentration at the k-th monitoring value. In Eq. 2, ck is the parameter concentration at the k-th monitoring value, and \(\overline{c}\) represents the standard deviation of the parameter concentration.

(As)k is a standardized value, and therefore, cannot be directly used to calculate the original contribution of the PCs. The standardized factor score must be transformed into a non-standard APCS following the Eqs. (3, 4, 5).

$$APCS_{ji} = (A_{{s}} )_{{ji}}- (A_{0} )_{j}$$
(3)
$$(A_{0} )_{j} = \sum\limits_{i = 1} {S_{ij} } \cdot (Z_{0} )_{i}$$
(4)
$$(Z_{0} )_{i} { = }\frac{{0 - \overline{{c_{i} }} }}{{\sigma_{i} }},$$
(5)

where, (A0)j is the factor score under the value of 0, i is the code of chemical index, Sij is the factor coefficient, and (Z0)i is the standardized value when the observation concentration is set to 0.

MLR takes measured value as the dependent variable, and the APCS is the independent variable of the budgeted pollution concentration. MLR can be expressed as follows:

$$C_{i} = \sum\limits_{m} {a_{mi} } \cdot APCS_{mi} + b_{i},$$
(6)

where, bi is the value of unidentified source, Ci is the measured value, and ami represents the coefficients of the m-th component to the i-th parameter.

The formula for calculating the contribution rate of the m-th pollution source to the i-th pollution factor (PCmi) is as follows (Gholizadeh et al. 2016):

$$PC_{mi} = \frac{{\left| {a_{mi} \cdot \overline{{APCS_{mi} }} } \right|}}{{\left| {b_{i} } \right| + \sum\limits_{m} {\left| {a_{mi} \cdot \overline{{APCS_{mi} }} } \right|} }}$$
(7)

The contribution of the unidentified sources is:

$$PC_{mi} = \frac{{\left| {b_{i} } \right|}}{{\left| {b_{i} } \right| + \sum\limits_{m} {\left| {a_{mi} \cdot \overline{{APCS_{mi} }} } \right|} }}$$
(8)

Data Processing

In this study, time and space are primarily the controlling variables, which belong to one-way ANOVA. ANOVA was used to verify whether different levels of time and space have any significant impact on the parameters (Varol 2019). The possible pollution sources in the river water and the pollutant characteristics were identified via PCA/FA (Kabir et al. 2020a; Tusher et al. 2020). PCA/FA parses the dataset and compresses the data dimensions to maximize the explanation of the original variables with fewer proxies. This is done by analyzing the relationships between multiple variables (Kabir et al. 2020b). This study also used cluster analysis (CA) to group the monitoring sections according to the similarity of water quality. Here, Ward’s correlation approach and squared Euclidean distance were used as the similarity measures, and the clusters were displayed graphically using dendrograms (Varol 2019). This study used the following data processing methods: (1) missing data were estimated as the average value of the corresponding datasets, (2) the goodness of fit was tested using the Kolmogorove-Smirnov (K-S) statistics (Gholizadeh et al. 2016), (3) the applicability of the results was verified by Kaiser–Meyer–Olkin (KMO) and Bartlett’s sphericity test (Wang et al. 2017), and (4) the percentage error (PE) and the root means square error (RMSE) were used to verify the degree of fitting for the established APCS-MLR model (Castrillo and García 2020). The statistical software package SPSS 25 and Microsoft Office Excel 2019 were used for all the data processing.

Results and Discussion

Interannual and Spatial Water Quality Characteristics

From 2017 to 2021, the annually minimum value of pH was 7.56 and the maximum value was 9.95 (Table 2). The annually average value ranged between 8.40 and 8.54, and this indicates that the water quality of Malian River was alkaline during the monitoring period. Irin et al. (2017) found that the river water is slightly alkaline in arid and rainless areas, which is similar to the current research results. The average pH values for the six monitoring sections were all higher than 8.3 (Fig. 2), and there was no noticeable spatial change in the pH from downstream to the upstream sections.

Table 2 Variations of the water quality parameters in the Malian River from 2017 to 2021 (Avg average, S.D standard deviation)
Fig. 2
figure 2

Spatial variations of the means and SD values for the potential water quality parameters at each water quality monitoring section in the Malian River

EC, a representation of the cation concentration, can significantly affect water quality. The EC is influenced by both natural weathering of sedimentary rocks and the artificial sources like industrial and sewage pollution (Martinez-Tavera et al. 2017). EC value fluctuated greatly during the year (Table 2). However, there was no interannual change, and the EC remained relatively stable in different years (Fig. 3). The northernmost section, namely CJT, had the highest average EC value. The spatial variation of EC showed a declining trend from the upstream to the downstream sections (Fig. 2). Increasing downstream water input and the associated dilution reduced the EC concentration of the river water in the downstream.

Fig. 3
figure 3

Temporal variations of the means and SD values for the 12 water quality parameters of the Malian River from 2017 to 2021

DO, a measure of the grade of the water quality ensures the survival of organisms in water (Rajendran et al. 2018). The annually average DO concentrations exceeded the standard of grade I (7.5 mg/L) from 2017 to 2021 (Fig. 3). In 2019 the DO concentration was at its lowest level (4.69 mg/L), while the DO concentrations at all the monitoring sections in other years were > 5.0 mg/L (Fig. 2). This indicates that DO level in the river water is generally within the expected control limit (Table 2). There was no spatial difference in the average DO concentration for each section. DO did not cause the water quality deterioration in the Malian River. The high DO concentrations and the spatial characteristics of this index can be explained by the fact that high-flow water can be replenished more adequate than other natural water bodies through long-distance open channels, such as lakes (Kangabam and Govindaraju 2017; Yang et al. 2019). The correlation between dissolved oxygen and other parameters needs further study since dissolved oxygen simulates the common physical and biological processes in water.

COD levels indicated that organic pollution and nutrient concentrations are present in the river (Nong et al. 2019). Overall, the annually average values of COD had decreased (Fig. 3). The highest detected COD concentration was detected in 2017 (29.48 mg/L). In the two northernmost monitoring sections, the mean COD concentrations exceeded the expected control limit (20 mg/L), while COD concentration in the other sections was below this threshold (Fig. 2). BOD5 can also be used to describe the levels of pollution by organic matter (Lee et al. 2016). The average BOD5 content ranged from 2.63 (2021) to 3.59 mg/L (2019) (Table 2). The annually mean BOD5 values in the NXQT, the WLQ, and the TLC sections were greater than 3 mg/L (Fig. 2). The high BOD5 concentration could be caused by the discharge of a considerable amount of organic-rich domestic sewage and industrial wastewater (Lin et al. 2021).

There was a slight change in the annual value of soluble ion F from 1.077 to 0.946 mg/L (Table 2). The average F concentration of the monitoring section showed spatial difference. Here, the northernmost section had the highest F concentration, while the remaining 5 sections showed weak or no variation (Fig. 2). These results might be due to the input of atmospheric pollutants with precipitation (Huang et al. 2017). Apart from the impact of the water source itself, some external inputs that may be closely related to the effects of human activities also affected the F concentration (Ali et al. 2018; Kimambo et al. 2019; Li et al. 2017).

The annually average TP concentration reduced from 0.106 to 0.070 mg/L from 2017 to 2021 (Fig. 3). The maximum TP concentration in 2017 and 2018 exceeded the standard of the grade III (0.2 mg/L). The inter-annual variation of the TP parameters was in line with the variation of the NH3–N concentration. This further confirmed that the efforts of the management department had improved the surface water status. Spatially, the TP concentration showed an upward trend, and fluctuations in the north and south sections (Fig. 2). The TP concentration of 6 sections was greater than 0.06 mg/L. This means that pollution sources in the area (such as human and livestock releases, and agricultural planting), have an impact on the surface state (Mao et al. 2019).

Apart from the CJT section (annually average TN concentration of 51.09 mg/L), which showed a statistically significant increase in TN concentrations compared to the others (with the annually average TN concentration fluctuations of about 10 mg/L), there were no conspicuous spatial changes (Fig. 2). The annually mean concentration of TN from 2018 to 2021 generally decreased year by year, but was still above 10 mg/L (Fig. 3). 30% of the animal waste in China is returned to the farmland as fertilizer. Previous studies found that livestock and poultry manure is one of the main causes to nitrogen pollution (Li et al. 2016, 2020). However, inadequate governance and poor supervision have caused nitrogen loss in surface water and serious environmental pollution (Bai et al. 2017).

NH3–N characterizes the water pollution level of nutrients (Mao et al. 2019). The mean NH3-N concentration of each section fluctuated at the threshold of the grade III standard limit (1.0 mg/L), with no spatial differences (Fig. 2). The annually mean NH3-N concentration decreased significantly to 0.422 and 0.521 mg/L in 2021 and 2022, compared with the TN concentration from 2017 to 2019 (Table 2).

The maximum measured E. coli concentration was detected in 2018, with values as high as 92,000 n/L (Spring 2018). The annually mean concentrations of E. coli reduced from 2017 to 2021 (Fig. 3). The average concentrations of E. coli were high, especially in the TLC section where the average concentration was 16,877 n/L (Fig. 2). These results illustrate the impact of pollution sources within the basin (e.g., human and livestock releases) (Tong et al. 2016).

LAS is an active component of ordinary synthetic detergents. The LAS concentrations declined from 2017 to 2021 from 0.103 to 0.049 mg/L (Table 2). It has maintained a relatively stable level in space, and the maximum concentration of LAS (0.58 mg/L) was measured in 2017 (Fig. 2). LAS assists in the production of foam that adheres to the water surface. Because of these properties, it reduces dissolved oxygen in the water, which means the water quality and the survival of aquatic organisms may be adversely affected by this component (Ding et al. 2020).

For Cr6+, there was no significant change in the annually average concentration, which ranged between 0.0306 and 0.0453 mg/L (Fig. 3). However, the mean concentration showed significant spatial differences, of which the CJT and WLQ sections in the north had much higher Cr6+ concentrations than the other sections and exceeded the standard of the grade IV prescribed in the Chinese standards for surface water (0.5 mg/L) (Fig. 2). The environmental background value of Cr6+ for the strata in the Huan County area is high, and the groundwater and the river are frequently transformed alternately (Cao 2003). The geology of the area causes a slow groundwater flow speed and strong dissolution and filtration. This causes relatively high groundwater trace element concentrations (Li et al. 2017). The high Cr6+ concentration in surface water may be due to its flow over the local geology and the interaction between surface water and groundwater.

Seasonal and Sectional Water Quality Grouping

This study clustered the monthly monitoring data using cluster analysis (CA) by calculating the Euclidean square distance between the samples, and then using the Ward algorithm to generate the dendrogram. According to the CA results, the monitoring period can be divided into three periods (Fig. 4a): period 1 (January–May), period 2 (June–September), and period 3 (October–December). The grouped temporal characteristics of the river were to a large degree consistent with those of the four seasons (March–May for spring, June–August for summer, September–November for autumn, and December–February for winter). Even though it is autumn in September, the Malian River Basin is in the rainy season (Wang et al. 2018b), with large river runoff and water quality characteristics which are similar to those in summer. These characteristics are typical in period 2, revealing the delayed response of water quality after the rainy season. March and April are dry seasons, with little river runoff, but here the water quality characteristics are similar to those in winter where the water is polluted. The time clustering results for the Malian River show that obvious seasonal changes have been detected, and that the water quality depends on hydrological conditions, such as river runoff (Xu et al. 2019). The runoff in the rainy season dilutes the river pollutants. This is also the time when river levels and flow change, which significantly affects atmospheric reoxygenation and algal growth (Kong et al. 2021).

Fig. 4
figure 4

Dendrogram showing temporal (a) and spatial (b) similarities of monitoring periods or sections

The three periods of the year showed obvious differences in the concentrations of NH3–N and TN in the Malian River water. These water quality indexes in periods 2 and 3 were significantly lower than in period 1. This means that nitrogen-containing sewage discharged from industrial and domestic sources significantly affects the Malian River. The NH3–N and TN concentrations in period 2 were the lowest, indicating that the runoff in the rainy season diluted the nitrogen nutrients discharged from the point sources (Yuan et al. 2020). The accumulation of pollutants in the river channels and sewage pipelines will enter the river water during this period (Ding et al. 2020). These pollutants lead to deterioration of water quality, which also adequately explains the cause of COD and BOD5 exceeding the standard in the rainy season.

The Ward algorithm was used to generate a dendrogram for the spatial CA, similar to the temporal CA. The water samples of the Malian River can be divided into 4 groups (Fig. 4b). Samples in group A belong to the CJT section and group B belongs to the XBC section, group C includes the DZP, TLC, and WLQ sections, and group D is the NXQT section. The sections in these groups had similar water conservancy conditions and were likely affected by similar sources, which caused the classifications to vary with significance level. The first group (CJT) is the initial section of the Malian River, where the COD, BOD5, TN, and Cr6+ exceed the water quality standards. This causes the degraded natural conditions of the Malian River in this section. Group B is relatively far from anthropogenic influences. This group has a class III water quality grade, which is relatively good. Group C consists of three sections, which are each located in a large populous area (Huan County, Qingcheng County, Heshui County). This group shows high COD, NH3–N, F, and LAS concentrations. Water pollution in the study area, including industrial wastewater, agricultural runoff, and urban sewage, affects this group. Group D (NXQT) is the section where the Malian River flows into the Jing River. The wastewater from the Xifeng County and Ning County domestic and industrial sewage treatment plants in this section, as well as rural domestic sewage and agricultural species along the coast, causes a grade of IV–V water quality in this section.

Spatial clustering shows that water quality characteristics of the main stream of Malian River are spatially distributed. The corresponding pollution sources of the cluster are related to the land use, and changes of the clusters in different seasons are caused by hydrological changes in the basin (Xu et al. 2019). This is also consistent with the hydrological conditions of the Malian River and the distribution of coastal pollution sources. For example, industrial, domestic and urban sewage discharges are the main sources of pollution in the WLQ and TLC sections, which explains why the two sections are classified into the same cluster (Fig. 3b). The NXQT at the last section of the river belongs to the same cluster, which may be caused by diluted pollutants that concentrate in the tributary and that degrade downstream.

Identification of Main Pollution Factors

The PCA/FA method was used to identify the natural and anthropogenic sources of the indicators with the largest contribution. The results aided in further understanding their distribution characteristics. To correlate the different parameters, KMO was conducted. The results (KMO = 0.68, p < 0.001) indicate a significant relationship between the different parameters and confirm the suitability of PCA/FA (Gholizadeh et al. 2016). Based on the Kaiser criteria (eigenvalue > 1), a total of 5 PCs were extracted, and these PCs explain 71% of the total variance (Table 3). The low total variance may be due to the inability to sample that was caused by the river ice in winter. Other relevant studies have also shown a similar low value of cumulative total variance (Liu et al. 2019; Huang et al. 2020).

Table 3 Variance explained by the main components for the Malian River

The value of the first principal component (PC1) is 2.442, and it explains 22.2% of the variance. Figure 5 shows that the main load variables include COD, BOD5, EC, and F. The source of this pollution may be industrial sewage, sanitary sewage, aquaculture wastewater (Matiatos 2016; Varol 2020; Zhang et al. 2020). Responding to the increasing pressure for environmental protection since 2017, the local government has made great efforts to renovate the Malian River Basin. The renovation includes the construction and improvement of the urban sewage pipe network, the relocation of the industrial discharge outlets, and the relocation and closure of the livestock and poultry breeding industry within a 500 m range of both sides river. Other efforts include the restoration of the embankment vegetation, the treatment of the polluted water bodies in the urban areas, the collection, storage, and transportation of crop straws, the dredging of some rivers that have serious sediment deposition, and the removal of toilets near the rivers in rural areas. However, upgrading and renovating the sewage treatment plant and constructing the sewage pipelines, have taken a long time and are still to be fully completed. Rain is currently still mixed with sewage flow since large volumes of urban domestic sewage and urban runoff overflow into the Malian River area during flood events (Ding et al. 2020). Research show that urban sewage and urban runoff are important organic pollutants sources for pollutants such as COD (Lin et al. 2021). The average concentrations of COD in the roof and road runoff in Chinese urban sites are, for example, 125 and 284 mg/L, respectively (Zhang et al. 2020). These values exceed the grade V standard limit of surface water by 2.1 and 6.1 times. Urban runoff is the second largest non-point source of pollution after agricultural source pollution. Therefore, PC1 can be defined as having an urban sewage-urban runoff source.

Fig. 5
figure 5

Component loadings for 12 parameters after varimax rotation

PC2 explains 14.64% of the total variance. TN and TP have a strong positive load (Fig. 5). This principal component has high nitrogen and phosphorus nutrient concentrations. High nutrient concentrations are a typical feature of chemical fertilizer. The sources of these nutrients are the non-point source pollution from orchards and farmlands. The proportion of land use type area affects the concentration of nitrogen and phosphorus in rivers (Yang et al. 2019). About 29% of the cultivated land in the Malian River Basin was treated with pesticides. Since fertilizers and residual pesticides can fairly easily enter the gully through leaching, infiltration, and soil erosion. PC2 is characterized as an agricultural non-point source.

PC3 accounts for 13.79% of the variance. The main load variables of PC3 include E. coli, NH3–N, and DO, at a moderate positive loading (0.593, 0.574, 0.545, respectively). Escherichia coli which grows in human and animal intestines represents the degree of fecal pollution in river. Escherichia coli can come either directly from the discharge of feces or from the release of pollutants that have accumulated in the river sediment. The temperature and nutrients in the sediment and the light in the deep water provide suitable conditions for the survival of E. coli (Taoufik et al. 2017). By 2019, the Malian River Basin has been cleared from livestock and poultry farms, and slaughterhouses that were located within 500 m of the main river. However, pollutants that have accumulated from livestock and poultry breeding in the past have built up in the river sediment and contaminated the river. The scale of the centralized and decentralized livestock and poultry breeding industry in the basin has gradually expanded in recent years, with a total of 365 breeding enterprises and professional breeding cooperatives according to our survey. Many farms and slaughterhouses discharge wastewater without treatment. This wastewater leaches into the river during rainfall events or is directly discharged into rural rivers by simple pipeline devices which pollutes the water source. Statistics on China’s animal husbandry indicate that the output of livestock manure in China has increased in the past decade (Bao et al. 2019). As much as one-third of animal waste is used as fertilizer for farmland (Bai et al. 2017). Therefore, PC3 can be characterized as the source of livestock and poultry breeding.

PC4 explains 10.54% of the variance. The main load variable is LAS. LAS generally comes from three aspects: (1) the emulsifiers, spreading agents, and detergents in pesticides that are released during agricultural production; (2) the detergent, cosmetics, and other articles used by residents; and (3) the industrial wastewater discharged by industrial enterprises producing and applying surfactants (Katam et al. 2020). The water quality of the Malian River has greatly improved over recent years because industrial sewage outlets have moved outside. Each county in the basin has also built a sewage treatment plant. TN and TP are also not the primary factors of PC4. Domestic sewage in villages and towns accounts for more than one-fifth of the total domestic sewage discharges in China. Qingyang City predominantly practices agriculture production and is located on the Loess Plateau. It has many rural residents and high volumes of rural domestic sewage discharges. More than half of the total discharge of rural domestic sewage comes from washing wastewater which contains a large amount of LAS. The PC4 excludes industrial sources, urban living sources, and plantation sources, and is considered the main rural sewage source.

PC5 accounts for 9.89% of the variance, while the main load variable is Cr6+. The high concentration of Cr6+ in this area is caused by the high environmental background value. Firstly, the geological mineral components in the study area form a geological environment rich in Cr6+. The Cr6+ is weathered and dissolves into groundwater and surface water (Cao 2003). Secondly, the groundwater hydrological regime is slow, leading to the slow diffusion of Cr6+ in local areas. This increases the concentration of Cr6+ (Zhang et al. 2020). Therefore, PC5 can be defined as the geogenic source.

Contribution of Pollution Sources

The multiple linear regression model was constructed and linearly fitted to the measured results based on the determined sources of the Malian River basin. Correlation coefficient greater than 0.5 means that the model has a high degree of prediction (Gholizadeh et al. 2016; Shen et al. 2021; Zhang et al. 2022). The R2 between the predicted concentration and the observed concentration of the water quality indices ranges from 0.671 to 0.762 (Fig. 6). This indicates that the prediction results of the APCS-MLR model in this study perform well.

Fig. 6
figure 6

Scatter plots of predicted and observed normalized concentrations of species using the APCS-MLR model

Figure 7 shows the proportions of influence of different pollution sources to water chemical parameters. The urban sewage-urban runoff source (PC1), which has an average contribution of 23.67%, was the main source of COD (57.40%). It also contributed BOD5 at 36.47%. The contribution of PC1 to the EC, pH, NH3–N, DO and LAS, ranges from 20 to 25%, while its contribution to other water quality indicators is relatively low. Agricultural non-point sources accounted for 18.65%, as indicated by the analysis of PC2. This is explained by the high proportions of TP (52.61%) and TN (41.84%). For PC3, the contributions of livestock and poultry breeding sources were 16.03%. For the 12 water quality indices, the contributions ranged from 0.50% (Cr6+) to 43.93% (E. coli). Escherichia coli (43.93%), NH3–N (43.54%), and TN (23%) were largely affected by the pollution sources within PC3. For PC4, rural domestic sources accounted for 19.81% of all the pollution sources. These are the main sources of the LAS (48.11%). The contribution of PC3 to COD, E. coli, and NH3–N is nearly 30%. This could be the case since rural domestic pollution has a certain pollution impact on the nutrient elements and the organic matter of the surface water quality. The geogenic source (PC5) accounted for 11.12% of all pollution and is associated with the unique environmental geological and geomorphic characteristics of the study area. The change in concentration of Cr6+ mainly reflects the impact on the surface water quality, where Cr6+ contributes 68.09%. In addition, PC5 contributes more than 15% to F and pH. The original geological environment of the study area may also possibly affect the water quality in terms of hardness, nutrients, and other indicators.

Fig. 7
figure 7

Contributions of different pollution sources on water quality variables (a) and the average contributions in the study area (b) using APCS-MLR model

Compared with previous studies, the unknown contribution rate in this study is generally low at 5.39% (Cheng et al. 2020; Fu et al. 2020; Liu et al. 2020). This means that the selected water quality parameters for source identification are reasonable, and the distribution results are reliable and accurate. Based on the results of the contribution of the five common factors to the surface water quality, the urban sewage-urban runoff is still the main pollution source of the Malian River. The rural living sources and the agricultural non-point sources are secondary pollution sources. Livestock and poultry breeding accounts for significant river water pollution. The impact of geogenic sources on the water quality can also not be ignored.

Conclusions

This work analyzed the temporal and spatial changes of 12 parameters of the Malian River from January 2017 to December 2021. The pollution sources and their main contributions to groundwater pollution were analyzed by running the APCS-MLR receptor model. A summary of the results follows:

  1. (1)

    The water quality of the Malian River shows an obvious spatiotemporal distribution. The EC, E. coli, and Cr6+ concentrations in the two northern sections (CJT and WLQ) are higher than in the other sections. This means that more frequent monitoring must be done in the two sections. The water environment has improved from 2017 to 2021, but is still dominated by excessive nutrient elements (TP and TN), and a high organic index (COD and BOD5). The pollution characteristics of the river are largely consistent with the four seasons. Water quality issues in the dry season should specifically be addressed.

  2. (2)

    Five potential pollution sources were determined based on the PCA/FA results. These include an urban sewage-urban runoff source, a rural sewage source, an agricultural non-point source, the livestock and poultry breeding, and a geogenic source.

  3. (3)

    The impacting ratios of the five pollution sources on surface water quality calculated by the APCS-MLR model are 24.08%, 20.10%, 18.65%, 16.23%, and 11.12%, respectively. The five sources have different contributions to COD, TP, TN, and NH3–N. The main sources of COD and NH3–N pollution in the river include urban and rural domestic sewage, while farmland chemical fertilizers are the main sources of TN and TP pollution. High levels of E. coli were also detected in this river. The E. coli pollution could be caused by feces, and further study on this aspect is needed. The local geological background causes Cr6+ concentrations that exceed the water quality standard. Terminal treatment should be considered to reduce Cr6+ pollution.