Introduction

Adequate and high-quality water resources are a prerequisite for economic development, ecological health, and sustainable development. As the social economy rapidly develops and as people focus more attention on water pollution, the conflict between the supply and demand of water resources is becoming increasingly prominent. Many areas are facing the problem of water shortages (Chen et al. 2020), with nearly half of the world's population living in areas with potential water shortages. This figure could rise to between 4.8 billion and 5.7 billion by 2050 (Tang et al. 2020; WWAP and UN-Water 2018). Rivers provide a main supply of freshwater resources for industry, agriculture, residents, and municipal administration, and had led to significant interest in river health (Wang et al. 2020).

The water quality of rivers is mainly affected by two factors: natural factors such as topography, hydrology, and meteorology (Wu et al. 2018), and human factors such as pollution discharge and water conservancy facilities construction (Adu 2021). These factors highlight the need to collect data about water quality changes, establish an appropriate evaluation system, take effective management measures, prevent water pollution, and control river water quality. The evaluation of water quality for different types of water bodies has been studied in many countries (Qu et al. 2020; Srinivas et al. 2020). In addition, it is important to understand the spatiotemporal variation of water quality indicators, their influencing factors, and effective water pollution control methods based on in-situ sampling technology and hydrochemical monitoring methods.

The water quality index (WQI) method has been widely used to assess the quality of surface water and groundwater (Seifi et al. 2020; Wu et al. 2021). The method can transform a large number of complex water quality data sets into a single indicator to reflect the overall water quality status (Pesce and Wunderlin 2000; Qu et al. 2020). Horton built the first WQI model in the 1960s based on 10 water quality indicators (Horton 1965). The WQI model has a general structure that can be used to evaluate spatiotemporal variations in water quality. With continuous improvement and development, this method has become a popular water quality assessment tool and has been widely advanced in many countries (Gikas et al. 2020; Silva et al. 2021). For example, the Dinius Index has been used to evaluate the water quality in the USA. The Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI) is a modified version of the British Columbia WQI, which was originally used to assess water quality in Canada (Gikas et al. 2020). The Almeida Index and Hanh Index are other examples of indicators developed in recent years (Uddin et al. 2021).

Different water quality indicators have different limitations with respect to water quality assessment. Examples of these limitations are as follows. (1) Water pollution sources differ in different study areas, such as point source pollution and non-point source pollution (Adu 2021; Srinivas et al. 2020). (2) Different levels of social development and land use structures lead to different levels of pollution, such as eutrophication, industrial and organic agricultural wastewater pollution, and heavy metal pollution into mining water bodies (Çelekli and Şahin 2021). (3) Operational errors can occur during sampling and experimental analysis (Nong et al. 2020). Therefore, we can identify and select key water quality indicators based on the big data of water quality monitoring. This has led to the minimum WQI (WQImin) model, which minimizes the variables needed for conclusive results. The WQImin model simplified the WQI model, and the indicators selected for the WQImin model were easy to measure, reduced the analysis cost, and explained the overall variation and characteristics of water quality (Pesce and Wunderlin 2000) This model is particularly suitable for developing countries (Koçer and Sevgili 2014; Nong et al. 2020; Pesce and Wunderlin 2000). Studies have shown that there is a high correlation between WQImin and WQI results (Akkoyunlu and Akiner 2012; Sanchez et al. 2007). Choosing an appropriate WQImin model can effectively explain the WQI model and increase the efficiency of water quality evaluations. Furthermore, the prediction accuracy and fitting degree of the WQImin model will also be different, based on the different selection of key indicators and weights (Kannel et al. 2007; Wu et al. 2018). As such, we need to determine the crucial water quality indicators and whether to increase the indicator weights based on the actual situation.

The Yihe River belongs to the Huaihe River basin and has abundant water resources. The Yihe River flows through Shandong and Jiangsu provinces, the total population accounted for about 13.2% of China’s total population, which have a large population and economic output. As an important catchment river in the eastern route of the South-to-North Water Diversion Project (SNWDP-ER), the entities managing the river have undertaken the task of ensuring water quality safety. Good water quality is beneficial to alleviate the contradiction between supply and demand of water resources in the vicinity of the basin, improve the surrounding ecological environment, and promote economic development. Previous studies have mostly focused on the water quality of a certain section of the Yihe River, such as the Shandong section and Xuzhou section (Li et al. 2017; Wu et al. 2020), and specific water quality indicators such as phytoplankton and nitrogen and phosphorus nutrients (Gao et al. 2008; Li et al. 2020b). Few studies have comprehensively evaluated the overall water quality of the river. In addition, some scholars had studied changes in the water quality (Yang and Shen 2021), but the evaluation process had been relatively general, making it difficult to accurately explain the spatiotemporal distribution characteristics of pollutants in the river. In addition, for developing countries, we hope that while accurately assessing water quality, the detection cost can be reduced, and the key indicators affecting water quality can be determined according to the socio-economic development of the watershed itself, so as to provide a basis for integrated water resources management.

In this study, we selected 8 sampling sites located from the upstream to the downstream of the Yihe River. We analyzed the water quality of the Yihe River based on 10 water quality indicators. The objectives of this study were to (1) analyze the spatiotemporal variation of the water quality indicators, (2) evaluate the water quality using the water quality index (WQI) method, and (3) select the key water quality indicators and construct a low-cost and efficient water quality management model. The results are not only applicable to the Yihe River basin in China, but also have important reference value for water resources management in other river basins in the world.

Materials and methods

Study area

This study focused on the Yihe River in China (Fig. 1). Sampling sites were selected based on the division of reaches in the Yihe River. Sampling sites S1and S2 were located upstream; sampling sites S3 to S5 were located midstream; and sampling sites S6 to S8 were located downstream. The Yihe River is 220 km long with a catchment area of 10,315 km2; the average annual runoff is 3.4 billion m3. The topography of the catchment is high in the north and low in the south, with a large drop between them. Upstream, the river flows through a mountainous area of more than 500 m; the downstream flow was through the Yishu Plain and the Subei Plain. The Yihe River area is in a temperate monsoon climate zone, with hot and rainy summers and cold and dry winters. The mean annual rainfall is 850 mm. The SNWDP-ER runs through Jiangsu Province, Shandong Province, and Hebei Province, and is a state-level inter-provincial regional project that delivers water for production and domestic use to North China. The Yihe River is an important catchment river for the SNWDP-ER, providing a variety of benefits and ecosystem services, such as flood and drought regulation, fishery and aquaculture, drinking water sources, and biodiversity conservation.

Fig. 1
figure 1

Location of the sampling sites in the Yihe River, China

Sample collection and laboratory analyses

The study period for this research was from 2009 to 2019 and the spatiotemporal variations of the water quality indicators were analyzed from March 2018 to February 2019. Samples were collected once a month to reduce the influence of precipitation on the data collection. The 8 sampling sites were distributed in the upper, middle, and lower reaches of the Yihe River. In addition, 4 water samples were collected from each sampling site.

Surface water samples were collected at a water depth of about 20 cm and were placed in 5-L plastic buckets that had been rinsed with surface water. A multi-indicator probe was used to measure pH, WT, and DO in situ in the Yihe River. To assess the other indicators, 2-L water samples were collected at each location with a 5-L plexiglass sampler and were stored in 0.5-L polyethylene bottles. The samples were kept in a refrigerator filled with ice. All samples were transported from the site to the laboratory for analysis. To prevent identification errors, the sample containers and labels were waterproofed. In addition, blank samples were established to assess the accuracy of analysis. More details of sample chemical analysis are provided in Table S1.

A total of 10 water quality indicators were assessed, including pH, water temperature (WT, ℃), dissolved oxygen (DO, mg/L), 5-day biochemical oxygen demand (BOD5, mg/L), chemical oxygen demand (COD, mg/L), total phosphorus (TP, mg/L), ammonia nitrogen (NH3-N, mg/L), nitrate (NO3-N, mg/L), fluoride (F, mg/L), and sulfate (SO42−, mg/L). The evaluation standard of water quality indicators in this study was based on the Environmental Quality Standards for Surface Water of China (GB 3838–2002) (China 2002).

Calculation of WQI

This study adopted the WQI calculation method proposed by Pesce and Wunderlin (2000). Each water quality indicator was weighted, with the final result reflecting the overall water quality (Pesce and Wunderlin 2000). The ten water quality indicators introduced above (WT, pH, DO, BOD5, TP, NO3-N, NH3-N, COD, F, and SO42−) were used to calculate the WQI, and their measured values were used for normalization. The WQI was calculated as follows:

$$WQI=\frac{\sum \limits_{i=1}^{n}{C}_{i}{P}_{i}}{\sum \limits_{i=1}^{n}{P}_{i}}$$
(1)

where n is the total number of indicators included in the study, Ci is the normalized value of indicator i, and Pi is the weight of indicator i. The Pi value used in the study ranged from 1 to 4 (Table S2), which are values verified in the literature (Koçer and Sevgili 2014; Pesce and Wunderlin 2000). The WQI value was designed to range from 0 to 100, with higher values indicating better overall water quality. The water quality was classified into five levels: 0–25, 26–50, 51–70, 71–90, and 91–100, corresponding to very poor, poor, moderate, good, and excellent, respectively.

A simple and economical water quality evaluation model was developed for the Yihe River, and water quality indicators were weighted and non-weighted. Key water quality indicators were selected using a multiple linear stepwise regression analysis method, supporting the establishment of the WQImin model (Koçer and Sevgili 2014; Pesce and Wunderlin 2000). The weighted WQImin-w model was calculated using Eq. (1), and the non-weighted WQImin-nw model was calculated based on Eq. (2) as follows:

$${WQI}_{min-nw}=\frac{\sum \limits_{i=1}^{n}{C}_{i}}{n}$$
(2)

Two steps were completed to establish the WQImin model for the Yihe River: training and testing. First, 512 data points collected from 2009 to 2017 were used as “training data” to select the key indicators for the WQImin model. These data were derived from laboratory samples and hydrological station monitoring data. Then, we tested the WQImin models using the 96 samples collected between March 2018 and February 2019.

Data analysis method

The Mann–Kendall (M–K) test is commonly used to detect changes in hydrological and climatic time series. In this study, the MK test was used to analyze the temporal variations of the water quality indicators (Fig. S1). A Pearson correlation analysis was used to assess the correlation between the water quality indicators. A one-way analysis of variance (ANOVA) was used to verify the spatial variation of water quality indicators. Before the stepwise multiple linear regression, a logarithmic transformation was applied to all data to satisfy the needs for homogeneity in the variance and requirements for a normal distribution (i.e., lg(x + 1)). We used the coefficient of determination (R2) to evaluate the fitting degree of the established WQImin models; the mean square error (MSE) and the percentage error (PE) were used to evaluate the forecasting precision of the WQImin models. Matlab was used for the correlation analysis and the M–K test (version R2017b). SPSS statistical software was used to complete the other data analysis (version 18.0).

Results

Spatiotemporal variations of water quality indicators

The pH values of the upstream, midstream, and downstream of the river were highest in the summer, with values of 8.18, 8.00, and 8.12, respectively (Fig. 2a). The pH of all samples exceeded 7, indicating that the water body was in an alkaline state (Table 1). WT was highest in the summer and lowest in the winter (Fig. 2b), which was consistent with changes in the local temperature. The mean values for WT at the sites showed no spatial differences from north to south (P > 0.05). The lowest detected DO concentration was observed at the downstream areas in the summer (7.4 mg/L), and detected DO concentrations reached the Class I standard for surface water during other study periods (Fig. 2c). The mean DO concentrations showed few significantly spatial differences from north to south (P > 0.05); however, they did gradually decrease from the upstream to the downstream in the summer. There were no significant trends from the results of the M–K test at any sites (Fig. S1).

Fig. 2
figure 2figure 2

Spatiotemporal variations of the water quality indicators in the Yihe River. (a) pH; (b) WT; (c) DO; (d) BOD5; (e) COD; (f) TP; (g) NH3N; (h) NO3-N; (i) F-; (j) SO4 2-

Table 1 Spatial distribution of water quality indicators in the Yihe River (S.D.: Standard deviation)

The concentration of BOD5 was higher in the downstream region (Fig. 2d). The maximum value occurred in the spring, at value of 3.2 mg/L. The annual average BOD5 concentration at most of the sampling sites (5/8) was less than 3.0 mg/L, meeting the Class I surface water standard. The COD concentration varied little from season to season; however, there was significant spatial variation (Fig. 2e). The average COD concentration was lower upstream (13.43 mg/L) and higher downstream (18.42 mg/L). The monthly average concentration of BOD5 and COD was significantly different (P < 0.05). The M–K test showed that only one site showed a significant downward trend for BOD5 in S6. The concentrations of BOD5 and COD had a similar spatiotemporal variation, which reflects the degree of pollution from industrial, agricultural, and living organic pollutants.

The monthly mean TP concentrations showed a narrow variation range across sampling sites, from 0.01 to 0.19 mg/L, respectively. The mean TP concentrations in all seasons exceeded 0.02 mg/L (Fig. 2f), indicating that the TP levels posed a high risk for triggering the water quality to degrade to the Class III standard during the study period. The average TP concentrations varied spatially across the sampling sites (P < 0.05), with 2 sites (S2 and S7) showing significant variability. The average monthly concentration of NH3-N showed narrow variations (Fig. 2g), ranging from 0.06 to 0.70 mg/L. The maximum NH3-N concentrations occurred in the summer, with values of 0.26 mg/L, 0.27 mg/L, and 0.26 mg/L, for the upstream, midstream, and downstream, respectively. No sampling sites showed significant trends. The NO3-N concentration in the upstream area was relatively high (Fig. 2h), ranging from 0.56 to 4.68 mg/L, respectively. The M–K test results indicated one site showed a decreasing trend for NO3-N in S5.

The two soluble ions, F and SO42−, were both detected at their highest concentrations downstream (Fig. 2i and j), with values of 0.99 mg/L and 176.00 mg/L, which indicated the presence of precipitation-driven pollution and hydrodynamic effects, respectively (Pant et al. 2021a). The monthly mean F concentrations all achieved the Class I standard for surface water. The M–K test results found that F showed significant decreasing trends in S1 and S2, and SO42− showed a significant decreasing trend in S4.

Relationships among water quality indicators

The Pearson correlation matrix was used to determine the relationships among the 10 water quality indicators in the Yihe River (Fig. 3). The correlation coefficient varied in different areas. In the upstream area, there were high positive correlations between COD and NH3-N, and high negative correlations between DO, WT, COD, BOD5, and F. The correlation coefficients varied from − 0.817 to 0.713 in the midstream area and varied from − 0.791 to 0.742 in the downstream area. Water quality indicators having high correlation coefficients in the water body may have similar hydrochemical characteristics (Helena et al. 2000).

Fig. 3
figure 3

Correlation analysis of water quality indicators in the upstream (a), midstream (b), downstream (c), and all sampling sites (d) of the Yihe River

Water quality assessment

The spatiotemporal variations in WQI in the Yihe River were showed in Fig. 4. The water quality of the Yihe River was classified at a “good” level (71–90) during the study period, with all sampling sites showing average WQI values > 75.

Fig. 4
figure 4

Temporal (a) and spatial (b) variations of the water quality index (WQI) in the Yihe River

With respect to temporal patterns, there was little variation in the WQI in the 12-month sampling period. The lowest and highest monthly average WQI values were in August 2018 and January 2019, with values of 76.00 and 84.56, respectively (Fig. 4a). There were small seasonal differences, with the following WQI values for each season: spring (80.15), summer (80.08), autumn (81.67), and winter (81.17). The WQI values decreased from spring to summer, increased from summer to autumn, and then fell again in winter. The lowest and highest WQI values occurred in the summer and the autumn, respectively. These results indicated the ability of the water bodies in the Yihe River to self-purify.

In general, the water quality at sampling sites from upstream to downstream showed distinct spatial variations and did not remain stable. The mean WQI values ranged from 78.54 to 83.67, respectively (Fig. 4b). The highest WQI values occurred in the upstream area (82.43), followed by the midstream area (80.44), and the downstream area (79.32).

WQIminmodels construction

According to stepwise multiple linear regression analysis, NH3-N contributed the most to WQI for training data, with R2 = 0.776 (P < 0.001). The other water quality indicators were added into the model, and the model fitting degree was compared after adding each indicator. The results showed that when BOD5, DO, and SO42− were added into the model in turn, R2 values increased, being 0.830, 0.854, and 0.874, respectively (Table S3). WT and COD could also improve model performance slightly. Therefore, NH3-N, BOD5, DO, and SO42− were selected as the basic key water quality indicators for the evaluation of water quality of the Yihe River. The performance of WQImin model with WT and COD were also compared.

The performance of different WQImin models was evaluated by comparing their degree of fitting and prediction accuracy (Table 2). The results showed that the R2 values of the model were improved by increasing the regression indicators regardless of whether the indicators were weighted or not. Among the WQImin models based on training data sets, WQImin-w2 model had the largest R2 values and the lowest MSE and PE values, indicating that WQImin-w2 model was the most suitable model in this study. In addition, WQI values based on the training data set and WQImin values were different (Fig. 5).

Table 2 Linear models explaining WQI (lg(WQI + 1)) based on the training data set (n = 512). The model results from a stepwise selection procedure using all 10 parameters were examined after normalization
Fig. 5
figure 5

The values of WQI and WQImin based on the training data set. Different letters (a, b, c, d, e, and f) indicated significant difference (P < 0.01) between mean values

The results from the WQImin predictions based on the testing data showed that the WQImin-w2 model performed best among the WQImin models. The WQImin-w2 model had the lowest PE and MSE values of 2.07% and 0.51 (R2 = 0.972, P < 0.05), respectively (Fig. 6). Comparing the R2 (0.897), MSE (1.70), and PE (3.79%) values showed that the WQImin-w3 model, which also had five selected parameters, performed relatively poorly compared to the WQImin-w2 model. The WQImin-w4 model had a slightly larger R2 value compared to the WQImin-w2 model, with R2 = 0.956 (P < 0.05). The MSE and PE values of the WQImin-w4 were also greater compared to the WQImin-w2 model, indicating that the WQImin-w4 model showed a poorer prediction ability compared to the WQImin-w2 model. In addition, the performance of the model based on the training data set showed that the weighted WQImin model sometimes performed better than the non-weighted WQImin model when using the same parameters. However, all the weighted models performed better when applying the testing data set.

Fig. 6
figure 6figure 6

The prediction accuracy and fitting degree of WQImin models based on the testing data set.  (a) WQI-WQImin-nw1; (b) WQI-WQImin-w1; (c) WQI-WQImin-nw2; (d) WQI-WQImin-w2; (e) WQI-WQImin-nw3; (f) WQI-WQImin-w3; (g) WQI-WQImin-nw4; (h) WQI-WQImin-w4

Discussion

Factors affecting water quality

The water quality in the Yihe River showed clear spatiotemporal variation based on the individual water quality indicators. The concentrations of WT, DO, NH3-N, and F changed in different seasons. The concentrations of BOD5, COD, TP, and SO42− varied significantly in different parts of the river, increasing in levels from upstream to downstream. The Yihe River flows through Shandong Province and Jiangsu Province, two of which are very populous. In 2019, the urban population density were 1665 people per square kilometer in Shandong Province and 2221 people per square kilometer in Jiangsu Province (China 2020). The Yihe River flows through towns and villages and has tributaries, including the Tanglanghe River, Menghe River, and Dongwenhe River. Pollution sources include domestic sewage, extensive mining activities, and industrial and agricultural effluents (Guo et al. 2020; Wang et al. 2017). These were possible drivers for the deterioration of water quality. The Yihe River originates from Tianzhuang Reservoir, Yiyuan County, Shandong Province. The concentrations of water quality indicators in Tianzhuang Reservoir from March 2017 to February 2019 were similar to our results (Table 3).

Table 3 Comparison of water quality in this study with other projects and rivers

When comparing our results with the water quality of the Langmaoshan Reservoir and other reservoirs in Shandong Province (Hou et al. 2016), they were similar. This indicates that in recent years, under the management of the Chinese government, the water quality in Shandong Province has been relatively stable, and the water pollution has been effectively controlled. DO concentrations in the spring and the winter were significantly higher compared to other seasons. A correlation analysis showed that DO concentrations had significant negative relationships with WT, which was consistent with previous research (Qu et al. 2020; Varol 2020).

The Yihe River is a catchment in the SNWDP-ER. In other areas of this project, the inorganic nitrogen concentrations were higher in the Yangtze River basin compared to in the impounded lakes, and long-term water transfers may lead to an increase of inorganic nitrogen in those lakes (Nong et al. 2020; Qu et al. 2020). Inorganic nitrogen in water is mainly present in the form of nitrate, partly from the oxidation of NO2 and NH4+, and partly from the direct input of human activities (Nong et al. 2020). This may have caused changes in the inorganic nitrogen concentrations in the Yihe River.

During the study period, the TP concentration was relatively high, which was of concern, as excessive nitrogen and phosphorus may lead to the eutrophication of water bodies and trigger algal blooms (Guo et al. 2020). Agriculture is the main land use type in the Yihe River basin, and fertilizers are widely used. Nitrogen and phosphorus fertilizers play an important role in promoting agricultural production, and it is difficult to accurately predict fertilization requirements. As such, frequent excessive application may lead to environmental pollution (Zhang et al. 2021), and nitrogen and phosphorus pollution from non-point sources may affect the water quality of the Yihe River. In addition, water quality indicators were closely related to the urbanization, industrial and agricultural production, land use types, and environmental protection infrastructure of the basin (Zhang and Lou 2011).

The water quality in the Yihe River was classified as having a “good” level during the study period. The reasons for slight changes in the WQI in different months may be related to WT and precipitation (Jha et al. 2020). Further, wind and wave disturbance, sediment resuspension, and river hydraulic conditions change may have led to the release of a large number of endogenous nutrients into the river (Cao et al. 2017). The different spatial distribution of water quality indicator concentrations could be related to physiographic metrics, such as physiography, landscape composition, and landscape configuration (Clément et al. 2017; Wu and Lu 2021). The upstream areas of the Yihe River flow through mountainous areas, whereas the downstream areas are mostly plains. Topography may affect water quality, by limiting the spatial distribution and composition of land use, altering pollutant volumes and the distribution of pollution sources (Pratt and Chang 2012). However, physiographic characteristics may also affect river water quality by influencing the path of pollutants from non-point sources to rivers, changing the surface runoff velocity and erosion amount (Li et al. 2020a; Yu et al. 2016; Zhou et al. 2017).

In 2012, the annual discharge amount of waste water in the Yihe River Basin was 1.9 × 108 m3, COD discharge was 1.6 × 104 t/a, NH3-N discharge was 1.04 × 103 t/a. The annual discharge of waste water in Linyi city was 1.7 × 108 t, COD discharge was 1.52 × 104 t/a, NH3-N discharge was 989 t/a. In 2018, the annual amount of waste water discharged into the Yihe River by the main sewage outlet in Linyi City was 1.9 × 108 m3, the annual amount of COD was 5.2 × 103 t, the annual amount of NH3-N was 384 t, the annual amount of TP was 48 t, and the annual amount of TN was 0.3 t. In 2020, there was a total of 58 sewage outlets in the Yihe River Basin, including 7 in Zibo city and 51 in Linyi City. The amount of sewage accepted in the downstream reaches of the Yihe River was large, resulting in the rise of water pollutant content and the decline of water quality. In 2018, the COD discharge into the river in Linyi Section of the Yihe River decreased by 66% and NH3-N decreased by 61% compared with 2012. The WQI of the Yihe River had improved significantly since 2006 (Fig. S2), indicating that the environmental protection measures implemented by the Chinese government were effective.

Crucial water quality indicators selection

The stepwise multiple linear regression analysis results were used to propose a WQImin model for this study, which consisted of 5 crucial water quality indicators, i.e., NH3-N, BOD5, DO, SO42−, and WT, which showed excellent performance for evaluating water quality. A simple ANN was introduced to demonstrate the effectiveness of the proposed model (Haghiabi et al. 2018). This ANN had two hidden layers with 10 neurons and 3 neurons, respectively. And the nonlinear ReLU was used as the activation module of the input layer and first hidden layer. Four groups of key water quality indicators were used as inputs; the results were shown in Table S4 and Figure S3. The accuracy of the model predictions was highest when the input variables were NH3-N, BOD5, DO, SO42−, and WT. The WQImin model simplified the WQI model, and the indicators selected for the WQImin model were easy to measure, reduced the analysis cost, and explained the overall variation and characteristics of water quality (Pesce and Wunderlin 2000).

As discussed above, NH3-N was the first indicator selected by the regression analysis and contributed the most in explaining the WQI variations (R2 > 0.77, P < 0.001), which was consistent with the spatiotemporal variation. Previous studies have reported the importance of NH3-N in representing water quality (Wu et al. 2018). NO3-N and NH3-N are forms of inorganic nitrogen; however, NO3-N was not used as a key indicator in the WQImin model for this study. Multiple linear regression analysis showed that NO3-N contributed less to the WQI model compared to other indicators, and did not effectively represent the WQI model (R2 > 0.002, P > 0.001). In addition, TP is also a nutrient salt indicator and has been used in previous studies as a key water quality indicator (Koçer and Sevgili 2014; Nong et al. 2020). In this study, while TP was selected as a regression model indicator before NH3-N, the WQImin model performed better with NH3-N compared to TP, based on the data training results. This may be related to the geographical location of the Yihe River and the water quality management measures with local characteristics, causing the pollution sources and nutrient levels to differ from other study areas (Kannel et al. 2007). Consequently, NH3-N was used to represent the nutrient levels in the Yihe River. This reflects the uniqueness of the water quality of the Yihe River.

BOD5 and DO were the second and third indicators added to the regression model, showing their importance to the WQI. This may reflect the impact of urban and rural sewage, intensive livestock and poultry breeding, industrial wastewater, and inputs of other organic pollutants on the water quality of the Yihe River. Organic pollutants are decomposed by the biochemical action of aerobic bacteria, and oxygen is consumed in that decomposition process. As a result, excessive pollutant discharges into the water may have resulted in low DO in the water. At the same time, organic matter may have caused problems related to the decomposition of anaerobic bacteria in the water, which can produce methane, hydrogen sulfide, mercaptan, ammonia, and other fetid foul-smelling gases. DO is a typical indicator reflecting the state of water quality and usually has two sources. One source occurs when DO is not saturated in the water, and oxygen from the atmosphere seeps into the water. A second source occurs when oxygen is released by plants in the water through photosynthesis. Many complex biochemical processes and the growth of aquatic organisms can be influenced by DO. As such, researchers have used DO as a crucial indicator for WQImin models for different water bodies (Kannel et al. 2007; Wu et al. 2021).

SO42− and WT were the fourth and fifth indicators added to the WQImin model in this study. The SO42− had a small weight, but was an important indicator affecting changes in the water quality and showed the influence of inorganic salts. F is also an inorganic salt. The calculation results of the training data showed that when F entered the WQImin model, the PE and MSE of the model both increased. Therefore, F did not perform as well as SO42− in the WQImin model. SO42− comes from a wide range of sources, including domestic sewage, industrial wastewater, and natural minerals. It is an important index for evaluating the chemical characteristics of natural water and is used to analyze salinity. Many metal ions in the environment can combine with the sulfate group to form a stable sulfate. Excessive sulfate can destroy the soil structure, reduce soil fertility, and harm the water environment (Niu et al. 2021). Sulfate is also present in drinking water; the main source is from strata minerals, mostly in the form of CaSO4 and MgSO4 (Khan et al. 2020). WT had the lowest weight in this study, which reflected the physical and chemical properties of water. When WT was added to the WQImin model, the R2 of the model significantly increased, and the MSE and PE decreased. In addition, similarly to DO, WT can affect the growth and reproduction of bacteria in water and the natural purification of water (Yang et al. 2018).

The Yihe River is the main river in the Huaihe River Basin. The WQImin model established for this study used 5 key water quality indicators that provided a useful future basis for evaluating water quality and for integrating water resources management across the basin. The WQImin model established for this study was similar to the models of some other rivers and projects. NH3-N was selected as the first water quality indicator from multiple linear regression analyses in studies examining trout farms in Turkey (Koçer and Sevgili 2014), Lake Taihu Basin in China (Wu et al. 2018), and Lake Chaohu Basin in China (Wu et al. 2021). NH3-N contributed the most to the WQI model in these areas. Scholars have chosen DO and WT as key water quality indicators in the SNWDP-ER in China (Nong et al. 2020) and Bagmati River Basin in Nepal (Kannel et al. 2007). Therefore, when the water environment is stable, the WQImin model established in this study may have important reference value for efficient water quality assessments in other regions.

The effects of the weights on the WQI min models

In early studies, weights were generally used to calculate the WQI; however, no weight calculations were included in the WQImin model (Pesce and Wunderlin 2000; Simoes et al. 2008). In later studies, scholars improved the water quality evaluation method based on WQI and considered the influence of weight on the WQImin model, to increase the accuracy of experimental results (Nong et al. 2020). In this study, the water quality indicators were weighted and normalized to make the water quality evaluation results more consistent with the actual situations. This involved using the same key water quality indicators to construct the WQImin model and then comparing their weighted and unweighted calculation results. The R2, MSE, and PE values of the models were compared to identify the WQImin model with the best performance.

The results showed that the weighted WQImin model explained the change in WQI better than the non-weighted WQImin model, and was able to more accurately predict the water quality. Therefore, we recommend using weighted WQImin model to evaluate the water quality of the Yihe River. In addition, the weights emphasize the relative importance of indicators and acknowledge different degrees of contribution. This is influenced by differences in research regions and the personal experience of researchers and may vary. Thus, we recommend that in practical research, researchers should review the relevant literature and adjust weights according to the actual research site and measured data, to build a more realistic WQImin model.

Conclusions

The water quality of the Yihe River remained at a “good” level during the study period, with significant spatiotemporal variations. The WQI was the lowest in August, and the water quality was higher at the upstream sites compared to downstream.

The established WQImin model included 5 key water quality indicators: NH3-N, BOD5, DO, SO42−, and WT. The WQImin model simplified the WQI model, and the indicators selected for the WQImin model were easy to measure. This reduced the analysis cost, while still explaining the overall variation and water quality characteristics. The weighted WQImin model explained the change in WQI better than the non-weighted WQImin model and more accurately predicted the water quality. We recommend using weighted WQImin model in future studies to further evaluate the water quality of the Yihe River.