Introduction

The quality of surface water is critical to human survival and is a worldwide concern owing to its fragility. Riverine systems have been essential to the growth of human civilizations throughout history because they provide an adequate and essential source of fresh water for household, agricultural, and industrial uses. However, uncontrolled urbanization and industrialization, which have facilitated modern cultures’ economic advancement, have seriously endangered life on the planet by polluting water. Both human and natural causes of pollution have an influence on the water quality of rivers (Njuguna et al., 2020). Water quality is affected by a number of natural processes, including variations in precipitation, surface runoff, erosion, and weathering (Akhtar et al., 2021). However, human activities including irrigational operations, sewage, municipal waste, and effluents also have a significant influence (Hanjra et al., 2012).

Every human possesses a fundamental right to access clean water which is also essential to their well-being. Unfortunately, a number of global issues, including industrial activity, population increase, urbanization, and climate change, make it difficult for many parts of the world to acquire clean water. Anthropogenic activity’s adverse effect on water quality has a big impact on the environment and people’s health. The geographical structure of regions and seasons affects surface water quality even when pollution is not present (Mutlu, 2021; Mutlu & Kurnaz, 2017). Thus, it is necessary to assess surface water quality in order to assure its safe usage for a variety of purposes, including drinking, industrial, and agricultural operations. For the betterment of both people and the environment, effort needs to be taken to stop water contamination and preserve the quality of surface water.

The natural resources of the Kashmir valley have given the fairly large number of people residing there a source of sustenance and livelihood for as long as humans have lived. However, despite the variety of benefits that the area’s rivers and lakes provide, they are frequently used as dumping sites for wastewater, which has led to a significant degradation in water quality (Qayoom et al., 2022). Additionally, a number of factors, including changes in land use, uncontrolled application of pesticides and fertilizers, deforestation, a great deal of tourism pressure, and unplanned urbanization, have exerted a significant impact on the biological and physicochemical qualities of the valley’s water resources (Mir & Gani, 2019). For instance, Showqi et al. (2014) observed that anthropogenic activities, changes in hydrometeorological climate, and changes in land use/land cover could potentially be responsible for the deterioration of the river Jhelum. Similar to this, Ganaie et al. (2021) claim that changes in land use land cover and subsequent alterations to hydrological patterns, such as increased erosion, reduced runoff, and sedimentation, are to blame for the decline of Wular Lake. However, it is noteworthy that the hydrogeochemistry of Wular Lake and the water quality of the river Jhelum are both significantly influenced by the water quality of several adjoining tributaries, viz., Sindh, Sukhnag, Aripal, Rambiara, and Lidder (Khanday et al., 2021).

To assess the quality of surface freshwater, researchers have utilized a variety of methodologies, such as the Trophic Status Index (TSI) and the Water Quality Index (WQI), to acquire comprehensive information on a wide range of complex water quality indicators (Lee et al., 2022). Further, Factor analysis (FA) and Analysis of Variance (ANOVA) are two multivariate statistical approaches that have been frequently employed to get helpful information from the analysis of water quality data (Aydin et al., 2021; Najar & Khan, 2012). These techniques have proven useful for characterizing and assessing water quality and analyzing spatial and temporal changes caused by both naturally occurring and human-induced factors across different river sections (Machiwal et al., 2018). Multivariate statistical techniques make it feasible to uncover information regarding potential environmental effects on water quality that could not have been evident from the raw data alone (Schreiber et al., 2022). Factor analysis is a valuable tool for identifying fundamental elements that explain relationships between observed factors, including water quality features (Tung & Yaseen, 2020). The approach involves calculating the parameter correlation matrix and subsequently deriving eigenvalues and factor loadings from the matrix. Factor loadings represent the degree of correlation between variables and factors, whereas eigenvalues relate to eigenvectors that highlight categories of variables with a high degree of relation (Bowman & Goodbody, 2020). Most of the variability in the data is often explained by the first few components. On the other hand, ANOVA is a statistical method used to examine whether the mean values of two or more groups differ when they are based on two distinct factors (Keysers et al., 2020).

This study’s objective was to collect preliminary data on water pollution by examining water samples taken from different places along the river Jhelum and its tributaries. This research possessed three objectives: (1) to evaluate the water quality of the various streams of the Kashmir region and their impact on the river Jhelum, (2) to detect regional and temporal changes in water quality as well as potential sources of pollution using descriptive statistics (DS) and principal component analysis (PCA), and (3) to use analysis of variance (ANOVA) for evaluating the variation of water quality measures at various spots. The findings of this study could provide substantial insight to decision-makers in managing water quality, preventing pollution sources, and safeguarding the Kashmir Valley’s water resources.

Materials and methods

Study area and sampling stations

The river Jhelum flows through the Kashmir Division’s 140-km stretch, with most villages and towns located along its banks. The river originates from an elegant spring named “Verinag.” The Lidder nallah, the greatest of all effluents and the source of the river Jhelum’s headwaters, meets the river Jhelum at a distance of two kilometers on the right side of the mountain. Furthermore, the Sindh nallah is the second-largest tributary, entering it on the right bank at Shadipora. The river finally drains itself into Wular Lake at Banyari, where it is joined by the Arin and Madhumati streams.

Twenty-six sample stations on the riverbank of Jhelum and its adjoining tributaries in 2022 were selected based on the anticipated pollution strength and riverbed geology, as can be shown in Fig. 1, to achieve the investigation’s aims and objectives. The samples were taken in the springtime (March–April), summertime (June-July), fall (September–October), and wintertime (December-January).

Fig. 1
figure 1

Study area showing drainage network and sampling stations

Testing procedure

Surface water samples taken from 26 different locations during the year 2022 were analyzed during the four seasons of spring, summer, autumn, and winter. The collected samples were placed in 3-L airtight vials which were considered to be adequate for the sampling procedure. The samples were tested at the State Pollution Control Board’s (SPCB) water laboratory in Srinagar and Tehkeek International’s environmental laboratory in accordance with APHA guidelines (2017). Transparency, pH, and conductivity levels were measured instantly at the sample location while other parameters were evaluated in the laboratory. Table 1 lists the various analytical techniques used for estimating the values of water quality parameters.

Table 1 Water quality parameters and their analytical procedures

Statistical analysis

A variety of graphs have been utilized by the researchers to present visual overviews of data that highlight the data’s significant information and provide an understanding of the data in a timely and effective manner (Mubarak et al., 2021; Whitlock et al., 2019). Graphs aid in determining whether or not more intricate modelling is required (Hohman et al., 2019). In this study, line diagrams and box plots were employed to summarize the dataset. For all parameters related to water quality, a two-way ANOVA (analysis of variance) was used at the 0.05% level of significance to identify significant differences between locations and seasons. ANOVA allows you to discover whether differences in mean values between two or more groups are by chance or if they are indeed significantly different (Oleson et al., 2019). A multivariate approach known as principal component analysis (PCA) was used to reduce the dataset’s dimension and find trends in the backdrop of disorganized or perplexing data. The PCA approach defines the number of factors required to comprehend the observed variation in the data (Alavi et al., 2020). Eigenvalues and eigenvectors are calculated mathematically while computing PCs, using covariance or other cross-product matrices that represent the dispersion of the observed parameters and initial variables (Mikis et al., 2022). The relationship between datasets is explained by PCA in terms of underlying factors that are not immediately clear. For standard statistical analysis, the SPSS software (v.26) was employed. Using principal component analysis (PCA), we can identify the most significant underlying variables and patterns in a large dataset by reducing its degree of dimensionality (Nobre & Neves, 2019). The relationships between the various water quality parameters measured at several sites and across various seasons were examined in this study using PCA. The resulting principal components (PCs) can be seen as completely new variables that accurately represent the most significant patterns of data variance (Maity et al., 2022). The researchers are able to find greater detail regarding the underlying factors of the observed patterns of water quality by examining the loadings of the different variables on each PC. When analyzing huge datasets with several variables, PCA is very helpful since it enables us to simplify the data without losing crucial information (Goldrick et al., 2020).

Results and discussions

The trends in water quality parameters observed along the Jhelum River’s stretch from Vishav to Wular Lake (S1–S26) are noteworthy. The levels of pH, DO, hardness, calcium, chloride, alkalinity, transparency, ammoniacal nitrogen, and nitrate nitrogen have all been decreasing. This indicates that the water quality has deteriorated over time in terms of these parameters. Phosphorus, carbon dioxide, TDS, TSS, and conductivity, on the other side, have increased. Overall, the above trend indicates that water quality in the study area has declined over time. Figure 2a–n depicts the regional and temporal variations in water quality, as well as a visual representation of the changes observed in each of the parameters over the study period.

Fig. 2
figure 2figure 2

a–n Spatial–temporal variation of water quality parameters during all four seasons at different sites

The water quality of river Jhelum varies significantly along its course due to instream pollution and abrupt changes at input sampling stations viz., S2, S5, S8, S11, S14, S17, and S20, which reflect the water quality of adjacent tributaries. Furthermore, the water quality of the river Jhelum changes as it pauses in Wular Lake owing to its direct input sources such as Arin and Madhumati, whose quality parameters are shown at S24 and S25. The water quality indicators of Wular Lake are shown at S25 and S26, and they differ significantly from the values at the river Jhelum’s upstream stretch.

Based on the line diagrams, it is evident that the river Jhelum’s tributaries cause its water quality to change dramatically over its entire course until it reaches Wular Lake. Due to the greater concentrations of dissolved organic matter and sediment yield generated by its input streams of water, TDS, and TSS levels in Wular Lake are at their peak values. Ammoniacal nitrogen and nitrate nitrogen levels are higher in all the sampling stations of river Jhelum, which may be due to the presence of residential and industrial areas near its banks, leading to a greater amount of animal and human waste being discharged into the river. Higher conductivity levels of the river Jhelum’s adjoining tributaries indicate that the water has greater levels of dissolved ions and mineralization. This is probably because more household and municipal waste effluents are being discharged into the river, especially when these streams pass through an urbanized area. Furthermore, the pH levels in the adjoining tributaries are higher as well. This could be due to interactions between different contaminants, such as chemicals, minerals, and pollution in the flowing water, as well as variations in the composition of the soil or bedrock. Despite these challenges, all of the nearby tributaries have far higher dissolved oxygen levels than the river Jhelum. This is probably because of things like lower water temperature and lower levels of dissolved salt content. Additionally, side streams have low calcium and hardness levels, which may be caused by the lack of calcium-containing rocks and minerals like gypsum, dolomite, and limestone in the riverbeds of these tributaries. The total concentration of chloride ions does not vary significantly over the length of the river Jhelum; however, it is generally higher in Wular Lake sampling stations. The chloride ions are often present in the environment as NaCl, CaCl, KCl, and MgCl, and because they are mobile, they can be easily released into river systems near the source (Huang et al., 2021). Phosphorus levels are higher in Lidder (S2), Aripal (S11), Dhoodganga (S11), and Sindh (S20) compared to other sampling spots possibly due to severe bank erosion in these rivers and insufficient sewage disposal infrastructure in their watersheds. Increased CO2 concentrations in Wular Lake indicate a significant level of decomposition of dead matter, which could be created by various types of water contaminants or natural processes. The line diagrams show the notable seasonal fluctuations at each of the 26 sampling points. DO, TSS, TDS, and conductivity readings are higher in the spring, while hardness, calcium, alkalinity, and carbon dioxide are higher in the winter. However, only a small number of the elements at some sites that were studied showed either a high concentration (positive peak) or a low concentration (negative peak) during all of the four seasons. The box plots in Fig. 3a–n show the dataset’s extreme values, median, dispersion, skewness, and outliers.

Fig. 3
figure 3figure 3

a–n Box plots showing seasonal variation of various water quality parameters

ANOVA analysis

The two-way analysis of variance (ANOVA) was used in this study to evaluate the variability of the parameters impacting water quality. At a probability of 5%, the value of parameters with significant F was compared between the stations as well as seasons. The results indicated a substantial difference between F and F-critical values for all samples and that the P value is nil in comparison to the alpha value (0.05) except for the phosphorus. The null hypothesis was rejected for all parameters, indicating significant variation in parameter values across all sampling stations and seasons. Table 2 displays the results of the two-way ANOVA analysis. Phosphorus is the only parameter for which the null hypothesis is accepted across sample stations, showing that there is little variation in values across the twenty-six sampling sites.

Table 2 Results of two-way ANOVA analysis

Factor analysis

Factor analysis is a statistical technique that is used to analyze data with multiple dependent interactions between variables. The purpose of factor analysis is to identify a small number of underlying factors that can explain the interdependence between the variables (Tokatli et al., 2021). These factors, or abstract indicators, can capture a significant amount of information that is reflected by many of the original variables (Uncumusaoglu & Mutlu, 2022). This approach can provide a scientific basis for decision-making by analyzing and evaluating data in a rational and systematic way. In essence, factor analysis seeks to extract information from a large number of variables, reduce it into a few factors, and minimize the loss of information in the process (Sellbom & Tellegen, 2019). Principal component analysis (PCA) was employed in this study to analyze 14 water quality parameters at twenty-six monitoring sites in the study area. The Kaiser–Meyer–Olkin (KMO) and Barlett tests were done before the analysis to check the suitability of PCA. The KMO value was 0.699, which was higher than the recommended minimum of 0.5 (Elsaman et al., 2022), indicating that the data was appropriate for PCA. The Barlett test value was 0.00, which is less than the statistical significance level of 0.05, showing that the variables were independent and appropriate for PCA.

To analyze the data, this study used SPSS (v. 26.0) software from IBM located in Armonk, NY, USA. The original monitoring data was standardized before generating a correlation coefficient matrix. The descriptive statistics of the experimental results are presented in Table 3, while the correlation matrix is shown in Table 4.

Table 3 Descriptive statistics
Table 4 Correlation matrix

The typical factor analysis levels are shown in Table 5, which includes the initial communalities that have a value of 1 for all parameters. After factor extraction, the communalities of the variables are displayed in the third column. The table shows that the communalities of TDS, TSS, hardness, calcium, DO, alkalinity, calcium, and transparency are all high (> 0.80), indicating that they all provide comprehensive information. The communalities of pH, conductivity, CO2, phosphorus, chloride, NH3-N, and NO3-N, on the other hand, are low (< 0.80), indicating that they provide insufficient information.

Table 5 Communalities

Table 6 presents the results of a factor analysis that shows each of the four common factors has eigenvalues greater than one, and together, they account for 75.182% of the total variance. Only the first four components are extracted and rotated, and the variance of the original variables of the multiple factors is redistributed by the factor, which brings the variance of the factors closer together. This implies that these four factors represent the fundamental elements of the original data.

Table 6 Total variance explained

Additionally, the scree plot shown in the Fig. 4 can be used for understanding the underlying data structure. It displays information about the eigenvalues of all factors, which helps in identifying the ideal number of primary components based on the selection principles of principal component analysis (Isiyaka et al., 2019). In this study, it was observed that the slope of the scree plot significantly flattened after the fourth component. Therefore, the first four main components, which had eigenvalues greater than one and accounted for maximum of the dataset’s variance, were retained. The variation in eigenvalues is greater between factors 1 and 2, 3 and 4, and 4 and 5. However, the difference between factors 5 and 6 and beyond is minimal. This implies that the top four factors include more reliable general information, and they could be considered the primary composition factors to represent all the 14 variables.

Fig. 4
figure 4

Scree plot

Table 7 displays the factor loading matrix, which shows the load each variable has on principal components before and after rotation; it is clear that there is a significant polarization in the loading factors after rotation.

Table 7 Component matrix and rotated component matrix

It is evident from the results that the first rotated principal component (PC1) and second rotated principal component (PC2) have a strong link with soil erosion and natural pollution, respectively. PC1 has a substantial positive loading on TDS (0.877) and TSS (0.852), and it accounts for 28.863% of the overall data variance. The second principal component (PC2), which accounted for 22.715% of the overall variation, had a significant positive loading on EC (0.849) and hardness (0.843). These findings imply that the most severe pollution problem in the stream is caused by significant sediment production as a result of numerous factors that include bank erosion, agricultural runoff, and watershed disintegration. Seasonal fluctuations, which have a direct impact on water quality, could also be responsible for the presence of these pollutants and signs of soil erosion in the water. For instance, excessive rain can cause topsoil to erode and become contaminated as a result of animal and human contact. As a result, this polluted soil can change the physicochemical and microbiological characteristics of the surface water resources.

In addition to natural factors, human activities also contribute to water pollution. For instance, land use activities such as agriculture, industrial activities, and improper waste disposal can lead to the emission of pollutants that interact with water during runoff. This interaction can result in the transport of pollutants into water bodies, further compromising water quality. Overall, the information provided suggests that natural and human activities can both contribute to water pollution and erosion. Therefore, there is a need for proper land use management and waste disposal practices to protect water resources from contamination.

The third rotated principal component (PC3) accounts for 15.516% of the total data variance and is associated with both anthropogenic and geogenic sources of pollution. This PC has a strong positive loading on NO3-N (0.888), NH3-N (0.848), and Cl (0.871) and has a strong negative loading on pH (− 0.899). The positive loadings of NO3-N and NH3-N may be attributed to anthropogenic influences such as industrial and domestic waste. Similarly, the high concentration of chloride in natural water sources could be a result of human activities, such as domestic waste disposal, agriculture, and industry-based activities. On the other hand, the negative loading of pH may be associated with organic matter oxidation resulting from anthropogenic activities. Overall, this principal component is associated with both anthropogenic pollution and geogenic sources. Therefore, it is crucial to implement effective strategies for managing human activities and protecting water resources to prevent further contamination.

The fourth rotated principal component (PC4) accounts for 8.089% of the total data variance and has a positive loading on pH (0.853) and strong negative loading on DO (− 0.915). The relationship between a river water pH and other water quality indices is complex, but it is claimed that toxic pollution from industrial manufacturing may be a contributing factor. The negative loading and DO suggest that pollution sources are of anthropogenic origin. Anthropogenic pollution sources can include wastewater discharges, agricultural runoff, and industrial activities, all of which can contribute decreased DO in water bodies. Low levels of DO can also have negative impacts on aquatic life, leading to reduced biodiversity and decreased water quality. Therefore, effective management of anthropogenic pollution sources is necessary to prevent further degradation of water quality and protect aquatic ecosystems.

Factor score

The factor scores for various sampling stations were calculated using the SPSS statistical package (v 26). To do this, open the data editing box in SPSS, copy, and paste the four columns of component matrix data from Table 7 and label the variables as α, β, µ, and Ɩ. Then, use “transform → compute variable” to input the formula “A = α/SQR(Xi), B = β/SQR(Xi), C = µ/SQR(Xi), D = Ɩ/SQR(Xi),” where Xi represents the eigenvalue for each principal component, as shown in Table 6, and A, B, C, and D represent the corresponding eigenvectors. Multiplying the standardized data by these eigenvectors produces the four extracted principal component expressions.

$$F1={\sum }_{i=1}^{26}{A}_{i}{Zy}_{i}$$
(1)
$$F2={\sum }_{i=1}^{26}{B}_{i}{Zy}_{i}$$
(2)
$$F3={\sum }_{i=1}^{26}{C}_{i}{Zy}_{i}$$
(3)
$$F4={\sum }_{i=1}^{26}{D}_{i}{Zy}_{i}$$
(4)

where F is the principal component score; A, B, C, and D is the corresponding eigenvectors; Z is the standardized Z score; and y is the corresponding principal component value.

The function F for comprehensive evaluation can be calculated using formula (5), taking into account the varying weights of variance for the four primary components (e1, e2, e3, and e4).

$$F = \begin{array}{c}F1\times \frac{e1}{e1+e2+e3+e4}+F2\times \frac{e2}{e1+e2+e3+e4}+F3\times \frac{e3}{e1+e2+e3+e4}+F4\times \frac{e4}{e1+e2+e3+e4}\\ \end{array}= 0.38390F1+0.30210F2+0.20641F3+0.10757F4$$
(5)

The results obtained using the above formulas are shown in Table 8. The results in the table reveal that of the 26 sampling stations, S5 has the poorest water quality, and S1 is the least polluted stretch. Comprehensive data indicate that the pollution status of 26 sections analyzed using evaluation function F is in the following order: site 5 > site 14 > site 25 > site 20 > site 17 > site 19 > site 26 > site 22 > site 11 > site 8 > site 15 > site 23 > site 24 > site 21 site 2 > site 18 > site 7 > site 10 > site 16 > site 4 > site 6 > site 12 > site 13 > site 9 > site 3 > site 1.

Table 8 Water quality principal component score and order of 11 sampling stations

Conclusions

The goal of this study was to analyze the overall quality of surface water in the Kashmir using standard testing procedures and multivariate statistical tools. The study is also aimed at assessing the impact of adjoining tributaries on the water quality of the river Jhelum and Wular Lake. Water quality monitoring was carried out at 26 sampling locations across all four seasons in 2022 to investigate spatial and temporal variations. Line diagrams and box plots revealed that some parameters exceeded allowable limits at certain points during particular sampling seasons, rendering the water unsafe for drinking, farming, fishing, or other domestic purposes (Cotruvo, 2017). The increasing amount of side-stream pollution was an indicator of increased human influence in the core watersheds of Kashmir.

A two-way ANOVA analysis was performed on the sampled parameters to determine any seasonal or sectional variations. The results revealed significant spatiotemporal variability. The principal component analysis (PCA) method was employed to identify the most important indicator parameters affecting water quality and potential sources of pollution. PCA simplifies the high-dimensional variable system by integrating and optimizing the retention of the original data information. Four significant principal components were extracted from the 14 water quality measures, accounting for 75.18% of the variation in the initial dataset. PC1 (28.86%) and PC2 (22.71%) represented chemical and conventional pollutants which indicates the influence of bank erosion and other deposited sediments on water quality. PC3 (15.51%) showed a positive correlation with DO and total alkalinity, representing pollution due to bed decomposition and industrial and residential wastes. PC4 (8.08%) had positive loadings of pH, representing toxic pollution from municipal areas surrounding surface water resources.

This study utilized various techniques to assess and interpret sources of pollutants and fluctuations in water quality in the river Jhelum and its tributaries. The aim was to enhance decision-making and efficient management of water resources. The study’s outcomes could stimulate thoughtful considerations and lead to improved management practices for surface water resources in the Kashmir valley, benefiting the ecology and environment.