Introduction

Water is a lifeline for the development of civilization. Untreated water from point and non-point sources contaminant the water. Some natural and anthropogenic processes like weathering, agriculture runoff, minerals, effluents from municipalities, and industries are responsible for this pollution (Grzywna & Bronowicka-Mielniczuk, 2020; Hajigholizadeh & Melesse, 2017; Zhang et al., 2020; Zhou et al., 2007). Thus, a river is one of the most vulnerable components of the environment, which can easily be destroyed by the unconscious activities of human beings that further create a threat to human lives. A general overview of the West Bengal Pollution Control Board’s published annual report (WBPCB) shows a clear picture of the seasonal and sample to sample variation of water quality. This variation of pollution potential sources of pollution gives a detailed account of rivers that help to maintain the water quality and proper management of the polluted stretches (Bhat et al., 2018; Grzywna & Bronowicka-Mielniczuk, 2020; Li et al., 2009; Platikanov et al., 2019; Salim et al., 2019; Singh et al., 2004; Varol, 2020; Varol & Şen, 2009; Zhong et al., 2018).

Multivariate statistical analysis is widely used for spatial and temporal analysis (Simeonov et al., 2003; Singh et al., 2004; Varol & Şen, 2009; Vega et al., 1998; Xiaolong et al., 2010) of water quality. Water quality index (WQI) is a mathematical method that determines water quality by combining multiple variables and transforming them into a single value (Akkoyunlu & Akiner, 2012; Lkr & Neizo, 2020; Sharma & Kansal, 2011; Zeinalzadeh & Rezaei, 2017). Multivariate statistical techniques and WQI complete each other by identifying pollutants, specific changing patterns of pollution levels, and overall water quality.

Rapid deterioration of river water quality has been a significant environmental concern in recent years. The Damodar River in West Bengal, like many other Indian rivers, passes through industrial and agriculturally developed areas. Damodar River is the water source of this area, and not only this, it is one of the most important tributaries of the lower Ganga (Hoogly). This river stretch is one of the polluted river (category I) in West Bengal, India (CPCB, 2017). The earlier researches were site-specific, analyzing the results of anthropogenic sources and pollutants on habitat. This study demonstrates the alteration of water quality on a temporal and spatial scale. Accessing a large quantity of data with multiple variables presents particular challenges. Multivariate techniques like factor analysis, cluster analysis, and discriminate analysis are required to represent data understandably (Bengraı̈ne & Marhaba, 2003; Chatterjee et al., 2010; Helmreich, 2015; Kotti et al., 2005; Reghunath et al., 2002). So, a comprehensive study has been done on the Damodar River using multivariate statistics to identify the responsible variables for water pollution, seasonal and spatial variation of water quality, source identification, and water quality estimation by WQI.

Materials and methods

Study area

Damodar River is a sub-basin of the Ganga basin, extended in Jharkhand and West Bengal (WB). It is one of the major rivers of the south Gangetic plain in West Bengal (Fig. 1). Damodar flows a distance of 260.48 km through the Purba Bardhaman and Paschim Bardhaman districts of West Bengal. Paschim Bardhaman is predominantly an industrial district and also an urbanized area. It is located in the west of West Bengal and between 22°27′ and 23°49′ North and between 86°48′ and 87°55′ East. The average slope of the basin is 2.34°. This river basin is constituted by sandstone, shales (Gondwana formation), laterite (tertiary period), and alluvial geological formations (Mondal et al., 2018).

Fig. 1
figure 1

Location of study area and sampling sites

Agricultural, residential, industrial, and mining areas are the dominant land-use categories in the Damodar River basin. Durgapur, Asansol municipal corporation, and municipalities like Kulti, Burdwan, Jamuria, Raniganj, and many small census towns are situated along this river. Asansol-Kulti township extended to the upper reach of the Damodar River (DCO, 2011) at a length of 36 km. The four sample stations located along this stretch of the river are S1 (Barakar), S2 (Dishergarh), S3 (Asansol), and S4 (IISCO). Durgapur Municipal Corporation encompassed a 16.5-km stretch of the river, with S9 (Durgapur) and S10 (Mujhermana) sampling stations. There are four sample stations, S5 (Narainpur), S6 (Raniganj), S7 (Andal US), and S8 (Andal DS), in the middle. S11 (Burdwan) station is located in the lower reach of the Damodar.

Site S1 receives outlets from Kulti industrial and residential areas. Site S2 receives water from a tributary near the state Jharkhand. Sites S3 and S4 stations are located near congested urbanized and large-scale industrial clusters. Site S5 receives a drainage outlet from the Anansol residential area. Sites S6, S7, and S8 stations are located in residential areas and receive sewages. Sites S9 and S10 stations are located in a highly congested area of industries and residential clusters. Station at site S10 receives treated, untreated effluents from industries and municipalities in the Durgapur region and drains into Tamla Nala (drain), finally joining the Damodar River (Mukhopadhyay & Mukherjee, 2013). S11 site is located in the lower reach of the Damodar River in the Burdwan municipality region. Industrial clusters, urbanized residential areas, and the outlets from these centers are shown in Fig. 2. Maximum industries have fallen in the red category list (the most polluted). These districts are also referred to as the rice bowl of West Bengal. It is, therefore, agriculturally one of the most productive regions. Maps are generated in the ArcGIS environment.

Fig. 2
figure 2

Outlets form industries and residential areas into Damodar River

Monitoring sites

This study is based on the data collected by WBPCB (West Bengal Pollution Control Board) under the Pollution Control Board of India. A total of 11 sampling sites (Fig. 1) are selected for the sampling purpose by WBPCB. The selection of sites is based on the potential sources of pollution (Guidelines for water quality monitoring). All sites are concentrated around industrial sites or municipal areas. Central Pollution Control Board describes the methods and sampling procedure (Guide Manual: Water and Wastewater analysis). The data have been taken monthly from the year 2014 to 2019 for all 11 stations. Out of the analyzed 27 parameters, 24 are used to determine the changes in water quality. The other three parameters (Boron, Phenolphthalein alkalinity, and total Kjeldahl nitrogen) are below detection level or NIL for maximum times. The measured parameters are Ammonia-N (Ammonia), biological oxygen demand (BOD), Calcium, Chloride, chemical oxygen demand (COD), conductivity (Cond), dissolved oxygen (DO), fecal coliform (F.Coliform), Fluoride, Magnesium, Nitrate–N (Nitrate), pH, Phosphate, Potassium, Sodium, Sulfate, temperature °C (Temp.), total alkalinity (Alkalinity), total coliform (T.Coliform), total dissolved solids (TDS), total fixed solid (TFS), total hardness (Hardness), total suspended solids (TSS), and turbidity.

Water quality parameters from eleven sampling locations are analyzed, totaling twenty-four (24) parameters and categorized into three seasons to find out the temporal variation of the pollution load from 2014 to 2019. Factor analysis was used to identify the most influential water quality parameters out of the 24 parameters in the three seasons. Spatial variations of pollution load are analyzed through cluster analysis. Among the 24 parameters, few are crucial for seasonal and spatial variation and discriminant analysis has conducted to distinguish those variables. WQI has accessed overall water quality based on seasonal and spatial variation. The detailed research design are shown in Fig. 3.

Fig. 3
figure 3

Research design

Rainfall pattern

Rainfall data of the Damodar River basin are extracted from the interpolated raster map (Pai et al., 2014) collected from the Indian Meteorological Department (IMD). Monthly rainfall data have been extracted and summarized seasonally from 2014 to 2019 (Table S1). It is one of the controlling factors that determine the pollution load by carrying elements through surface runoff. Data in Table S1 shows mean rainfall and SD in mm. A year is divided into three seasons, pre-monsoon (March to May), monsoon (June to September), and post-monsoon (October to February) for the analysis purpose. Discharge declines from the monsoon to the pre-monsoon and lowest prevail during the month between October to February (Bhattacharyya, 2011).

Statistical techniques

Factor analysis

Factor analysis is one of the most common and useful methods for multidimensional data used in many water quality analyses (González et al., 2014; Kükrer & Mutlu, 2019; Mutlu, 2019; Ouyang et al., 2006; Singh et al., 2004). It transforms the original variables into few latent variables without compromising the original characters of the data. Each factor is a set of latent factors that carry as much variance and bear some unique characters. The same number of factors is generated as the number of input variables. Eigenvalue more than 1 is considered as the method to choose the number of components of factor analysis. It also produces uniqueness for each variable that tells us that other variables cannot explain that variable. A “varimax” axis rotation makes the output factor loadings easier to read in factor analysis. The factor loadings can be classified into strong (> 0.75), moderate (0.5–0.75), and low (< 0.50) (Liu et al., 2003).

Cluster analysis

Cluster analysis is a multivariate technique that performs the grouping of sampling stations depending on the similarity of the pollution load. The clusters show high homogeneity within-cluster and high heterogeneity between clusters (Hair et al., 2010). It has widely been used in many studies (Alberto et al., 2001; Chang, 2005; Hajigholizadeh & Melesse, 2017; Simeonov et al., 2003; Singh et al., 2004; Vega et al., 1998).

We have used an agglomerative hierarchical cluster where each observation is considered a cluster until a large cluster is formed through the set of observations (Maechler et al., 2005). Data are standardized before clustering. “Euclidean” distance is used to calculate the distance among stations, and the stations are clustered using ward’s minimum variance clustering method. Hopkins statistics (Lawson & Jurs, 1990) determine the data suitability for cluster analysis.

Discriminant analysis

Factor analysis is performed to extract the low dimensional factors representing the high variance of the multivariate dataset. In the discriminate analysis (DA), the dataset is divided into the best possible groups. It is also called the supervised pattern recognition model, which is based on multiple explanatory variables to predict categorical response variables. DA assumes that all classes are linearly separable by hyperplanes depending on the various explanatory variables’ criteria. The number of hyperplanes relies on the number of groups. This study is based on the three seasons, so two hyperplanes will generate to classify the data. This hyperplane passes through the midpoint of the cluster mean. It is also calculated from the individual sample covariance matrix. The discriminant function has the form presented in Eq. (1) (Alberto et al., 2001).

$$f\left({G}_{i}\right)={k}_{i}+ \sum{_{i=1}^{n}}{w}_{ij}.{p}_{ij}$$
(1)

where i is the number of groups (it is three in temporal analysis), ki is the constant inherent to each group, DA assigned weight coefficient (wj) for selected parameters (pj), and n is the number of analytical parameters.

This DA performs in standard and stepwise mode to select variables that significantly contribute to maximizing distance between the mean of each group. DA analysis has been performed for temporal and spatial analysis where seasons and clusters are the response variables and observed water quality parameters are explanatory variables. The model performance is shown through the confusion matrix. SPSS software is used for the discriminant statistical analysis and R software is used for other statistical analysises and representation.

Water quality index

Canadian Council of Ministers of the Environment (CCME) develop a water quality index (WQI) to determine the water quality depend on the different variables. The essential features of CCME WQI are flexibility in choosing water quality parameters according to the requirements and availability. This index is based on the three elements (CCME, 2017), which are scope (F1), frequency (F2), and amplitude (F3). The water quality index is expressed as

CCME WQI = 100 − \(\left(\frac{\sqrt{{{F}_{1}}^{2}+{{F}_{2}}^{2}+ {{F}_{3}}^{2} }}{1.732}\right)\); here, the 1.732 value normalizes the result at a range of 0–100.

Where F1 is the percentage of the failed parameters concerning the total number of parameters that fail to meet the water quality standard, F2 is the percentage of failed test for the total number of tests, and F3 is an asymptotic function used to normalized the sum of excursions to yield a range between 0 and 100, but before calculating F3, excursion and sum of excursion (nse) need to be calculated.

$$\mathrm{Scopy}\; \left(\mathrm{F}1\right)=\left(\frac{\mathrm{Number\; of\; failed\; parameters}}{\mathrm{Total\; number\; of\; paramters}}\right)\times 100$$
$$\mathrm{Frequency}\; \left(\mathrm{F}2\right)=\left(\frac{\mathrm{Number\; of\; failed\; test}}{\mathrm{Total\; number\; of\; tests}}\right)\times 100$$
$$\mathrm{Amplitude}\; \left(\mathrm{F}3\right)=\left(\frac{nse}{0.01nse+0.01}\right)$$

Excursion is calculated by dividing the failed values by the objective when concentration is greater than the permissible limit and vice versa when concentration is less than the required minimum permissible limit.

$$\mathrm{F}3\; \left(a\right)\;{ \mathrm{Excursion}}_{\mathrm{i}}=\left(\frac{\mathrm{FailedTestValu}{e}_{i}}{\mathrm{Objectiv}{e}_{j}}\right)-1$$
$$\mathrm{F}3\; \left(\mathrm{b}\;\right) {\mathrm{Excursion}}_{\mathrm{i}}=\left(\frac{\mathrm{Objectiv}{e}_{j}}{\mathrm{FailedTestValu}{e}_{i}}\right)-1$$
$$\mathrm{F}3\; \left(1\right)\; \mathrm{nse}=\left(\frac{\sum_{i=1}^{n}\;\mathrm{excursion}_{i}}{\#\;{ \text{of}\; \text{tests}}}\right)$$

A minimum of four parameters and four sampling frequencies are required for this WQI. Due to the flexibility of choosing the variables and the permissible limits, this WQI is used for CPCB assigned water category A (drinking water source without conventional treatment but after disinfection). The permissible limit of IS 2296:1992 Indian standard has been used for this analysis. BOD, chloride, DO, fluoride, nitrate, pH, sulfate, total coliform, TDS, and hardness determine the water quality. This WQI is classified as poor (0–44), marginal (45–64), fair (65–79), good (80–94), and excellent (95–100).

Result and discussions

Correlation analysis

Pearson correlation analysis has been performed (Table 1) to understand the significant correlation among 24 parameters. COD has a significant positive correlation with ammonia, calcium, cond., phosphate, potassium, sodium, sulfate, TDS, TFS, and TSS. So, the sources of these elements are similar. DO and pH negatively correlated with COD. COD is highly associated with ammonia and TFS. COD is the source of effluent discharge from the residential, industrial, and agricultural fields (Bellos & Sawidis, 2005). So, an increase in nutrients leads to a decrease in the level of DO. DO has a highly significant positive correlation with the pH of the water and negatively correlated with temperature, turbidity, TFS, and TSS. An increase in temperature increases the biological process in water that consumes oxygen from water and decreases the DO level in the water (Brandt et al., 2017). TDS and TFS are highly correlated with minerals like ammonia, chloride, sodium, potassium, sulfate, and Alkalinity. The alkalinity of water comprises the amount of calcium, magnesium, sodium, and potassium that further controls the level of TDS (contains Ca2+, Mg2+, K+, Na+, SO42−, Cl, etc. (Çadraku, 2021)) and TFS in the water (Brandt et al., 2017). It is highly affected by land washing (Bengraı̈ne & Marhaba, 2003) in the wet season (monsoon) and drainage from urban areas (Alberto et al., 2001) as well as from irrigation discharges (Liu et al., 2019). Ions like sodium, potassium, and magnesium are a highly positive relation with hardness. Water mineralization is controlled by these ions (Varol, 2020). Both natural and anthropogenic sources are responsible for the variation of these ions.

Table 1 Pearson correlation matrix of water quality parameters

Spatial variation

Ammonia, BOD, and COD concentrations are higher in site S5 (Narainpur) than upper four stations (Table 2). Site 5 is located near the tributary that receives sewage from a vast residential area. Domestic effluents increase ammonia concentration (Brandt et al., 2017; Kotti et al., 2005) in water. Oxidation of ammonia contributes to the increase in COD levels (Gradilla-Hernández et al., 2020). The concentration gradually decreases from site S6 due to the self-purification process (Varol, 2020). Site S10 receives maximum sewages from large clusters of industries and congested residential areas. So, the concentration of pollutants is relatively high on this site. Minerals like calcium, magnesium, sodium, potassium, and TDS, alkalinity increase from site S1 to site S6, and then it decreases and further increases in site S10. Hardness is controlled by the amount of calcium and sodium concentration in water. Thus, it follows the same pattern as calcium and magnesium. Nitrate, phosphate, and potassium; alkalinity; chloride; conductivity; TDS; TFS; and hardness are significantly high from sites S6 to S8 due to the concentration of agriculture field and residential area in this zone. TSS, turbidity, and coliform (both fecal and total) are significantly higher in sites S1 and S2 than site S3 due to the vast agriculture field and residential area.

Table 2 Mean value of water quality parameters for seasons (a) and sampling sites (b)

Cluster analysis has been used to detect the variation of pollution content along the river bed from source to mouth. Hopkins statistics (Lawson & Jurs, 1990) H value is 0.236; thus, the null hypothesis is rejected, and the dataset is suitable for cluster analysis. The result is represented in Dendrogram (Fig. 4). The best cluster number is chosen based on the 30 indices (Charrad et al., 2014). Here, the three clusters are the best number of clusters. Site S1, S2, S3, and S4 in the upper reach form the first cluster; sites S5 to S9 and S11 constitute the second cluster; and site S10 form a separate cluster. Site S10 (Mujher Mana) receives the effluents from the Tamla drain. Sixty percent of Durgapur town’s habitat slopes toward the Tamla drain (Mukhopadhyay & Mukherjee, 2013).

Fig. 4
figure 4

Cluster dendrogram showing the similarity in pollution concentration among the sampling sites

Seasonal variation

COD, fluoride, potassium, sulfate, TSS, and turbidity show a significant increase in the monsoon season than the other seasons. F.coliform and T.coliform amounts get almost double in the monsoon seasons. These parameters have increased as a result of surface runoff from non-point sources. On the other hand, the conductivity level decreases in the monsoon season. The overall mineral composition of the river water improves during the monsoon season. A significant decrease is found in the hardness level in monsoon than in the other seasons. But still, the water remains in the same hardness level (slightly hard) in all three seasons (Brandt et al., 2017). Chloride, magnesium, phosphate, alkalinity, and TDS do not experience any significant seasonal variation.

Seasonal factor analysis has been performed to identify the most critical seasonal parameters (Mohanty & Nayak, 2017; Ouyang et al., 2006; Pejman et al., 2009). Bartlett’s test and Kaise-Meyer-Olkin (KMO) statistics were conducted to test the data suitability for performing FA. The p value of Bartlett’s test is significant (p < 0.000) and KMO criterion (> 0.73) for all the seasons. Those components are chosen which have eigenvalue more than 1. Seven components are selected for pre-monsoon and post-monsoon season, explaining 75.99% and 72.02% variance, respectively, and six are chosen for the monsoon season, explaining 63.55%. These components explain above 60% of the variance sufficient for the environmental dataset (Hair et al., 2010). Factor loading of more than 0.75 is considered a significant parameter for seasonal variation. A factor loading less than 0.75 shows a very high uniqueness value (failed to explain the variables by factor analysis).

In pre-monsoon season, component 1 (Table 3(a)) indicates the high loading on chloride, conductivity, sodium, TDS, and TFS suggests pollution related to ionic and salt concentration. Natural and anthropogenic sources are responsible for these elements. Ammonia, phosphate, and TSS show high loading in component 2 connected to the runoff from agricultural fields and sewage effluents (Aliyu et al., 2020; Brandt et al., 2017; Pejman et al., 2009). Magnesium and hardness have a strong correlation, with component 3, which indicates the mineral composition of the water. Component 4 reveals the bacteriological characteristics of the water. Domestic, agricultural fields, and animal farms are responsible sources for the coliform bacteria in water. Alkalinity has a strong correlation with component 5, which represents the salt concentration of water. Component 6 is correlated with pH. BOD with very high loading associated with component 7 denotes the organic pollution in the water caused by effluents from residential areas and industries.

Table 3 Factor loadings for premonsoon (a), monsoon (b), and postmonsoon (c) season

Component 1 of monsoon seasons (Table 3(b)) dominates conductivity, sodium, TDS, and TFS controlled by erosion and high surface runoff in monsoon seasons. Component 2 is characterized by the mineral composition of water (Singh et al., 2004; Vega et al., 1998). The presence of dolomite and anhydrite in the study area (Bengraı̈ne & Marhaba, 2003; Salifu et al., 2012) are the responsible factors for the high contribution of minerals in component 2. Calcium has natural sources from rocks that control the hardness of the water. Along with the natural sources, anthropogenic activities also increases the level of hardness in water. Pathological character is presented by component 3. Component 6 is highly correlated with alkalinity. The alkalinity of water comprises the sum of all salts (Brandt et al., 2017).

Post-monsoon season (Table 3(c)) dominates by conductivity, sodium, and TDS in component 1, directly related to the salt characteristics of water. Ammonia, nitrate, phosphate, and TSS overlook in component 2. The source of ammonia is sewage coming from industrial and agricultural sites (Brandt et al., 2017). The source of ammonia is also from the decomposition of plant and animal matters. An increasing amount of nitrate in water is for sewage pollution (Kotti et al., 2005), agriculture runoff (Bu et al., 2010; Kotti et al., 2005), and oxidation of ammonia (Brandt et al., 2017). So, an increase in ammonia may increase the amount of nitrate in water. Phosphate comes from multiple sources, including industries, cropland where phosphate-based inorganic fertilizers are used, and phosphate-based detergents used in households (González et al., 2014). Phosphate-based fertilizers are most common, and it has massive use in all seasons except monsoon season. In monsoon season, least amount of fertilizers and pesticides are used. Paddy cultivation uses maximum fertilizer (about 31.8%) among all agricultural product of which irrigated cultivation use 22.2% fertilizer (FAO, 2005) in India. So phosphate is not an essential parameter in the monsoon season. Components 3 and 4 dominate the mineral composition of the watershed and is controlled by natural and organic compounds of wastewater (Potasznik & Szymczyk, 2015). Component 5 indicates the pathological pollution in the Damodar River. Components 6 and 7 have a high correlation with temperature and BOD, respectively.

Pathological pollution dominates in all seasons. Very high (> 0.9) impact of BOD is found in the pre-monsoon season, related to organic pollution. Municipal waste discharge (Saksena et al., 2008; Vega et al., 1998) is the potential source of organic pollution. BOD is not a vital parameter in the monsoon season. So, an increase in the volume of water reduces the oxidation process of organic pollutants.

Discriminant analysis

Discriminant analysis (DA) is used to evaluate the temporal variation of water quality by dividing the dataset into three seasons pre-monsoon, monsoon, and post-monsoon. Standard and stepwise methods are used in the discriminate analysis. The standard discriminate method is used to discriminate the seasons. The stepwise discriminate method is used to extract the variables responsible for the temporal discrimination depends on Wilk’s lambda criteria (at a significance of p < 0.05). Overall significance tests of Wilk’s lambda are presented in Table 4. P value represents the significant temporal classification in standard and stepwise mode. The first function of DA explains 80.1% of the variance and the second function explains 19.9% of the variance. The stepwise method suggests ammonia, DO, potassium, temperature, total coliform, TFS, and turbidity as responsible parameters for seasonal variation of Damodar River water quality (Table 5(a)). The first DA function in the stepwise method explains 83% variability, and the second function explains 17% of the variability. It separates groups more accurately than the standard DA. The accuracy of the model is presented in the confusion matrix. Standard DA and stepwise DA predict temporal classes with 74.4% and 71.5% accuracy, respectively (Table 6(a)).

Table 4 Overall significance test of Wilk’s lambda for temporal and spatial discriminant analysis
Table 5 DA coefficients for temporal (a) and spatial (b) variation in water quality
Table 6 Confusion matrix for temporal (a) and spatial (b) variation of water quality

The extracted parameters from stepwise DA are plotted in box and whisker plot (Fig. 5). Monsoon season has the lowest presence of ammonia, and in the post-monsoon season, it is in the highest amount. Ammonia is associated with agriculture runoff and sewage effluents. The lowest amount of rainfall prevails in the post-monsoon (11.46 ± 5.09 mm). In the monsoon season, the dilution effect comes into play to reduce the amount of ammonia in the water (Varol, 2020). The average rainfall in monsoon seasons is 117.44 mm, with a standard deviation of 18.23 mm. The highest DO level is found in the post-monsoon season and the lowest in the monsoon season. DO level is associated with the temperature in water. In the monsoon season, a more consistent temperature is found. An increase in temperature enhances the biological activities in water that consumes more DO in water. A decrease in temperature improves the DO condition in the post-monsoon season (Hajigholizadeh & Melesse, 2017). Municipal and industrial sewage discharges and agricultural runoff are the familiar sources of potassium in river water (Skowron et al., 2018). Thus, in monsoon seasons, potassium increases due to the non-point source (agricultural area). Still, potassium remains the same in pre- and post-monsoon periods due to the constant supply of pollutants from point sources. TFS denotes the fixed amount of non-volatile solids in water that does not increase vastly like turbidity in monsoon season. In the monsoon season, carrying a large number of solids through surface runoff increases turbidity. Coliform bacteria population increases with the increase in surface runoff. So, in the monsoon season, the coliform bacteria population is highest, and in the post-monsoon season, it is the lowest. Bacterial population increases with runoff from non-point sources along with point sources.

Fig. 5
figure 5

Box Plot for water quality parameters on temporal DA

DA analysis is also performed on the spatial variation in the cluster dataset. A significant p value of standard and stepwise DA indicates the good classification of cluster datasets (Table 4). The first DA function explains 87.80% variance, and the second function explains 12.2% of the variance. It represents an efficient classification of clusters. In stepwise DA, first and the second functions explain 89.1% and 10.9%, respectively. The confusion matrix shows the accuracy above 77.8% and 76.9% for standard DA and stepwise DA, respectively (Table 6(b)). Stepwise DA selects ammonia, BOD, calcium, chloride, cond., DO, sodium, sulfate, tem., alkalinity, TDS, hardness, TSS, and turbidity as responsible for cluster variation (Table 5(b)). The variation of these parameters is represented through the box and whisker plot (Fig. 6). Ammonia, BOD, calcium, chloride, conductivity, sodium, sulfate, alkalinity, TDS, hardness, and TSS show the same pattern that increases the pollution level from cluster 1 to cluster 3. DO level is almost the same in-between cluster 1 and cluster 2. In cluster 3, the variation of DO is found from the other two clusters. The lowest DO level is found in cluster 3, highly affected by organic pollution. Temperature level also increases from cluster 1 to cluster 3. An increase in the pollution load increases the temperature of the water as it is controlled by conductivity and other pollutants (Brandt et al., 2017). TSS and turbidity are very high in cluster 3 than in cluster 1 and cluster 2. In the upper course, the water quality is much better than in the lower course of the Damodar River.

Fig. 6
figure 6

Box Plot for water quality parameters on Spatial DA

Water quality index

WQI is based on the ten water quality variables. Water quality is determined both seasonally and spatially (Fig. 7). Poor water quality is found in all sites and seasons. The first four sites are almost the same value WQI. Site S10 shows the most polluted station in the Damodar River. The lowest water quality is also found in the monsoon season. However, all stations fall into the poor water category indicates the water quality is almost threatened.

Fig. 7
figure 7

WQI for sampling sites and seasons

Conclusions

Multivariate statistical techniques find that there is a high spatial variation of pollution levels for all sites. Point and non-point sources are primarily responsible for seasonal variation in pollution. Ionic concentration does not have significant seasonal variation, but it varies spatially. This river is highly encroached by pathological pollution. This pathological population gets almost double in monsoon season. FA successfully extracted the seasonally important parameters. Seasonal variations are found mainly to the parameters related with the anthropogenic sources. Stepwise DA efficiently extract the parameters which are sensitive to the seasonal and spatial change. The WQI for each station says that the river water is not suitable for use. The water quality is the worst in those areas dominated by congested urbanization and associated with clusters of large-scale industries. Thus, site 10 is the most polluted among all the sites. In the monsoon season, the water quality deteriorates more.

The present study can provide several suggestions to maintain the Damodar River water quality: (1) Water quality can be improved by controlling the direct discharges into the river. (2) Fertilizer use should be controlled because the non-point sources (like agriculture fields etc.) have high control on the water quality in this river. (3) Along with regular monitoring direct actions are also required to revive the water quality.