Introduction

Water is one of the significant natural resources for human beings and other living creatures on the living planet Earth. However, the availability of this precious resource has been declined drastically (Jiang et al. 2019). The availability of water resources is significantly less than its current demand (Jiang et al. 2019). However, the human population is increasing so rapidly and creating extra pressure on the living planet Earth and natural resources, including water (Withanachchi et al. 2018). In addition, the resources have been contaminated due to the excess accumulation of pollutants into natural surface water bodies and playing a major role in polluting the rivers (Withanachchi et al. 2018). Anthropogenic activities are primarily responsible for the degradation and pollution of natural surface water bodies and surface sediment (Withanachchi et al. 2018). The main activities like industrial activities, municipal wastewater discharge, unsustainable agricultural practices, and traffic activities are responsible for the pollution of river surface water and sediments (Withanachchi et al. 2018; Lomsadze et al. 2016; Zhang et al. 2012; Allafta and Opp 2020). Water pollution is considered as one of the most noticeable matters threatening water quality (Withanachchi et al. 2018; Mohan et al. 1996; Islam et al. 2015). Moreover, good water quality (WQ) could help human beings to enjoy a healthier environment and life (Kumar et al. 2017a; Sharma and Ravichandran 2021). The water quality of Indian rivers, especially the Amba River, has deteriorated over the last several decades as a result of the continual discharge of slightly/untreated industrial effluent, urban runoffs, and sewage from point and non-point sources (Sharma and Ravichandran 2021; Duran and Suicmez 2007; Mishra and Kumar 2020; Mishra et al. 2020; Raja et al. 2008; Sivakumar et al. 2000; Smitha et al. 2007). Moreover, the climate influences (surface runoff, land erosion, rock weathering) and human interventions (Kumar et al. 2017b) affect the quality of river systems both directly and indirectly, notably in metropolitan areas and agricultural activities near rural communities (Shrama and Ravichandran 2021; Ayeni 2010; Kankal et al. 2012; Raj and Azeez 2009). This contaminated water contains toxic organic and inorganic pollutants, as well as heavy metals, which can cause severe effects on river water, such as eutrophication, as well as ecological degradation (Sharma and Ravichandran 2021; Alkarkhi et al. 2009; Kumar et al. 2010; Muduli et al. 2006; Sinha et al. 2006; Yerel and Ankara 2012).

In March 1990, a petrochemical complex developed at Nagothane on the western bank of the Amba River estuary and began commercial production. The wastewater created in the process facilities, as well as home sewage, is properly cleaned and discharged in the Amba River’s estuary zone, some 25 km downstream from the manufacturing site. Detailed oceanographic investigations conducted in 1986 led to the selection of the wastewater dumping site. These studies determined that the treated wastewater, when discharged at the proposed site, would have no substantial impact on the then-existing marine ecosystem quality (Dineshkumar and Sarma 1989; Dineshkumar et al. 1991; Zingde 1994). Following that, in 1989, the investigations were expanded to assess the estuary’s biological production potential to develop a reliable database against which future comparisons could be made, as well as to ensure that the proposed wastewater release would not have a negative impact on the ecosystem, including fisheries (Gajbhiye et al. 1995). There are quite a few reasons concerning water pollution in the Amba River basin, which has been identified as one of the most polluted rivers in Maharashtra in India. The origin of Amba River is located in the Sahyadri hills of western ghat near the small village known as “Padghawali” and currents on the west side and eventually encounters into Dharamtar creek of Arabian Sea (MPCB 2019). Six industries have been operating in the Amba River catchment area. They are dependent on water from the Amba River for process and domestic use and irrigation purposes. The discharge of industrial effluent and wastewater is a significant problem, which may create pollution of the surface water and sediment of the Amba River (MPCB 2019).

The Amba River is also receiving a massive amount of untreated sewage produced by several nearby villages situated on the bank of the river to some level. Pali, Nagothane, Wadkhal, Dolvi, and Gadab are the critical villages in the river’s catchment area. Moreover, other small towns like Parali, Padghawali, Amnori, Welshet, Bense, Chole, Sambri, Kurdup, and Kharghatis are also positioned in the catchment area of the river. There is no sewage treatment plant facility available (MPCB 2019), which raises pollution severity. Also, the selected area is located nearby the industrial area.

Despite the implementation of several pollution control methods, natural water bodies remain extensively contaminated; as a result of the discharge of insufficiently treated wastewater from industry and sewage water from nearby settlements. In light of the current situation, the primary goal of this study article is to evaluate the influence of industrial wastewater, wastewater, and sewage disposal on the surface water quality of the Amba River basin.

Traditionally, all the chemical parameters are used to find the water quality index (WQI), but most of the time very few parameters drive the quality of the water. Various researchers have used statistical methods to supplement the decision of choice of the parameters. Principal component analysis (PCA) and cluster analysis (CA) have been widely used to supplement the decision. The statistical analysis augmented with the domain knowledge results in the final choice of the parameters. PCA was used by Wei et al. (2016) to determine the parameters for assessing the geographical and temporal patterns of water quality in the Dongjiang River, China. The PCA was investigated by Zeinalzadeh and Rezaei (2017) for determining geographical and temporal changes in surface water quality. Krükrer and Mutlu (2019) have studied the PCA and CA for measurement of the surface water quality using water quality index in Saraydüzü Dam Lake, Turkey. Matta et al. (2020) have used the PCA and CA to reduce the number of parameters to build WQI for the Ganga River system in Uttarakhand, India.

Materials and methods

Study area

The Dolvi area near Pen Taluka in the Raigad District of Maharashtra is positioned between 18° 42′, 86° 76′ N latitude and 73° 02′ 47.3496 E longitude at 3.5-m altitude from sea level; the annual average rainfall is 2124.9 mm, and humidity annual average is 77%. This village is located in a hollow at the right bank of the Amba River and surrounded by woody hills. Moreover, it comprises an industrial area. The existing sub-creek and Amba River are the critical water bodies under consideration, as far as the nature and location of the project are concerned. A total of 13 sampling sites were selected for the collection of water samples which are namely Bori (SW1), Port Start (SW2), Port Middle (SW3), Port End (SW4), Bridge (SW 5), Ghaswad (SW6), Company End-Kasumata (SW7), Company Middle (SW8), Company Start-Helipad (SW9), Dehen (SW10), Kachali (SW11), Shreegaon Dam (SW12), and Jalashi (SW13), and locations are indicated in Fig. 1.

Fig. 1
figure 1

Location map

Physicochemical properties of the water samples

The samples were collected in 1-L pre-cleaned plastic cans using Niskin in the sampler for the period of three seasons, namely December 2018 (winter), May 2019 (Summer), and December 2019 (winter) at 0–10 cm depth. After collection, the samples were brought to the laboratory with the utmost care (APHA 2005) and used for further analysis (Trivedi and Goel 1984). The physicochemical properties were analyzed in two phases considering the analytical ease and preprocessing versatility of the parameters. Firstly, the subtle parameters like pH, electrical conductivity (EC) was analyzed with handheld online instruments on the site itself. The remaining samples were transported to the laboratory and analyzed with the standard prescribed methods (APHA 2005). Sodium, total alkalinity, total hardness, calcium, magnesium, chlorides, sulfate, nitrate, phosphate, chemical oxygen demand, and biochemical oxygen demand were analyzed in the laboratory.

Statistical analysis

The current data set comprehends vital information about the Amba River water behavior. The PCA followed by CA helps to identify the important parameters from the set of all the parameters. PCA provides a unique solution to reconstruct the original data by identifying patterns in data and expressing the data in such a way as to highlight their similarities and differences (Singh et al. 2004 and 2005; Kowalkowski et al. 2006; Mukkannawar 2016). PCA is a multivariate technique used for data reduction. Statistical analysis plays a supporting role in choosing the parameters for calculating the WQI. It interprets the total data variation with a few equations derived from the original variables. These equations are called principal components or PCs. The PCs are represented as a linear combination of all the selected variables. If X1, X2,…., Xn are the original variables. In the current scenario, all the chemical parameters studied in the experiment are the original variables. The PCs are expressed as follows.

$$\mathrm{PC}1={a}_{11} {X}_{1}+{a}_{12}{X}_{2}+{a}_{13}{X}_{3}+\dots +{a}_{1n}{X}_{n}$$
(1)
$$\mathrm{PC}2={a}_{21} {X}_{1}+{a}_{22}{X}_{2}+{a}_{23}{X}_{3}+\dots +{a}_{2n}{X}_{n}$$
(2)

and the last PC expressed as

$$\mathrm{PCn}={a}_{n1} {X}_{1}+{a}_{n2}{X}_{2}+{a}_{n3}{X}_{3}+\dots +{a}_{nn}{X}_{n}$$
(3)

where aij, i = 1,2,…, n; j = 1,2,…, n are the coefficients of the PCs. The aij are also called as loadings, the coefficients/loadings give idea about the influence of that variable on the PC, e.g., if the a11 = 0.66 or − 066 and a12 is 0.21 or − 0.21, then the variable X1 has more influence in the first PC1 than the X2, similarly other loadings can be interpreted. These coefficients define the importance of that variable in a given PC.

A multivariate CA technique has the primary purpose of identifying and classifying sample groups that show similar characteristics, i.e., look-a-like groups (Kamble and Vijay 2011). The uncertain group membership of the objects and the possible number of unknown groups before starting the computation make cluster analysis an unsupervised technique. This multivariate statistical technique aids in the interpretation of complicated data matrices and the identification of potential influencing elements in water systems, making it a useful tool for dependable water resource management and the rapid solution of pollution issues (Einax et al. 1998; Wunderlin et al. 2001). The first step of data standardization increases the variable influence of minor variance and reduces significant variance’s impact (Gupta and Christopher 2009). Hierarchical clustering was done to cluster the chemical parameters and the sampling locations (sites). The various units of the parameters studied were made unit less or in the same unit of measurement. This problem was avoided with the ward linkage and the correlation coefficient distance, classifying the highly correlated parameters in the same cluster. The correlation coefficient is scale-invariant, and hence though the parameters are measured in various measurement scales, this method works better. Dendrogram has been generated to visualize the multiple clusters. It is a highly interpretable description graphical format, hence widespread (Hastie et al. 2001). All the statistical analysis of data has been done using MINITAB 17 statistical software (Montgomery 2017).

WQI

WQI is described as an index that reflects the combined impact of several water quality parameters that are evaluated and taken into consideration while computing the water quality index (Chaurasia et al. 2018; Tyagi et al. 2013; Table 2).

Let,

$${RW}_{i}=\frac{{W}_{i}}{\sum_{i=1}^{n}{W}_{i}}$$
(4)

Each parameter is assigned a 1 to 5 weight, where 1 is the smallest and 5 is the largest. The relative weight (RW) is computed by dividing the respective parameter weight by the total weight of all the parameters as per Eq. (4). The quality rating for each parameter was calculated by dividing the measured concentration (Ci) of each parameter by the limit values given in the WHO drinking water criterion (Si).

$${Q}_{i}=\frac{{C}_{i}}{{S}_{i}}\times 100$$
(5)
$$WQI={\sum }_{i=1}^{n}{RW}_{i}{Q}_{i}$$
(6)

The parameters for WQI are chosen using PCA and CA.

Variables selected for index: pH, NO3, SO4, TDS, chemical oxygen demand (COD), and biochemical oxygen demand (BOD).

Following WHO standard have been used for the present study (Table 1).

Table 1 Surface water quality standard by WHO

Result and discussion

pH

The pH values in W18 ranged from 7.1 ± 0.2 to 8.9 ± 0.3. Minimum value was observed at SW1 while maximum value was observed at SW6 and SW8, respectively. The pH of S19 varied between 7.5 ± 0.05 and 8.6 ± 0.2, and the minimum was observed at SW5, whereas maximum values were observed at SW9. The pH of W19 was ranged from 7.4 ± 0.05 to 9.3 ± 0.07, and the minimum observed at SW1 while the maximum observed at SW8. This three seasons’ pH of SW1, SW6, and SW5 has a pretty neutral pH, but SW8, SW9, which has shown alkaline nature (Fig. 2A). The possible explanation could be that seawater intrusion during the high tides may lead to a rise in pH level. Moreover, discharge of industrial waste and sewage directly into river water may rise to pH levels (Jadhavar et al. 2013; ManiMegalai and Muthalakshmi 2006). A slight increase in pH illustrates the progressive transition of water’s weakly alkaline to moderately alkaline characteristics. The minor rise in pH can be ascribed to the lack of CO2 due to the aquifer’s rain-fed charge being cut off (Jadhavar et al. 2013; Pondhe et al. 1997; Deshmukh and Pawar 2000). Changes in temperature, biological activity, industrial waste disposal, and photosynthetic activities affect the water pH (Begum and Harikrishna 2008). These conditions may produce a significant variation in pH, affecting the water’s potability (Jadhavar et al. 2013; ManiMegalai and Muthalakshmi 2006). Moreover, the chemical activities like acid–base reactions, solubility reactions, oxidation–reduction reactions, and complexation in aquatic habitat are mostly governed by pH (Saalidong et al. 2022).

Fig. 2
figure 2

Analysis of physicochemical parameters of surface water quality of Amba River (sample code SW is indicated as S)

Electrical conductivity

Electrical conductivity is used to determine the concentration of dissolved ionic or soluble salts, fertilizers, and chemical-containing solutions. The EC values of W18 ranged from 0.76 ± 0.01 to 42.75 ± 2.69 mS /cm. The minimum was observed at SW1, while the maximum was observed at SW2. The EC values of S19 were 0.39 ± 0.01 to 54.14 ± 0.27 mS/cm; the minimum was recorded at SW12 and the maximum at SW10. The values of W19 ranged from 7.37 ± 0.02 to 8.31 ± 0.92 mS/cm; the minimum was recorded at SW2 and SW11, respectively, while the maximum was noted at SW8 (Fig. 2B). When the conductivity of a stream suddenly rises, it means there's a source of dissolved ions nearby. The number of ions in water is proportional to the value of the dissolved solid (Bhatt et al. 1999). The breakdown and mineralization of organic materials result in higher conductivity and cations (Begum and Harikrishna 2008). Seasonal variations showed a higher value of EC in pre-monsoon and a lower value in monsoon due to dilution with rainwater (Patel and Parekh 2013). Similar results were reported by Srivastava and Srivastava 2011 and Patel and Parekh 2013. However, the higher values of EC are attributed to the release of wastewater, sewage, and agricultural runoff into the water body (Sharma and Ravichandran 2021). Many researchers have argued that the EC has a strong correlation with TDS present in surface water and a moderate correlation with the NO3 (Saalidong et al. 2022).

TSS

The total suspended solids (TSS) of W18 ranged from 15 ± 2 to 266 ± 5 mg/L. The minimum was observed at SW8, while the maximum was at SW13. In the case of S19, it varied between 12.55 ± 2.45 and 172.25 ± 69.75 mg/L; the minimum was observed at SW12 while the maximum values were observed at SW2. The TSS of W19 is 103 ± 1 to 4505 ± 833, minimum at SW12 and maximum at SW2 (Fig. 2C). In the case of the W19, TSS levels have crossed the permissible limit, and it may be due to the increase in discharge of wastewater and sewage water into river streams (Kinuthia et al. 2020; Aniyikaiye et al. 2019). Another reason behind the influence of turbidity is that especially significant rainfall impacts water flow, which affects turbidity (Wetzel 2001). Rainfall can increase stream volume and streamflow, causing settled sediments to be re-suspended and riverbanks to disintegrate. Runoff from rain can also raise the quantity of total suspended solids. Water may take up particles as it passes over a surface and deposits them in a water body (Gray et al. 2000). Topsoil can be washed away by runoff, contributing to riverbank erosion (Langland and Crolin 2003). If the flow risk increases enough, bottom sediments can be deposited, boosting TSS concentrations further (Fondriest Environmental Inc. 2014).

TDS

Total dissolved solids (TDS) estimate the molecular, ionized, or micro-granular (colloidal soil) suspended content of all inorganic and organic compounds present in a liquid. (Hussain 2019). The TDS values of W18 ranged from 105 ± 8 to 31,116.5 ± 1657.5 mg/L; the minimum value was recorded at SW12 and maximum at SW2 (Fig. 2D). The TDS values of S19 ranged from 207 ± 2 to 28,745 ± 105 mg/L, minimum observed at SW12 and maximum at SW10. The TDS values of W19 varied between 21 ± 1 and 852.5 ± 787.5 mg/L. The minimum was observed at SW12, while the maximum was at SW13. TDS readings are diluted by runoff water at high tide, and most rivers have an inverse relationship between discharge rate with TDS (Charkhabi and Sakizadeh 2006). The S19 TDS values have crossed the permissible limit, and it may be due to excess deposition of industrial wastewater or sewage water and sediment (Kinuthia et al. 2020; Aniyikaiye et al. 2019). In addition, low water levels during the summer season increased TDS values, whereas high water levels during the rainy season resulted in a decrease in TDS values (Moniruzzaman et al. 2009).

COD

COD measures the oxidation of reduced compounds in water. It is often used to quantify the number of organic compounds in water indirectly (Kumar et al. 2011). COD is a metric that determines the number of organic materials in water. It measures the total quantity of oxygen required to oxidize all organic material into carbon dioxide and water. COD values are always greater than BOD values. Seasonal averages of COD values disclose variations in all three seasons (Fig. 1E). The COD values of the W18 ranged from 113 ± 2 to 375.5 ± 0.55 mg/L, minimum observed at SW12 and maximum at SW3. The COD values of S19 ranged from 110 ± 2 to 313.5 ± 1.5 mg/L. The minimum was recorded at SW12 and maximum at SW2 and SW3, respectively. The COD values of W19 varied between 111 ± 1 and 346 ± 1 mg/L. The minimum was recorded at SW9 and the maximum at SW4 (Fig. 2E). Overall, the values of COD obtained in our study area exceeded the limits prescribed by WHO 1993, 2004, and BIS 2012. Excess discharge of industrial effluent and untreated sewage is the primary cause of higher COD levels of the study area river water (MPCB 2019). The introduction of organic matter through plant extract, solid waste disposal, sewage discharge, and agricultural runoff is responsible for COD fluctuation (Sharma and Ravichandran 2021).

BOD

The BOD is a measurement of the dissolved oxygen that aerobic organisms require. The biodegradation of organic molecules raises the biological oxygen requirement by increasing oxygen stress in the water (Patel and Parekh 2013; Begum and Harikrishna 2008). The BOD values of W18 ranged from 2.9 ± 4.5 to 15.05 ± 0.5 mg/L; the minimum observed at SW11 and maximum at SW4. The BOD values of S19 were 7.3 ± 0.4 to 14.2 ± 40 mg/L, minimum shown at SW5 and maximum at SW3. The BOD values of W19 varied between 3 ± 1 and 20.5 ± 2.5 mg/L, minimum observed at SW9 and maximum at SW8 (Fig. 2F). BOD has values that have exceeded the permissible limit given by WHO in all three seasons. Only the biodegradable percentage of a water sample’s total potential DO intake is measured by BOD testing. Because bacteria are devouring the available oxygen in the water, high BOD levels imply a decrease in DO, making fish and other aquatic species unable to thrive in the river (Pathak and Limaye 2011).

Alkalinity

Alkalinity can neutralize acids, natural water, and is predominantly from weak acids’ salts. Organic alkalinity is mainly made up of hydrogen sulfide, carbonates, and bicarbonates. Carbon dioxide reacts with calcium or magnesium carbonate in the soil to produce significant volumes of bicarbonates (Patel and Parekh 2013). The alkalinity values of W18 ranged from 320 ± 10 and 1110 ± 10 mg/L; the minimum was observed at SW10 and maximum showed at SW3. The alkalinity values of S19 varied from 131 ± 1 to 2240 ± 480 mg/L. The minimum was observed at SW9 and maximum at SW3. The alkalinity values of W19 ranged from 51 ± 1 to 114 ± 2 mg/L; the minimum was observed at SW12 and maximum at SW1 (Fig. 2G). All three seasons values were exceeded the permissible limit given by WHO. Alkalinity was increased due to the presence of organic acids like humic acid, which produce salts. Though alkalinity has a limited public health impact, extreme alkaline fluids are unappealing and cause gastrointestinal pain (Patel and Parekh 2013).

Total hardness

Total hardness (TH) describes the influence of dissolved minerals, mainly Ca and Mg, in water. The usable water criteria vary depending on its use, i.e., household, industrial, and drinking, owing to the addition of calcium bicarbonates, sulfates, chloride, and nitrates (Thrtth et al. 1949). The hardness values of W18 were 733 ± 2 to 7707.5 ± 1907.5 mg/L. The minimum was observed at SW8 and maximum displayed at SW2. The hardness of S19 was ranged from 67 ± 1 to 4086.5 ± 48.5 mg/L; the minimum was observed at SW12 and maximum at SW13. Hardness values of W19 varied between 0 and 850 ± 350; the minimum was observed at SW6 and maximum at SW13 (Fig. 2H). Hardness values were exceeded the permissible limit given by WHO for all seasons. Riverside activities and discharge of domestic and industrial waste could be the possible cause of an increase in TH levels (Sharma and Ravichandran 2021).

Cl

Chlorides are present in all types of water. Yet, in natural freshwater, its concentration was often lower than that of sulphates and bicarbonates. As a result, chloride concentration is used as a pollution indicator. Because sewage water and industrial effluent are prevalent in Island, their discharge causes excessive chloride levels in freshwater (Hasalam 1991). The chloride (Cl) values of W18 ranged from 57 ± 1 to 13,913.5 ± 515.5 mg/L. The minimum was observed at SW12 and the maximum at SW3. The Cl levels at S19 varied between 10 and 20,459.5 ± 245 mg/L. The minimum observed at SW12 and the maximum at SW2. The Cl of W19 ranged from 24.8 ± 0.1 to 3536.1 ± 1267.3 mg/L; the minimum was found at SW9 and maximum at SW2 (Fig. 2I). In all three seasons, the concentration of Cl was exceeded as permissible limit given by WHO. The possible reason behind the increase in the concentration of Cl was the indiscriminate disposal of industrial waste, sewage, and agricultural runoff (Kaushal 2009). Chloride concentrations in aquatic and terrestrial ecosystems can have various ecological impacts (Kaushal 2009). It can cause stream acidification, mobilize harmful metals from soils through ion exchange, impact aquatic plant and animal mortality and reproduction, change the community composition of plants in riparian areas and wetlands, and enable the invasion of saltwater species into freshwater habitats (Kaushal 2009).

PO4

The phosphate (PO4) values of W18 ranged from 4 ± 0.1 to 7.3 ± 0.2 mg/L; the minimum was observed at SW4 and SW11, respectively, while the maximum was recorded at SW1. The PO4 values of S19 varied between 0.095 ± 0.005 and 2.37 ± 0.02 mg/L; the minimum was observed at SW4 and the maximum at SW6. The PO4 values of W19 were changed between 1 and 402.05 ± 40.75 mg/L; the minimum was observed at SW12 and the maximum at SW2 (Fig. 2J). Phosphorous is widely utilized as a fertilizer in agriculture and a key component in detergents, primarily residential use. As a result, runoff and sewage discharges are significant sources of phosphorus in surface waterways (IEPA 2001). Extensive farming in the research region, which served as a supply of phosphates, might explain the greater phosphate content during the monsoon season. Similarly, animal waste, agricultural waste, and detergent in home wastewater contributed to that same phosphate increase (Anda et al. 2001).

SO4

The dissolved SO4 minerals, the redox of pyrite and other forms of reduced oxidation of organic sulfides in natural soil processes, and human inputs, such as fertilizers, are all sources of dissolved sulfate (SO4) (Grasby et al. 1997). The SO4 values of W18 ranged from 5.67 ± 0.02 to 55 ± 23.11 mg/L; the minimum was observed at SW1 and maximum at SW3. SO4 values of S19 varied between 21.65 ± 3.95 and 39.35 ± 0.45 mg/L; the minimum was observed at SW7 and maximum at SW2. The SO4 values of W19 ranged from 4.6 ± 0.2 to 79.15 ± 6.95 mg/L; the minimum was observed at SW12 and maximum at SW2 (Fig. 2K). An increase in concentration is due to the biological oxidation of reduced sulfur species to sulfate. The discharge of industrial wastes and home sewage into bodies of water tends to raise the concentration of these substances (Trivedi and Goel 1984). Summer and fall have higher EC, Cl, and SO4 levels, whereas spring and winter have lower levels (ZareGarizi et al. 2011).

NO3

The nitrate (NO3) values of W19 ranged from 0.2 ± 0.1 to 2.1 ± 0.1 mg/L; the minimum was observed at SW10, while the maximum was recorded at SW1. The NO3 values of S19 varied between 0.2 ± 0.1 and 1.9 ± 0.1 mg/L; the minimum was observed at SW12 and the maximum at SW4. The NO3 values of W19 ranged from 0.02 to 4.6 ± 0.2 mg/L; the minimum was observed at SW1 and maximum at SW9 (Fig. 2L). Nitrate in the surface water is a significant component in determining water quality (Jhones and Burt 1993), and waste discharges and synthetic nitrogen fertilizers mainly cause it. Plant nitrogen fixation and bacterial oxidation contribute to some proportion (IEPA 2001; Butekar et al. 2018). Higher nitrate levels were ascribed to nitrate-rich runoff from agricultural areas and a considerable volume of toxic sewage during the monsoon season (Mithani et al. 2012; Butekar et al. 2018).

PCA and CA

The PCA and CA methods have been applied to choose the most influential chemical parameters (Table 2).

Table 2 Water quality index (WQI) classes (

Part A of Figs. 3, 4, and 5 represent the loading plot for the first two PCs. As the first two PCs extract most of the information from the data, we have plotted the loadings plot for PC1 and PC2 only. It is nothing but a scatter plot of the PC1 and PC2 coefficients. The loadings plot gives an idea about the influential variables (chemical parameters). The variables that are relatively close to each other (with a smaller angle) are correlated and provide similar information, so rather than selecting all of them, we can choose any of them. However, if the factors are significantly different in size and direction, they are more impactful and should be included in the analysis. From Fig. 3A, we can see that COD, BOD, TDS, and SO4 have a strong positive influence and pH has a negative influence on PC1 whereas NO3 has a strong positive influence on PC2, pH has a strong negative impact on PC2 also. COD and SO4 have a negative influence on PC2 rest of the parameters do not have a strong influence. PO4 has an influence on PC2 but does not have much on PC1. We have used a loading plot as well as cluster analysis to get a rough idea about the important parameters. The inference using this method and expert opinion decides the final choice of the parameters. Here, for each season, we have chosen six different parameters (variables) to form the WQI. Out of six, four to five variables are chosen purely based on PCA and CA and one or two based on domain knowledge, e.g., we have chosen SO4 instead of PO4 in the winter season of 2018 as the PO4 has less weight in the literature than the SO4 (Kannel et al. 2007) and less loading on PC1. Loading in PC1 dominates the loadings in other PCs as the PC1 has higher information content than others. Similarly, the parameters are chosen from other seasons and the loading plots are interpreted accordingly. Part B of these figures represents the dendrograms of the cluster analysis carried out for the chemical parameters. The dendrograms are self-explanatory and give the classification of the parameters as per the methods used. The classification is subject to a minor change if the method of clustering is changed so here also one has to use the domain knowledge for finalizing the classification. Using the information from the part A and part B of these figures, we have chosen the most influential parameters for forming the WQI. It can be seen that the variables selected for the three seasons are different as the influence of the variables changes with the season.

Fig. 3
figure 3

A PCA loading plot. B Cluster analysis for sites—dendrogram: Ward linkage, Euclidean distance. C Correlation coefficient distance of season 1, W18 (December 2018)

Fig. 4
figure 4

A PCA loading plot. B Cluster analysis for sites—dendrogram: Ward linkage, Euclidean distance. C Correlation coefficient distance of season 2, S19 (May 2019)

Fig. 5
figure 5

A PCA loading plot. B Cluster analysis for sites—dendrogram: Ward linkage, Euclidean distance. C Correlation coefficient distance of season 3, W19 (December 2019)

The different parameters have been selected for different seasons based on the PCA and CA analysis. For season 1 (W 2018), parameters like pH, TDS, BOD, SO4, NO3, and COD are influential and chosen to construct WQI (Fig. 3A, B). For season 2 (S 2019), the parameters are included pH, COD, Cl, BOD, SO4, and NO3 have been selected for analysis (Fig. 4A, B). Similarly, for season 3 (W 2019), parameters such as pH, TSS, TDS, alkalinity, BOD, and NO3 have been chosen (Fig. 5A, B). The sampling locations are the same throughout the study period, but the dominating parameters have changed season-wise. Hence, seasonality has a significant role in water pollution. The weights for these parameters to form WQI are taken from Kannel et al. (2007). Figures 3C, 4C, and 5C are the dendrograms for clustering the sites based on the data. It can also be seen that there is a high dependency on the season while clustering the sites. The areas in the same cluster behave similarly in terms of the pollution level.

These locations are comparatively less polluted because of less habitation near upper side beaches and well-connected city sewerage systems at lower side beaches of Mumbai. Cluster II includes Dadar corresponding to the MP site, which receives untreated sewage and wastewater discharges from nonpoint and point sources. Marginal variation in water quality was observed at Dadar during post-monsoon, winter, and pre-monsoon. The nallah at Cleavel and Bunder, carrying about 54 MLD of untreated wastewater, join the seashore between Worli and Dadar (Dhage 2006). Cluster III of Mahim corresponds to the HP site that receives heavy pollution through the Mithi river which carries garbage, raw sewage as well as industrial and hazardous waste having a huge organic load (MPCB 2019; Bhagatet al. 2006; Kamini et al. 2006).

WQI

The behavior of the WQI for various sampling sites and the three seasons depict significant seasonal variation in the water quality (Fig. 6). For season 2, the water quality is below poor and hence not suitable at any locations for direct use. As it is summer, no dilution happens, and therefore the pollutants constitute a significant part of the water. However, a similar study has been done, and the same result has been reported (Sudarshan et al. 2019).

Fig. 6
figure 6

WQI for three seasons against various sites

Conclusion

River health is vital for livelihood and maintaining a concerned ecosystem. The surface water of the Amba River was significantly contaminated, as evidenced by the water quality index study and assessment. The parameters like TDS, COD, BOD, TA, TH, Cl, and PO4 were above the permissible limit at all sites and seasons. Hence, the river’s water quality fell under the poor category and was unsuitable for drinking purposes. The causes are discharge of poorly treated effluent, sewage water, and agricultural runoff. The use of this water may have a disastrous impact on human health, river ecosystem, and agricultural land of the surrounding area. As the river health is threatened, there is a need to restore the natural water quality, flow of the river, and clean up polluted water. In the case of our study through WQI it can be seen that pollution of water is happening in all the seasons and water is not useful for drinking purposes at all. So, we conclude that sewage water treatment facilities should be provided to nearby villages, and also adequate treatment should be given to the industrial effluent before discharging into the natural water body. Moreover, statistical methods are useful for the optimal use of data.