1 Introduction

Numerous epidemiological studies of waterborne illness in countries like India indicate that the common aetiological agents are bacteria, viruses and parasitic protozoa (Hamner et al. 2007; Patel et al. 2002) and the FIB (coliform and enterococci) presence is an important parameter for determining water pollution levels. But the ubiquitous nature of FIB especially coliform, the majority being environmental contaminant, had raised doubt about their authenticity as an indicator of the pathogen (CPCB 2002). Also, poor relationships have been accounted for between waterborne human viruses or protozoa and FIBs (CPCB 2002; Schriewer et al. 2015). This can give a bogus thought regarding water quality. Hence recently, CPCB used BOD, or the dissolved oxygen required for water bodies to bolster eco-systems, as a reliable parameter for estimation of river quality (CPCB 2011; CPCB 2015). However, they provide judgement in terms of individual parameters and do not provide a complete picture of the scenario (Ali et al. 2014; Rosemond et al. 2009).

Furthermore, many epidemiology studies have failed to find a correlation between human health outcomes and FIB levels, particularly when the pollution is not from a known point source such as a wastewater treatment plant (WWTP) (Dwight et al. 2004; Colford et al. 2007). Also, the determination of the FIBs does not provide information about the pollution source. Nonetheless, water regulatory agencies have yet to come to terms with the inherent problems resulting from reliance on a single parameter as currently determined. Hence, several water quality indices (WQI) have been developed globally to monitor freshwater quality for direct human consumption and other uses (Sun et al. 2016). As there is a variety of chemical, physical and biological water quality parameters, several researchers have proposed other parameters or markers that can be incorporated in the WQI so that not only the level of pollution can be indicated but also the source of pollution can be identified (Zeinalzadeh and Rezaei 2017). Fortunately, new index organisms for some pathogens look promising like C. perfringens, phages and viruses like adenoviruses. As these index organisms are relatively untested worldwide especially in India, extensive trials are necessary before their general acceptance in microbial risk assessment. Also, most of the microbial source tracking (MST)-based studies focused on faecal contamination from human sources and their backtracking. But as the understanding of zoonoses potential expands, it will be worth to involve water contamination from animal sources and its backtracking.

In India, a lot of the studies have been carried out using multiple parameters for constructing water quality index in assessing the water quality (Bora and Goswami 2016; Ponsadailakshmi et al. 2018), and in Punjab, most of the extensive studies were on heavy metal contamination (Setia et al. 2020). But none of the studies could give the conclusive results related to which parameters must be used in the water quality index for assessing policies related to water quality. And the determination of the FIB does not provide information about the pollution source. Hence, the need to develop effective methodologies for evaluation of groundwater and surface water resources for sustainable development and safety of human health arises. In the current study, water pollution levels were therefore monitored in the midstream and at both riversides as well as in the river Sutlej’s most important tributaries. To track the origin of faecal pollution, host-specific genetic faecal markers for different host groups were determined simultaneously at all the sampling sites along the river and the tributaries.

Lastly, the simultaneous detection of human-specific markers (like B. thetaiotaomicron, M. smithii, HPyVs and HAdVs etc.) with E. coli and enterococcus supports the likelihood of recent faecal contamination, since the human-specific markers are unable to regrow in natural surface waters (Liang et al. 2015; Das Kangabam et al. 2017). Multiple-linear-regression results further confirm that the inclusion of HAdVs, together with traditional indicators, would better predict the occurrence of pathogens. Further study is needed to determine the applicability of such models to different geographical locations and environmental conditions (Liang et al. 2015; Farkas et al. 2018).

Thus, the current study incorporated to monitor faecal coliforms, enterococci, HAdVs, BAdVs and PAdVs and other physiochemical properties in surface water bodies. The objectives for this study were (1) quantifying FIB and adenoviral markers in surface water, (2) determining the physiochemical values of the surface water and (3) testing for correlations among physiochemical, FIB and adenoviral markers using the statistical models.

2 Material and Methods

2.1 Surface Water Samples from Canals and River Sutlej

The river Sutlej flows through the Ludhiana district of Punjab, and Sindhwan and Neelon-Ropar canals are tributaries that join siver Sutlej. Samples were collected from these canals and Sutlej River (Fig. 1) from February 2017 to January 2018. Around 52 surface water samples (17 water samples from each Sidhwan canal and Neelon-Ropar canal and 18 water samples from river Sutlej) were collected in sterile polypropylene bottles, properly labelled and immediately placed in an insulated box containing ice packs. A 0.2 ml of 3% sodium thiosulfate (Na2S2O3) was added to 200-ml water samples kept for bacteriological analysis, to neutralize any chlorine present. The insulated boxes carrying samples were transported to the School of Public Health and Zoonoses lab and stored at a low temperature of 4 °C. The pH and turbidity were tested immediately after sampling as they may change during storage and transport.

Fig. 1
figure 1

These maps of the Punjab India, pulled from the Google Earth, show the canals and the river Sutlej of Punjab. Sampling was done after every 3 to 5 km along the stretch of the river and canals

2.2 Physiochemical Parameters

The pH, temperature, EC (electrical conductivity), TDS, turbidity and dissolved oxygen (DO), BOD and COD were estimated in this study (Supplementary material).

2.3 Enumeration of Culturable Indicator Organisms

E. coli and Enterococcus spp. in surface water samples were enumerated using the Environmental Protection Agency’s standard membrane filtration methods (EPA 1997; U.S. EPA 2002). Minimum analytical QC requirement for the analysis of samples using Method 1603 was followed (Supplementary material).

2.4 Virus Concentration Methods and Nucleic Acid Extraction

The following two virus filtration methods were optimized and used in this study:

  1. i

    Protocol I: direct nucleic acid extraction from negatively charged membranes (Zehra 2018) and were applied to the highly turbid samples (Supplementary material).

  2. j

    Protocol II: adsorption/elution-based protocol with negatively charged membranes (Zehra 2018) and were applied to the river and canal water samples with the low turbidity (Supplementary material).

2.5 Molecular Assays

Touch-down PCR, semi-nested PCR and the qPCR were first optimized/validated using the known faecal samples and PCR inhibition assay (Sketa-22 Assay) was studied before applying these methods to the environmental samples (Zehra 2018). The details of these assays are given in Supplementary material.

3 Statistical Analysis

The statistical analyses were done using IBM Statistical Package for Statistical Sciences (SPSS V24.0) software, and before, any statistical analysis normality of data collected was tested. A one-way analysis of variance (ANOVA) was performed to evaluate a significant difference in physiochemical and bacteriological parameters between water samples collected from different tehsils of Ludhiana. If a significant difference was acquired (p ≤ .05), the post hoc analysis utilizing Tukey’s test was performed.

When assessing the virus concentration, the average number of virus particles per sample was log10 transformed and plotted using a 95% confidence interval (CI). Since it is obscure whether an estimation of zero-copy number/sample was because of the lack of AdV DNA, inhibition or a sample DNA below the limit of detection, all zeros were removed before log10 transformation and then put back in zeros. The relationship between various water quality parameters was determined by calculating Pearson correlation coefficients and was accounted for as “r”. Differences were considered significant when p ≤ .05 and two-sided tests were performed for all analysis. Observations of markers were converted to binary data and binary logistic regression model (SPSS V24.0) was used to assess relationships between physiochemical parameters, FIB concentration and the presence or absence of the viral marker. The strength of relation was reported as Nagelkerke’s R square; the stronger association had the value closer to 1.0. Relationships were considered significant when the p value for the model Chi-square was ≤ .05 and the CI for the odds ratio did not overlap 1.0. Only correlation in which p values were < .05 was noted. A non-significant correlation was denoted by NS. Fischer’s exact test was used to evaluate significant differences in the frequency of observation of binary marker data. An alpha of 0.05 was used as the cut-off for significance.

In the multiple regression, coliform/enterococci counts were the dependent variable and physiochemical parameters were the independent variables. The same analysis was repeated using adenoviruses as the dependent variable (McCulloch 2015). Likewise, in a multinomial logistic regression using canals as the “dependent variable”, all other quantitative water quality parameters as “covariate” and qualitative water quality parameters as factors.

Cluster analysis comprises series of multivariate methods which are used to find clusters within the data (Du et al. 2017). The predefined algorithms were used to calculate similarity at which observations are clustered and used to build a dendrogram (Kumar et al. 2018). Cluster analysis of the considerable number of variable in surface water was made based on the estimation of Pearson’s correlation coefficient and centroid method. Likewise, principal component analysis for 8–10 water quality parameters was performed to determine the principal components which explains the maximum variance of the data. In factor analysis, new factors that cause variation in physiochemical parameters and microbiological concentration were extracted by varimax/direct oblimin rotation of the PCA.

4 Results and Discussion

The changeability of a load of genetic marker and prevalence in populations from other geographical areas proposes that the utilization of MST markers developed in a geographical area requires a priori characterization of the assay performance at each watershed of interest before being actualized (Yahya et al. 2017). Subsequently, in the first instance, the chosen MST markers were tested in faecal samples from known sources. The virus concentration methods were optimized/validated and PCR inhibition assay was studied before applying these methods to the environmental samples (Zehra 2018). All the PCR markers demonstrated high sensitivity and specificity, albeit none of them accomplished 100% for the two parameters. Albeit a portion of the MST markers was identified in hosts other than the expected ones, their load in the target group was in every case significantly higher than in the non-target hosts, showing their reasonableness to recognize between sources of pollution.

The river Sutlej flows through the Ludhiana district of Punjab, and Sidhwan and Neelon-Ropar canals are tributaries that join Sutlej river. Viral load and other nine water quality parameters including BOD, COD, DO, TDS, turbidity, pH, temperature, coliform and enterococci count were studied at various sampling sites throughout 63 km between Samrala and Jagroan tehsil of district Ludhiana.

4.1 Detection of Adenoviruses in Surface Water Samples

Human enteric viruses are common surface water contaminants throughout the world. In this study, the data also showed the occurrence of HAdVs and bovine/porcine AdVs in samples of surface water. From 52 water samples, 32% analysed samples were positive for HAdV (qPCR), followed by 16% samples positive for BAdV (nested PCR) and 11% positive for PAdV (qPCR). In this study, the concentration of HAdV in water samples ranged from 101 to 102 copy no./L. These outcomes are like those found in surface waters (Hundesa et al. 2006; Wyn-Jones et al. 2011). However, viral loads higher in 2 logs have been detailed in other studies. Albinana-Gimenez et al. (2009) (101–104 gc/L), Choi and Jiang (2005) (102–104 gc/L) and Haramoto et al. (2010) (103–105 gc/L), pointed out that the HAdV viral loads present in water are exceptionally subject to the geographic level, temporal variation, environmental damage and the methods utilized for detection. Also, the concentration of PAdVs in the present study was around 10 copy no./L. This study is only among the few studies like Bortagaray et al. (2019) that reported PAdV in the surface water.

The frequency of adenovirus detection in each tehsil was analysed and summarized in Fig. 2. For all type of adenoviruses, detection frequencies were not significantly different except for HAdV that showed higher frequency in surface water samples from tehsil Ludhiana (Binary logistic—HAdVqPCR was taken as dependent variable and tehsil was taken as a categorical covariate) (Fig. 2).

Fig. 2
figure 2

Frequency of detection of adenoviruses (HAdV/BAdV/PAdV) in each tehsil of Ludhiana

There were points in tehsil Ludhiana showing maximum HAdV viral load (around 102 copy no./L). This may be because of higher population density in tehsil Ludhiana compared with other tehsils of district Ludhiana.

4.2 Analysis of Physiochemical and Bacteriological Parameters of River and Canals

There was a significant difference in physiochemical and bacteriological parameters between water samples collected from different tehsils of Ludhiana (p < .05, one-way ANOVA) (Table 1). On post hoc analysis using Tuckey’s test, a significant difference in enterococci count, coliform count, turbidity and TDS was observed (p < .05). The average concentration of both enterococci and coliform at Samrala was significantly larger than those at Jagroan and river Sutlej (p < .05, ANOVA). Although the average concentration of enterococci and coliforms among Jagroan, Ludhiana and river Sutlej were not significantly different (Table 1), there was a significant difference between turbidity and TDS of water samples from river Sutlej with any other water sample (p < .05).

Table 1 Summary of physiochemical and bacteriological parameters of water samples

In the present study, high numbers of coliform and enterococci were observed in the water of the canal over all locales which could be almost certain because of the presence of faecal contamination from sewage leakage, animal sources and environmental sources. The coliform count was generally 10-fold higher than enterococci along most of the sites.

The BOD and COD values represented in Table 1 showed higher standard deviation, which means that samples from river Sutlej were showing high variation in physiochemical parameters, which could be because of entry of highly contaminated water from Budha nallah, a highly polluted seasonal water stream that drains into river Sutlej. There was significant variation between water quality parameter from upstream and downstream of confluence point of Budha nallah with river Sutlej (Table 2, Fig. 3).

Table 2 Variation in mean values of various physicochemical parameters at six selected sites (S1–S6)
Fig. 3
figure 3

Scatter plot showing outliers representing samples from river Sutlej after joining of Budha nallah

4.3 Correlation Between Physiochemical, Bacteriological and Viral Markers in Surface Water Samples

To the best of our knowledge, this study is among the first in India to analyse the extent and recurrence of observation of culturable FIB, multiple physico-chemical parameters and viral markers in recreational waters. Among the most significant findings were the relationships of human-associated MST markers with one another and with adenovirus detection. Data were from all the locales of the canal and river Sutlej excluding values of samples collected downstream Sutlej. These values were outlier therefore not statistically analysed with other values from water samples (Fig. 3). The average log10-transformed concentrations of bacterial and viral markers at each site are summarized in Fig. 4 and the relationship between the physiochemical, bacteriological and viral indicators is presented in Table 3.

Fig. 4
figure 4

Log10-transformed concentrations of enterococci, coliform and adenoviruses at different tehsils. Error bars represent standard deviations. Indicator bacteria are reported as log10 CFU/100 ml and AdVs are reported as log10 copy number/100 ml

Table 3 Relationship between indicators and markers of surface water samples from canals and river Sutlej

A correlation between analytes at each site was now and again yet not generally noted. Generally, only a sporadic significant correlation was observed with the few components; there was no constant, stable correlation between the microbiological and physiochemical parameters (Table 3). Their correlation was positive between the coliform and enterococci and between the COD and BOD (Table 3). In river water samples, qPCR detection of HAdVs and PAdVs was weakly to strongly correlated with the log concentrations of coliforms (HAdV: Nagelkerke’s R2 = 0.446, odds ratio = 6.829, p = .019 and with PAdV: Nagelkerke’s R2 = 0.688, odds ratio = 19.764, p = .05). PAdVs detection rate also showed strong correlation with TDS (Nagelkerke’s R2 = 0.686, odds ratio = 0.886, p = .044). Nested PCR detection of BAdVs was not correlated with any water quality parameter except COD (Nagelkerke’s R2 = 0.632, odds ratio = 1.487, p = .021). This correlation could be significant or nonsignificant at different sites or for different samples. No significant relationships were observed at sites from Sidhwan canal and Neelon-Ropar canal (multinomial logistic regression using canal as the “dependent variable”, all other quantitative water quality parameters as “covariates” and qualitative water quality parameters as “factors”). But when considering coliform, BOD, DO and pH together for classifying samples as per CPCB (2008) then odds of finding the class D (Propagation of Wildlife and Fisheries, Irrigation, Industrial Cooling, Controlled Waste disposal) samples in Neelon-Ropar canal was 13.93 and 15.83 times higher than samples from Sidhwan canal and Sutlej river, respectively (p = .001). The lack of correlation between the physiochemical, bacteriological and virological parameters had already been well documented by other studies. Lee et al. (2013) have demonstrated that the presence of total coliform and faecal coliform were not related to the presence of enteric viruses in a study conducted with surface water in South Korea over 4 years. Likewise, Vecchia et al. (2015) observed correlation only with a few components while evaluating 78 physiochemical parameters (physiochemical parameters, metal components, pesticides and organic compounds) and its correlation with microbiological parameters. In their study, an inverse correlation was observed between DO and canine AdV while a positive correlation was observed between DO and E. coli count. Unlike the study reported by Waso et al. (2018), a positive correlation was observed between the coliform and human adenovirus.

In the present study, both the canals and river Sutlej showed a stable and significant positive correlation between bacteriological parameters (coliform and enterococci) and between BOD and COD. However, regression analysis revealed no significant correlations between concentrations of enterococci count with the concentration of AdVs, unlike coliform that showed correlation with PAdV and no correlation with HAdVs/BAdVs, similar to the studies that reported such type of correlations with HAdVs (Lee et al. 2013; Vecchia et al. 2015).

In nutshell, the concentration of E. coli and Enterococcus spp. concentration varied along the line of Sidhwan and Neelon-Ropar canal but was detected at all the points evaluated. There was no correlation between the HAdV and bacterial indicators when it comes to point-source contamination.

4.4 Cluster Analysis and Principal Component Analysis

One important aspect of the current study was to understand the source allocation of the water quality parameters in the study area using cluster analysis (CA) and principal component analysis (PCA).

In the dendrogram (Fig. 5), cluster analysis separates the different water quality parameters into four major clusters (C1 to C4). PAdV forms a cluster C1 with BOD and COD that in turn forms a sub-cluster with turbidity and TDS (C2). Bacterial indicators and HAdV forms a separate cluster as C3 and C4, respectively, thereby represents no similarity with the other clusters.

Fig. 5
figure 5

Dendrogram representing similar groups clustered together. C1 to C4 represent the four clusters

Principal component analysis of the data ends up being a successful tool for data reduction as the initial three principal components of all the water quality parameters clarified 91% variance. Factor analysis outlined three factors of basic water quality (Kumar et al. 2016; Kumar et al. 2018; Tripathi and Singal 2019). Factor 1 comprised parameters like BOD, COD, TDS, turbidity and PAdV. Factor 2 was a bacteriological water quality determinant and explained maximum variance in coliform and enterococci. Factor 3 comprised HAdV.

At first, the factorability of the 10 water quality parameters was examined using varimax and oblimin with Kaiser-Meyer-Olkin normalization as rotation methods. An oblimin rotation gave the best-characterized factor structure. A few very much perceived rules for the factorability of a correlation were utilized. Firstly, it was observed that 7 of the 10 parameters correlated at least 0.3 with at least one other item, suggesting reasonable factorability. Secondly, the Kaiser-Meyer-Olkin measure of sampling adequacy was 0.775, above the commonly recommended value of 0.6 and Bartlett’s test of sphericity was significant (χ2 (45) = 520.502, p < .05). Additionally, the anti-image correlation matrix’s diagonals were all over 0.5 and the commonalities were all above 0.3 except for HAdV. To increase the reliability of PCA, the two parameters (DO and pH) that were not correlated with any other parameter were removed for second PCA analysis. Now, factorability of the eight water quality parameters was examined. It was observed that seven out of eight parameters correlated and the Kaiser-Meyer-Olkin measure of sampling adequacy was 0.806 and Bartlett’s test of sphericity adequacy was significant (χ2 (28) = 484.476, p < .05). Additionally, the anti-image correlation matrix’s diagonals were all over 0.5 and communalities were all above 0.3, further affirming that every parameter imparted some common variance with other parameters. Given these overall indicators, factor analysis was regarded to be reasonable with each of the eight parameters. A tabular and graphical representation for factor loading of principal components (eigenvector > 1) alongside their cumulative variance (%) has been given in Table 4 and Fig. 6.

Table 4 PCA of different water quality parameters of water samples from Sutlej and its tributaries (component matrix)
Fig. 6
figure 6

Component plot in the rotated space

Extraction method: principal component analysis

Factor 1 accounted for 59.70% of the total variance and had strong positive loadings on BOD, COD, TDS, turbidity and PAdV which are also supported by their mutual cluster in CA. Since all these parameters are pollution-related from industrial pollutant and farm wastewater, the first factor is largely a pollution loading factor leading to eutrophication. Factor 2 accounted for 19.05% of the total variance and had strong loadings on enterococci and coliform count due to the natural quality of water and domestic waste. Factor 3 represented 12.06% of the total variance and had strong positive loadings on HAdV same as indicated in CA. This factor shows the impact of human activities overwhelmed in the catchment area of the canals and river.

Finally, in evaluating the surface water of two canals and river Sutlej, there was a small association between viruses and few other variables, but this was not consistent for the present study to establish a reliable relation. In general, the correlations showed that linearity and nonlinear correlations were found only at point source contaminations. The findings of the presented study were evaluated in polluted areas and not so polluted areas, judging by the bacteriological indicators. This sort of clustering can likewise be seen in other studies like Kirschner et al. (2017). Their study detailed microbial faecal contamination of the river as an independent component and did not cluster with any other estimated environmental parameters within the environmental data matrix. This study shows that the application of host-associated genetic microbial source tracking markers working together with the customary idea of microbial faecal contamination observing based on FIB essentially improves the information on the extent and origin of microbial faecal contamination patterns in large rivers. It establishes an incredible tool to guide target-oriented water quality management in large river basins.

In conclusion, the capability of coliform to be environmental in origin and the deficient specificity of markers can mean uncertain outcomes when just a single marker is distinguished. In any case, each marker’s predictive probability of success is increased when more than one marker is identified at the same site. For all markers utilized in this study, an epidemiological study evaluating the human health risks associated with the marker would all the more accurately characterize the usefulness of each assay. Hence, a significant level of certainty on the contamination source can be acquired if more than one marker is distinguished. Predictive statistical models represent a novel technique for advising the management organizations regarding likely factors impacting faecal contamination intensity and how to alleviate future faecal contamination occasions. Our study thus gives a case of how quantifying physiochemical, FIB and adenoviral markers in surface water can improve water quality evaluation and help tailor Central Pollution Control Board (CPCB) programs for impaired water bodies in the future.