Introduction

The degradation of freshwater quality is one of the major concerns of aquatic ecosystems on a worldwide scale. Human interference and natural hydrological processes play very important roles in water quality degradation. The degradation of water quality is a strong motivation for water quality management and determination of the ambient water quality, the extent of the impact of pollution by anthropogenic activities, and possible control measures to restore water quality. Water quality monitoring is a collection of representative and quantitative information on physical, biological, and chemical water quality characteristics with respect to space and time (Sanders et al. 1983; Ward et al. 1989; Strobl and Robillard 2008). According to recent reviews (Dixon and Chiswell 1996; Strobl and Robillard 2008; Mishra and Coulibaly 2009, 2010), the design of a water quality monitoring network (WQMN) consists of a set of tasks: (a) determination of the number and spatial distribution of monitoring stations, (b) selection of a sampling frequency, and (c) selection of water quality parameters to be monitored. Defining monitoring objectives and budgetary constraints has been found essential while designing water quality monitoring network.

In a global scale, various countries, viz. European Union member states Norway and the European Commission (Water Framework Directives of the European Community 2003), USA (United States Environmental Protection Agency (USEPA) 2003; United States Geological Survey 2005), India (Ministry of Environment and Forest of the Government of India 2005; Central Pollution Control Board of India 2007), Canada (Canadian Council of Ministers of the Environment 2006), and Australia (Department of Water of the Government of Western Australia 2009), have constituted their own protocols and guidelines for design of water quality monitoring networks. Most recently, USEPA published a report on National Rivers and Streams Assessment 2013–2014, in which sampling sites were selected randomly, based on the statistical survey design to represent the population of the streams and rivers in their ecological region, i.e., the geographic area which consist of similar ecological features, climate, and plant and animal communities (USEPA 2014).

Apart from these guidelines and frameworks, various approaches to the design of WQMN have been proposed by previous researchers. Sharp (1971) proposed a systematic approach to uniform sampling of rivers and streams. In this method, optimum sampling locations are selected by dividing the stream network into successive halves by identifying the centroids. Sanders et al. (1983) proposed a modified form of Sharp’s method in which tributaries were replaced by pollution loadings and the number of outfalls. Both the Sharp and Sanders approaches are based on the topology of the river basin. The Sanders et al. (1983) modification of Sharp’s approach has been used by various researchers for selection of sampling locations (Varekar et al. 2012; Do et al. 2011, 2012) and sub-basins (Cetinkaya and Harmancioglu 2012). Water quality parameters are inherently stochastic in nature (Sanders et al. 1983; Harmancioglu and Alphasan 1992, 1994; Harmancioglu et al. 1999; Ozkul et al. 2000). Therefore, various statistical techniques, viz. statistical entropy (Harmancioglu and Alphasan 1992, 1994; Harmancioglu et al. 1999; Ozkul et al. 2000; Karamouz et al. 2009a; Alameddine et al. 2013), statistical kriging (Lo et al. 1996; Karamouz et al. 2009b; Hudak and Loaiciga 1993), and multivariate statistical techniques (Ouyang 2005; Varol et al. 2012; Wang et al. 2012; Musthafa et al. 2012, 2014), have been used effectively by researchers for the design and assessment of sampling sites. Optimization approaches have also been applied as effective tools for the design of sampling locations (Telci et al. 2009; Cetinkaya and Harmancioglu 2012; Asadollahfardi et al. 2014). Apart from these approaches, fuzzy logic and genetic algorithms on geographic information system (GIS) platforms (Park et al. 2006; Strobl et al. 2006a, b), fuzzy theory with multiple criteria analysis (Chang and Lin 2014a), and VIKOR method (Chang and Lin 2014b) have also used for the design of water quality monitoring networks (Table 1).

Table 1 Summary of literature on design of water quality monitoring network

The anthropogenic activities are the key elements responsible for alternation of surface water quality. These activities are accounted in the terms of pollution loads. Thereby, consideration of the pollution potential of anthropogenic activities becomes vital for effective water quality management. Hence, the principal objective of proposed river water quality monitoring design is evaluating the potential of pollution load entering into the river and the need for respective pollution control measures. In addition, present monitoring design also suits with the monitoring objectives defined by Central Pollution Control Board (CPCB 2007) of India, which are listed as follows: (i) rational planning of pollution control strategies; (ii) identification of nature and magnitude of pollution control measures; (iii) effectiveness evaluation of existing pollution control efforts; (iv) identification of the mass flow of contaminants in surface water and effluents; (v) formulation of standards and permit requirements; (vi) testing of compliance with standards and classifications for waters; and (vii) early warning and detection of pollution. The reviews that have been conducted on the guidelines and frameworks for water quality monitoring network in various countries and on the numerical and statistical approaches to the design of such networks have shown that there is no universally accepted unique approach to design of water quality monitoring networks (Strobl and Robillard 2008). The guidelines are mostly based on experts’ judgments and do not consider the analyses of existing water quality inventory data and watershed and river characteristics, and they also may lack a mathematical basis for design, while the numerical and statistical approaches may be too theoretical and mathematically rigorous. Moreover, statistical approaches are more efficient for assessment of existing monitoring networks than for designing new networks. Most of the above-discussed approaches consider the effect of point or discrete sources and the existing status of river water quality. However, intensive use of fertilizers on agricultural land may generate enormous amounts of nutrient-rich agricultural runoff that enters into the river as a result of diffuse source of pollution (Jha et al. 2005). In tropical countries, where monsoons are predominant, river water quality may vary significantly during the monsoon (June to September) and non-monsoon (October to May) seasons (Tsirkunov et al. 1992; Bhangu and Whitfield 1997; Vega et al. 1998; Hanrahana et al. 2003; Singh et al. 2004; Ouyang et al. 2006; Chang 2008; Altin et al. 2009), mainly due to the impact of diffuse pollution loads. Hence, both point and diffuse pollution sources must be considered to achieve a realistic design for surface water quality monitoring network.

Based on a comprehensive review conducted in the present study, it has been established that the design of a water quality monitoring network should consider (1) the location and magnitude of both point and diffuse pollution loadings and (2) their temporal variations, i.e., during the monsoon (June to September) and non-monsoon (October to May) seasons, which motivates us in the present study. To account for the effect of seasonal/temporal variability, i.e., the monsoon and non-monsoon seasons, the annual cycle is divided into two seasons in Indian applications (Jain et al. 2007; Central Ground Water Board, Government of India 2004). It has also been found that a periodic assessment of the monitoring network is required to consider this seasonal variation, which may result in pre-monsoon (before the monsoon season starts) and post-monsoon (immediately at the end of monsoon season) design of the network. In this regard, GIS, geographic positioning systems (GPS), and remote sensing tools are found useful in exploiting various types of data, such as data on the hydraulic and hydrological characteristics of river basins, land use practices, agricultural cropping patterns, use of fertilizers, the relative locations of pollution sources and monitoring stations, etc. Such a realistic design approach helps a watershed manager to design a network with a user-friendly interface. The present study is the first effort to address the effect of monsoonal and non-monsoonal variations in the design of a water quality monitoring network, considering both point and diffuse sources of pollution. The proposed approach conceptualizes water quality monitoring as a two-stage process, the first stage of which involves considering all potential water quality sampling sites selected on the basis of existing guidelines or frameworks and the locations of both point and diffuse pollution sources. The monitoring at all sampling sites should be continued for a sufficiently long period of time to account for the effect of the monsoon season on water quality. The monitoring network is then designed separately for the monsoon and non-monsoon periods by optimizing the number of sampling sites, subject to budgetary constraints and the importance of each individual station, which is quantified using optimal hierarchy levels (Sanders et al. 1983; Do et al. 2011, 2012), based on the binary tree structure of graph theory (Knuth 1968). The second stage involves establishment of a monitoring program with optimized monitoring of the network based on the network structure in the monsoon and non-monsoon seasons. The network monitoring structure should be modified or redesigned in subsequent years with updated water quality data. The Kali River basin in western Uttar Pradesh, India (Jha et al. 2005, 2007) was selected as the study area. The potential sampling sites were identified based on the locations of points of entry of point and diffuse pollution into the river, using geospatial techniques. The modified Sanders approach (Sanders et al. 1983), which is well accepted among researchers (Do et al. 2011, 2012; Varekar et al. 2012), was applied to the design of sampling sites.

Materials and methods

Proposed methodology

The proposed methodology consists of three components: (I) field investigations, primary data collection and geospatial analysis; (II) pollution load estimation; and (III) design of water quality sampling locations for monsoon and non-monsoon periods. Figure 1 shows the organization and flow of the proposed method.

Fig. 1
figure 1

Framework of proposed methodology

Field investigations, primary data collection, and geospatial analysis

The proposed methodology introduces a realistic approach to the selection of sampling locations based on field survey and the use of geospatial techniques to account for the impact of human activities. The river system considered in this study is the Kali River basin in Uttar Pradesh, India (Fig. 2). The Kali River is a tributary of the Hindon River. Its origin is near the Saharanpur district, and it travels a length of 125 km before merging with the Hindon River. The catchment area of the Kali River is 1,475.50 km2. Kali River has a significant socioeconomic value, i.e., people residing in the watershed have been depending on the river water for drinking, bathing, washing of clothes, and also for agriculture. But the river water quality is gradually degrading due to disposal of municipal wastewater and significant volume of agricultural runoff entering into the river (Jha et al. 2007). Therefore, the proposed monitoring network design framework was applied to the Kali River system, for evaluating the potential of pollution load entering into the river and the need of pollution control measures. The sampling locations were selected based on the location of mouth of sub-watersheds, as the water quality monitored at mouth should represent the characteristics of the respective sub-watershed. Hence, in the current study, the 16 sampling locations were the representative monitoring sites, both spatially and temporally, and not restricted to any particular nodal information. Topo sheets (survey of India: 53 G/9, 53 G/10, 53 G/11, 53 G/12, 53 G/13, 53 G/14), a location map, district map, and political map, were used to identify the exact location of the catchment area. A digital elevation map (DEM) with 20-m resolution was obtained from the Indian Space Research Organization (ISRO). The watershed was delineated using ArcGIS 9.3. The main stream of the Kali River was delineated using the terrain processing tool in Arc-Hydro by defining the threshold value. Field visits were conducted to collect details of the cropping pattern, fertilizer utilization, and the locations of point pollution sources (estimated using GPS) and to check the accessibility of sampling sites. A total of 16 sampling stations were identified, based on the available guidelines for water quality monitoring (CPCB 2007), the locations of sub-watershed outlets, and the relative locations of point pollution sources. Of these 16 monitoring stations, six are located downstream from point sources, merging with the Kali River in the form of natural open drains. The effects of diffuse pollution sources are captured by all 16 monitoring stations, as they are located at the outlets of sub-watersheds. All 16 sampling sites (the details of which are tabulated in Table 2) are accessible throughout the year. The land use map (shown in Fig. 3a) was generated from satellite images of the study area (Source, United States Geological Survey (USGS) Earth Explorer) using the image processing software ERDAS IMAGINE 9.1, following a supervised classification. The land use map shows that 80.11 % of the watershed area is used for agricultural purposes (Fig. 3).

Fig. 2
figure 2

a Location map of the study area. b Location of potential sampling sites in Kali River basin

Table 2 Details of sampling stations and river reaches
Fig. 3
figure 3

a Land use map of the study area. b Relative fraction of land use for each sub-watershed. c Area under different land use category for each sub-watershed (in square kilometer)

Pollution load estimation

Extensive sampling from the 16 locations on the river was conducted. As mentioned in the “Introduction” section, the proposed method consists of two stages of monitoring. The initial monitoring in the first year should be more comprehensive than the subsequent seasonal monitoring. Therefore, the hydraulic and water quality characteristics of the river were estimated for 1 year, from March 1999 to February 2000. Sampling was conducted on the 10th and/or the 11th of each month (three times daily) during the non-monsoon season, November to June. During the monsoon season (July to October), sampling was conducted based on the occurrence of storm events. It should be noted that the Kali River basin is located in northern India, where the monsoon approaches later than in other parts of country. The “grab samples are single samples collected at a specific spot at a site over a short period of time (typically seconds or minutes)” (APHA 2000), and the sampling method is called as grab sampling. The grab sampling method was used to collect the river water samples. The samples were collected at a depth of 15 cm from 3 points across the location of sampling (1/3, 1/2, and 2/3 distance along river cross section). The cross-sectional area of the river was estimated at each location using a measuring tape and a leveling staff. The velocity and depth of the water were measured using a current meter (electromagnetic current meter WTW 197, Germany) and a leveling staff, respectively. The cross-sectional area, depth, and velocity were used to estimate the discharge at various locations along the river. The water samples collected were analyzed to determine their pH, temperature, electrical conductivity, and chemical properties, i.e., their nitrate, phosphate, biochemical oxygen demand (BOD), and dissolved oxygen (DO) contents, following standard methods (APHA/AWWA/WEF 2000). For example, the azide modification method has been used for determination of dissolved oxygen (APHA/AWWA/WEF 2000). Portable meters (Hach sensION1 pH meter and Hach sensION5 conductivity meter) were used to measure the physical parameters in the field. The remaining parameters were analyzed in the laboratory. The diffuse source BOD (L d) inflow to the river was found to be difficult to determine. Therefore, in the present analysis, a mass balance approach was used to estimate the L d values (Jha et al. 2007) from the available data. To understand continuous spatiotemporal variation of river flow and water quality characteristics, a set of nonparametric cubic splines (Stasinopoulos and Rigby 2007) was fitted to the observed data and represented in Fig. 4. The exact locations of six-point pollution sources were determined during the field visits, and the BOD loadings of these point sources were simulated using hydraulic and water quality data for the downstream river water (Table 3). The diffuse source loadings (nitrate and phosphate) were also estimated from data collected at the 16 sampling locations (Table 4). The procedures followed to estimate the discharge from the pollution sources and to simulate the characteristics of the effluent are discussed in Appendix 1.

Fig. 4
figure 4

Spatiotemporal variation of flow and water quality characteristics using nonparametric cubic splines: a discharge, b depth, c velocity, d BOD, e nitrate, and f phosphate

Table 3 Seasonal pollution loading of point sources
Table 4 Seasonal pollution loading of diffuse sources

Design of water quality sampling locations for monsoon and non-monsoon periods

As mentioned in the “Introduction” section, the proposed approach conceptualizes water quality monitoring as a two-stage process. In the first stage, all 16 sampling locations were identified as potential sampling sites, based on the guidelines for water quality monitoring (CPCB 2007), the locations of sub-watershed outlets, and the relative locations of point pollution sources. After completion of 1 year of comprehensive water quality monitoring, a modified Sanders approach was used to select the optimal locations and number of sampling sites, taking into consideration the seasonal variation in the point and diffuse source pollution loadings.

Determination of optimum locations for sampling

The Sanders approach (Sanders et al. 1983) is based on Sharp’s (1971) sampling method. In Sharp’s method, a river network is divided into a number of interior and exterior links. An exterior link is mainly tributary, has a minimum mean discharge, and is not fed by other defined streams. An exterior link or tributary is also called a first-order tributary. An interior link is not a tributary; it is formed by the intersection of two exterior tributaries and is called a second-order tributary (Sanders et al. 1983). Each exterior tributary or link that contributes to the main stretch of a river is assigned a magnitude or weight of one. The magnitude or weight of an interior link is equal to the sum of the magnitudes of intersecting exterior links. The magnitude at the mouth of a river watershed is equal to the number of contributing exterior tributaries. After numbering the entire river network, the optimal sampling sites are selected based on the centroids of the river network.

The centroid of river network

According to Sharp’s approach, a river network is analogous to a binary tree structure in graph theory. The tree is a vital nonlinear structure used in computer programming (Knuth 1968). “Tree structure means the branching relations between nodes (data point) much as that found in the trees of nature” (Knuth 1968). The start (origin) and end points of links (i.e., points of intersection of links and the terminating points of links) are termed vertices, and links joining vertices are the edges of a tree. The centroid is the vertex with the minimum weight (i.e., the vertex for which leading sub-trees have the minimum number of vertices). In other words, the centroid of a tree is the vertex that has approximately an equal number of vertices on its upstream and downstream ends. The tree structure and its centroid are as shown in Fig. 5a. For example, the hypothetical tree shown in Fig. 5a, conceptualized from Knuth (1968)), has 10 vertices (A to K), t denotes the number of leading sub-trees for a given vertex, and a 1, a 2, …, a t are the number of vertices in the respective sub-trees. The weight of each vertex is estimated. The vertex D has the minimum weight, 3. Hence, vertex D is the centroid of the given hypothetical tree. In 1971, Sharp used the concept of a centroid (Knuth 1968) to divide a river network into approximately equal halves. He proposed the mathematical expression for determining the centroidal link, i.e., the link that divides the network into two equal halves. In nature, it has been observed that in general, each river tributary has two sub-braches. Hence, the bifurcation ratio for a river system is considered to be 2. Therefore, a river system can be considered analogous to a binary tree structure. The Sanders approach is actually a modification of Sharp’s method, in which tributaries are replaced by the number of outfalls and pollution loadings (Sanders et al. 1983).

Fig. 5
figure 5

a Centroid of tree. b Modified Sander’s approach for a hypothetical river tree

Modified Sanders approach

In the present analysis, a modified Sanders approach, derived by Do et al. (2011), was used to determine the optimum locations of the sampling sites. The potential sampling locations are the nodes (data points)/vertices of a river tree, as shown in Fig. 5b. The magnitude of each node is equal to the pollution loading at the respective sampling point. The mouth of the watershed is the root of a tree, and its magnitude is equal to the summation of the pollution loads at each node/vertex within the watershed. The first centroid, i.e., the first-order station (first hierarchy), is the node/sampling point with a magnitude closest to M h (Eq. 1):

$$ {M}_h\kern0.5em =\kern0.5em \left[\frac{N_{\mathrm{o}}\kern0.5em +\kern0.5em 1}{2}\right] $$
(1)

where M h is the magnitude of the node/sampling location at the hth hierarchy, and N o is the total pollution load at the mouth of river basin. The first sampling location is placed at the first centroid, identified using Eq. 1, which indicates the main branch of the tree [the link joining the centroidal node (data point) to the root of the tree]. The first centroid is the unique node with a magnitude closest to half of the whole river network’s magnitude. Hence, the first centroid is at the first hierarchy level or the principal branch of a tree and deserves the highest priority in sampling. The successive centroids at the different hierarchy levels are estimated using Eq. 2. For example, there are two centroids at the second hierarchy level or first-order sub-branch of tree, with the second highest priority in sampling.

$$ {M}_{h+1}\kern0.5em =\kern0.5em \left[\frac{M_h\kern0.5em +\kern0.5em 1}{2}\right] $$
(2)

where M h + 1 is the magnitude of the node/sampling location at the (h + 1)th hierarchy.The river system is a binary tree structure; hence, the maximum number of centroids used to determine sampling locations at various hierarchy levels is estimated using Eq. 3:

$$ n\kern0.5em =\kern0.5em {2}^h\kern0.5em -\kern0.5em 1 $$
(3)

where n indicates the number of sampling stations at the hth hierarchy level.

Determination of optimum number of sampling sites

After delineation of the sampling locations, the optimum number of sampling sites is estimated. The availability of funding is the governing factor in the selection of the optimum number of sampling locations. If the budget is known, then the optimum number of sampling locations is determined by the total available funding divided by the operating cost of a single monitoring site. If the budget is unknown, then M h values are first calculated as a function of the pollution loading (Eqs. 1 and 2). Do et al. (2011) proposed that the sampling should be stopped at M h (the hth hierarchy), which is much smaller than M 1 (the first centroid), and that the optimum number of sampling locations correspond to the hierarchy level M h (Eq. 3). Another approach to selection of the optimum number of sampling sites is to consider the effluent disposal standards of wastewater. According to this approach, the number of sites corresponding to the sampling locations violating the effluent disposal standards would be the optimum number of sampling sites.

Results and discussion

The proposed approach was applied to Kali River basin, located in western Uttar Pradesh, India. Two distinct networks of sampling locations were designed by accounting the seasonal variation of both point and diffuse pollution loads, i.e., for the monsoon and non-monsoon seasons. The design process involves selection of the optimum number and the locations of sampling points. The design of the sampling locations for the monsoon period loading is termed the pre-monsoon design, while that for the non-monsoon period loading, the design is termed the post-monsoon design.

Design of sampling locations based on point pollution load

The major point sources of pollution in the study area are six open drains carrying municipal and industrial wastewater. The sampling locations downstream of these sources are considered potential sampling sites, i.e., P5, P7, P9, P11, P13, and P15, as shown in Table 2. These six sampling sites were analyzed for the design of the water quality sampling locations. The objective of the design process was to select the optimum number and locations of sampling points from the six potential sampling sites.

First, the optimum number of sampling locations was selected. For the present study, the budget was unknown, and effluent disposal standards were not defined. Hence, the optimum number of sampling locations was determined on the basis of pollution loadings. The number of sampling locations for each hierarchy level was estimated using Eq. 3. For the point pollution sources, the analysis was carried out up to the sixth hierarchy level because there were six potential sampling locations. M h values (Eqs. 1 and 2) were calculated for h = 1 to 6 for both the monsoon and non-monsoon seasons. The estimated M h values and the number of sampling locations are shown in Table 5. Do et al. (2011) proposed that the threshold value for selection of the optimum hierarchy level, M h , should be around one fifth of the first hierarchy, M 1 (i.e., M h  = 0.2 M 1). Hence, in the present analysis, hierarchy level h 3 was selected as the design hierarchy level for both seasons. The significant locations and number of sampling sites were selected based on BOD loadings at the six potential sampling sites, and the M h values were estimated up to the third hierarchy level. The significant sampling locations at hierarchy h are the locations with the smallest differences between the estimated M h values and the measured BOD loadings. The optimum number and locations of sampling sites and their priority for both seasons are shown in Table 6.

Table 5 Estimation of M h values
Table 6 Design of sampling locations for point source of pollution based on BOD loading

Monsoonal design of sampling locations based on point pollution loads

The monsoonal design of the sampling locations is illustrated in Fig. 6a. Four significant sampling locations in the monsoon period are P7, P11, P13, and P15. Here, the term “significant” refers to the sampling locations at hierarchy h with the smallest differences between the estimated M h values and the measured BOD loadings. The sampling location P7 is at the first hierarchy level, P13 is at the second, and P11 and P15 are at the third. The significant sampling locations are selected based on the pollution loadings. In the present study, all point pollution sources are of municipal origin; hence, their pollution potential is measured in terms of BOD (organic) loadings. These sources are in the form of open drains, the hydraulic behavior of which is different from that of closed conduits. The open drains carry both surface runoff and wastewater. Hence, the pollution load is considerably affected by dilution. Untreated wastewater is directly discharged into the river by the point sources of pollution, and as a result, the pollution load is quite high. The dilution effect is significant during the monsoon season, which results in higher discharge and lower BOD concentration at the point sources, as shown in Table 3.

Fig. 6
figure 6

Effect of seasonal variation on design of sampling locations. a Monsoon season. b Non-monsoon season

The point source of pollution just upstream of P7 consists of both municipal and industrial wastewater discharges, and the sub-watershed of P7 has a settlement area of 42.67 km2 (19.50 % of the total sub-watershed area; Fig. 3). The BOD loadings from both the point source and overland flow from the settlement area flow through the open drain, and therefore, P7 has a significant BOD concentration of 1,124.94 mg/l (Table 3). The runoff during the monsoon season through the open drain confluence at P7 is 3.13 m3/s because of the large area of the contributing sub-watershed, 218.81 km2 (Fig. 3), which results in a considerable BOD discharge of 303,830.65 kg/day (Table 3), so P7 is placed at the first hierarchy level (Fig. 6a and Table 6). Similarly, the point source upstream of P13 carries both industrial and municipal wastewater, with the effluent load from industrial wastewater predominating. The sub-watershed of P13 has a settlement area of 7.99 km2 (22.21 % of the total sub-watershed area; Fig. 3). Therefore, the effluent discharge just upstream of P13 has a considerable BOD concentration of 1,378.24 mg/l (Table 3). The contributing sub-watershed area of P13 is 35.96 km2 (Fig. 3), which is less than the area of the sub-watershed of P7. Consequently, the flow through the open drain at P13 during the monsoon season is 2.17 m3/s (Table 3), which is less than the flow at P7. This results in a lower BOD load of 258,498.85 kg/day and places P13 at the second hierarchy level (Fig. 6a and Table 6).

The sampling location P11 has the largest contributing sub-watershed area of 261.33 km2, and the runoff and subsequent flow during the monsoon season through the open drain at P11 is found to be highest, with a magnitude of 4.57 m3/s (Table 3). Although the sub-watershed has a larger settlement area of 73.36 km2 (approximately 28.07 % of the total sub-watershed area, as shown in Fig. 3), the effluent BOD loading is approximately 62,713.91 kg/day because of the higher dilution of the polluting loads. As a result, the monitoring station P11 is placed at the third hierarchy level (Fig. 6a). Again, the monitoring station at P15 is placed at the third hierarchy level due to its lower BOD load of 51,556.49 kg/day (Tables 3 and 6).

The rest of the monitoring stations that receive BOD from point sources, i.e., P5 and P9, have lower levels of BOD load (approximately 42,296.34 and 25,968.48 kg/day for P5 and P9, respectively) and do not fulfill the optimality criterion for being in the motoring network during the monsoon season. These two stations are found to be redundant and may be eliminated from the network for monitoring during the monsoon season.

Non-monsoonal design of sampling locations based on point pollution loads

The design of the sampling locations for the non-monsoon season is illustrated in Fig. 6b. Three significant sampling locations during the non-monsoon season are P13, P7, and P15, which are at the first, second, and third hierarchy levels, respectively. The significant sampling locations are selected based on BOD loadings. During the non-monsoon season, the dilution factor (the effect of surface runoff) is insignificant, and the pollution loading depends upon the discharge and BOD concentration of each point source of pollution. The point source of pollution just upstream of P7 carries municipal and industrial wastewater. Due to the combined effect of the municipal and industrial wastewater, the effluent BOD load (206,290.02 kg/day) is significant, so P7 is placed at the second hierarchy level (Table 6). The point source just upstream of P13 also carries municipal wastewater and a significant amount of industrial wastewater. The resulting effluent BOD load is 302,959.78 kg/day which is the highest BOD discharge in the river watershed, so P13 is placed at the first hierarchy level (Table 6).

The point source upstream of sampling location P15 carries municipal wastewater and has a considerable effluent BOD load of 58,488.51 kg/day (Table 3). Hence, P15 is placed at the third hierarchy (Table 6 and Fig. 6b). The point sources upstream of P5, P9, and P11 carry municipal wastewater and make up approximately 40 % of the unsewered settlement area (field study) of their respective sub-watersheds. The point source effluent BOD loadings upstream of P5, P9, and P11 are 37,846.53, 30,095.90, and 39,666.18 kg/day, respectively (Table 3), which are insignificant in comparison to the loadings derived at hierarchy level M 1 (Table 5). Thus, sampling locations P5, P9, and P11 failed to meet the optimality criterion and are removed from the design of sampling locations for the non-monsoonal season (Fig. 6b).

Effect of seasonal variation on the design of sampling locations for point sources of pollution

The BOD loads of point sources were taken into account in the design of the sampling locations. As mentioned in “Monsoonal design of sampling locations based on point pollution loads” section, the point sources are actually in the form of open drains. Therefore, surface runoff during the monsoon season has a significant impact on the effluent BOD load of the point pollution sources. The BOD loads of the point sources depend on the surface runoff and BOD concentration of the effluent, and the runoff is directly proportional to the area of the sub-watershed. Therefore, the seasonal effect brings two major types of change to a network: changes in the priority order of the sampling locations and a reduction in the number of sampling locations.

During the monsoon season, sampling location P7 is placed at the first hierarchy level and P13 is placed at the second hierarchy level, while during the non-monsoon season, P13 is placed at the first hierarchy level and P7 is placed at the second hierarchy level (Fig. 6). The sub-watershed area contributing to sampling location P7 (218.81 km2) is larger than that of P13 (35.96 km2). Therefore, the resulting discharge of the point source just upstream of P7 (3.13 m3/s) is greater than that of P13 (2.17 m3/s; Table 3). Hence, during the monsoon season, the effluent BOD loading upstream of P7 (303,830.65 kg/day) is greater than that upstream of P13 (258,498.85), and P7 is placed at the first hierarchy level, while P13 is placed at the second hierarchy level (Table 6).

As mentioned in “Non-monsoonal design of sampling locations based on point pollution loads” section, during the non-monsoon season, the discharge and effluent concentrations of individual point sources are significant, in contrast to surface runoff during the monsoon season. The point sources just upstream of P7 and P13 carry both industrial and municipal wastewater, and both of them discharge considerable BOD loads into the river. However, the effluent load from industrial wastewater is very predominant at P13. Therefore, during the non-monsoon season, the point source upstream of P13 has a higher BOD load (302,959.78 kg/day) than that upstream of P7 (206,290.02 kg/day; Table 3). Hence, P13 is positioned at the first hierarchy level, while P7 is positioned at the second hierarchy level.

The second major impact of seasonal variation is a reduction in the number of sampling sites. The sampling location P11 is significant during the monsoon season but insignificant during the non-monsoon season. During the monsoon season, the pollution loading is at a maximum due to the effect of surface runoff. During the non-monsoon season, the discharge and effluent characteristics of the point pollution sources determine the pollution loadings. The sub-watershed area of P11 is the largest (261.33 km2; Fig. 3), which results in the maximum discharge (4.57 m3/s; Table 3) by a point source upstream of P11. Therefore, P11 is at the third hierarchy level during monsoon season and is insignificant during the non-monsoon season. The optimum number of sampling locations is reduced from four to three. Hence, in the present analysis, a new and cost-effective network design approach for selection of sampling locations is proposed. The effect of seasonal variation was taken into account and resulted in a reduction in the number of sampling locations during the non-monsoon season, which improves the cost-effectiveness of the design process.

The present analysis shows the cause-and-effect relationship between the results obtained and anthropogenic activities in the study area. The pollution loading is the optimality criterion, so the first priority of sampling is to select the sampling locations downstream of major pollution sources. The effect of surface runoff was successfully accounted in the design of the sampling locations. Municipal and industrial wastewater is discharged into the river without treatment. Rapid industrialization and urbanization in the study area have had an adverse impact on the river ecosystem and have resulted in considerable degradation of the river water quality. Therefore, the results obtained reflect the actual site condition, which makes the present analysis more realistic.

Design of sampling locations based on diffuse pollution loads

The study area consists of 16 sub-watersheds that are considered principal diffuse sources of pollution. The outlets of these 16 sub-watersheds are the potential sampling sites, i.e., P1 to P16. These 16 sampling sites were analyzed to select the optimum number and locations of water quality sampling locations.

First, the optimum number of sampling locations was selected by following the procedure described in “Design of sampling locations based on point pollution load” section. M h values were calculated using Eqs. 1 and 2 for h = 1 to 16 and for both the monsoon and non-monsoon seasons. For diffuse pollution sources, the analysis was carried out up to the 16th hierarchy level because the potential sampling locations are 16. The estimated M h values and the number of sampling locations are shown in Table 5. According to the optimality criterion proposed by Do et al. (2011), the design hierarchy level is h 3. The optimum number and locations of sampling sites and their priority order for both seasons are shown in Table 7.

Table 7 Design of sampling locations for diffused source of pollution based on Nitrate and Phosphate loading

Monsoonal design of sampling locations based on diffuse pollution load

The design of the sampling locations for the monsoon season is illustrated in Fig. 6a. The significant sampling locations for diffuse pollution loadings during the monsoon period are P1, P7, P8, P9, P11, P15, and P16. The sampling location P1 is at the first hierarchy level, P7 and P11 are at the second, and P8, P9, P15, and P16 are at the third (Fig. 6a). The significant sampling locations are selected based on nutrient pollution loadings, given that 80.11 % of the watershed area is used for agricultural purposes (Fig. 3). Nitrate and phosphate are the principal nutrients required for growth of plants and are the key ingredients in fertilizers. These fertilizers are used excessively to increase crop yields, resulting in nutrient-rich agricultural runoff. Hence, during the monsoon season, huge amounts of nutrients are carried out to the river by agricultural runoff. The pollution potential of diffuse sources is measured in terms of nitrate and phosphate (nutrient) loadings. The surface runoff is directly proportional to the watershed area. The nutrient concentration of the surface runoff is affected by the use of the watershed area for agricultural activities. Therefore, during the monsoon season, the surface runoff and agricultural area of a given sub-watershed are the key factors governing the nutrient loading and priority order of sampling locations. The presence of a point pollution source in a sub-watershed also has a significant impact on nutrient loadings and the design of sampling locations.

The area of the sub-watershed contributing to sampling location P1 is 239.59 km2, of which 214.83 km2 (i.e., 89.67 %) is used for farming (Fig. 3). The nutrient pollution load at sampling location P1 is 332.42 kg of nitrate/day and 164.04 kg of phosphate/day (Table 4). Agricultural land use of the area of the sub-watershed contributing to P1 is significant. Hence, during the monsoon season, sampling location P1 discharges the maximum nutrient load in the basin and is at the first hierarchy level for both types of nutrient loadings (Fig. 6a and Table 7).

The sub-watershed area contributing to sampling location P7 is 218.81 km2, of which 171.59 km2 (78.42 %) is used for farming (Fig. 3). The point pollution source just upstream of P7 discharges both municipal and industrial wastewater. Thus, due to the combined effect of point and diffuse pollution sources, the nutrient load at sampling location P7 is 277.97 kg of nitrate/day and 139.31 kg of phosphate/day (Table 4). Therefore, sampling location P7 is significant and is located at the second hierarchy level for both nutrients during the monsoon season (Fig. 6a and Table 7). Similarly, sampling location P11 has a considerable agricultural area, i.e., 182.26 km2 (69.74 %) in a sub-watershed of 261.33 km2 (Fig. 3). The point pollution source just upstream of P11 carries municipal wastewater, which results in a combined effect of point and diffuse pollution sources. The nutrient pollution loads at P11 are 308.84 kg of nitrate/day and 157.03 kg of phosphate/day (Table 4). Hence, P11 is placed at the second hierarchy level for both nutrient loadings during the monsoon period (Fig. 6a and Table 7).

The sampling locations P8, P9, P15, and P16 are placed at the third hierarchy level (Fig. 6a and Table 7) as they discharge significant nutrient pollution loads for both nutrients. The sub-watershed area of sampling location P8 is 160.65 km2, of which 141.36 km2 (87.99 %) is used for farming (Fig. 3). The nutrient loads at P8 are 219.69 kg of nitrate/day and 108.58 kg of phosphate/day (Table 4). The sub-watershed of P9 has an area of 173.11 km2, of which 129.26 km2 (74.67 %) is used for farming (Fig. 3). The point pollution source carrying municipal wastewater discharges just upstream of P9. Hence, a combined effect of both point and diffuse sources occurs at P9, and the resulting nutrient loads are 213.12 kg of nitrate/day and 107.43 kg of phosphate/day (Table 4). The sub-watershed area contributing to sampling location P15 is 109.54 km2, of which 75.81 km2 (i.e., 69.21 %) is used for farming (Fig. 3). The point source upstream of P15 discharges municipal wastewater. Hence, P15 also has the joint effect of both point and diffuse pollution loads, and it is located at the third hierarchy level (Table 7). The nutrient loads at P15 are 128.68 kg of nitrate/day and 65.46 kg of phosphate/day (Table 4). The sampling location P16 is placed at the third hierarchy level due to the considerable nutrient loads at that point, i.e., 129.28 kg of nitrate/day and 64.08 kg of phosphate/day, during the monsoon season (Table 7). The sub-watershed area contributing to sampling location P16 is 96.11 km2, of which 82.49 km2 (85.83 %) is used for agricultural purpose (Fig. 3). Therefore, sampling location P16 is significant because of the nutrient-rich agricultural runoff from its contributing sub-watershed. Sampling locations P2, P3, P4, P5, P6, P10, P12, and P13 discharge nutrient loads that are insignificant compared to M 1 (Tables 5 and 7). These sampling locations do not meet the optimality criterion defined in “Design of sampling locations based on diffuse pollution load” section. Hence, monitoring at these locations is not necessary during the monsoon season.

Non-monsoonal design of sampling locations based on diffuse pollution load

The design of sampling locations for the non-monsoon season is illustrated in Fig. 6b. The significant sampling locations during the non-monsoon period are P1, P7, P8, P9, P11, and P15. The sampling location P11 is at the first hierarchy level, P1 and P7 are at the second hierarchy level, and P8, P9, and P15 are at the third hierarchy level for both nitrate and phosphate loadings. The significant sampling locations are selected based on the nutrient (i.e., nitrate and phosphate) loadings. In the non-monsoon season, surface runoff is insignificant, and the nutrients are mainly transported to the river by sub-surface flow. The point sources are also key sources of nutrient pollution. The sub-watershed contributing to P11 has an agricultural area of 182.26 km2 (69.74 % of the total; Fig. 3), and the point pollution source carries municipal wastewater. Therefore, considerable amounts of nutrients, i.e., 91.00 kg of nitrate/day and 54.34 kg of phosphate/day, are discharged at P11, and it is placed at the first hierarchy level (Fig. 6b and Table 7).

The sub-watershed contributing to sampling location P1 has an agricultural area of 214.83 km2 (Fig. 3). The nutrient loads at P1 during the non-monsoon season are 72.41 kg of nitrate/day and 39.14 kg of phosphate/day (Table 4). The significant amounts of nutrients are carried out to P1 by sub-surface flow, and therefore, it is located at the second hierarchy level during the non-monsoon season (Fig. 6b and Table 7).

The sub-watershed contributing to sampling location P7 has an agricultural area of 171.59 km2 (Fig. 3). The point pollution source just upstream of P7 discharges both municipal and industrial wastewater. Thus, due to the joint effect of point source pollution and nutrients transported by sub-surface flow, the nutrient loadings at sampling location P7 are 71.61 kg of nitrates/day and 41.10 kg of phosphate/day (Table 4). Therefore, sampling location P7 is significant and is placed at the second hierarchy level for both nutrients during the non-monsoon season (Fig. 6b and Table 7).

Sampling locations P8, P9, and P15 are at the third hierarchy level (Fig. 6b and Table 7) as they discharge significant amounts of both nutrients. The sub-watershed contributing to sampling location P8 has an agricultural area of 141.36 km2 (Fig. 3). Hence, the sub-surface flow carries considerable amounts of nutrients during the non-monsoon period. The nutrient loads at P8 are 48.72 kg of nitrate/day and 26.52 kg of phosphate/day (Table 4). The sub-watershed of P9 has an agricultural area of 129.26 km2 (Fig. 3). The point pollution source discharges municipal wastewater just upstream of P9. Hence, a combined effect of both point source pollution and nutrient transport by sub-surface flow occurs at P9, and the resulting nutrient loads are 58.04 kg of nitrate/day and 33.69 kg of phosphate/day (Table 4). The sub-watershed area contributing to sampling locations P15 has an agricultural area of 75.81 km2 (Fig. 3). The point source just upstream of P15 discharges municipal wastewater. Hence, a considerable nutrient load occurs at P15 due to the combined effect of nutrient transport by the point source and sub-surface flow. The nutrient loads at P15 are 38.11 kg of nitrate/day and 22.78 kg of phosphate/day, so P15 is located at the third hierarchy level (Table 7). The sampling locations P2, P3, P4, P5, P6, P10, P12, P13, and P16 discharge the nutrient loads, which are insignificant compared to M 1. Hence, these sampling locations do not meet the optimality criterion and do not need to be monitored during the non-monsoonal season (Fig. 6b).

Effect of seasonal variation on design of sampling locations for diffuse sources of pollution

The nutrient load (i.e., nitrate and phosphate) of diffuse pollution sources was accounted for in the design of the sampling locations. The nutrient-rich agricultural runoff and point sources carrying municipal and industrial wastewater are the major sources of river pollution due to nutrients. Hence, as mentioned in “Monsoonal design of sampling locations based on diffuse pollution loads” section, in the monsoon season, the nutrients are transported to the river by nutrient-rich surface runoff from agricultural areas. In the non-monsoon season, the surface runoff is insignificant as there is no rainfall. However, some amount of nutrients goes underground through deep percolation and is finally transported to the river by sub-surface flow. The point sources play an important role in nutrient transport in both seasons. The seasonal effect brings two major changes to the network: a change in the priority ordering of the sampling locations and a reduction in the number of sampling locations.

In the monsoon season, sampling location P1 is at the first hierarchy level and P11 is at the second hierarchy level, while in the non-monsoon season, P11 is at the first hierarchy level and P1 is at the second hierarchy level (Fig. 6 and Table 7). The sub-watershed contributing to sampling location P1 has an agricultural area of 214.83 km2, which is greater than the agricultural area of the sub-watershed contributing to P11, i.e., 182.26 km2 (Fig. 3). In the monsoon season, the resulting surface runoff at P1 has greater nutrient potential than that at P11. Therefore, in the monsoon season, the nutrient loads at P1 (332.42 kg of nitrate/day and 164.04 kg of phosphate/day) are higher than those at P11 (308.84 kg of nitrate/day and 157.03 kg of phosphate/day), and P1 is placed at the first hierarchy level, while P11 is placed at the second hierarchy level (Fig. 6 and Table 7).

As mentioned in “Non-monsoonal design of sampling locations based on diffuse pollution loads” section, in the non-monsoon season, nutrients are transported by sub-surface flow instead of surface runoff. Sampling stations P1 and P11 receive nutrient loads through sub-surface flow. However, the point pollution source just upstream of P11 discharges municipal wastewater. Hence, sampling location P11 is affected by both point and diffuse pollution sources and has higher nutrient loadings (91.00 kg of nitrate/day and 54.34 kg of phosphate/day) than P1 (72.41 kg of nitrate/day and 39.14 kg of phosphate/day; Table 4). Hence, in the non-monsoon season, P11 is placed at the first hierarchy level, while P1 is placed at the second hierarchy (Fig. 6a and Table 7).

The second major impact of seasonal variation is reduction in the number of sampling sites. Sampling location P16 is significant in the monsoon season but insignificant in the non-monsoon season. In the monsoon season, the nutrient loading is highest due to the effect of surface runoff, while in non-monsoon season, the surface flow is insignificant and the nutrients are transported by sub-surface flow. The nutrient loads at P16 are significant (129.28 kg of nitrate/day and 64.08 kg of phosphate/day) in monsoon due to effect of surface runoff, while in the non-monsoon season, the nutrient loads are insignificant (29.60 kg of nitrate/day and 16.32 kg of phosphate/day) in comparison to M 1 (Tables 5 and 7). Hence, P16 is located at the third hierarchy level during the monsoon season, and it is insignificant during the non-monsoon season. The optimum number of sampling locations is reduced from seven to six. Hence, the new network design approach for sampling locations proposed in the present study improves the cost-effectiveness of the design by taking the effect of seasonal variation into account.

As mentioned in “Effect of seasonal variation on the design of sampling locations for point sources of pollution” section, the present analysis shows a cause-and-effect relationship between the results obtained and the anthropogenic activities in the study area. The pollution loading is the optimality criterion, so the sampling locations discharging considerable nutrient loads have the highest priority for sampling. The excessive use of chemical fertilizers, improper agricultural practices, and untreated municipal and industrial wastewater disposal into the river have adverse effects on the river ecosystem and result in continuous degradation of river water quality. The effect of surface runoff is taken into account in the design of the sampling locations. Therefore, the results obtained reflect the actual site condition, which makes the present analysis more realistic.

Conclusions

The present study investigates the effect of seasonal variations in point and diffuse pollution loadings on river water quality and quantifies their impact on the optimal design of sampling locations. The approach is obviously cost-effective, as it suggests the optimal number of monitoring stations for the pre-monsoon and post-monsoon seasons. An effect-based monitoring network is delineated directly through estimation of pollution loadings, and the optimal sampling locations are selected based on the optimal hierarchy level of a binary tree structure from graph theory. The study shows that the number and locations (hierarchy/priority) of sampling sites may vary seasonally, which justifies pre- and post-monsoonal change or redesign of the monitoring stations. As discussed in the “Introduction” section, numerous frameworks/guidelines proposed by different countries and mathematical approaches proposed by various researchers have been applied for design of sampling locations. Most of the framework/guidelines are simple, not case study specific, and easy to locate the sampling locations by accounting the effect of pollution sources and accessibility of sampling sites, whereas the mathematical approaches are comparatively complicated, case study specific, and are capable of optimally allocate sampling sites. The present study realizes importance of both the approaches, and a first attempt has been made to combine the existing guidelines and the modified Sanders approach for design of water quality sampling locations. The proposed approach was applied to the 1,475.50-km2 catchment area of the Kali River in northern India, which channelizes a 125-km stretch of polluted river.

In this study, the impacts of agriculture and urban development on river water quality are quantified in a GIS framework. The optimal sampling locations for BOD and nutrient loadings for the monsoon season are four and seven, respectively, while for the non-monsoon season, they are three and six, respectively. Excessive use of fertilizers within the river watershed, 80.11 % of the area of which is used for agricultural purposes, results in massive amount of nutrients being discharged into the river through surface and sub-surface flows. Untreated domestic and industrial wastewater is discharged into the river through open drains. There is a need for wastewater treatment before disposal in the river and a need to create awareness among farmers concerning the proper use of fertilizers. The results obtained show that the optimal number of sampling locations is reduced from four to three and from seven to six in the non-monsoon season for BOD and nutrient loadings, respectively. The diffuse pollution load is a function of the surface runoff, which is quite high, especially during the monsoon season. Hence, to capture the effect of surface runoff, more sampling locations are required in the monsoon season than in the non-monsoon season. The estimation of seasonal pollution loads and determination of the optimal number of monitoring stations is definitely cost-effective, as the process of water quality sampling is expensive. This approach might be more effective for a larger river with significant spatiotemporal variability in water quality. The effect of seasonal variability of pollution loads on design of sampling sites is successfully demonstrated in the present study considering both point and diffuse sources of pollution, which is applicable to any part of the world with prevalent monsoon season.

Further, the major shortcoming of the proposed approach is the extensive watershed data requirement, which has been mentioned in the “Introduction” section. The data sets include hydraulic and hydrological characteristics of river basins, land use practices, agricultural cropping patterns, use of fertilizers, and relative locations of pollution sources and monitoring stations. Again, the proposed approach needs expertise in various disciplines, viz. remote sensing and GIS applications, water quality testing, instrumentation, watershed hydrology, etc. Also, implementation of modified Sanders approach for a large river basin with more number of sampling locations might also be too complicated. Hence, a computational tool/software might facilitate a systematic and easier implementation of the proposed framework. The other design components of monitoring network, e.g., sampling frequency and parameters, may further be considered to evaluate the impact of seasonal variability of river water quality.

Appendix 1

Estimation of discharge from pollution sources

Along the entire stretch of the Kali River, a total of six drains empty into the river, carrying municipal and industrial wastewater. These drains are considered major point sources of pollution. The river stretch is divided into 15 reaches (R i ) based on the relative locations of 16 sampling sites (see Table 2 and Fig. 2). These 15 reaches are classified into two categories based on the locations of the point sources of pollution. The first category of the reaches (\( {R}_i^{\tilde{D}} \)), which do not contain point sources of pollution (along the Kali River, a total of nine reaches fall into this category, i.e., R1, R2, R3, R5, R7, R9, R11, R13, and R15). The second category of the reaches (R D i ) contains point sources of pollution (along the Kali River, a total of six reaches fall into this category, i.e., R4, R6, R8, R10, R12, and R14). Therefore, a river reach (R i ) must fall into one of these two categories, \( {R}_i^{\tilde{D}} \) or R D i . The groundwater table in the study area was found during the field study to be quite high (approximately 5.49 to 6.10 m below the ground surface), which indicates the possibility of losses from the river and the addition of discharges to the river as subsurface flow and precipitation. It was also found that the discharges from point sources coming through highly sedimented natural open drains were discontinuous, resulting in difficulty in direct measurement. Apart from this, of the six-point sources constituting natural open drains, four were found to be inaccessible. Therefore, the discharges from the point sources were routed using a mass balance approach. In general, the inflows and outflows of the intermediate river reaches can be estimated using the following equation:

$$ {Q}_j^{P_{i+1}}={Q}_j^{P_i}\kern0.5em \pm \kern0.5em \varDelta {Q}_j^{R_i} $$
(4)

where \( {Q}_j^{P_{i+1}}\kern0.1em \left({\mathrm{m}}^3/\mathrm{s}\right) \) is the river discharge at sampling location P i + 1; i is the sampling location (i = 1 to 15), i.e., downstream of river reach R in month j (where j = 1 to 12, i.e., January to December); \( {Q}_j^{P_i}\left({\mathrm{m}}^3/\mathrm{s}\right) \) is the river discharge at sampling point P i , i.e., upstream of river reach R i in the jth month; \( \varDelta {Q}_j^{R_i}\left({\mathrm{m}}^3/\mathrm{s}\right) \) are the losses (which appear as negative quantities, e.g., evaporation from the river reach) and/or additions (which appear as positive quantities, e.g., discharges from aquifers adjacent to the river) of discharges to the river, estimated using the following generalized equation:

$$ \varDelta {Q}_j^{R_i}=\kern0.34em \varDelta {Q_D}_j^{R_i}\pm \varDelta {q}_j^{R_i}{L}_{R_i} $$
(5)

where \( \varDelta {q}_j^{R_i} \) (m3/s/unit length of river reach) are additions and losses to the river discharge in the form of evaporation, sub-surface flow or seepage from the river, flow from diffuse sources, precipitation, etc.; \( \varDelta {Q_D}_j^{R_i} \) (m3/s) is the addition of discharge by the point pollution source in reach R i ; and \( {L}_{R_i} \) (in m) is the length of reach R i . There may be two situations, depending upon the presence of point pollution sources in a river reach.

Situation 1: for river reaches in the first category (i.e., those with no point sources of pollution, \( \varDelta {Q_D}_j^{R_i} = 0 \)), Eq. 4 becomes

$$ {Q}_j^{P_{i+1}}={Q}_j^{P_i}\kern0.5em \pm \kern0.5em \varDelta {q}_j^{R_i^{\tilde{D}}}{L}_{R_i} $$
(6)

where \( \varDelta {q}_j^{R_i^{\tilde{D}}} \) is the difference between \( {Q}_j^{P_{i+1}} \) and \( {Q}_j^{P_i} \) per unit length of reach R i , which has no point source of pollution (\( {R}_i^{\tilde{D}} \)).

Situation 2: for river reaches in the second category (i.e., those with point sources of pollution), Eq. 4 becomes

$$ {Q}_j^{P_{i+1}}={Q}_j^{P_i}\kern0.5em \pm \kern0.5em \varDelta {q}_j^{R_i^D}{L}_{R_i}+\kern0.5em \varDelta {Q_D}_j^{R_i^D} $$
(7)

where R D i denotes river reach R i with a point source of pollution. The discharge from point sources of pollution (\( \varDelta {Q_D}_j^{R_i^D} \)) is unknown and needs to be estimated. In this study, a stretch of 125 km of the Kali River was considered, and it was assumed that there was no significant variation in the weather and topographical conditions among the river reaches. Hence, the monthly average losses and additions per unit length of river (\( \varDelta {q}_{{\mathrm{avg}}_j} \)) were estimated as the arithmetic means of \( \varDelta {q}_j^{R_i^{\tilde{D}}} \) (Eq. 6) for nine river reaches with no point sources of pollution. The monthly losses and additions (\( \varDelta {q}_j^{R_i^D} \)) in Eq. 7 are replaced by \( \varDelta {q}_{{\mathrm{avg}}_j} \) to estimate the discharge from point sources of pollution. For the river reaches with point sources of pollution (R4, R6, R8, R10, R12, and R14), monthly point source discharge values (\( \varDelta {Q_D}_j^{R_i^D} \)) were estimated by substituting the river discharge value at each ends of each river reach R D i , i.e., \( {Q}_j^{P_{i+1}} \) and \( {Q}_j^{P_i} \), and the estimated value of \( \varDelta {q}_{{\mathrm{avg}}_j} \) in Eq. 7. The monsoonal and non-monsoonal discharges from each point source, i.e., the arithmetic averages of the discharge (\( \varDelta {Q_D}^{R_i^D} \)) values for the monsoon season (when j = 7 to 10) and non-monsoon seasons (when j = 1 to 6 and 11 to 12), respectively, is then calculated from the set of monthly values (\( \varDelta {Q_D}_j^{R_i^D} \)) and are tabulated in Table 3. As mentioned earlier, the sampling sites are located at the outlets of the sub-watersheds; therefore, the difference between the upstream and downstream river discharge measured at each sampling site (Eq. 4) yields the discharge contribution of each sub-watershed and corresponds to the diffuse source of pollution from that sub-watershed. The flows from all point and diffuse sources are routed according to the procedure discussed above and are used to estimate the pollution load through simulation.

Simulation for estimating effluent characteristics

The modified BOD model (Jha et al. 2007) and refined diffuse source model (Jha et al. 2005) were applied to simulate effluent characteristics from point (BOD) and diffuse (nitrate and phosphate) sources. Both the models were successfully calibrated and validated using the same data set used by Jha et al. (2005, 2007), which is the primary reason for choosing these models to simulate the effluent characteristics of the point and diffuse sources. The modified BOD model (Eq. 8) is based on a model for the first-order decay of organic matter proposed by Streeter and Phelps (1925), and the concept of Jolankai (1997) model was used in mass flux calculations, considering the diffuse sources of pollution and benthic oxygen demand. The hydraulic and water quality characteristics of the river water for R D i (R4, R6, R8, R10, R12, and R14) were used as inputs to the modified BOD model to simulate the BOD concentrations of the river water upstream and downstream of each point source. The effluent BOD (Table 3) of the point sources was then estimated by applying a BOD mass balance calculation for the point of mixing of effluent from each point source with river water, using the simulated values of river water BOD (at the point of mixing) and the estimated discharges of the point sources and river water (at the point of mixing) for a given month. The pollution loading was then estimated from the BOD concentration and the discharges of the point sources in each month. The monsoonal and non-monsoonal BOD loadings from each point source (i.e., the arithmetic averages of BOD loadings for the monsoon and non-monsoon seasons, respectively) were then calculated from the set of monthly values and are tabulated in Table 3.

$$ {C}_j={C}_{j_o}{e}^{-\left({K}_1+{K}_2\right)\ t}+\kern0.6em \frac{C_{j_d}\varDelta {q}_j^{R_i}l\left(1-{e}^{-\left({K}_1+{K}_2\right)t}\right)}{\left({K}_1+{K}_2\right)\kern0.5em \left({Q}_j^{P_i}+\kern0.5em \varDelta {q}_j^{R_i}l\right)}\kern0.5em +\kern0.5em \frac{B_j{Q}_j^{P_i}\left(1-{e}^{-\left({K}_1+{K}_2\right)t}\right)}{\left({K}_1+{K}_2\right)\kern0.5em \left({Q}_j^{P_i}+\kern0.5em \varDelta {q}_j^{R_i}l\right)} $$
(8)

where C j (mg/l) is the river water BOD at a distance l (in meters) from the origin of a river reach in a given month j (j = 1 to 12), \( {C}_{j_{\mathrm{o}}} \) is the initial BOD concentration of the river water at the origin of the river reach, K 1 is the reaction rate constant for biochemical decomposition of organic matter (day−1), K 2 is the reaction rate constant for BOD removal by sedimentation (day−1), \( {C}_{j_d} \) is the lateral inflow BOD concentration to the stream (mg/l), \( \varDelta {q}_j^{R_i} \) is the lateral inflow rate (m3/s/unit length of river reach), \( {Q}_j^{P_i} \) is the flow rate at the river reach origin, B j is the benthic oxygen demand (mg/l), and t is the time of travel (day).

The refined diffuse source model (Jha et al. 2005), as expressed in Eq. 9, is used to estimate nitrate and phosphate loadings.

$$ {\mathrm{NL}}_j = \left({Q}_j^{P_{i+1}}\kern0.1em {N}_j^{P_{i+1}}-{Q}_j^{P_i}\kern0.1em {N}_j^{P_i}{e}^{-kt}\right) $$
(9)

where NL j is the diffuse/non-point source load for a given month j; \( {Q}_j^{P_{i+1}} \) (m3/s) and \( {N}_j^{P_{i+1}} \) (mg/l) are the discharge and nutrient concentration, respectively, at the downstream of river reach R i for month j; \( {Q}_j^{P_i} \) (m3/s) and \( {N}_j^{P_i} \) (mg/l) are the discharge and nutrient concentration, respectively, at the upstream of river reach R i for month j; k is a decay constant (day−1); and t is the time of travel (day). The rates of decay for phosphate and nitrate were 0.22 and 0.10, respectively, at 20 °C (Jha et al. 2005). A temperature correction was applied for estimation of the decay constants. The measured discharges and effluent concentrations (nitrates and phosphates) at all river reaches were used as model inputs to estimate the monthly diffuse pollution loading. The seasonal nutrient loading was then estimated from the nutrient concentration and discharge of diffuse sources. The monsoonal and non-monsoonal nutrient loadings from each diffuse source were then calculated from the set of monthly values and are tabulated in Table 4.