Stormwater runoff from urban areas can be highly polluted with various materials, indicating a significant non-point source (NPS) of pollution to receiving water bodies (Brezonik and Stadelmann 2002; Kim et al. 2005). Site and event parameters (total event rainfall, cumulative seasonal rainfall, drainage area, annual average daily traffic and antecedent dry period) were found to have significant influences on urban runoff. Irish et al. (1998) and Brezonik and Stadelmann (2002) determined that loads for each constituent are dependent upon a unique subset of variables. Also, the processes responsible for the generation, accumulation, and wash-off of urban runoff pollutants are constituent-specific. Geographic and physical factors such as the type and intensity of urban land use, degree of imperviousness, tree cover, soil type and slope are also important parameters that impact the quality of urban runoff (Graves et al. 2004; Kayhanian et al. 2007).

Due to the impacts on receiving waters and the expense involved in obtaining monitoring data on nonpoint source pollution data, interest has grown in analyzing existing/measured data to develop estimation models for urban stormwater loads and concentrations (Thomson et al. 1997; Phillips and Thompson 2002). Such models will be very helpful in estimating concentrations for unmonitored watersheds. Consequently, parameters which are relatively easy to monitor can serve as indicators for other constituents reducing labour and time. Hence efforts have been made on similar lines to develop regression models for the few selected sub-watersheds of Patiala city, Punjab. Patiala (29°49′ and 30°47′ north latitude, 75°58′ and 76°54′ east longitude), city of Punjab (Northern India), does not have provisions for the stormwater drainage. As a result even modest rainfall events produce severe flooding in many parts of the city (Arora and Reddy 2012). Avoiding discharge of this polluted stormwater into the river Ghaggar, preventing flooding within the city and maintaining the groundwater table should be considered as utmost important while planning stormwater management system for the city.

This research was conducted to develop estimation models of pollutant concentrations from urban stormwater runoff, as a function of rainfall variables, using multiple regression analysis. The average rainfall and antecedent dry period variability was explored by means of investigating 6 years data gathered by the Indian Meteorological Department. Storm water characteristics were thus predicted for five sub-watersheds of the Patiala city through regression modeling for best, average and worst case conditions. The model developed was validated using Chi square test applied on six additional storm events monitored during the next rainy season.

Materials and Methods

Five urban sub-watersheds were identified within the Patiala city to serve as sample catchments for the present study (Fig. 1). These represent the diversity of urban watershed characteristics and conditions across the city. The catchments differed in the forms of land development activities and housing density. Civil Lines area is predominantly a residential catchment. Manjit Nagar, located near Seona village, is a mix of rural and residential acreage. Preet Nagar contains mixed urban development. Model Town, near Tagore theatre, is a predominately commercial area. Lastly, Bus Stand area is a heavily travelled and polluted catchment.

Fig. 1
figure 1

Map showing the locations of the sub-watersheds studied

The study catchments were chosen within a 5 km radius so as to have uniformity in the geological, topographical, and climatic variables.

For each of the catchment, detailed background information was collected on:

  • Watershed area and land use patterns within the catchments

  • Slope and hydrological path lengths towards the point of flooding

  • Demographic features of the watersheds

  • Socioeconomic and other activities in the watersheds

For information on the watershed characteristics, physical survey of the catchments, interviews with local people, topographical sheets and local zoning maps were used. Google Earth Pro was used for demarcating the catchments and assessing the area contributing to the runoff. The data on the land use types was obtained from Patiala municipal council and Department of Town and Country planning, Punjab and the data on storm/rainfall events have been obtained from Indian Meteorological Department.

For each of the five sample catchments, grab samples of stormwater were collected from the stormwater accumulation ditch, immediately after the rainfall event. This research aims at designing sub-watershed level stormwater treatment systems, involving the collection or storage of stormwater in first place and then subsequently treating it. Hence, grab sampling was preferred over flow weighted measurements. A total of six storm events were sampled during the period ranging from April 2010 to January 2012. Rain events averaging less than 2.5 mm yielded insufficient volume of runoff to sample. A minimum gap of 4 days between any two successive storm events being sampled was maintained as per the guidelines specified in Caltrans Stormwater Monitoring Protocol Guidance Manual (Caltrans 2000). Information on the rainfall event, such as depth of rainfall and number of antecedent dry days (ADD) were recorded for each of the events sampled. The samples were analyzed in the Environmental Laboratories of the Thapar University for the parameters pH, Conductivity, Total Suspended Solids (TSS), Total Dissolved Solids (TDS), Bio-Chemical Oxygen Demand (BOD5) for 5 day at 20 °C, Chemical Oxygen Demand (COD), Total Kjeldhal Nitrogen (TKN), Total Phosphorous (TP), Oil and Grease, Total Coliform count, Fecal Coliform count and Heavy metals: Zn, Cd, Ni, Pb, Fe and Cu. Field and laboratory blanks (ultrapure water) were maintained throughout the sampling, preparation and analytical steps. Techniques given in the APHA (1999) were used for analysis of the samples.

Statistical tools were used for assessing and formulating relationships between stormwater quality characteristics with average rainfall and antecedent dry period. Multiple regression analysis was undertaken keeping pollutant concentration as dependent variable, and ADD and rainfall as two independent variables. Regression models were formulated using software MiniTab Ver16 and regression equations were formulated for all the identified significant parameters for each of the sub-watershed separately.

The general multiple linear regression (MLR) equation used to develop estimation equations for pollutant concentrations is shown as:

$$ {\text{Pollutant Conc}}.{\text{ or Load }} = f\,\left( {{\text{ADD}},{\text{ Rainfall}}} \right) $$
$$ {\text{Pollutant Conc}}.{\text{ or Load }} = \, \alpha \, \pm \, \beta \, \times{{\rm ADD }} \pm \gamma \times {\text{ Rainfall}} $$
(1)

Where, α is an arbitrary constant; β and γ are the dependent parameter constants; and ADD (day), Rainfall (mm) are the input variables.

To confirm the accuracy of the regression equations obtained and assess its statistical power in predicting the concentration of pollutants, Chi square tests were applied to evaluate the goodness of fit at level of significance.

The value of Chi square obtained for a particular parameter i.e. BOD, COD, TSS; etc. is compared with the tabulated value of Chi squared distribution for α = 0.05 significant level at n-1 degrees of freedom i.e. 5 in present study. If the calculated value is less than the tabulated value, the null hypothesis gets accepted which means that the hypothesis stating that the regression equations obtained can predict the concentrations of the pollutants with 95 % accuracy and 5 % standard error holds true and valid.

Models are often used to predict pollutant concentrations. Different models have been used, including regression, stochastic, and deterministic simulations (Irish et al. 1998). The main difference between these models lies in their approach with regard to the mechanisms that are considered for the generation and transport of the pollutants. For demonstrating the appropriateness of the regression model developed, sensitivity analysis was undertaken for an organic parameter (BOD) and one of the metal pollutants (i.e. Zn) at each of the stormwater sites. Model sensitivity was completed by manipulating one of the model input parameters (i.e. ADD) through a range of values (i.e. mean ± 2SD) while maintaining the other value (i.e. rainfall) constant and vice a versa. Sensitivity of the model framework to these varying input coefficients was completed by comparing coefficients of variation (CV) for model output. Model sensitivity to input parameter variance was deemed to be high for changes in CV’s > 50 %, moderate for changes in CV’s between 20 % and 50 %, and low for changes in output CV < 20 %.

Results and Discussion

Comparison of key runoff quality constituents reported in other countries with this study is outlined in Table 1. Most of the studies listed were implemented in urban areas except for France and some sites in USA, which were carried on urban edge areas.

Table 1 Comparison of stormwater pollutant concentrations obtained in this study and previous publications

Inspection of the table shows that at almost all the sites mean pollutant concentrations of the listed constituents, are higher in Patiala than the results of USA (Smullen et al. 1999) and China (Gan et al. 2008) except for the TSS, which is least among the compared areas. Runoff from the catchments in USA is less polluted as compared to the others. The studied areas in Paris and Isfahan were urban watersheds, where the water quality of stormwater runoff was closely related to the layout of land uses, drainage system and environmental background. In general pollutant concentration values of all the parameters except Cu were much higher in Paris than our study’s results (Kafi et al. 2008). Mean values of COD and TSS (168 and 114 mg/L respectively) in present study were much lower than the study results of Iran (561 and 161 mg/L respectively). Whereas, the concentrations of TKN, TP and Zn observed in our study were on higher side than Iran (Taebi and Droste 2004).

Low or negligible concentration of Pb observed in the present study as compared to the studied areas of other countries can be attributed to the fact that the present study has been conducted recently after an appreciable time lag, since the ban of leaded gasoline. The differences among the studies demonstrate the uncertainty and randomness of the stormwater pollution and press on the need for making site specific observations before planning a stormwater management system.

Besides the quality the quantity of stormwater generated from these sub-watersheds was estimated using US-SCS CN Method. The Curve Number (CN) for the catchments was assessed with the help of hydrologic model WIN TR-55 Small Watershed Hydrology software. This was then used for calculating the runoff volume generated for different intensities of rainfall i.e. 12 mm (70 percentile), 21 mm (80 percentile), 34 mm (90 percentile), 64, 122, and 157 mm (Table 2).

Table 2 Volume of runoff for urban sub-watersheds

One can estimate the volume of stormwater to be managed for a particular storm event but it is practically very difficult to analyse every event for knowing the quality. In this context the regression models can help predicting the pollution concentrations, which largely depends on both build-up and wash-off processes. The build-up depends on the ADD, land use, wind speed and traffic. Wash-off is a function of rainfall intensity and other factors (Kim et al. 2005, 2006). Few researchers determined the effect of the antecedent dry period in their study on stormwater discharge (Saget et al. 1996; Lee et al. 2002) and even tried to find a build-up model using ADD (Ball et al. 1998). Keeping this in mind, modeling was done.

Among all the parameters BOD, COD, TSS, Oil and Grease along with heavy metals (Zn, Fe and Cu) showed a strong dependence on ADD and wash-off resulting from the amount of rainfall. Regression equations obtained for these parameters with respective coefficients for each of the sub-watershed has been listed in Table 3.

Table 3 Results of the multiple regression analysis

It was observed that a large variation in the pollutant concentrations existed between different sites that can be attributed to a complex interaction among many factors. ADD, in general, positively influenced the concentrations of the pollutants in the sense of the more dry days, the higher the pollutant concentration. It was also observed that in the smaller catchments such as the Preet Nagar the measured endpoints showed less correlation with the ADD. This is evident from negative correlations in the case of COD and low regression coefficients in case of other parameters for that sub-watershed. In contrast, the Bus Stand sub-watershed has a larger area and traffic density and showed positive correlations between the antecedent dry days and the measured end-points. The degree and the extent to which these two variables impact stormwater pollutant concentrations of different sub-watersheds is difficult to determine on account of the influences of variables such as the rainfall pattern, traffic density, type of pavement, existence of curbs, maintenance practices, land use of the surrounding catchment and its size (FHWA 1996).

The parameters for those the regression showed coefficients of determination (R2) less than 0.5 for almost all the sub-watershed were dropped. R2 values for some of the parameters are greater than 0.8, hence suggesting that the equations and data are well matched. To confirm the accuracy of the regression equations obtained using six storm events and to validate and test the predictive power of the calibrated regression model, Chi square tests were applied on the six additional samples collected during the next rainy season and the goodness of fit was evaluated at the level of significance set at α = 0.05(Table 3). Thus, a total of 12 storm events were monitored 6 were used for model development and remaining 6 for its validation. The Chi square values were compared with the tabulated values at n−1 degrees of freedom i.e. 5 for all the sub-watersheds. For most of the cases, the null hypothesis was valid, as the Chi square values obtained were less than the tabulated value (11.07 at df = 5).

The obtained results indicate that the pollutant load can largely be explained by two major factors namely ADD and rainfall, and appropriate regression equations can be formulated for different sub-watersheds which can aid in predicting the concentrations of the pollutant parameters which fit well within the model. Most of the models commonly use concentrations or loads of pollutants as variables that are dependent upon runoff volume, rainfall intensity, traffic intensity, antecedent dry days, surrounding land use, etc. Generally, it is difficult to consider all factors because many different site-specific conditions exist, such as the presence or absence of street sweeping, soil saturation, wind direction, etc. But the present research wanted to establish and develop the model taking into account only those factors which will be uniform to all and can be applied to sub-watersheds with similar characteristics. Hence the study confined its approach to the major factors ADD and rainfall.

The results of the sensitivity analysis performed have been indicated in Table 4.

Table 4 Results of the sensitivity analysis performed for BOD and Zn

Through this analysis it was observed that for most of the cases the model coefficient ADD generated greater output variability (higher CV) than rainfall and can thus be concluded as the primary model sensitivity parameter. On the other hand rainfall showed moderate to low variability except for the case of BOD in Manjit Nagar and Bus Stand. In these two sub-watersheds rainfall is observed as the primary sensitivity parameter.

Multiple regression models were also used for stormwater quality assessment under three sets of conditions.

  • Worst case conditions for stormwater quality: minimum rainfall (13 mm) and maximum dry period (98 days)

  • Average or moderate conditions: average rainfall (56 mm) and average length of dry period (30 days)

  • Best case conditions for stormwater quality: maximum rainfall (115 mm) and minimum dry days (5 days).

Table 5 enlists the results of various stormwater quality parameters obtained for the set of above conditions.

Table 5 Stormwater quality of the sub-watersheds predicted using regression models

The values have been obtained using the regression equation in cases where the R2 value is greater than 1-sigma value in a normal distribution i.e. 0.68. It is believed that these criteria will yield strong, moderate and weak relationships in terms of R2 values and signify the importance and reliability of the obtained results through regression modeling. In cases where R2 values are less than 0.5 it is assumed that weak or no relations are observed and hence regression equations will provide misleading results. In such cases maximum, minimum and average values are reported from the results i.e. in case of total and fecal coliform count. The assessment of the regression model will help to predict the quality well before actually analyzing the water quality and thus aid in selecting those treatment schemes being most suitable for taking care of the stormwater quality even under worst case conditions.

The generated regression models had better estimation ability than the empirical models reported in previous studies, for most but not all the cases. However, still there is a need to understand the mechanisms of generation and transport of pollution in urban systems during a storm event, as it concerns many media, space and time scales. Nevertheless, the regression models are useful in predicting future trends of loads and concentrations of pollutants in stormwater for sub-watersheds and hence aid in deciding on the most appropriate and cost effective treatment schemes suitable for managing the stormwater.