1 Introduction

Trihalomethanes (THMs), a group of halogenated compounds, are formed by the interaction between the organic substances present in surface waters and the disinfection oxidants like chlorine and chlorine dioxide (Richardson 2002). In general, the THMs are composed of four major compounds viz. chloroform (CHCl3), bromodichloromethane (CHCl2Br), chlorodibromomethane (CHBr2Cl), and bromoform (CHBr3). Disinfection of drinking water is carried out to destroy pathogenic organisms to prevent waterborne diseases, and chlorine is predominantly used as disinfectant (Elshorbagy et al. 2000). However, chlorine interacts with the organic matter, commonly present in surface waters to form the disinfection by-products (DBPs) (Rodriguez and Serodes 2001). Several of the DBPs including the THMs have been classified as probable or possible human carcinogens and regulated by the international regulatory agencies worldwide (Krasner et al. 2001). The US Environmental Protection Agency (USEPA) has set the maximum contaminant level of 80 μg/l for the sum of four THM species in water (USEPA 2003). Since, the organic substances are inherently present in surface waters and disinfection is an essential requirement to make the water potable, the major responsibility is, therefore, to have quality control procedures to prevent formation of the THMs or to minimize their levels to within their recommended safe limits (Platikanov et al. 2007).

Here, this study investigates the THMs formation potential (THMFP) of the Gomti River water, which meets 40–50% of the total domestic water demand (approximately 500 million liters per day) of the Lucknow city, India. The river originates from the foothills of Himalaya and meets the Ganga River after traversing a distance of about 750 km. The river while passing through the Lucknow city caters to its water demand. The raw river water after processing and disinfection by chlorination is distributed through network of pipelines. The halides and the organic matter are detected in the river water throughout the year (Singh et al. 2004), and their presence may lead to a surplus of halogenated compounds during the conventional disinfection process. The THMFP measures the extent of formation of total trihalomethanes (TTHMs) in water spiked with chlorine incubated up to a period of 7 days (168 h). THMFP is a useful conservative indicator parameter for the potential THMs that would develop in the presence of the total available organic precursors (Elshorbagy et al. 2000).

In general, the raw water characteristics (dissolved organic carbon, pH, and bromide concentration) and disinfection conditions (chlorine dose, temperature, reaction time) are considered as variables which to a large extent control the THMs formation in water and for predicting the THMFP of water, their individual and interactive role has to be quantified. Development of reliable models is increasingly recognized as an essential methodological basis for predicting the DBPs formation (Rodriguez et al. 2003). Several empirical models (Elshorbagy et al. 2000; Abdullah et al. 2003; Sadiq and Rodriguez 2004; Golfinopoulos and Arhonditsis 2002; Nikolaou et al. 2004; Milot et al. 2002; Rodriguez and Serodes 2004; Platikanov et al. 2007) have been proposed for DBPs formation in water. However, many of these are site specific and their predictive capabilities in other water sources environment under varying conditions remain inappropriate (Uyak et al. 2007), and most of these studies have been performed under rather limited experimental conditions. Hence, there is still an urgent need to develop reliable and robust models to predict THM formation in surface waters, capable of optimizing the independent variables and conditions for minimization of the THMs in water. These models should be capable of predicting the conditions under which the THMs formation is limited to safe limits.

Optimization of the process variables is essentially required to identify the set of conditions for optimal formation of THMs in chlorinated water. The conventional orthogonal approach for optimization of the process variables requires determination of the dependent variable at each and every combination of the independent variables just varying only one at a time and keeping all other as constant in batch studies, thus requiring a very large number of experiments to be performed, which would be very expensive and time consuming. Moreover, it does not reveal the influence of the interactions between the process variables on the dependent variable (Sahu et al. 2009). Statistical experimental design may reduce the number of experiments as well as provide appropriate model for process optimization, allowing for the evaluation of the influence of inter-variable interactions on the process outcome. Recently, several types of experimental design methods have been employed in multivariable chemical process optimization (Alam et al. 2007). Box–Behnken design (BBD) combined with response surface modeling (RSM) is a useful method for studying the effect of several variables influencing the responses by varying them simultaneously and carrying out a limited number of experiments.

The main objectives of this study were (1) to investigate the individual and combined effects of selected operational variables viz. chlorine dosage/dissolved organic carbon ratio, water pH, bromide concentration, temperature, and the reaction time on formation of THMs in raw water during chlorination and (2) to develop a quadratic model for prediction and optimization of THMs levels in chlorinated water for their effective control. Accordingly, BBD-based chlorination experiments were conducted using raw surface water. Key operational variables relating to water characteristics and chlorination conditions were identified and quadratic model established for the prediction and optimization of THMs formation in water during chlorination.

2 Response surface modeling

RSM is a multivariate technique, which fits mathematically the experimental domain studied in the theoretical design through a response function (Santelli et al. 2006). As an empirical statistical technique, it is devoted to the evaluation of the relationship of a set of controlled experimental factors and observed results (Mustafa 2009). The graphical representation of their functions is referred to as response surfaces, and this approach is used to describe the individual and interactive effects of the process variables and their subsequent effects on the response. The main objective of RSM is to determine the optimum set of operational variables of the process (Myers and Montgomery 2001). The statistical experimental designing of water disinfection process can reduce the process variability, experimentation time, overall cost with improved process output (Annadurai et al. 2003). The RSM approach has widely been applied in chemical engineering and sorption process optimization (Ricou-Hoeffer et al. 2001).

A number of factorial designs are available to estimate the response surface, a useful scheme for the optimization of variables with a limited number of experiments (Azargohar and Dalai 2005). Central composite design (CCD) and the BBD of RSM are suitable for exploring quadratic response surface and constructing second-order polynomial models (Nazzal and Khan 2002). In CCD, for a two-level study, the total number of experiments (N) to be performed is given as \( N = {2^n} + 2n + {n_{\text{c}}} \), where n is the number of independent variables. The axial points (2n) are at equidistance from the design center and ensure that variance of the model prediction remains constant at all the points. The replicates at the center (n c) provide an independent estimate of the experimental error. In CCD, the factorial (2n) levels are coded as ±1, augmented by the 2n axial points (±α, 0, 0, … , 0), (0, ±α, 0, … , 0), (0, 0, … , ±α), and n c, the center points (0, 0, 0, … , 0). The number of center points depends on the number of variables in the design. The value of α is computed as one fourth power to the number of factorial runs (α = (2n)1/4 (Singh et al. 2010a). Each variable is investigated at two levels and as the number of variables (n) increases, the number of experimental runs for a complete replicate of the design increases rapidly. On the other hand, BBD is a spherical, revolving design, consisting of a central point and the middle points of the edges of the circle circumscribed on the sphere (Evans 2003). It is a three-level fractional factorial design consisting of a full 22 factorial seeded into a balanced incomplete block design. It consists of three interlocking 22 fractional design having points, all lying on the surface of a sphere surrounding the center of the design. BBD requires relatively few combinations of variables for determining the complex response function and it does not contain those combinations for which all variables are at their highest or lowest levels simultaneously. So BBD is useful in avoiding experiments performed under extreme conditions, for which unsatisfactory results might occur (Ferreira et al. 2007). The N required in BBD is given as \( N = 2k\left( {k - 1} \right) + {C_0} \), where k is the number of variables and C 0 is the number of central points. For a five-factor BBD, three to six center points are recommended (Esbensen 2005). Thus, for a five factors design, consisting of five central points, a total of 45 experimental runs are required (Ferreira et al. 2007). Here, we used BBD and the coded values of the five factor levels are given in Table 1.

Table 1 Coded factor levels for BBD for optimization involving five variables

The optimization involves estimation of the coefficients in a mathematical model and predicting the response and checking the adequacy of the model. The response model may be expressed as

$$ Y = f\left( {{X_1},{X_2},{X_3}, \ldots .,{X_n}} \right)\pm e $$
(1)

Where, Y is the response, f is the response function, X i are the independent variables, and e is the experimental error. The form of response function, f largely depends on the nature of relationship between the response and the independent variables. A higher-order polynomial, such as quadratic model may be expressed [Singh et al. 2010a] as

$$ Y = {\beta_0} + \mathop{\sum }\limits_{{i = 1}}^n {\beta_{{i }}}{X_{{i }}} + \mathop{\sum }\limits_{{j = 1}}^n {\beta_{{ii}}}X_{{i }}^2 + \mathop{\sum }\limits_{{i = 1}}^{{n - 1}} \mathop{\sum }\limits_{{i = 1}}^n {\beta_{{ij}}}{X_i}{X_j} + e $$
(2)

where Y is the predicted response, β 0 the constant coefficient, β i the linear coefficients, β ii the quadratic coefficients, β ij the interaction coefficients, and X i , X j are the coded values of the independent process variables, and e is the residual error.

The response surface modeling helps to investigate the response over the entire variables space and to identify the region where it reaches its optimum value. A response surface plot can provide information about the combination of process variables, which gives the best response.

3 Experimental

3.1 Water samples

A bulk raw water sample was collected from the water abstraction point on the Gomti river, upstream of the Lucknow city (India). Raw water samples were also collected on bimonthly basis (February, April, June, August, October, and December) from the same site over a period of 1 year (2006) in order to cover all the seasonal variations. The samples were collected from 0.5 m below the water surface in a high-quality pre-cleaned polyethylene container and transported to the laboratory immediately in ice box at low temperature. The raw water samples were filtered through 0.45 μm cellulose acetate filters and analyzed for selected physico-chemical parameters. Filter papers were first pre-washed in filtration apparatus by passing 500 ml of double distilled water (DDW) to prevent possible release of organic materials from the filters during sample filtration. Temperature was measured on the site during the sample collection. The dissolved organic carbon (DOC), bromide, and pH were determined immediately. DOC was determined by a TOC analyzer and bromide (Br) concentration was measured following the phenol red colorimetric method with a quantification limit of 10 μg/l (APHA 1992). All solutions were prepared in double distilled water (DDW).

3.2 Process variables and experimental design

In general, the THMs formation in water during disinfection by chlorination largely depends on the variables, such as chlorine dosage (Cl2), DOC, water pH, Br concentration, temperature (T), and the reaction time (t). Here, we selected five process variables, viz. DOC normalized Cl2 dose (Cl2/DOC), pH, T, Br concentration, and t to investigate their influence on THMs formation in raw surface water.

Accordingly, the number of batch experiments to be performed for optimization of the selected five process variables was determined by the BBD approach as 45 (Ferreira et al. 2007). After defining the range of each of the process variables through pre-trial experimentation, they were coded to lie at ±1 for the factorial points and 0 for the center points. The actual values of the variables were transformed into their respected coded values as (Desai et al. 2008);

$$ {z_i} = \frac{{{X_i} - {X_0}}}{{\Delta {X_i}}} $$
(3)

where, zi is the dimensionless coded value of the ith-independent variable, X i is the uncoded value of the ith-independent variable, X 0 is the uncoded value of the ith-independent variable at the center point, and ΔX i is the step change value. The selected variables with their limits, units and notations are given in Table 2.

Table 2 Process variables, their coded values and limits

3.3 Chlorination experiments

All the chlorination experiments were carried out according to the BBD (Table 2). A set of 45 batch experiments was conducted covering the selected process variables in ranges: free chlorine at a Cl2/DOC ratio, 0.8–5 mg/mg; temperature, 15–40°C; reaction time, 4–168 h; and pH 6–10. A bromide concentration of 10–100 μg/l was maintained through external addition. Minimum and maximum levels of the water quality variables defined here were based on their annual variations in surface water. Temperature, pH, and Br concentration in the surface water range between 17°C and 36°C, 6.5–9.5, and 10–80 μg/l, respectively. Range for the Cl2/DOC ratio was worked out from the actual chlorination doses applied for disinfection and the measured DOC concentrations in the surface water during different seasons. The maximum contact time of 168 h during chlorination has been considered sufficient for completion of the THMs formation reaction (Hong et al. 2007).

The pH of water sample for chlorination was adjusted by appropriate buffer (phosphate and borate). Chlorination experiments were conducted in a series of pre-cleaned 100 ml amber glass bottles with teflon-lined screw caps. The bottles were cleaned through washing with phosphate-free detergent and soaked in 10% sulfuric acid for 24 h, rinsed with deionized and Milli-Q water. Chlorine was spiked in each bottle from a stock solution (NaOCl), which was standardized for available free chlorine by the N,N-diethyl-p-phenylenediamine titrimetric method (APHA 1992). After the chlorination experiments were terminated, residual chlorine concentrations were also measured to ensure that the samples were not chlorine limited. The chlorination reaction at desired durations was stopped through quenching with Na2SO3 prior to analysis for TMHs. Chlorination of the six natural water samples collected on a bimonthly basis was performed with free-chlorine dosing at three different (lowest, mid, and highest) Cl2/DOC levels under the natural water pH, bromide level, temperature conditions, and incubated for 168 h.

For determining the THMs (CHCl3, CHCl2Br, CHBr2Cl, and CHBr3), the chlorinated water samples were extracted with n-pentane and analyzed on GC/ECD system equipped with DB-5 capillary column (30 m × 0.25 mm ID). Calibration standards were prepared using the standards procured from Supelco (USA). The minimum detection limit for each of the four THMs was up to 0.1 μg/l. The spiked water samples with known mixture of THMs standards were also processed following the same protocol to determine the method recoveries for each compound in order to compensate for the extraction efficiency. The TTHMs concentration in the samples was obtained by summing up concentrations of all the four halogenated compounds. All the experiments were conducted in duplicate and mean of the two values reported here for all calculations.

3.4 Response surface modeling

The response variable, Y (TTHMs concentration) can be expressed as Y = f (X Cl2/DOC, X pH, X Br, X T , and X t ), where, X Cl2/DOC, X pH, X Br, X T , and X t are the coded values of the process variables (Cl2/DOC dosage, pH, Br concentration, temperature, and reaction time). The selected relationship being a second degree response surface is expressed as below

$$ \begin{array}{*{20}{c}} {Y = {\beta_0} + {\beta_{{1}}}{X_{{{\text{Cl2}}/{\text{DOC}}}}} + {\beta_{{2}}}X_{{{\text{Cl}}2/{\text{DOC}}}}^2 + {\beta_{{3}}}{X_{\text{pH}}} + {\beta_{{4}}}X_{\text{pH}}^2 + {\beta_{{5}}}{X_{\text{Br}}} + {\beta_{{6}}}X_{\text{Br}}^2 + {\beta_{{7}}}{X_T} + {\beta_{{8}}}} \hfill \\ {X_T^2 + {\beta_{{9}}}{X_t} + {\beta_{{{1}0}}}X_t^2 + {\beta_{{{11}}}}{X_{{{\text{Cl2}}/{\text{DOC}}}}}{X_{\text{pH}}} + {\beta_{{{12}}}}{X_{{{\text{Cl2}}/{\text{DOC}}}}}{X_{\text{Br}}} + {\beta_{{{13}}}}{X_{{{\text{Cl2}}/{\text{DOC}}}}}{X_T} + {\beta_{{{14}}}}} \hfill \\ {{X_{{{\text{Cl2}}/{\text{DOC}}}}}{X_t} + {\beta_{{{15}}}}{X_{\text{pH}}}{X_{\text{Br}}} + {\beta_{{{16}}}}{X_{\text{pH}}}{X_T} + {\beta_{{{17}}}}{X_{\text{pH}}}{X_t} + {\beta_{{{18}}}}{X_{\text{Br}}}{X_T} + {\beta_{{{19}}}}{X_{\text{Br}}}{X_{\text{t}}} + {\beta_{{{2}0}}}{X_T}{X_t}} \hfill \\ \end{array} $$
(4)

The model coefficients (β i ) were estimated and used for predicting the response values for different combinations of the coded values of the variables.

The adequacy of the selected model and statistical significance of the regression coefficients were tested using the analysis of variance (ANOVA) and the student’s t test statistics (Gunaraj and Murugan 1999). The measured and the model-predicted values of the response variable (Y) were used to compute the correlation coefficient (R 2), the root mean square error of prediction (RMSEP), and the relative standard error of prediction (RSEP). The correlation between the measured and predicted values indicates the goodness of fit of the model, whereas, the RMSEP and RSEP values are used to evaluate the predictive ability of the selected model. The RMSEP and RSEP were computed as;

$$ {\text{RMSEP}} = \sqrt {{\frac{{\mathop{\sum }\nolimits_{{i = 1}}^N {{({y_{{{\text{pred}},i}}} - {y_{{{\text{meas}},i}}})}^2}}}{N}}} $$
(5)
$$ {\text{and}}\;{\text{RSEP}} = \sqrt {{\frac{{\mathop{\sum }\nolimits_{{i = 1}}^N {{({y_{{{\text{pred}},i}}} - {y_{{{\text{meas}},i}}})}^2}}}{{\mathop{\sum }\nolimits_{{i = 1}}^N {{({y_{{{\text{meas}},i}}})}^2}}}}} \times 100 $$
(6)

where y pred,i and y meas,i represent the predicted and measured values of the variable, Y and N represents the number of experimental observations. The RMSEP and RSEP are measure of the goodness of fit, best describe an average measure of the error in predicting the dependent variable by the selected model (Singh et al. 2010b). Modeling was performed using the Statistica 8.0.

3.5 Model application for predicting THMFP of the raw surface water

A total of 18 experiments were conducted on six bimonthly water samples collected from the same water abstraction site. The DOC, temperature, pH and Br concentration were taken as measured in these samples and three different chlorine/DOC ratios (lowest, middle, and highest) were selected for chlorination as in case of the bulk water chlorination experiments. Chlorinated samples were incubated for a period of 168 h at the water temperature. Concentrations of all the four THM compounds (CHCl3, CHCl2Br, CHBr2Cl, and CHBr3) were determined in all the chlorinated samples. Thus, a set of total 18 independent and response variables was obtained. The developed and validated five-variable quadratic model was then applied to the data set pertaining to the natural raw water chlorination experiments to predict the TTHMs levels during different months. The measured and predicted TTHMs levels were compared through the correlation between them, and evaluation of the prediction errors, such as RMSEP and RSEP.

3.6 Statistical analysis

The significance of the independent variables and their interactions were tested by means of the ANOVA. Results were assessed with various descriptive statistics such as t, p, F values, degrees of freedom (df), R 2, adjusted correlation coefficient (\( R_{\text{adj}}^2 \)), sum of squares (SS), mean sum of squares (MSS), and chi-square (χ 2) test to reflect the statistical significance of the quadratic model. The quadratic model equation was solved using the Microsoft Excel 97. A non-parametric Mann–Whitney U test and a parametric two-sample (unpaired) t test were conducted to evaluate the relationship between the additional experimental data (test set) and the corresponding model-predicted responses.

3.7 Process optimization modeling

Optimization modeling helps to identify the levels of the independent process variables that generate minimum, maximum or any desired level of the response factor. Hence, optimization in present case of THMs formation would allow selecting the levels of the independent variables which keep the THMs formation and concentration below the prescribed safe limit. With this point in view, optimization modeling was performed using the developed quadratic model. The identified quadratic model was used to optimize the five process variables within their studied experimental ranges for the THMs formation. Optimization was performed in Microsoft Excel 97.

4 Results and discussion

4.1 THMs levels

The TTHMs concentration in chlorinated water varied between 19.79 and 136.96 μg/l. Among the four trihalomethanes studied here, CHCl3 was the most abundantly formed compound in chlorinated water. Its concentration ranged between 13.84 and 74.12 μg/l. CHCl2Br was the second in order and ranged between 4.71 and 62.69 μg/l. The brominated compound CHClBr2, although low in amount, formed in all the chlorinated water samples and its concentration ranged between 1.10 and 8.46 μg/L. CHBr3 was present in only 60% of the chlorinated water samples and its concentration ranged between 1.06 and 5.23 μg/l.

4.2 Modeling

4.2.1 Experimental design and regression model

The individual and interactive effects of the selected variables on the THMs formation potential of the surface water due to chlorination were investigated using the BBD approach. The measured TTHMs levels in water corresponding to different combinations of selected independent variables are presented in Table 3. A quadratic model was selected for developing the mathematical relationship between the response and the process variables, viz. Cl2/DOC ratio, pH, bromide concentration, temperature, and the reaction time. The BBD-based experiments to obtain a quadratic model, here consisted of 2k(k−1) standard factorial runs, and five replicates at the C 0. The maximum concentration of TTHMs was observed to be 136.96 μg/l (Table 3). Polynomial regression modeling was performed between the response variable (TTHMs concentration) and the corresponding coded values (X Cl2/DOC, X pH, X Br, X T , and X t ) of the five different variables (Cl2/DOC, pH, Br, T, and t), and finally, the best fitted model equation was obtained as

$$ \begin{array}{*{20}{c}} {Y = {71}.{439} + {8}.{159}{X_{{{\text{Cl2}}/{\text{DOC}}}}} - {2}.{411}X_{{{\text{Cl}}2/{\text{DOC}}}}^2 + {31}.0{14}{X_{\text{pH}}} + {4}.{921}X_{\text{pH}}^2 + {3}.{747}{X_{\text{Br}}} + {3}.0{61}X_{\text{Br}}^2} \hfill \\ { + {16}.0{86}{X_T} - {2}.{424} + {26}.{467}{X_t} - {12}.{474} + {2}.{775}{X_{{{\text{Cl2}}/{\text{DOC}}}}}{X_{\text{pH}}} - {2}.0{36}{X_{{{\text{Cl2}}/{\text{DOC}}}}}{X_{\text{Br}}} + } \hfill \\ {{1}.{785}{X_{{{\text{Cl2}}/{\text{DOC}}}}}{X_T} + {2}.{537}{X_{{{\text{Cl2}}/{\text{DOC}}}}}{X_t} + {5}.{7}0{4}{X_{\text{pH}}}{X_{\text{Br}}} + {8}.{3}0{9}{X_{\text{pH}}}{X_T} + {14}.{961}{X_{\text{pH}}}{X_t}--} \hfill \\ {0.{7}0{6}.{X_{\text{Br}}}{X_T} + {1}.{717}{{\text{X}}_{\text{Br}}}{X_t} + {6}.{434}{X_T}{X_t}} \hfill \\ \end{array} $$
(7)
Table 3 Box–Behnken design matrix with five independent variables (natural and coded values) and corresponding measured and predicted TTHMs levels

Model Eq. 7 was used to evaluate the dependence of TTHMs formation on the process variables. The ANOVA was performed to test the significance of the model (Singh et al. 2010a). The ANOVA results (Table 4) of the quadratic regression model (Eq. 7) suggest that the quadratic model was highly significant, as evident from the Fisher’s F test (F model = 204.326) with a very low probability value (p model = 0.00000). Furthermore, the calculated F value (F model = 204.326) was compared with the critical F value (F 0.05, df, (n − df + 1)) for the considered probability (p = 0.05) and degrees of freedom. The critical F value (F 0.05, 20, 24 = 2.027) is less than the calculated F value of 204.326. It suggests that the computed Fisher’s variance ratio at this level was large enough to justify a high degree of adequacy of the quadratic model and significance of the variables combinations (Sen and Swaminathan 2004; Liu et al. 2004; Yetilmezsoy et al. 2009).

Table 4 Analysis of variance of the response surface quadratic model for the prediction of TTHMs formation

The goodness of fit of the model was checked by the R 2 between the experimental and predicted values of the response variable (Fig. 1a). A high value of R 2 (0.994) indicated a high dependence and correlation between the measured and the predicted values of response. Moreover, a closely high value of the adjusted correlation coefficient (R a 2 = 0.989) also showed a high significance of the model and that the total variation of about 99% for THMs formation was attributed to the independent variables and only about 1% of the total variation cannot be explained by the model. The \( R_{\text{adj}}^2 \) corrects the R 2 value for the sample size and the number of terms in the model. If there are many terms in the model and the sample size is not very large, the \( R_{\text{adj}}^2 \) may be noticeably smaller than the R 2. In our case the, the values of \( R_{\text{adj}}^2 \) and R 2 were found to be close. A similar pattern has been reported by others for the second-order RSM experiments based on central composite (Liu et al. 2004) and Box–Behnken (Yetilmezsoy et al. 2009) designs. Further, a considerably low RMSEP (2.26) and RSEP (3.04) values suggest for the adequacy of the fitted quadratic model. The χ 2 test was also performed to check, whether there was a significant difference between the model responses and the experimental data. The calculated chi-square value (χ 2cal  = 4.54) was found to be less than the critical value (χ 2crit  = 61.65) suggesting that there was no significant difference between the observed and the expected responses. The χ 2 test concluded with 95% certainty that the quadratic model provided a satisfactory fit to the experimental data (Yetilmezsoy et al. 2009).

Fig. 1
figure 1

Plot of a the measured and model-predicted values of the response variable (TTHMs, μg/l) and b the normal probability of the raw residuals

The normal probability plot of the residuals is an important diagnostic tool to detect and explain the systematic departures from the assumption that errors are normally distributed and is independent of each other, and that the error variance is homogeneous. Information regarding the lack of fit of the selected model is contained in the residuals (Liu et al. 2004). The normal probability of the residuals (Fig. 1b) suggests that almost no serious violation of the assumptions underlying the analyses and it confirmed the normality assumptions and independence of the residuals. Moreover, the comparison of the residuals with the error variance showed that none of the individual residual exceeded the value twice of the square root of the error variance (Sen and Swaminathan 2004).

4.2.2 Effects of model components and their interactions on THMs formation

The significance of the quadratic model coefficients was evaluated by the Student’s t test and p values listed in Table 5. The t value is the ratio of the estimated parameter effect and the estimated parameter standard deviation. The parameter effect is estimated as twice the regression coefficient value for that parameter. The p value is used as a tool to check the significance of the coefficient. The larger the magnitude of the t value and the smaller the p value, the more significant is the corresponding parameter in the regression model (Yetilmezsoy et al. 2007).

Table 5 Multiple regression results and significance of the components for the quadratic model

From Table 5, it is evident that all the linear and quadratic terms are statistically significant (p < 0.05). Moreover, the pH with temperature, Br concentration, and time, and temperature with time exhibited significant interactions. The statistical results (Table 5) further suggested that the water pH, T, and t are among the process variables exhibiting most significant effect on the TMHs formation process. Moreover, the first-order main effects of all the five variables viz. Cl2/DOC ratio (X Cl2/DOC), temperature (X T ), pH (X pH), bromide concentration (X Br), and reaction time (X t ) were found to be relatively more significant than their respective quadratic effects (\( X_{{{\text{Cl2}}/{\text{DOC}}}}^{{2}},\;X_T^{{2}},\;X_{\text{pH}}^{{2}},\;X_{\text{Br}}^{{2}},\;{\text{and}}\;X_t^{{2}} \)). The t and p values (Tables 4 and 5) suggest that the pH, reaction time and temperature have a direct relationship on TMHs formation potential of the raw water during chlorination. It may be noted that the pH was the most significant component of the regression model for the present application. Moreover, all the five variables, quadratic terms of pH (X 2pH ), and bromide concentration (X 2Br ) and the interactive terms other than X Cl2/DOC*X Br and X T *X Br, water pH (X pH), and bromide concentration (X Br) exhibited positive relationship with the chlorination by-products formation process, whereas, the remaining terms showed negative effect on the process.

A wide range of variation in response factor (19.79 to 136.96 μg/l) measured in the batch experiments (Table 3) suggests that the THMs formation process was strongly influenced by the selected process variables. The standardized effects (t ratio) of the independent variables and their interactions on the THMs formation potential of water were investigated by the Pareto graphic analysis. The Pareto chart is shown in Fig. 2. It is evident that the water pH produced the largest effect on THMs formation process, followed by the reaction time, temperature and chlorine dose. A similar pattern is also reflected by the regression coefficients in quadratic model Eq. 7.

Fig. 2
figure 2

Pareto chart showing the standardized effect of independent variables and their interactions on TTHMs formation in water during chlorination

The percent contribution (PC) of each of the individual term in final model was computed (Table 5) using the SS values of the corresponding term. The PC of a term is obtained as the ratio of SS of an individual term to that of sum of SS for all the terms (Yetilmezsoy et al. 2009), as;

$$ {\text{PC}} = \frac{\text{SS}}{{\mathop{\sum }\nolimits {\text{SS}}}} \times 100 $$
(8)

As evident from Table 5, the pH (X pH) showed the highest level of significance with a contribution of >48% as compared with the other components.

The total percent contributions by the first-order, quadratic and interaction terms in the model were also computed (Yetilmezsoy et al. 2009; Meng et al. 2007), as

$$ {\text{TP}}{{\text{C}}_i} = \frac{{\mathop{\sum }\nolimits_{{i = 1}}^n {\text{S}}{{\text{S}}_i}}}{{\mathop{\sum }\nolimits_{{i = 1}}^n \mathop{\sum }\nolimits_{{j = 1}}^n {\text{S}}{{\text{S}}_i} + {\text{S}}{{\text{S}}_{{ii}}} + {\text{S}}{{\text{S}}_{{ij}}}}} \times 100 $$
(9)
$$ {\text{TP}}{{\text{C}}_{{ii}}} = \frac{{\mathop{\sum }\nolimits_{{i = 1}}^n {\text{S}}{{\text{S}}_{{ii}}}}}{{\mathop{\sum }\nolimits_{{i = 1}}^n \mathop{\sum }\nolimits_{{j = 1}}^n {\text{S}}{{\text{S}}_i} + {\text{S}}{{\text{S}}_{{ii}}} + {\text{S}}{{\text{S}}_{{ij}}}}} \times 100 $$
(10)
$$ {\text{TP}}{{\text{C}}_{{ij}}} = \frac{{\mathop{\sum }\nolimits_{{i = 1}}^n {\text{S}}{{\text{S}}_{{ij}}}}}{{\mathop{\sum }\nolimits_{{i = 1}}^n \mathop{\sum }\nolimits_{{j = 1}}^n {\text{S}}{{\text{S}}_i} + {\text{S}}{{\text{S}}_{{ii}}} + {\text{S}}{{\text{S}}_{{ij}}}}} \times 100 $$
(11)

where TPC i , TPC ii , and TPC ij are the total percent contributions (TPC) of the first-order (linear), quadratic (second-order), and interaction terms, respectively. The SS i , SS ii , and SS ij correspond to the total sum of squares (SS) for the linear, quadratic, and interaction terms, respectively. The first-order linear terms exhibited the highest contribution (91.65%) followed by the quadratic (4.26%) and interaction (4.09%) terms. This suggests that the first-order independent variables have a direct relationship on the response factor, whereas, the quadratic and interaction terms did not show a large effect on THMs formation process in raw water due to chlorination.

4.2.3 Three-dimensional response surface plots

The three-dimensional (3D) response surface plots of the dependent variable as a function of two independent variables varying within their experimental ranges, while maintaining all other variables at fixed (center) levels can provide information on their relationships and can be helpful in understanding both the main and the interaction effects of these two independent variables (Wu et al. 2009; Adinarayana and Ellaiah 2002). Therefore, 3D response surface plots for the measured responses were constructed based on the quadratic model. Since, the quadratic model in this study has five independent variables, three variables were held constant at their center level for each plot and subsequently, a total six 3D plots were formed for the responses.

The influence of the five different variables on the formation of TTHMs is visualized in the 3D response surface plots (Fig. 3).

Fig. 3
figure 3figure 3

The 3D plots showing combined effect of a Cl2/DOC dose and pH, b Cl2/DOC dose and temperature, c Cl2/DOC dose and Br concentration, d Cl2/DOC dose and reaction time, e pH and temperature, f pH and Br concentration, g pH and reaction time, h temperature and Br concentration, i temperature and reaction time, and j Br concentration and reaction time on THMs formation

Figure 3a shows the 3D plot for the TTHMs formation as a function of pH and Cl2/DOC dose at constant temperature (27.5°C), Br concentration (55 μg/l), and reaction time (86 h). As evident, TTHMs formation increased with increase in both of these variables. At higher pH, the hydrolysis of precursors of THMs may result in higher THMs formation. Also, higher chlorine dose provided higher inorganic precursors to react with organic carbon in water enhancing the THMs formation process (Hong et al. 2007).

The interactive effect of temperature and Cl2/DOC dose on THMs formation at constant pH (8), Br concentration (55 μg/l) and reaction time (86 h) is shown in Fig. 3b. It may be noted that both the reaction temperature and chlorine dose favored the THMs formation process. It is expected that higher temperature may promote chlorination reaction rates.

The combined effect of Br concentration and chlorine dose on THMs formation at constant pH (8), temperature (27.5°C) and reaction tine (168 h) is shown in Fig. 3c. The concentration of both the halogen species favored THMs formation, as both these species in water would provide higher inorganic precursors to react with organic carbon leading to higher THMs formation.

Figure 3d shows the interactive influence of chlorine dose and contact time on THMs formation. Both these variables also favored the THMs formation. The longer reaction time may lead to complete and higher production of THMs.

The 3D plot for combined effect of the pH and temperature (Fig. 3e) suggests that the THMs formation increases with both the pH and temperature. A maximum TTHMs level (>120 μg/l) is observed at highest pH and temperature. A similar pattern of combined effects on THMs formation process was observed for Br concentration and pH (Fig. 3f), pH and reaction time (Fig. 3g), Br concentration and temperature (Fig. 3h), temperature and reaction time (Fig. 3i), and Br concentration and time (Fig. 3j).

4.2.4 Prediction of THMs formation potential of natural water samples

The experimentally measured TTHMs levels in test set samples exhibited a variation between 74.25 and 122.56 μg/l. The observed narrow range of the TTHMs concentration in test set is due to the close values of the independent variables. The quadratic model applied to the bimonthly data set (18) of the measured independent variables yielded the values of the corresponding response factor, close to the experimental values of the TTHMs formed due to chlorination of the samples (Table 6) with a considerably high correlation (R 2 = 0.95) and low prediction errors, RMSEP (3.13) and RSEP (3.11), thus suggesting for the suitability of the model for application to new data set to predict the THMFP of the chlorinated water.

Table 6 The actual (A) and coded (C) values of the independent variables in test set and corresponding measured and predicted values of the response variable

In order to examine, whether the experimental data (TTHMs levels) in test set differ from the corresponding model-predicted values of the response variable, a non-parametric Mann–Whitney U test was conducted (Yetilmezsoy et al. 2009), which is based on the combined ranking of the two samples and summing up their total rank scores (U) separately after their break up. An expected score value is determined as (Hamilton and Mann-Whitney 2004);

$$ E(U) = {n_{{U }}}(N + 1)/2 $$
(12)

where E(U) is the expectation of U, n U is the sample size of the data set being tested, and N is the total number of samples (\( N = {n_1} + {n_2} \)). A z-score under the normal curve is then calculated as (Yetilmezsoy et al. 2009);

$$ z = \frac{{{U_{{{ \max }}}} - E(U)}}{{\sqrt {{{n_1}{n_2}(N + 1)/2}} }} $$
(13)

where U max is the maximum total rank score value and n 1 and n 2 are the sample sizes of the two independent data sets. The z-score for the present test data set was determined to be 0.013 and the two-tailed probability associated with this z-score was obtained as p = 0.990, which is greater than the chosen probability level of p = 0.05, suggesting that there was no significant difference between the measured and the model-predicted responses in test set (Liu et al. 2004).

A parametric two-sample (unpaired) t test was also performed to evaluate the relationship between the model-predicted and the experimental responses (Yetilmezsoy et al. 2009). Since, both the predicted and experimental data sets have almost equal variance; the standard error of the two sets can be pooled (Hamilton 2004) as

$$ {\text{s}}{{\text{e}}_p} = \frac{{{s_1} + {s_2}}}{2} \sqrt {{\frac{1}{{{n_1}}} + \frac{1}{{{n_2}}}}} $$
(14)

where se p is the pooled standard error, s 1 and s 2 are the standard deviations, and n 1 and n 2 are the sample sizes of the two data sets. The t test statistics was then applied calculating the t cal value, as (Hamilton 2004);

$$ {t_{\text{cal}}} = \frac{{{{\bar{y}}_1} - {{\bar{y}}_2}}}{{{\text{s}}{{\text{e}}_p}}} $$
(15)

where t cal is the calculated t statistics, and \( {\bar{y}_1} \) and \( {\bar{y}_2} \) are the mean values of the independent data sets. The value was compared with the critical t value (t crit) for the corresponding degrees of freedom (\( df = {n_1} + {n_2} - 1 \)). The calculated t value (t cal = 0.089) was found to be less than the critical value of t (t crit = 2.03), suggesting that there is no significant statistical difference between the two set of independent samples. Moreover, the two-tailed probability associated with the computed t value (t cal = 0.089, df = 35) was found as p = 0.930, which is greater than considered probability (p = 0.05), suggesting no significant difference between the experimental and predicted values of the response factor. Therefore, the t statistics also concluded with 95% certainty that the proposed quadratic model provided a satisfactory fit to the field test data set (Yetilmezsoy et al. 2009).

4.2.5 Optimization modeling for TTHMs formation

Optimization modeling was performed using the developed quadratic model to assess the maximum level of the TTHMs in chlorinated raw water under the worst possible condition of the highest (+1) levels of all the five variables within their experimental ranges, and also with a view to control the raw water characteristics and chlorination conditions (in terms of selected five variables) to control the THMs formation maintaining their concentration within the safe limit of 80 μg/l. Optimization modeling predicted maximum level of 192.06 μg/l for TTHMs formed under the highest (+1) level of each of the five variables, which was very close to the experimental value of 186.8 ± 1.72 μg/l determined under these conditions.

5 Conclusions

THMs formation in chlorinated raw water (source of drinking water) was studied as a function of DOC normalized chlorine dose, water pH, temperature, bromide concentration, and reaction time. Application of a BBD combined with the RSM and optimization helped in reaching the global solution for optimum THMs formation in disinfected water. The proposed mathematical approach provided a critical analysis of the individual and simultaneous interactive influences of the selected independent variables, such as the chlorine dose, pH, temperature, bromide concentration, and reaction time on THM formation. The model-predicted maximum level of TTHMs (192.06 μg/l) in chlorinated water under highest levels (+1) of all the five independent variables was very close to the experimental value of 186.8 ± 1.72 μg/l.

The adequacy of the developed quadratic model for THMs formation in chlorinated water was checked through various statistical diagnostics. The model-predicted values of the response variable were in very good agreement with the experimentally determined values (R 2 = 0.994, RMSEP = 2.26, RSEP = 3.04). The statistical analysis results suggested that the first-order main effects of the independent variables were relatively more significant than their respective quadratic effects, indicating that the selected process variables had a direct relationship with the THMs formation in water. The pH of water was the most significant component of the quadratic model for the present system followed by reaction time and temperature.

The diagnostic results of the model application to the test set samples also suggested for the suitability of the developed model (R 2 = 0.95, RMSEP = 3.11, RSEP = 3.13) for its application to new data sets. The test results clearly confirmed that the proposed five-factor BBD combined with RSM and optimization is an effective approach for modeling the THMs formation in chlorinated water to understand the relationships among the independent and response variables and to optimize the process to achieve safe levels of TTHMs in chlorinated water. The modeling approach may be employed to predict and control the TTHMs level in chlorinated water using the raw water characteristics and disinfection conditions.