Introduction

In the last four decades, the chemical disinfection of drinking water has reduced significantly the incidence of infectious waterborne disease, but reactions of disinfectants such as chlorine with natural organic matter contained in source waters produce chemical mixtures of different undesirable compounds considered as disinfection byproducts (DBPs). Until now, more than 600 DBPs have been identified in drinking water (Richardson et al. 2007) and this number continues growing. Most drinking water treatment plants use chlorine for disinfection and therefore several types of chlorine containing DBPs are generated. Among them, trihalomethanes (TTHMs) and haloacetic acids (HAAs) are found at the highest concentrations in treated drinking water (Hamidin et al. 2008; Richardson 2003).

DBPs can enter the human body by multiple pathways, such as water ingestion, oral intake, inhalation through breathing, and dermal contact through skin during regular indoor activities (showering, bathing, and cooking). This chronic exposure to DBPs may pose risks to human health (Siddique et al. 2015), although inconsistent results have been reported across different epidemiological studies. In the case of TTHMs, the highest risk of total cancer in both males and females is associated to chloroform occurrence, mainly by inhalation (80–90% of the total risk), followed by oral exposure and dermal contact (Basu et al. 2011; Mishra et al. 2014). Several studies have reported associations between DBP exposure and increased risk of adverse developmental outcomes including low birth weight or small for gestational age births (Hoffman et al. 2008; Kumar et al. 2014), congenital anomalies, and birth defects such as cardiovascular and neural disorders (Grazuleviciene et al. 2013; Levallois et al. 2012; Nieuwenhuijsen et al. 2008). Others studies have found elevated rates of bladder, colon, rectum and brain cancers (Cantor et al. 2010; Melnick et al. 1994; Salas et al. 2013). For all these reasons, the different countries have regulated the permitted levels for the most prevalent DBPs in drinking water; in this way, the US Environmental Protection Agency established the limit of total concentration of four TTHMs to 80 μg L−1 and five HAAs to 60 μg L−1 (USEPA 2001) and the European Union has regulated the limit of total concentration of TTHMs to 100 μg L−1 from 2008 (98/83/EC 1998).

Since Johannes Rook (1974) discovered that TTHMs are formed by the reaction of chlorine with natural organic matter (NOM) in drinking water, hundreds of studies have been developed to determine the effects of TTHMs on health and its mechanisms of formation within the treatment plant and its evolution along the distribution systems. The earliest models for predicting TTHMs and chloroform formation in drinking water were reported in 1983 (Engerholm and Amy 1983). To date, more than 150 models have been developed through field and laboratory scaled to predict DBP formation in drinking water (Brown et al. 2011; Chowdhury et al. 2009).

These models have investigated the effects of different quality and operational parameters in controlling DBPs formation under a variety of environmental conditions, but mechanistic DBPs models are exceedingly difficult to derive due to seasonal, locational, and temporal variations in water quality, as well as the complexity of aquatic chemistry in terms of both disinfection kinetics and interactions in natural water matrices arising from heterogeneous natural organic matter (NOM) (Kulkarni and Chellam 2010).

Thus, most of these empirical models are site specific, and consequently, their predictive capabilities in different water conditions remain inappropriate (Elshorbagy 2000). The development of a mathematical model that predicts the formation of disinfection DBPs under different water qualities and treatment conditions is of great interest and usefulness in the drinking water field (Lu et al. 2011). The great number of models proposed shows the challenge to get a universally applicable model (Golfinopoulos and Arhonditsis 2002).

In a recently study (Mayer et al. 2015), a large number of existing models were evaluated and overall poor performances were found. According to this study, most of the models are based on specific boundaries related to source water, water quality parameters, and treatment conditions using untreated, coagulated, or finished conventionally treated water. In this way, satisfactory model performance may be limited to a narrow range of treatment scenarios. Accordingly, the overall poor performance of the models tested may be a function of applying them to data sets that did not satisfy all boundary conditions. Among the characteristics of distribution systems, piping materials (especially iron or copper) can affect the formation of DBPs. A recent study showed that reactions between of certain organic precursors with zero-valent iron may contribute to the formation of dichloroacetamide (DCAcAm) in distribution networks which contain cast iron pipes unlined, even in the absence of chlorinated disinfectants (Chu et al. 2016a). Metallic Cu alone did not affect HAcAm concentrations, but cooper increases reductive dehalogenation of haloacetamides by zero-valent iron in drinking water, reducing the integrated toxic risk (Chu et al. 2016b). A few studies focused on the consequences of suitable water treatment processes (Badawy et al. 2012; Mouly et al. 2010) concluding that DBP formation and its spatial/seasonal variations depends largely on the efficient removal of NOM.

A previous paper demonstrates that most of TTHMs occur during the treatment process by reaction of chlorine with organic precursor compounds not removed during the preceding stages to the addition of the disinfectant. Trihalomethane formation continues throughout the distribution system by reacting with the residual organic matter with the free residual chlorine and the chlorine applied in successive re-chlorination stages. This reaction is affected by environmental conditions as well as operational and morphological characteristics of the distribution system. The range of seasonal and spatial variation of trihalomethanes depends on the effectiveness of the removal of the organic matter during the treatment process (Domínguez-Tello et al. 2015), which is in good agreement with other authors in works related to bench scale (Badawy et al. 2012), real scale (Summerhayes et al. 2011), and laboratory scale (Sadrnourmohamadi and Gorczyca 2015), considering the evaluation of ozonation effect on the total trihalomethane formation potential (THMFP) in river water samples.

The aim of this work is the development of a predictive model of formation of trihalomethanes with predictive capability in different scenarios of treatment and different distribution systems. To this end, the model was developed in two stages (treatment process and distribution system) using a wide range database. For this purpose, a historical database was used with water samples treated in Aljaraque WTP, with three different treatment processes. The model was developed using multiple regression analysis and validated using a recent database from the same distribution system. Subsequently, the model was tested on three water supply systems located in the surroundings of Huelva city (Southwest Spain), comparing the predicted and measured TTHM values. The model fits well to different treatment scenarios with low prediction errors. The results showed good predictive capability, good adaptability to boundary conditions, and low prediction errors. On the other hand, in this work, 17 models published were evaluated, analyzing comparatively the predictive capability of TTHM concentration of each of them, by variation the treatment process applied in the plant.

Materials and methods

Study sites

Water samples were collected from January 2011 to May 2015 through four water distribution systems (WDSs) located at the province of Huelva (southwest Spain): Aljaraque (Alj WDS), Lepe (Lep WDS), Riotinto (Rti WDS), and La Palma (Lpa WDS) (Fig. 1). All the systems worked with surface water source and used chorine in the form of sodium hypochlorite. Alj and Lep WDS supply water to a population of 60,000 and 150,000 inhabitants, respectively. Both plants operated with two conventional treatment processes in a seasonal scheme: (a) from October to May including pre-oxidation with potassium permanganate (pre-KMnO4) coagulation–flocculation–sedimentation (CFS), rapid sand filtration (SF), second step filtration/adsorption with granular activate carbon (GAC), and disinfection and (b) from May and September with the same treatment process but substituting the pre-oxidation by an advanced treatment with ozone (pre-O3). The other two WDSs: Lpa supplies water to a population of 70,000 inhabitants using advanced treatment (pre-O3, CFS, SF, GAC) and disinfection and Rti for a population of 20,000 inhabitants using a conventional treatment process (pre-KMnO4, CFS, SF, GAC) prior to the disinfection. All the WDSs studied are affected by significant climatic and population variations, mainly the influence of seasonal coastal tourism. Table 1 summarizes the general characteristics of the water distribution systems studied.

Fig. 1
figure 1

Study zone

Table 1 Description of water distribution systems

Sampling strategy

An intensive monthly sampling campaign was performed in each supply system from January 2011 to May 2015. Sampling points were located at the water treatment plants (raw water and finished water) and two additional sample points in the reservoirs along of the distribution system. Thus, 720 samples were taken, 180 in each WDS under study (45 raw water, 45 finished water and 90 from distribution system). Samples were taken at the same day of the week and the first week of each month, using the same sampling route and the same sampling point in each selected location, ensuring the stability of hydraulic conditions and the representativeness of the samples in the studied system.

Samples were taken at the tap of each sampling point. In order to guarantee the representativeness of the sample, it is necessary to renew the water contained in the section of pipeline between the sampling point and the reservoir or supply network to be sampled, for which the water was allowed to flow for at least 5 min before filling the sample bottles. To analyze TTHMs, duplicate samples were collected in 125 mL amber glass bottles with Teflon-lined screw caps, completely filling the bottle avoiding any headspace. A volume of 1.5 mL of 0.1 M sodium thiosulfate aqueous solution was added to eliminate any remaining residual chlorine quenching the sample to further THM formation. Temperature, pH, turbidity, conductivity, and residual chlorine were in situ measured while dissolved organic carbon, UV254, bromide, calcium, and trihalomethanes species were determined in the laboratory. The samples were stored at 4 °C and analyzed within 2 days after collection. During each campaign, the operational parameters of the treatment plant and distribution system were collected to calculate the operational variables (chlorine dose in WTP and rechlorination, treatment flow, flow rate, water consumption, and water level in the storage tanks).

Throughout the study, several actions have been implemented to reduce the effect of the uncertainties on the quality of the developed model: The coagulation/flocculation/sedimentation and oxidation processes were controlled and adjusted on the basis of jar test. The sedimentation process was controlled to prevent floc leakage. Through the sampling period, the quality of finished water was maintained at values of Fe, Mn, and turbidity lower than 50 μg L−1, 10 μg L−1, and 0.7 NTU, respectively. Stable process conditions were maintained, avoiding sampling during occasional process fluctuations. Sodium hypochlorite was used in the disinfection process; the concentration of chlorine in the solution (150–123 g L−1) was used for calculating the accumulated dose of chlorine and chlorine dose. The contact time of water in reservoirs was calculated daily considering the flow of water supplied to each population nucleus by assuming complete mixing inside the reservoirs. The contact time considered for model development was the weekly average in stable conditions.

Analytical methods

Water samples were analyzed for the regulated TTHMs using headspace-solid-phase microextraction (HS-SPME) coupled to gas chromatography-mass spectrometry (GC-MS), using a Varian CP-3800 gas chromatograph coupled to an ion trap mass spectrometer Varian Saturn 2000 MS (Varian, Sunnyvale, CA, USA). The analytical method has been published elsewhere (Domínguez-Tello et al. 2015). Briefly a DB-5 ms 30 m × 0.25 mm × 0.25 μm capillary column (Agilent Technologies) was used for the chromatographic separation of TTHMs, using the following temperature program: 40 °C for 4 min, ramped to 120 °C at 10 °C min−1 and hold for 1.5 min, and finally ramped to 250 °C at 25 °C min−1 for 5 min (total run time 23.7 min). Injections were made in split mode (1:10) for 3 min at 220 °C. Helium was used as carrier gas at a constant pressure of 29 kPa and a constant flow rate of 1 mL min−1.

The HS-SPME extraction method has also been described in the previous publication (Domínguez-Tello et al. 2015). Briefly, the fiber used was made of carboxen/polydimethylsiloxane (CAR/PDMS 85 μm) purchased from Sigma-Aldrich. Before use each fiber was conditioned at 250 °C for 30 min. For HS-SPME extraction, 2 mL of sample was transferred to a 4-mL sample vial together with a magnetic bar, 250 μL of saturated sodium chloride solution, and 2 μL of 5 mg L−1 of internal standard solution (1,2-dibromopropane). The samples were sealed using screw cap, containing a PTFE-faced rubber septum. The analytes were extracted at 40 °C for 30 min with stirring speed of 250 rpm. Then, the fiber was introduced into the GC injection port at 270 °C during 4 min for desorption.

Electron ionization mass spectra were recorded in scan mode using the m/z 29–300 at 3.5 scans per second. Each compound was quantified by comparing the relative area of the internal standard to the target analyte. The limits of detection (LOD) and quantification (LOQ) were 1.3, 0.8, 1.1, and 0.9 μg/L and 4.2, 2.5, 3.6, and 3.0 μg/L for CHCl3, CHCl2Br, CHClBr2, and CHBr3, respectively. The features of the method and validation dates were listed in Table S1 of Supplementary information. Milli-Q water was used through and was purified in a Gradient system (Millipore, Watford, UK). All the standards were of analytical grade and purchased from Sigma-Aldrich (Madrid Spain). Solvents were of HPLC grade and obtained from Sigma-Aldrich.

Sampling campaigns were monthly performed and quality control of the analytical method evaluated using four external standards with different concentrations of TTHMs, which confirm result reliability. Not less than 25% replicate samples were analyzed for THM to evaluate the method precision. Blanks were also used for background correction and error source detection.

Conventional parameters for water quality control were analyzed using the following approaches: free residual chlorine was measured using a Hanna photometer HI-93711 following colorimetric method DPD according to Standard Method 4500-Cl-G. Turbidity was measured using a HACH 2100P turbidimeter. Bromide was analyzed using an ion-chromatograph (METROHM 861 Advanced compact IC) with chemical suppression and conductivity detector. A Metrohm 744 pH-meter equipped with a gel-filled electrode Water pH was used for measuring pH. Conductivity was measured with Crison CM35 conductivity meter. Samples were filtered using a 0.45-μm nylon membrane filter prior to the measurement of UV absorbance and DOC. UV254 absorbance spectra were measured using a PerkinElmer Lambda 18 spectrophotometer with 5 cm quartz cell, and latterly spectra were normalized to a 1-cm cell length. DOC concentrations were obtained using a TOC-5000 Shimadzu analyzer, according to EPA Standard Method 5310C. Specific ultraviolet absorbance (SUVA) was calculated by normalizing UV254 values with respect to DOC.

Modeling and validation

In this study, the predictive model of TTHMs was developed in two stages: (1) the treatment process and (2) the distribution system, since as explained before, the THM formation occur during the treatment process (first stages) starting with the addition of chlorine by its reaction with organic precursor compounds not removed during the earlier stages of treatment. As above commented, the TTHM variation in the water distribution system is influenced by trihalomethanes concentration and organic precursors in finished water, pH water and environmental variables, and operational and morphological characteristics of the distribution system (Fig. 2).

Fig. 2
figure 2

Model scheme

Modeling

The approach tries to predict the formation of trihalomethanes in the water treatment process and subsequently their variation throughout the distribution system. The model was developed using a database (Domínguez-Tello et al. 2015) which considers 198 samples taken in a monthly sampling campaign conducted between January 2007 and July 2012 in Alj WTP, three reservoirs and networks of its distribution system. During the sampling campaign, the plant worked with four treatment processes from which only three were selected for the study. The database used provides a wide range of values of trihalomethanes (22.6–123.9 μg L−1) which provides a high statistical potential and high range of application of the model. Subsequently the model was validated using the results from recent sampling campaign between February 2011 and July 2015, in Alj WDS.

A recent review from Ged et al. discusses the predictive capability and application of 87 DBP models published over the last 30 years focused on chlorine disinfection. The results showed that multivariable power law models had the highest predictive capability for TTHM. Additionally, the best models for predicting TTHM were those including at least five of the seven explanatory variables (DOC, UV254, Br, pH, chlorine dose, reaction time, and temperature) (Ged et al. 2015). In accordance with this study, the direct explanatory variables initially considered in the predictive model were DOC, SUVA, UV254, ion bromide (Br), pH, chlorine dose (d), accumulated chlorine dose (D), reaction time (t), temperature (T), and three composite explanatory variables associated with the disinfection reaction: R EF, R DS and δ, where R EF is the product of the following variables: chlorine dose in WTP (d), reaction time (t), temperature (T) and UV254 absorbance in finished water. Similarly, the composite variable R DS is the product of variables involved in the formation of TTHM along the distribution system (stage 2), including TTHM calculated in finished water (TTHMEf), which can be potentially produced by reaction of precursor unoxidized organic matter (UV254), accumulated dose of chlorine (D), contact time (tDS) and water temperature (TDS) in the point selected of distribution system. Finally, δ is an explanatory variable expressed as the difference between the dose of chlorine added in the treatment process and the value of free residual chlorine in the water finished, in relation to the contact time between the two points (Mouly et al. 2010). The detailed description of explanatory variables used in the two stages of the predictive model is shown in Fig. 2.

After verifying the statistical significance using the Pearson correlation matrix at 95% significance level—p < 0.05—the explanatory variables were selected. For that, the combinations of explicit variables were tested and were selected those in which the best statistical results and reproducibility of the model were obtained following criterion of maximum R2, minimum standard error s, and Cp Mallows.

For development of the model, multiple regression analysis of data was carried out. Polynomial: Y = K + X 1 b 1 + X 2 b 2 + …X p b p and logarithmic: Y = K (X 1)b 1 (X 2)b 2…(X p )b p forms were tested, where Y is the variable to be modeled (TTHM), X i , i = 1 to p are the explanatory variables, b i  = 1 to p represent the statistical coefficients to be estimated, and K is a constant term. The model was developed according to the best combination of variables obtained in the statistical analysis.

Comparative statistical analyses of measured and predicted data from the model options started with F test, Student’s T test, linear correlation coefficient (R2), and analytical variance (AV): percentage of the absolute difference between the measured and predicted values and standard error (SE) or root mean square error. F test analysis determined the variance similarity between observed values and predicted values. If the F test value was >0.5, the Student’s T test with equal variance was conducted, and otherwise if F test <0.5, Student’s T test with unequal variance was conducted. If the Student’s T test result was <0.5, the two data sets had no statistical similarity, and they were not equivalent. Instead if the Student’s T test result was >0.5, the two data sets had no significant statistical differences, that is, they were equivalent and then uncertainty analyses were calculated: SE, AV, and linear correlation coefficient (R2) (Chen and Westerhoff 2010). Both AV and SE reflected the deviation or uncertainty of predicted data relative to measured data, and R2 indicated the correlation between predicted data and experimental data. According to the statistical analysis, the model with higher value Student’s T test, higher R2, and lower AV and SE was selected. Moreover, the statistical significance of the selected model was checked by the F value and Durbin–Watson estimate (1.5–2.5).

Validation and application

The purpose of validation is to measure the goodness of fit of the values predicted by the model in comparison with the experimental data measured. In order to validate the new two-step model developed, the TTHMEf and TTHMDS values were calculated using the model equations for an independent set of additional database obtained from Aljaraque WDS between January 2011 to May 2015 (35 samples).

To evaluate the adaptability of the new model to different treatment processes and different conditions of the distribution system, the new model was applied to three different WDS (Lepe, La Palma, and Rio Tinto).

The predicted values were compared with measured values calculating the difference between them, using AV, SE, and R2. A T test was done on the predicted models and to determine the biasness by calculating the t value for models compared to the t critical value (If t value < t critical, the biasness is considered to be not significant).

Results and discussion

Occurrence of TTHMs: seasonal and spatial variations

Seasonal variations of the temperature and raw water quality cause variations of DBPs concentrations in the water supply. Additionally, seasonal changes in water consumption habits, human activities, and environmental changes favor such variations (Fokmare and Musaddiq 2001; Karapinar et al. 2014). Therefore, to maintain suitable water quality according to the established regulations, it is necessary to combine the water treatment processes with seasonal raw water conditions and the characteristics of the supply system. Considerable seasonal variations of drinking water quality have been reported in many drinking water systems including small water distribution system (Scheili et al. 2015).

In this work, both the seasonal (from winter to summer) and spatial (from the water treatment plant to the end points of the distribution system) variations of TTHM were evaluated in the four WDS located at important areas of the southwest Spain (Aljaraque, Lepe, Riotinto, and La Palma) during the period in which it was developed and validated the model. The seasonal and spatial variations of TTHMs are shown in Table 2.

Table 2 Occurrence TTHM formation

TTHM levels were higher in summer followed by spring and lower in autumn and winter. The average levels of TTHMs measured in summer at the water treatment plant of Alj, Lep, Rti, and Lpa were 1.41, 1.34, 1.41, and 1.15 times higher than the average levels in winter, respectively. The lower range of seasonal variation occurs in Lpa WTP where advanced treatment process with ozonation and GAC were used. The ranges of spatial variation in Alj, Lep, Rti, and Lpa water distribution systems were 1.1, 1.26, 1.34, and 1.13 times the concentration of TTHM in treated water, respectively.

Influence of the oxidation process: ozonation test

The natural organic matter present in the source of water is the major precursor to the formation of DBPs. Water utilities need to apply treatment technology and optimize the treatment processes to remove organic precursors to effectively reduce the formation of DBPs (Hua et al. 2015).

To evaluate the influence of the treatment process and especially the oxidation process on the formation of TTHMs in distribution system, a real scale test was carried out in the Lep WTP varying the ozone dose. A sampling campaign for 2 months with daily sampling of raw water, treated water, and three sampling points of the distribution system (R1, R2, and R3) was performed. The results are shown in Fig. 3.

Fig. 3
figure 3

Effect of ozonation

In agreement with other authors (Bond et al. 2014; Galapate et al. 2001; Sadrnourmohamadi and Gorczyca 2015), trihalomethane formation depends on both oxidation kinetic and halogenation steps. The transformation of dissolved organic carbon during ozonation results in a higher reduction in TTHM (by conversion of hydrophobic fractions—main contributors to the formation of TTHM to hydrophilic fractions).

As a result, it was found that increasing ozone dosage (1, 2, and 3 mg L−1) reduce the content of DOC (31.2, 32.8, and 38.3%, respectively) and trihalomethanes (14, 34 and 48%, respectively) in the distribution system. Furthermore, as shown in Fig. 3, a higher oxidation treatment contributes to reduce the variability of TTHM throughout the distribution system, suggesting less seasonal and spatial variation, achieving greater stability of supply water quality. The results suggest the importance of considers the effect of the treatment process in the development of DBPs predictive models.

Effect of natural organic matter

The natural organic matter present in the source water serves as the major precursor to the formation of DBPs. Aquatic NOM is a complex mixture of heterogeneous organic compounds varying in structure and functionality from source to source. Therefore, surrogate parameters are used to predict its removal through treatment, estimating its reactivity toward DBP formation, such as total organic carbon (TOC), dissolved organic carbon (COD), ultraviolet (UV) absorbance, and SUVA (Hua et al. 2015; Matilainen et al. 2011; Reckhow et al. 1990). SUVA is a good indicator of the formation of unknown DBPs, but generally, the correlation between SUVA and TTHMs is poor, because TTHMs are produced from diverse types of precursors including UV and non-UV absorbing organic compounds. NOM with high SUVA values is rich in humic substances, hydrophobic compounds, and high molecular weight organic matter (Ates et al. 2007; Kitis et al. 2002). The SUVA values of the raw waters from the four reservoirs used for this study were lower than 2 L mg−1 m−1, suggesting the presence of low molecular weight compounds from NOM and low humic acids content.

In our study, data from DOC, SUVA, and UV254 were tested using the Pearson correlation test (Table 3). The highest correlation was found with the UV254 variable. Therefore, UV254 measurement was adopted as a critical variable of the model, which can be easily measured. In the individual study of each treatment process, positive correlations between the variable UV254 in treated water and the formation of TTHM were obtained (r −0.696, −0.365, and −0.704 in TP1, TP2, and TP3, respectively); however, a non-significant correlation (r 0.140) was found in the pooled data from the three process schemes. The reason of this effect could be the different removal efficiency of NOM of each treatment scheme (Badawy et al. 2012).

Table 3 Pearson correlation matrix

The UV254 varies according to the treatment applied in the WTP and is a good indicator of the potential formation of TTHMs. High UV254 values in the finished water indicates poor oxidation and a high potential for formation of TTHMs throughout the distribution system. Thus, the variable UV254 was applied in both stages of the model as significant indicator of trihalomethane reaction formation.

Effect of pH

The concentration of TTHMs increases at high pH as a result of numerous hydrolysis reactions occurring in these compounds, and the increasing formation of hypochlorite ions at these pH, which reduce the effectiveness of chlorine disinfection. As a consequence, at higher pH values, more TTHM are formed (Hong et al. 2007; Zimoch et al. 2015). In this study, a positive Pearson correlation was obtained between water pH and TTHM concentration (Table 3), both in global data (r 0.562) and those from data groups of each treatment process (0.561, 0.874, and 0.877 in TP1, TP2, and TP3, respectively). Good correlation between water pH and the TTHM formation in distribution system (stage 2) was also found (r 0.687).

Effect of water temperature

In the area under study, a clear relationship of water temperature with the formation of trihalomethanes was found (Table 3), which can be explained by the effect the temperature in the organic matter removal effectiveness during the treatment process, which was confirmed by the results obtained in the conventional treatment processes, used in this study, TP1 and TP3 (r 0.944 and 0.918, respectively). However, a positive, but less marked correlation, was observed in the advanced process TP2 (r 0.220), which matches with low UV254 values in finished water due to the strong oxidation of organic matter by ozone. Despite the clear importance of temperature in the reaction of formation of TTHM, no direct correlation was found when Pearson test was applied to global data.

Effect of chlorine dose

Using the Pearson correlation method, a strong relationship (r 0.748) was obtained between THM formation and chlorine dose used in the treatment processes (TTHMEf). Also, a strong relationship (r 0.881) was obtained between TTHM formation in distribution system (TTHMDS) with the accumulated dose of chlorine (Table 3).

Effect of contact time

In this study, the contact time (tEf) for the reaction of chlorine with organic matter was measured from the chlorine dosing point to the finished water in the WTP. We also measured the contact time from the point of finished water to different sampling points in the distribution system (tDS). The contact time in the studied water treatment plants (tEf) were between 0.10 and 3.25 h and in the distribution systems (tDS) between 19.7 and 30.0 h.

In both cases, a strong relationship was obtained between the contact time tEf (r 0.951) and tDS (r 0.965) with the TTHMEf and TTHMDS, respectively.

Effect of bromide

In the chlorination process, DBP concentration increases with the level of bromide. This is because the bromide ion can be oxidized by free chlorine to produce hypobromous acid (HBrO) that reacts with NOM with more substitution ability than HClO. When the level of bromide increases, more bromide could be incorporated into DBPs, and consequently, the formation of chlorine-containing species decreases. Moreover, the weight of bromine atom is higher than chlorine, so DBP formation increases more significantly with the increase of bromide (Bougeard et al. 2010; Hong et al. 2013; Watson et al. 2015).

In the present study, a good relationship (r 0.657) between the bromides in finished water with TTHMEf was obtained. However, peak values of bromide in the raw water were observed that not always are neutralized during the treatment process, which can cause elevations of TTHM in the distribution system. Therefore, the bromide variable was included in the model, despite its low background in the water supply systems studied.

Reaction variables

TTHM formation behaves as a first order reaction with respect to chlorine dose and humic acid precursors. Therefore, TTHM formation can be formulate as a function of the concentration of THMFP (humic acid precursor), residual chlorine, reaction time, and reaction temperature (Li and Zhao 2006). Following this criterion, the variable reaction (R Ef) was established as an indirect indicator of the reactivity of organic matter with chlorine in treated water in the plant. R Ef represents the product of the dose of chlorine, contact time, water temperature, and UV254 absorbance measured in finished water. Using the Pearson method (Table 3), a strong relationship (r 0.965) was obtained between R Ef and TTHMEf. Similarly R DS represents the product of the dose of chlorine, contact time, water temperature, and UV254 absorbance measured in finished water (R DS = D × tDS × TDS × UV254). A strong relationship (r 0.749) was obtained between R DS and TTHMDS.

The TTHM in distribution system was inversely proportional to the variable δ. Good relationship was obtained between δ × T and UV254 × δ × T with TTHMEf (−0.861, −0.713 and −0.611 respectively). However, these variables were not selected in the model proposed since the correlation obtained between predicted and calculated values was lower (R2 0.789–0.878) and standard error higher (19.7–14.9) in respect to the variable R.

Modeling developed

After verifying the effect of different variables on the formation of trihalomethanes in the two stages, the statistical significance and their Pearson correlations, the explanatory variables were selected. Among the possible combinations of explicit variables (pHEf, d, tEf, TEf, Br UV254, R Ef, δ, δ T), pHEf, Br, and R Ef, were selected, because these variables obtained the best statistical results reproducibility model: R2 0.948, SE 8.67, and Cp Mallows 4.0 (Table 4).

Table 4 Model options

Different options of models (linear and polynomial) were evaluated according to the best combinations of variables obtained in the statistical analysis and the accuracy of the predictions regarding the measured values. Based on the results obtained, the lineal model was selected. The summary of TTHM models is shown in Table 5. The results of Student’s T test that were >0.5 (0.99 and 0.90 for stages 1 and 2, respectively) showed no significant statistical difference between measured and predicted values. The analytical variance (AV), standard error (SE), and linear correlation coefficient (R 2) were as follows: 13.6, 8.67, and 0.948 and 9.9, 6.08, and 0.9 for stage 1 and 2 models, respectively. The new model is statistically significant, and the value of the Durbin–Watson statistic was found to be 1.74 and 1.62 for stages 1 and 2, respectively. The value of Durbin–Watson is preferred to be between 1.5 and 2.5 for statistically best model (Kumari and Gupta 2015; Uyak et al. 2007). According to the comparative results, the lineal model was selected:

$$ {\mathrm{TTHM}}_{\mathrm{Ef}}=165\hbox{--} 21.3\ {\mathrm{pH}}_{\mathrm{Ef}}+0.232\ {\mathrm{Br}}^{-}+5.84\ {R}_{\mathrm{Ef}} $$
$$ {\mathrm{TTHM}}_{\mathrm{DS}}=14.9+1.01\ {\mathrm{THM}}_{\mathrm{Ef}}+0.20\ {\mathrm{pH}}_{\mathrm{DS}}-0.104\ {R}_{\mathrm{DS}} $$
$$ \mathrm{where}\ {R}_{\mathrm{Ef}}=d\times {t}_{\mathrm{Ef}}\times T\times {\mathrm{UV}}_{254};{R}_{\mathrm{DS}}=\mathrm{D}\times {t}_{\mathrm{DS}}\times {T}_{\mathrm{DS}}\times {\mathrm{UV}}_{254}. $$
Table 5 Statistical evaluation of TTHM models

The analysis of variance (Garcia-Villanova et al. 2010) showed that the model was statistically significant (p < 0.05). The examination of statistical residuals of model showed a normal distribution of data evently distribuited above and below the zero baselines and no visible trends. The range of application of the developed model is restricted by range of quality and operational variables taken as the basis of design and validation: TTHMEf (22.6–125.5 μg L−1), COD (1.40–5.09 μg L−1), UV254 (0.017–0.076 cm−1), pHEf (6.50–7.80), T (10.6–26.6 °C), t (0.10–3.25 h), Br (20.0–176 μg L−1), d (0.70–5.80 mg L−1), TTHMDS (27.3–130.1 μg L−1), tDS (19.7–30.0 h), D (2.97–6.31 mg L−1), and pHDS (6.73–7.75).

In developing the model, some effects that could be limiting and affect the quality of the predicted results can be observed. Therefore, the data were selected and those corresponding to unstable or anomalous behavior were removed. Thus, data which trihalomethanes in distribution system were lower than in finished water were discarded. This effect was observed in some reservoirs far away from the WTP, where the inlet water was cascading, affected probably by air-stripping. Furthermore, it is observed that quality measuring of certain operational variables such as contact time and cumulative dose of chlorine could be sources of uncertainties in the predictive model. However, it should be pointed out that the source water used in the development, validation, and application of the model contains low levels of bromide and it would be advisable to check the effectiveness of the model with high levels of this ion.

Model validation

The new predictive model was validated in the same WDS in which was developed (Aljaraque WDS) during period January 2011 to May 2015 (35 samples). During the validation period, Aljaraque WTP operated with conventional treatment process from October to May (pre-KMnO4, CFS, SF, GAC and disinfection) and advanced treatment between May and September (pre-O3, CFS, SF, CAG, and disinfection). The results of the validation analysis are shown in Table 6. A T test was applied to determinate the biasness of the model. The t value for the two stages (Alj WTP and Alj DS) were less than the t critical value and p values were also greater than 0.05, which indicates that the model biasness were not significant. The uncertainty analysis shows a low deviation of predicted data relative to measured data. The standard errors (SE) of the two stages (Alj WTP and Alj DS) measured as root mean standard errors were 2.57 and 3.00, respectively. The AV measured as average percentage of the absolute difference between measured and calculated by the model was 7.38 and 7.22%, respectively. The maximum differences between measured and calculated TTHM in the two stages of the model were 17 and 12%, slightly lower in the DS indicating some adjustment in the second stage model. The analysis of variance of the first stage (WTP) obtained prediction errors (AV) greater than 10% in the values range studied (TTHM 28.3–56.5 μg L−1). Likewise in the second stage model (DS), 23% of the predictions obtained a variance >10%. The two-stage model validation indicates very satisfactory predictions with R2 of 0.91 and 0.87. The Fig. 4 shows measured vs. predicted TTHM values in water treatment plant (stage 1) and distribution system (stage 2).

Table 6 Model validation and application
Fig. 4
figure 4

Model validation. Aljaraque WTP and DS

Model application to different water supply systems

In order to check the suitability of the new two-stage model and its adaptability to different boundary conditions, the model developed was applied in three different water supply systems: Lepe, Riotinto, and La Palma water distribution systems with substantial differences in treatment processes, water source, and distribution system characteristics. The quality characteristics of raw water, treated water and water distribution system of Aljaraque, Lepe, Riotinto, and La Palma are shown in Tables S2, S3, and S4 of supplementary information.

The numeric results of the application are shown in Table 6. Graphically, the measured and calculated values for the different distribution systems studied are represented in the Fig. 5. To analyze the results of applying of the model, similar statistical procedure to validation was followed. A T test was used to determine the possible bias data. The predicted values measured values were compared with calculating the differences between them, using AV, SE, and R2.

Fig. 5
figure 5

Model application in three WDSs

The uncertainty analysis shows low deviations of predicted data relative to measured data in the three WDS studied with SE and AV between 2.98–4.94% and 6.67–9.80%, respectively. The maximum differences between measured and calculated TTHM in the three WDS were between AVs 14.0 and 26.6%. The largest variance was obtained in Lpa WDS, coinciding with the widest range of values TTHMs (14.8–64.8 μg L−1). The R2 obtained were between 0.82 and 0.96. The new model was tested globally by adding all the data used from validation and application (Fig. 6) obtaining very good predictive capability of TTHM in the water treatment plant (R2 0.940, SE 3.18, AV 7.79%) and distribution system (R2 0.87, SE 4.05, AV 8.30%).

Fig. 6
figure 6

Global predictive model. TTHM

The results obtained shows good adaptability of the two-stage developed model to the boundary conditions of the three supply systems studied, despite their differences in source water, treatment processes, and the characteristics of the distribution systems.

Comparison predictive capability of different models according to the treatment processes

Most empirical DPB model proposed in the literature are based on databases from specific treatment conditions, water quality, and distribution system. As a consequence, this introduces specific value ranges related to boundary conditions and cannot be applied to any real situation (Amy et al. 1987). In the development of predictive models, the treatment process is a critical factor to be taken into account by using databases from different treatment scenarios. To demonstrate the influence of the treatment process on the accuracy and applicability of predictive models of TTHMs, a comparative study of the predictive efficiency of 17 models was performed.

Mathematical expressions of the models were applied by replacing the explicit variables in the database Alj WDS, with groups according to the treatment process (TP1, TP2, and TP3). TP1: pre-chlorination with conventional treatment process; TP2: advanced treatment process with pre-ozonation, inter-chlorination, filtration, and GAC; and TP3: conventional process using potassium permanganate pre-oxidation and chlorination. The prediction results from the different models TTHM were compared with measured values. The comparative statistical analysis was performed using SE, AV, and R2. The predictive models evaluated are shown in Table 7, and the comparative results obtained in Table 8.

Table 7 Predictive models TTHMs evaluated
Table 8 Predictive capability of models according to treatment processes

Of 17 predictive models evaluated, seven models obtained good prediction capability of TTHM with only one treatment process, specially the conventional process with pre-chlorination (TP1); however, high errors were obtained with other treatment processes. Among them the models M11, M7, M3, M2, and M13 obtained small prediction errors, and M14 and M4 obtained moderate errors. Also, other predictive models, developed in WDS with conventional treatment processes, high concentrations of organic matter in water source, and significant seasonal variations (Kumari and Gupta 2015), showed good predictive capacity for the conventional process TP1.

Seven models (M5, M15, M16, M8, M10, M6, and M9) satisfy the boundary conditions with two treatment processes simultaneously, obtaining acceptable predictions for TTHM with the advanced treatment process TP2 and the conventional TP1. The M10 model obtained the best result with the conventional pre-chlorination process and acceptable values with the conventional process with permanganate. The M12 model obtained low predictive capacity with the three treatment processes studied.

Of the 17 models studied, only two (M1 and M17) provided acceptable results in TTHM prediction levels for any treatment process. The M17 obtained the best overall results of predictive capacity in the three treatment models studied (SE 18.66, AV 27.36, and R2 0.86). The prediction results with M1 model are also good but with relatively high errors.

The new model developed in this work provides clearly the best results, with good individual predictive capability in each treatment process (SE 7.97, 6.37, and 11.51%; AV 4.99, 12.13, and 25.06% in TP1, TP2, and TP3, respectively) and overall good predictive capability (SE 8.88, AV 16.63%, and R2 0.94).

According to the results obtained, it was verified that most of the models are specific in their application and no satisfy the boundary conditions of all the treatment processes. Therefore, most of these models cannot be applied globally. The results suggest the need to develop predictive models of DBPs from databases that include different treatment scenarios, obtaining a wide range of application.

Conclusions

In this paper, a predictive model of trihalomethanes formation in two stages (WTP and DS) was developed, which gets good predictive capability for a wide range of scenarios of water treatments and distribution systems. The two-stage model developed predicts with low error, the formation of TTHM in treatment process, and water distribution system from quality and operational variables. The model developed links for the first time the formation of trihalomethanes in the distribution system with the effectiveness of the treatment process applied in the plant. Thus, the model can be used as a useful preventive tool for process treatment control, alerting about setting requirements that prevent high levels TTHM in drinking water.

The new predictive model includes two direct explanatory variables: pH and ion bromide (Br) and two composite variables R EF and R DS associated with the disinfection reaction in WTP and DS, respectively. Both composite variables were calculated as the product of other direct variables: organic matter (UV254), contact time (tEf and tDS), chorine dose (d and D), and temperature (T and TDS).

In this work, it has been verified that the treatment processes applied in the WTP have a high influence on the predictive capability of TTHM in the distribution system. It was shown that an efficient oxidation treatment in the WTP contributes to reduce the range of TTHM values in the distribution system, thus reducing the effect of seasonal and spatial variation, achieving a higher stability of supply water quality. This result underscores the importance of considering the effect of the treatment process on the development of predictive models of DBPs, using databases that include different treatment scenarios.

The strategy of development of two-stage DBP predictive models using data from different treatment processes can contribute to improving the adaptability of future developments of models to different boundary conditions and to increase its range of application.