Introduction

Water scarcity is a common feature of our modern world and this threat is predicted to be worse in the future (Alcamo et al. 1997; MED WS &D WG 2007). Groundwater makes up about 60% of the world’s freshwater supply, which is about 0.6% of the entire world’s water (EPA 2009). Groundwater also is recognized as one of the most valuable natural resources, immensely important and a dependable source of water in all climatic region all over the world (Todd and Mays 2005; Carreira 2010). This situation may induce severe water crisis (Varghese et al. 2012). The accurate prediction of groundwater quality is essential for sustainable utilization and management of vital groundwater resources either on local or regional scales. In addition, for an effective watershed management strategy, water quality assessment is essential in ensuring the protection of groundwater resources from an unavoidable climate change impact and other problems like industrial revolution, urbanization, agricultural increases, etc. (Subba Rao 2008; De Fraiture and Wichelns 2009; Liao et al. 2012). To that end, it is useful to provide information on water quality, classification of water for various purposes, identification of different groundwater aquifers, assessment of groundwater potential, and investigation of different chemical processes (Trabelsi et al. 2007; Liu et al. 2008; Chenini and Khemiri 2009). Thus, water-quality indicators must reflect mineralization process, integrate reservoir properties, and be sensitive to groundwater recharge rate and flow direction (Andre et al. 2005; Subba Rao 2008). The water chemistry is an important factor determining its use for domestic and irrigation purposes (WHO 1984; Memon et al. 2011). The chemical composition of groundwater is controlled by many factors that include the composition of precipitation water, climate, way of groundwater flow through the rock types, topography of the region, saline water intrusion in coastal areas, and human activities on the ground surface (Reghunath et al. 2002; Trabelsi et al. 2007; Subba Rao 2008; Liao et al. 2012; Arslan 2013; Bhat and Jeelani 2015). These factors can combine to create diverse water types that change in composition spatially and temporally. Different techniques have been used in an attempt to evaluate water quality, essentially based on chemical ions’ correlation and some ions’ rapports (Pazand and Pazand 2014; Ben Alaya et al. 2014).

Therefore, the predictor models offer a cost-effective option to water-quality management, and they have the potential to be applied elsewhere. Multiple regression analysis is a statistical methodology that utilizes the relation between two or more quantitative variables so that a response or outcome variable can be predicted from the others. Predictor models can be used to supplement regular monitoring by identifying areas that need health warnings or more frequent monitoring and are useful between sampling periods (USEPA 2010). This methodology is widely used in the assessment of same-contaminant indicators: viral and bacterial water pollution (Yates et al. 1985; González-Ramón et al. 2012; Gonzalez and Noble 2014; Herrig et al. 2015), harmful substances (Ozekin 1994; Golfinopoulos and Arhonditsis 2002; Yu et al. 2015), nitrate (Liao et al. 2012), metals (Manzoor et al. 2006; Kumaresan and Riyazuddin 2007; Ahsan et al. 2008), and rare earth elements (Janssen and Verweij 2003). Nevertheless, only a limited number of studies focused on the TDS parameter as a groundwater-quality indicator. WHO (1984) considered the TDS as the criteria for the classification of groundwater for domestic purposes.

Given the aforementioned context, the main objective of this work was to investigate the origins of mineralization and to test and validate the applicability of multiple regression analysis for estimating groundwater quality based on the TDS. The model developed was based on chemical analysis data from Nadhour-Saouaf aquifer samples. The data set was divided into a developing set, used for setting up the models, and an independent set, used for the validation. Then, a sensitivity analysis is performed to evaluate the reliability and uncertainty of the estimated TDS values.

Materials and methods

Study area

The Nadhour-Saouaf syncline, located in the southeastern Zaghouan City in Tunisia, extends over an area of 400 km2 and lies between mountain ranges in the north and the northwest and the alluvial range in the south (Fig. 1). The topography varies from 78 to 923 m (Fig. 1). The climate is mostly semiarid, with hot dry summer and wet winter. The mean annual rainfall is about 400 mm; this is much lower than the potential evaporation which exceeds 1560 mm/year The mean annual temperature is around 18 °C. This region has a rather unstable climate with irregular rainfall quantity and highly variable spatial distribution.

Fig. 1
figure 1

Study area and groundwater sampling wells locations

Geology and hydrogeology

The geological outcrops in the study area range from Jurassic to Quaternary. The Quaternary deposits largely present to the south of Zaghouan area and the Plio-Quaternary formed the syncline fillings, whereas Miocene showed a thick sequence of lignite to the north and medium to coarse sandstone to the south (Hamza 1990; Kacem 2008). The upper Oligocene unit, situated in the bordering edges of infiltration, is mainly composed of coarse sandstone and clay. The Lower Eocene deposits are composed of limestone beds. The Upper Eocene deposits, a thick sequence of clay and marl, constitute the superficial aquifer substrate, outcropping near the borders of the syncline. The topographic heights of Mesozoic formations dominate the north of the studied area (Fig. 2a, b). The hydrogeological SW-NE cross section reveals an aquifer system featured by a synclinal structure and shows a depositional sequence of three main hydrostratigraphic series: Plio-Quaternary formed by conglomerate interstratified by sand and clay beds, the Miocene sandstone layer, and highly permeable Oligocene sandstones interstratified by sand and gravel beds (Fig. 3). The transmissivity ranged between 0.012 and 0.036 m2/s with a hydraulic conductivity of 30 × 10−6 to 8.7 × 10−3 m/s (SCET 2002). In the Nadhour-Saouaf watershed, the aquifer has been exploited since the 1980s. In the 2000s, the number of pumping wells increased rapidly, and consequently the extraction volume reached 5.2 × 106 m3 in 2007. Thus, since 1982, the groundwater level has continuously declined with a maximum drawdown of 4.5 m during the period of 1995–1996. As a direct impact of increased water abstraction, the groundwater level decline was spread over the entire aquifer. Consequently, the groundwater becomes overexploited as its natural recharge by rainwater cannot maintain a safe yield situation. To improve the natural recharge regime of the water table, an artificial groundwater recharge infrastructure was implemented (Saadine Essahel and El Ogla hill dams) which consists of releasing water downstream the hill dams in the wadi bed.

Fig. 2
figure 2

Geological map (a) and synthetic litho-stratigraphic column of the study area (b)

Fig. 3
figure 3

Hydrogeological cross section along transect BB′ (see Fig. 1 for location).

Naturally, this aquifer is recharged by water flowing from the surrounding mountains, as well as by local infiltration from several rivers (Zammouri and Feki 2005). The groundwater flow direction is toward the over-pumping zone in the SE area (Fig. 4). The highest piezometric levels are at the NE of the study area, and the lowest ones are close to the Nadhour area. Water of the Nadhour-Saouaf aquifer is used erratically by different economic sectors. However, drinking-water supply and irrigation remain the primary use of groundwater resources.

Fig. 4
figure 4

Piezometric map of Nadhour-Saouaf aquifer in 2010

Groundwater sampling and analysis

Groundwater samples from 36 pumping wells in the Nadhour-Saouaf multilayered aquifer system were collected in 2010. All samples were obtained from existing water-supply boreholes used for domestic and agricultural purposes. The samples were divided into three main groups corresponding to three reservoir levels: the Plio-Quaternary, the Miocene, and the Oligocene. An attempt was made to choose sampling locations that are uniformly distributed throughout the study area (Fig. 1). The samples were collected after 10 min of pumping and stored in polyethylene bottles. Immediately after sampling, temperature (T°C) and electrical conductivity (EC) were measured with a conductivity meter. pH was measured using a pH meter. Chemical analyses of major elements (Na+, Ca2+, Mg2+, K+, Cl, SO4 2−, and HCO3 ) and nitrates (NO3 ) were carried out in the Laboratory at the National School of Engineers of Sfax (LARSEN) employing standard methods reported in Table 1. The total dissolved solids (TDS) in mg/l were determined gravimetrically.

Table 1 Used analytical methods

The analytical data quality was ensured through careful standardization; the ionic charge balance of each sample was within ± 5%.

Multiple linear regression

One of the classical problems in engineering investigation is to make predictions. Usually, such predictions require a formula to be found which explain a relationship between a response variable and a set of regression variables (Tabari et al. 2011). Multiple linear regression (MLR) analysis is commonly used to describe quantitative relationships between the dependent variable and one or more independent variables (Shirsath and Singh 2010). This method is successfully used by different authors to establish statistical models (Ghasemi and Saaidpour 2007).TDS is a common parameter in water-quality monitoring as it constitutes an excellent indicator of groundwater quality. Therefore, MLR method provides an equation linking the independent variables Vi ([Na], [Mg], [Ca], [K], [SO4], [HCO3], [Cl], and [NO3]) to the dependent variable Vd (TDS) for this case using a relationship of the following type:

$$ Vd={\beta}_0+{\beta}_i{V}_{i1}+\bullet \bullet \bullet +{\upbeta}_n{V}_{in} $$
(1)

where the intercept (β 0) and the regression coefficients of descriptors (β i ) are determined by least square method (Green and Carroll 1996) and n is the number of groundwater samples.

Before establishing the statistical model, the selection of influential (significant) input variables has been applied. Generally, all of the potential input variables are not equally informative, because some variables may be correlated, noisy, or have no significant relationship with the output to be modeled (Maier and Dandy 1998; Hassan et al. 2014). The selection of model inputs among chemical properties of sampled groundwater was carried out according to multiple variable correlations. The technique of multivariate data analysis gives the correlation coefficient r for each pair of variables. A correlation coefficient r is a number between − 1 and + 1, which measures the strength of the linear relationship between two variables. For r > 0.7, 0.5 ≤ r ≤ 0.7, and r < 0.5, the variables were considered, respectively, strongly correlated, moderately correlated, and poorly correlated. The standardized skewness and standardized kurtosis were used to determine whether the sample comes from a normal distribution. Values of these statistics outside the range of − 2 to + 2 indicate significant departures from normality, which would tend to invalidate many of the statistical procedures normally applied to this data. A variable with standardized skewness and standardized kurtosis values outside the expected range is not influential. Then, variables were retained in the model on a significance level of P < 0.05.

The effectiveness of the developed MLR models was measured by a set of standard statistical indicators and the ANOVA table (Makridakis et al. 2008), namely, multiple correlation coefficient (R 2), adjusted R 2, P value, mean absolute error (MAE), and Durbin–Watson statistic (DWS). The relationships between variables were established using the stepwise regression method (Bernstein 1988; Thayer 2002). Multiple variable analysis and multiple regression modeling were performed using STATGRAPHICS XVI.I statistical program (SPT 2009).

Results and discussion

Groundwater chemistry

A statistical summary of the total dissolved solids (TDS) and major ion and nitrate concentrations in the groundwater are presented in Table 2. The TDS values range from 0.9 to 1.8, 0.7 to 1.8, and 0.7 to 2.4 g/l, respectively, for the Plio-Quaternary, Miocene, and Oligocene levels. In order to identify the geochemical processes responsible for the groundwater salinization, the major elements versus TDS and ions versus saturation index was analyzed.

Table 2 Descriptive statistics of water-quality parameters

TDS versus major elements

In order to identify the geochemical processes contributing to the groundwater salinization in the multilayered Nadhour-Saouaf aquifer system, the relationship between concentrations of major elements and TDS was considered (Fig. 5). For the Plio-Quaternary level, these diagrams indicate a well-defined correlation characterizing the relationship of Ca2+, SO4 2−, Na+, and Cl versus TDS with correlation coefficients of 0.83, 0.82, 0.81, and 0.79, respectively. The relationships suggest these major elements’ contribution to the water mineralization of the Plio-Quaternary level. Except the HCO3 , all major elements and TDS are well correlated for the Miocene level which indicates that the total salt content is mainly controlled by the concentrations of these elements, with correlation coefficients exceeding 0.85. Major elements Cl, Na+, SO4 2, and Mg2+ are well correlated with TDS, with correlation coefficients of 0.9, 0.88, 0.8, and 0.7, respectively, which can further elucidate elements controlling mineralization in the Oligocene level.

Fig. 5
figure 5

Major elements concentration versus TDS relationships

Relationship between ions

In order to underline the mechanisms of groundwater mineralization, relationships between the major elements were investigated. The Na+/Cl relationship shows a high correlation with relatively high concentrations in ions Na+ compared to ions Cl as a consequence of halite dissolution as a major process of mineralization (Fig. 6a). The excess of Na+ relative to Cl may be explained by the intervention of other processes like cation exchange between groundwater and the clay fraction of the aquifer material (Trabelsi et al. 2007; Kraiem et al. 2012). SO4 2− versus (Ca2++ + Mg2+) shows a good positive correlation for all samples of the various aquifer levels (Fig. 6b) due to the probable cation exchange by generation of a Ca2+ deficiency relatively to SO4 2− concentration. Figure 6c shows that dolomite dissolution is probably the source of Mg2+ and Ca2+ in addition to other sources of theses ions for all aquifer levels. The plot of (Ca2+ + Mg2+) versus (HCO3  + SO4 2−) shows that all Miocene samples and some samples from both Oligocene and Plio-Quaternary are placed close to the 1:1 line indicating that Nadhour-Saouaf groundwater mineralization is controlled by gypsum dissolution. A great number of Plio-Quaternary groundwater samples and a few ones of the Oligocene level are placed in the right due to an excess that happens to be of HCO3 and SO4 indicating that groundwater mineralization is controlled by ion exchange in addition to mineral dissolution (Fig. 6d).

Fig. 6
figure 6

Ion relationship. a Na/Cl. b (Ca + Mg)/SO4 2−. c (Ca + Mg)/HCO3. d (Ca + Mg)/(HCO3  + SO4 2−)

Furthermore, the referred exchange is confirmed through the plot of ((Ca2+ + Mg2+) − (HCO3) + SO4 2−)) versus (Na+ + K+ − Cl) as shown in Fig. 7, revealing an inverse proportional evolution with a slope of about − 1 (Mc Lean et al. 2000; Dassi 2004; Kamel et al. 2005; Kraiem et al. 2012). In the absence of these reactions, all data should plot close to the origin (Mc Lean et al. 2000). Figure 7 shows that the groundwater samples are distributed on both sides indicating that Nadhour-Saouaf groundwater mineralization is also controlled by ion exchange and reverse ion exchange process with clay minerals present in the aquifer material.

Fig. 7
figure 7

(Ca + Mg) − (SO4 + HCO3)/(Na + K − Cl) relationship

Saturation index

The saturation indices (SI) of calcite, dolomite, halite, anhydrite, and gypsum were calculated using the code Diagrammes 6.51. The results are presented in Fig. 8. SI evaluates the level of equilibrium between minerals of water and rocks (Elango et al. 2003). If SI is greater than zero (SI > 0), the solution is saturated relative to the mineral, and then, precipitation from the groundwater is theoretically possible. When SI is below zero (SI < 0), the solution is undersaturated and dissolution continues. If SI is equal to zero (SI = 0), the mineral would be either precipitating or dissolving (Qiyan and Baoping 2002; Trabelsi et al. 2007).

Fig. 8
figure 8

Relationships between saturation indices and major elements in the analyzed groundwater samples. a IS (calcite)/(Ca + HCO3 ). b IS (dolomite)/(Ca2 + Mg2 + + HCO3). c IS (anhydrite)/(Ca2 + + SO4 2−Na+ + Cl). d IS (gypsum)/(Ca2 + + SO4 2−). e IS (halite)/(Na+ + Cl)

As shown in Fig. 8a, b, all samples are supersaturated with respect to calcite and dolomite except for a few ones. In the cases of gypsum, anhydrite, and halite, SI values indicate undersaturation, suggesting that their soluble component Na, Cl, Ca, and SO4 concentrations are not limited by mineral equilibrium (Guler et al. 2002). For the Miocene and Oligocene levels, the saturation indices of anhydrite and gypsum vary in inverse proportion to the sum of (Ca2+ + SO4 2−) (Fig. 8c, d).

The saturation indices of halite versus (Na+ + Cl) indicate that water evolves from a state close to undersaturated with respect to halite (Fig. 8e). Not only calcite precipitation but also gypsum, anhydrite, and halite mineral dissolution are the principal reactions that determine the chemical evolution in Nadhour-Saouaf aquifer system.

Nitrates

Nitrate is a very important parameter for assessing the contamination of groundwater (Einsiedl and Mayer 2006; Ameur et al. 2016). The limit of nitrate content in water is set by the World Health Organization (WHO) at 50 mg/l. High concentrations of nitrates can cause serious health problems. In the study area, this parameter range between 0 and 54 mg/l, with a nitrate average of 8 mg/l. Only one sample exceeds the nitrate concentration limit (50 mg/l), established by the WHO standards.

MLR model development

In developing the MLR model, an appropriate number of explicative variables required to provide a more precise simulation results were determined. This approach was used to extract related variables controlling groundwater quality. The range of the considered variables, the average, the standard deviation, the standard skewness, and the standard kurtosis are shown in Table 2. The standardized skewness and standardized kurtosis outside the range of − 2 to + 2 are in bold showing irrelevant variables in groundwater quality. For the Plio-Quaternary aquifer level, all variables can be considered as significant model inputs. The Na was deleted from the input variable list for the Miocene level and the Mg, Cl, and NO3 were excluded from the list of variable considered in the Oligocene level.

The results of MLR analysis for the three aquifer levels in Nadhour-Saouaf syncline using a stepwise regression technique are summarized in Table 3. The “beta” (standardized regression coefficients) values show the relative contribution of each independent variable in the prediction of groundwater quality. The “P value” refers to the significant variables, which are included in the regression equation.

Table 3 MLR models and their performance

After fitting the MLR models to the data set, an assessment is made of the adequacy of the fitting for each model. The value of R 2 is 0.96, showing that about 96% of the total variations in the TDS can be accounted for the independent variables for the Plio-Quaternary aquifer level. The fitted model 1 retained three explicative variables (Ca, Cl, and NO3). For the Miocene water samples, MLR model 2 required other explicative variables (Cl, SO4, and Mg). The high R 2 shows that about 98% of the total variations in the TDS have been explained by these variables. However, for the Oligocene level, only tow explicative variables (Ca and Na) appear obligatory to explain the TDS variation by the model 3. The value of R 2 is 0.98, showing that about 98% of the total variations in the TDS can be accounted for the independent variables. Goodness of fit between measured and simulated TDS was assessed by the mean absolute error. A low MAE was recorded for three models. The autocorrelation of residuals was checked by using the Durbin–Watson statistic (Makridakis et al. 2008). The value of Durbin–Watson statistic for the residuals obtained at Plio-Quaternary, Miocene, and Oligocene levels range between 1.54 and 3.34, which is significant at a 95% confidence interval, thereby satisfying the condition of no autocorrelation at lag 1 in the groundwater samples of these aquifer levels. These diagnostic checks were employed for all of the three aquifer levels, and the results indicated that all MLR models satisfy the basic assumptions of MLR technique (Courville and Thompson 2001).

In order to test whether the dependent variable TDS is related to predictor variables, the ANOVA table was used (Williams 2015). Since P value is less than the significance level (5%), it indicates a TDS related to predictor variables for all aquifer levels. Table 4 shows that at the 5% significance level, whether it appears that any of the predictor variables can be removed from the full models is unnecessary. The entire coefficients for the models are significant, i.e., P value of the t statistic for each coefficient is less than significance level (5%), so all the considered variables are useful as predictors of dependent variable TDS.

Table 4 Statistical results obtained from the MLR analysis

For the qualitative evaluation of the models’ performance, the results of graphical indicators are concerned; a comparison of measured and predicted TDS is shown in Fig. 9a, b, and c for the Plio-Quaternary, Miocene, and Oligocene aquifer levels, respectively. It is apparent from this figure and in Table 5 that all models showed reasonable correlation between measured and predicted TDS. Hence, the produced MLR models are viable tools for monitoring assessment of groundwater-quality status which can enhance groundwater resources management in the area.

Fig. 9
figure 9

Comparison of measured and simulated TDS for a Plio-Quaternary level, b Miocene level, and c Oligocene level

Table 5 Statistical results of measured and predicted TDS in Nadhour-Saouaf aquifer system

Sensitivity analysis

The groundwater-quality assessment was carried out using chemical data for each aquifer level. Unfortunately, there is a relatively great deal of the uncertainty in the process of chemical parameter analysis. Thus, the water quality uncertainties are compounded by uncertainties relating to the chemical parameters explaining TDS variation for each aquifer level. In the light of these probable uncertainties, a sensitivity analysis was conducted for the MLR models. The aim of this was to investigate the relative importance of each input variable for accurately predicting groundwater quality. This analysis was carried out for all MLR models by imposing certain changes on individual inputs and observing their effects on the model output. According to the weak quantity of NO3 in the Plio-Quaternary water samples, the effect of its uncertainty on the TDS assessment is not included in this analysis. The magnitude of the perturbation was ± 10% with respect to the original data, while keeping the other inputs at their original values and then calculating the change in the model output TDS. The uncertainties associated with the chemical parameters were evaluated by computing a defined relative sensitivity (S) as AS/CP. The CP is the relative change of a given variable or parameter, defined as |V s  − V m |/V m  × 100, and AS is the relative change in the output TDS value, defined as |C s  − C m |/C m  × 100. V s and V m are variable values used for sensitivity and MLR model original value, respectively, and C s and C m are output data TDS computed in sensitivity and generated MLR model, respectively (Jiménez-Martínez et al. 2010). Figure 10a, b, and c present the results of the sensitivity analyses for the MLR models of Plio-Quaternary, Miocene, and Oligocene level, respectively. The variation of the sensitivity values with the inputs at each water sample are shown for all aquifer levels. It can be seen from Fig. 10a that in the Plio-Quaternary level groundwater, all the inputs have reasonably low values of sensitivity (the rank ranging between 0.05 and 0.64). Compared to Ca, the variation of the Cl affects significantly the TDS at this level. However, at the Miocene aquifer level, inputs Cl and SO4 have an irregular value of sensitivity as compared to those for the Mg input (Fig. 10b). It has a low sensitivity value (< 0.2). Therefore, the greatest change in the TDS of Miocene water samples is not due to perturbations in the input Mg but also to variations in the Ca and Cl. For almost all the Oligocene water sample, the TDS sensitivity trend to Na was deduced (Fig. 10c). Therefore, the inputs having the highest sensitivity should be considered with greater accuracy so as to ensure reliable prediction of TDS by the MLR model.

Fig. 10
figure 10

Sensitivity of the MLR models to the input variables for a Plio-Quaternary level, b Miocene level, and c Oligocene level

External validation of the TDS models

The next task was to conduct a validation test of the calibrated models with data different from those used for model formulation (Table 3). The prediction was made for each aquifer level separately based on the previously defined predictor variables. Figure 11a–c depicts the predicted TDS values and the TDS data derived from evaluation of the various groundwater samples based on the MLR models of Plio-Quaternary, Miocene, and Oligocene aquifer levels, respectively. Thus, Fig. 11 shows “perfect” simulation. TDS predicted for Nadhour-Saouaf aquifer are in reasonable agreement with the measured data.

Fig. 11
figure 11

Validation of the TDS-models for a Plio-Quaternary level, b Miocene level, and c Oligocene level

Conclusions

Investigations of groundwater quality have been based on two approaches: (i) hydrochemical characterization using major elements and relationships, (ii) statistical modeling with MLR for simulating and predicting of TDS using physicochemical parameters (pH, EC, and TDS); major ions concentration [HCO3], [Cl], [SO4], [Ca], [Mg], [Na], and [K]; and nitrates. Calcite precipitation, gypsum, anhydrite, and halite dissolutions are the main reactions inducing the chemical evolution in Nadhour-Saouaf aquifer system, in addition to direct and reverse ion exchange with clay minerals. Then, for MLR approach, the standard protocols for MLR modeling, as well as all the pertinent and influential input variables, were used to achieve this goal. The performance of the MLR models developed for three aquifer levels was assessed both quantitatively and qualitatively by using appropriate statistical and graphical indicators. Analysis of the results indicated that the developed TDS models provide accurate prediction of TDS with considerably high values of R 2 and lower values of MAE at a 95% significance level. The fitted MLR models are reasonably good for all aquifer levels, despite their low residuals. The MLR technique has important practical advantages such as the fact that implementation is much easier and less time-consuming compared to other predictor methods. Consequently, the MLR technique can serve as an alternative and cost-effective tool for groundwater-quality prediction. The methodology presented in this study is very useful in groundwater management and protection and may be easily applied in other regions.