Introduction

Groundwater is a major source of drinking water available from the natural resources on the earth. Water demand for human, industrial and agricultural activities does not cease to increase in the Sidi Bouzid region, located in the Center of Tunisia, considered an arid region. According to Castellazzi and Schmid (2021), if groundwater abstraction exceeds groundwater recharge for extensive areas, groundwater levels show a significant decline occurring groundwater depletion. Unfortunately, water shortages due to the excessive use and continued mismanagement of water resources have long been a critical constraint to its future development. Therefore, early and respectable groundwater resource management is essential to preserve the natural resource quality. In fact, these problems have been conducted to increase interest in determining the key factors inducing the vulnerability of groundwater resources to anthropogenic factors. The term “vulnerability” refers to a state of change where there is a risk or possibility of any harmful effect on society (Pal et al. 2019).

The concept of groundwater vulnerability was proposed for decades, and the vulnerability maps are critical references in decision-making for land-use planning, groundwater management, monitoring, and remediation (Wang et al. 2012). The groundwater vulnerability notion was initially introduced by J. Margat in 1960 (Albinet and Margat 1970), who stated that aquifer vulnerability is “the possibility of percolation and diffusion of contaminants from the ground surface into natural water-table reservoirs under natural conditions.” However, this definition has gradually changed, e.g., Hrkal (2001) describes groundwater vulnerability as the tendency and the probability that a general contaminant reaches the water table after introduction at the ground surface.

There are two types of groundwater vulnerability: the intrinsic and the specific vulnerability. The first one could be described as the “relative inability” of the soil-rock-groundwater system to protect its water against contamination (Zaporozec 2002). For the second one, in addition to the various components of intrinsic vulnerability, it considers the aquifer vulnerability taking into account the contaminant properties participation in groundwater modifications (Gogu and Dassargues 2000).

Different approaches have been proposed to assess groundwater vulnerability (Javadi et al. 2011a and b; Allouche et al. 2017; Ayed et al. 2017 and 2018; Nasri et al. 2021). These methods are grouped into three categories: process-based simulation, statistical and overlay and index models (Dixon 2004).

The index-overlay methods, such as DRASTIC method, which use a numerical rating system, are based on the selection of several parameters to assess the intrinsic vulnerability using a geographical information system (GIS), which provides efficacious tools to overlay and integrate the different multiple maps (Vu et al. 2019). The DRASTIC method presents the simplest and exhaustively exploited methods which considers seven hydrogeological parameters: depth to groundwater, net recharge, aquifer media, soil media, topography, impact of the vadose zone and hydraulic conductivity. However, the choice of this method depends on the data available to a great extent, the type of aquifer being studied, and takes into account the hydrological, geological, and hydrogeological characteristics of the study area. The plain of Sidi Bouzid, situated in the center of Tunisia, is almost composed of agricultural land. So, the shallow aquifer was exploited in order to satisfy agricultural needs. For this reason, the DRASTIC method is considered the most suitable for the study region. In fact, this method was mostly used for the modification technique by the nitrate concentration, which can be classified into the standard method and specific method.

The use of process-based models, incorporating appropriate scaling procedures, provides an adequate approach to apply a mathematical method that form the key tool. The application of these methods on a regional scale are more difficult.

Statistical methods represent an effective tool to better determine the role of factors having the highest influence on groundwater vulnerability. Moreover, they allow identifying the likely important sources of contamination and assessing the relative probability of contamination occurrence (i.e., groundwater vulnerability), considering the simultaneous presence, or absence, of these factors (Stevenazzi et al. 2017). Such methods, like factorial analysis of correspondence “FCA” and multiple linear regression “MLR,” are capable to classify each parameter that potentially influences the probability of pollution of the aquifer and lead to a score which designates the vulnerability of the groundwater (Focazio et al. 2002; Panagopoulos et al. 2006). Usually, these methods aim to use the fewest predictors to explain the greatest variability in the response variable. Stepwise approaches are used to select the most relevant predictors in regression models (Ouedraogo 2017).

In the current study, we performed the MLR approach to select the predictors. The rationale of this selection method is to combine the measure of fit with a penalty term based on the number of parameters used in the model. If more parameters (i.e., the number of trends or explanatory variables) are used, the model fit can be better, but the penalty for the extra parameters are higher as well (Boy-Roura et al. 2013; Brindha and Elango 2015; Ouedraogo 2017).

Concerning the specific vulnerability assessment to pollution by nitrate, different researchers have made a modification to the original DRASTIC method using nitrate concentrations in order to provide a reliable groundwater vulnerability model (Rupert 2001; McLay et al. 2001; Panagopoulos et al. 2006; Antonakos and Lambrakis 2007; Huan et al. 2012; Javadi et al. 2011a, b; Neshat et al. 2013; Neshat et al. 2014; Sinha et al. 2016; Jmal et al. 2017; Roohollah et al. 2019; Ncibi et al. 2020; Nasri et al. 2021).

The nitrate, originating mainly from agriculture activities, has been considered an indicator of pollution in Sidi Bouzid shallow aquifer. In fact, this anion results from the leaching of nitrogen used to fertilize agriculture land. Therefore, it is necessary to estimate whether certain locations in this aquifer are more susceptible to receive and transmit pollution.

The objective of this study is to assess the most vulnerable areas to nitrate pollution of the Sidi Bouzid region using a comparative study between the standard and the modified DRASTIC methods.

Study area

The Sidi Bouzid shallow aquifer, the subject of this study, covers around 580 km2, and it is located in the western part of Central Tunisia (Fig. 1). The study area was classified among arid regions with an average annual temperature and rainfall of about 19.7 °C and 228 mm, respectively, for the period 1975 to 2018 (NIM 2018). This region is almost composed of agricultural land. The Sidi Bouzid shallow aquifer was exploited in order to satisfy the agricultural needs. As knowing, the agricultural activities development is based on a chemical fertilizers intensive use, leading to groundwater contamination. Nitrate is considered the most widespread chemical contaminant in the Sidi Bouzid shallow aquifers, selected to describe urban and agricultural lands. The highest values of nitrate content (reaching 80 mg/l) were recorded in the North and the East of the study area, characterized by the irrigated croplands and the low water table (Jmal et al. 2017). Figure 2 represents the spatial distribution of nitrate content in the study area.

Fig. 1
figure 1

Localization of the study area

Fig. 2
figure 2

Spatial distribution of nitrate in Sidi Bouzid shallow aquifer (Jmal et al. 2017)

The plain of Sidi Bouzid is relatively flat and covered essentially by the mixing of sand, clay, and gravel intercalations of the Pleistocene age and recent alluvial deposits. The study area boundaries are covered by the marl of Triassic, the clay and gypsum of Cretaceous, and the clay and sand of Neogene and Paleogene (Fig. 3). The Sidi Bouzid shallow aquifer is formed by detrital deposits of the Mio-Plio-Quaternary layer system (Smida 2008).

Fig. 3
figure 3

Geological and piezometric map of the study area (Jmal et al. 2017)

The piezometric study was based on 27 static level measures in 2014. The piesometric map of the Sidi Bouzid shallow aquifer shows head values ranging from 290 to 365 m. It shows a general flow direction going from the South to the North. It shows an appearance of hydrodynamic convergence areas located especially in the North and the Southwest parts of the study area (Fig. 3). This can be explained by the excessive exploitation of the aquifer in these irrigated zones.

The study area is constituted of two among 11 hydrogeological basins in the entire region of Sidi Bouzid: the El Fakka Wadi and Serg Edhiba Wadi basins. The extension of these basins coincides with the study area boundaries.

Material and method

Dataset

To evaluate the groundwater vulnerability to pollution and to develop a vulnerability map, a detailed study was conducted in the Sidi Bouzid region by collecting available data (Table 1). The processing of all these data is performed on ArcMap 10.1, which was considered the tool for creating, viewing, querying, editing, composing, and publishing maps.

Table 1 Data sources of DRASTIC parameters

DRASTIC method

The groundwater vulnerability was almost assessed using the DRASTIC model. It was created by the US Environmental Protection Agency to evaluate the groundwater pollution potential (Rahman 2008). Its acronym is resulting from the initials of its seven parameters: depth to water table, recharge to aquifer, aquifer media, soil media, topography (slope), impact of the vadose zone media (vadose media), hydraulic conductivity of the aquifer (Aller et al. 1987). Each of the above seven DRASTIC parameters is mapped and classified either into ranges or into significant media types, and a numerical rating ranging between 1 (least pollution potential) and 10 (highest pollution potential), is assigned to each factor or parameter. The numerical rating represents the degree of the susceptibility to groundwater pollution. Some hydrogeological parameters are more important than others in determining aquifer vulnerability. Each DRASTIC rating is multiplied by a weighting factor ranging from 1 to 5. The DRASTIC Index is then computed applying a linear combination of all factors according to the following equation:

$${\varvec{V}}(\mathbf{i}\mathbf{n}\mathbf{t}\mathbf{r}\mathbf{i}\mathbf{n}\mathbf{s}\mathbf{i}\mathbf{c})={\mathbf{D}}_{\mathbf{R}}\mathbf{*}{\mathbf{D}}_{\mathbf{W}}+{\mathbf{R}}_{\mathbf{R}}\mathbf{*}{\mathbf{R}}_{\mathbf{W}}+{\mathbf{A}}_{\mathbf{R}}\mathbf{*}{\mathbf{A}}_{\mathbf{W}}+{\mathbf{S}}_{\mathbf{R}}\mathbf{*}{\mathbf{S}}_{\mathbf{W}}+{\mathbf{T}}_{\mathbf{R}}\mathbf{*}{\mathbf{T}}_{\mathbf{W}}+{\mathbf{I}}_{\mathbf{R}}\mathbf{*}{\mathbf{I}}_{\mathbf{W}}+{\mathbf{C}}_{\mathbf{R}}\mathbf{*}{\mathbf{C}}_{\mathbf{W}}$$
(1)

where V (intrinsic) is the DRASTIC index for intrinsic vulnerability.

“D” is the groundwater depth parameter

It is one of the most important factors because it determines the depth of material through which a contaminant must travel before reaching the aquifer. The depth to groundwater was measured in 71 surface wells recorded during March 2014. The depth to water table map is obtained using the Kriging interpolation extension of the Arc Toolbox “spatial analyst tools.”

“R” is the net recharge parameter

It represents the infiltered portion of rainfall water, irrigation water return flow, and absorption wells to migrate down to the Groundwater. We have used the water table fluctuations method (WTF) to determine the recharge values. This method has been shown to be applied to shallow aquifers that show a large rise and falls in water level (Sophocleous 1991). It links the groundwater storage variation and the water table fluctuations:

$${{\varvec{R}}}_{(\mathbf{t}\mathbf{j})} = {\mathbf{S}}_{\mathbf{y}} \times {\mathbf{D}\mathbf{H}}_{(\mathbf{t}\mathbf{j})}$$
(2)

where R(tj) (mm) is the recharge showing between the times t0 and tj, Sy is the specific yield (storage) or the effective porosity of the unconfined aquifer, and DH(tj) is the peak water level rise associated to the recharge period.

“A” is the aquifer media parameter

The aquifer permeability was determined based on the drilling logs of the saturated zone for each well using the horizontal permeability equation (Castany 1982). The aquifer media states the rock types (consolidate and unconsolidated) (Aller et al. 1987).

“S” is the soil media parameter

It controls the recharge rate, which can reach the groundwater and its retention pollutant ability into the vadose zone. It was obtained by digitizing the existing soil maps of the study area, with a scale of 1:50,000. These maps were collected from the Regional Agency of Agriculture Laboratory (CRDA).

“T” is the topography parameter

It refers to the slope of the land surface. The surface slope was derived from the 1:50,000 scale topography maps covering the study area.

“I” is the impact of the vadose zone parameter

It influences on the potential aquifer pollution. It depends on the permeability and the attenuation characteristics of soil cover. The impact of the vadose zone was determined basing on the drilling logs and using the vertical permeability equation (Castany 1982).

“C” is the hydraulic conductivity parameter

The hydraulic conductivity value is controlled by the aquifer properties. It determines the aquifer formation ability to transit water. The hydraulic conductivity values were obtained by using the aquifer thickness and the Transmissivity that is measured from the field pumping test in the Sidi Bouzid region.

The subscripts R and W are the corresponding ratings and weights, respectively. Computed maps result from the multiplication between the selected rating and the assigned weight to each parameter.

Once the DRASTIC Index has been computed, it is possible to identify areas which are more likely to be susceptible to groundwater contamination. Values of DRASTIC index for intrinsic vulnerability may range between 23 and 226 (lowest and highest vulnerabilities, respectively).

Nitrate measurements

The DRASTIC model calibration was established using the nitrate concentration. Thirty-eight surface wells were selected for sampling and analysis, with two nitrate samples taken from each well. The first sampling was realized in March 2013 to determine the correlation coefficient between nitrate concentration and groundwater vulnerability, whereas the second nitrate sampling was conducted in March 2014 to calibrate the model.

Modified DRASTIC method

Aiming for a better prediction, a modification of the DRASTIC method has been elaborated using the primary control parameter (Nitrate) according to Panagopoulos et al. (2006). To use nitrate to calibrate the model, the following conditions should be satisfied (Javadi et al. 2011b):

  • the source of nitrate has resulted from agricultural activities;

  • the area distribution should be homogenous;

  • the nitrate leaching was resulted from the surface recharge over a long period of time to establish the correlation between contamination and human activities.

In the modified DRASTIC method, we applied simple statistical procedures to consider the modification of the rating scales and the factor weights of each parameter, and the addition of the land use parameter. This modification is based on their correlation to nitrates concentration.

Statistical approaches

Chemometric approaches, included particularly correspondence analysis (CA) and quality of representation, have been achieved using the ITCF statistical software package (STATIT-CF 1987) so as to evaluate the investigated variables in the samples associations and to recognize the pollution sources.

Correspondence analysis was applied to identify the sources and the processes that may describe the vulnerability of groundwater analyses. Eigenvector techniques such as CA convert the original variables into a new set of variables (vectors) so that the main portion of data variation is minimized on just two or three of them. Then, this reduced set of vectors can be plotted on a single diagram and grouped in clusters that are formed and interpreted in terms of some physical or chemical process. Fundamentally, vectors are a new set of orthogonal axes geometrically categorized by the eigenvectors and eigenvalues: (1) the elements of the eigenvectors are the angles cosines between the original and transformed axes; (2) the eigenvalues measure the data spread in the vector’s direction (Pacheco 1998).

The representation quality of parameters was established by a coefficient giving the maximum variance (F1 × F2 plane), resulting by the sum of the squares of the correlation coefficients between each parameter and the factorial axis of the considered plan (Dutot et al. 1983; Azri et al. 2009). It was defined by the following equation:

$$\mathrm{QLT} \left(j\right)= \sum_{j=1}^{n}{r}_{i}^{2} (j)$$

where QLT (j): the representation quality of the parameter j (‰); ri: the correlation coefficient between the parameter j and the factorial axis i; n: the number of considered factorial axes.

MLR is one of the most flexible statistical tools available to the researcher. It allows the modeling of multiple influences on an outcome and the understanding of the relationship between a dependent variable and one or more independent variables (Mustapha and Abdu, 2012; Pathak 2012; Basu and abd Lokesh, 2014).

The rationale of this selection method is to combine the measure of fit with a penalty term based on the number of parameters used in the model. If more parameters (i.e., the number of trends or explanatory variables) are used, the model fit can be better, but the penalty for the extra parameters is higher as well.

In the present study, the MLR approach will be used in order to well describe the relationship between the nitrate concentrations and the six other variables which could impact the quality of groundwater quality and accordingly assess the groundwater vulnerability to nitrate pollution in arid and semi-arid regions like the Sidi Bouzid shallow aquifer.

According to these researchers, MLR can be expressed using the following equation:

$$Y= {\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\dots +{\beta }_{m}{X}_{m}+\varepsilon$$
(3)

where Y represent the dependent variable; X1,⋅⋅⋅, Xm represent the several independent variables (predictor variables); β0,⋅⋅⋅, βm represent the regression coefficient; ε represent the random error.

From the above model, we could determine the R2 value, called the coefficient of determination (R-squared) which is considered a statistical metric that is used to measure how much of the variation in outcome can be explained by the variation in the independent variables. R2 always increases as more predictors are added to the MLR model, even though the predictors may not be related to the outcome values. Moreover, R2 gives the percentage of variability in the dependent variable accounted for by all the independent variables together.

Additionally, R2 by itself cannot thus be used to identify which predictors should be included in a model and which should be excluded. R2 can only be between 0 and 1, where 0 indicates that the outcome cannot be predicted by any of the selected independent variables and 1 indicates that the outcome can be predicted without error from the independent variables. A value of 1.0 indicates a perfect fit and is thus a highly reliable model for future forecasts, while a value of 0.0 would indicate that the calculation fails to accurately model the data at all. But a value of 0.20, for example, suggests that 20% of the dependent variable is predicted by the independent variable, while a value of 0.50 suggests that 50% of the dependent variable is predicted by the independent variable, and so forth.

R2 measures the explanatory power of the model, which in turn reflects the goodness of fit of the model. It reflects the model adequacy in the sense of how much is the explanatory power of the explanatory variables.

The R2 value is given as output by the software which will be used.

The most commonly used procedure for selecting the best regression equation is stepwise linear regression analysis (using an F probability of 0.05 for the selected factor), as described by Landau and Everitt (2003) and performed using SPSS.

Results and discussion

Application of the standard DRASTIC model

The major drawback of the DRASTIC method is the particularity of the rating scale and the weighting coefficients determination for the several factors (Table 2). The seven parameters map of the DRASTIC method were, respectively, presented in a raster format from GIS tools (Fig. 4).

Table 2 Assigned weights for DRASTIC features (Aller et al. 1986)
Fig. 4
figure 4

Seven layers of standard DRASTIC model 1 groundwater depth; 2 recharge; 3 aquifer media; 4 soil media; 5 topography; 6 impact of vadose zone; 7 hydraulic conductivity

The depth to groundwater shows values ranging from 10 to 61.5 m which were divided into six intervals. According to this parameter, the East of the Sidi Bouzid shallow aquifer is considered the most vulnerable zone with a low water depth. The computed depth to groundwater map (Fig. 4.1) corresponds to the first component of DRASTIC model.

The resulting map of the net recharge parameter, which was calculated based on the WTF method, reveals three ranges of net recharge: 2.9–5 mm/year, 5–10 mm/year, and 10–17 mm/year. The recharge map shows that the high values of aquifer recharge are recorded in the upstream of the Sidi Bouzid shallow aquifer. This can be explained by the water supply from the El Fakka Wadi during floods. The net recharge map corresponds to the second component of DRASTIC model (Fig. 4.2).

The aquifer rock is composed of Clayey sand, sand, and sand and gravel, which occupies almost all of the study area (Fig. 4.3).

The soil media map consists of mineral soil, rendzina and hydromorphic soil, calcareous brown soil, gypsum soil, soil with little evolution, isohumic soil, halomorphic soil, complex unit soil, and urban zones. The highest rate is attributed to the mineral soil (9), and the lowest rate is attributed to the urban zones (1). The soil media map represents the fourth component of DRASTIC model (Fig. 4.4).

In our case, the topography was divided into six classes (0–3%), (3–5%), (5–10%), (10–15%), (15–25%) and > 25%. Most of the study area is characterized by a slope oscillating from 0 to 3%. The low slope observed in the majority of the study area tends to retain water in the ground surface and favors the infiltration or recharge of water through the vadose zone leading to a great potential for contaminant penetration. The topography map corresponds to the fifth component of the DRASTIC model (Fig. 4.5).

The vadose zone is mainly constituted of sandy clay, clayey sand, fine sand, medium to coarse sand, and gravel with high permeability. The corresponding rates system ranges between 1 and 10. The impact of the vadose zone map characterizes the sixth component of the DRASTIC model (Fig. 4.6).

The hydraulic conductivity values vary between 9.9 10−5 and 2.6 10−5 (m/s), divided into 3 ranges. Rates of 6, 7, and 8 were attributed to this parameter. The hydraulic conductivity map corresponds to the seventh component of DRASTIC model (Fig. 4.7).

The groundwater vulnerability map in the study area results from the sum of the different obtained components using GIS tools. It illustrates the presence of three vulnerability classes (Fig. 5 and Table 3):

  • low vulnerability (74–100) degrees which occupies a very small area (3.5% of the study area). This result can be related to a high groundwater depth (> 33 m) and low permeability of the vadose zone (2.10−7–4.10−4 m/s);

  • moderate vulnerability (100–139) degree, spatially scattered, which covers almost all of the study area (71%). Hydrogeological protection seems to be unspecified;

  • high vulnerability (139–163) degree, which covers 25.5% of the study area. It is located mainly in the East and the Northeast and particularly in the center of the study area. It characterizes the shallow areas (10–18 m) and areas with high permeability of the vadose zone and aquifer. Indeed, high vulnerability classes coincide with the Sebkha and urban zone, where industrial activities are developed.

Fig. 5
figure 5

Distribution of the standard DRASTIC vulnerability and nitrates concentrations for the study area

Table 3 Classes of groundwater vulnerability in Sidi Bouzid area

The corresponding vulnerability map represents a decision support tool for future land uses. For more reliable results, it is essential to be validated after their establishment by dependent parameters in the vulnerability equation but does not act directly on the groundwater vulnerability to pollution.

Validation of the DRASTIC model using nitrate concentration

The not validated model creates biased and erroneous results in assessing an environmental system. Thus, the validation process is necessary to avoid wrong conclusions (Kura et al. 2015; Saidi et al. 2011). Nitrate concentration was used to validate the vulnerability DRASTIC model in the Sidi Bouzid shallow aquifer.

Thirty-eight sampled points taken in March 2013 are used and plotted on a DRASTIC map; the corresponding nitrate concentration for each sample was extracted and divided into five categories according to DRASTIC classes (Fig. 6). The highest nitrate concentration values (79 and 73 mg/l) are located especially in the high vulnerability class, located in the agriculture lands near the urban areas, industrial areas and sebkha. However, the low nitrate concentration ones occupy areas with low and mediocre vulnerability class, found in groundwater around the Kebar and Hamra mountains.

Fig. 6
figure 6

Correlation between the DRASTIC original vulnerability index and groundwater nitrates concentration for the study area

Furthermore, nitrate concentration values and vulnerability index were correlated to evaluate the original DRASTIC method using Pearson’s correlation factor (r). The results of the correlation are low, in order of 44% (Fig. 6), with a P-value equal to 0.005 (Table 4). The p-value obtained from this test is less than 0.01, so the data sets are significantly different at the 99% confidence level. This result is statistically significant with 99% of level confidence. Half of the nitrate concentration values increase with the vulnerability index. However, low nitrate concentration values are not equivalent to a low vulnerability index. This means that the intrinsic vulnerability indexes are weakly correlated with the nitrate concentration values. In the same approach, some authors, such as Panagopoulos et al. (2006) and Neshat et al. (2014), have used the correlation between nitrate concentrations and the vulnerability index to validate their vulnerability model. They found correlation coefficients comparable to the coefficient found in our case study.

Table 4 Correlation between DRASTIC vulnerability index and nitrate concentrations in groundwater in the study region

In order to obtain a better correlation, reflecting actual groundwater vulnerability and a realistic assessment of the potential pollution in the study area, a modification of the original DRASTIC model for groundwater vulnerability using nitrate should be required.

Application of the modified DRASTIC model in the study area

Modification of the rating scales of each parameter

The modified DRASTIC model becomes an effective tool that was suggested for the study area to obtain significant results. As long as the agriculture land occupies the most of Sidi Bouzid plain, the nitrate concentration was used in order to conduct better results in the vulnerability map. In fact, the rating scale for each parameter in the original model was modified using the median of every class and the corresponding median of nitrate concentration. This modification is based on the Wilcoxon rank-sum nonparametric statistical test (Wilcoxon 1945). This latter was used to study the statistical significance of these different classes according to the nitrate concentrations distribution present in each class. It calculates a P-value which judges the significance of the test, consistent to compare this value to a previously defined threshold (traditionally 5%). For the continuous parameters (depth to groundwater, recharge to aquifer, topography and hydraulic conductivity), the neighboring classes may be combined in such cases when the median of these classes did not differ statistically. Groundwater depth parameter combines forth classes (11–14 m, 14–18 m, 18–25 m, and 25–33 m), which does not present a significant statically results (P-value > 0.05) (Fig. 7). In fact, the concentrations are statistically significant between the “25–33 m” and “33–61.5 m” of the groundwater depth classes. However, the nitrate concentrations in the other neighboring classes are not significant (the classes colored in red in the 1st row) (Fig. 8). When two neighboring classes show nonsignificant results, these two classes are combined, and the combined class is tested with another neighboring class, and so forth. The final result of the Wilcoxon nonparametric statistical test, applied to the groundwater depth parameter, shows the assembly of three classes “8–11 m,” “11–33 m,” and “33–61.5 m” (Fig. 8).

Fig. 7
figure 7

Application to the Willcoxon nonparametric test for the groundwater depth parameter

Fig. 8
figure 8

Groundwater depth of the modified DRASTIC model

For the noncontinuous parameters (aquifer media, vadose zone media, and soil media), all the media present in the zone were kept inconsiderate of statistical diversity (Rupert 2001). The box plot diagrams of the seven DRASTIC parameters are presented in Fig. 9. It shows different spatial distributions of nitrate concentrations applied for the different parameter classes. In fact, the box plot diagrams represent the graphical representation of the statistical analysis applied to the different parameters to assess the importance of each class and determine the new rating through the median of nitrate concentrations calculated for each class. It is designed to display the location, spread, skewness, and tail of the data. The precision of some of this functionality is considered to be more reliable for symmetric data types and thus less appropriate for skewed data such as the extreme data. So, the box plot diagrams explain the repartition of nitrate concentrations in each class of parameter and the location of the highest concentrations values exceeding the threshold, equal to 24.5 mg/l.

Fig. 9
figure 9

Box plots showing the distribution of groundwater nitrates concentration for the statistically different classes of all the DRASTIC parameters

The rating system of each parameter was performed according to previous studies (Panagopoulos et al. 2006; Antonakos and Lambrakis 2007; Huan et al. 2012; Javadi et al. 2011b; Neshat et al. 2014; Kazakis and Voudouris, 2015) and the following equation on a scale of 10:

$${\varvec{r}} = ({\mathbf{C}}_{(\mathbf{N}\mathbf{O}3-)} / {\mathbf{C}}_{(\mathbf{N}\mathbf{O}3-) \mathbf{m}\mathbf{a}\mathbf{x})} * 10$$
(4)

where r is the value of the rating, C(NO3) is the median nitrate concentration in the classes of corresponding parameter, and C(NO3) Max is the highest median nitrate concentration in the corresponding parameter.

The ratings obtained from this equation, corresponding to the ratings from the modified DRASTIC model, are illustrated in Fig. 4 and Table 5.

Table 5 Modified ratings of the seven DRASTIC factors

Modification of the factor weights

A modification of the weighting factors is assigned to all the DRASTIC parameters defined in Eq. (1) using the median nitrate concentration for the thirty-eight sampling points of the study area. The discrete (nominal) parameter was correlated with the nitrate concentrations using their factor scores and not the natural range values due to the fact that the factor scores vary with an interval scale (Panagopoulos et al. 2006). The correlation was calculated using the Spearman’s (ρ)and Kendall’s (τ)correlation coefficients (Kendall 1975), which are recommended for such parameters type. The new weighting factors were determined based on these coefficients and after their values were reduced to a scale with a maximum value of 5, as described by the DRASTIC model (Panagopoulos et al. 2006). when the parameter represents coefficients not statistically significant, this later will be excluded from the equation of the vulnerability.

The groundwater depth records the highest values of Spearman’s and Kendall’s correlation coefficients. The result 0.724 and 0.594, respectively (Table 6). The relationship between groundwater depth and nitrate concentrations is very strong. It justifies the importance of this parameter in the groundwater vulnerability to pollution. This relationship is statistically significant at a 99% confidence level.

Table 6 Modified weights of the DRASTIC factors and correlation coefficients between DRASTIC factors and nitrates concentration

Recharge and soil media parameters show the lowest values (Table 6), with a critical probability value (P-value) of about 0.12 and 0.15, respectively. So, it will be excluded from the vulnerability equation because these Spearman’s and Kendall’s coefficients are not statistically significant (statistically comparable). The same conclusion was also reported by other researchers (Panagopoulos et al. 2006; Antonakos and Lambrakis 2007).

Other parameters are statistically significant to 95% confidence level, exception of topography parameter, which does not represent correlation results. The weighting of each parameter was recalculated using the following equation on a scale of 5:

$$\mathbf{w}=(\mathbf{T}/{\mathbf{T}}_{\mathbf{M}\mathbf{a}\mathbf{x}})\times 5$$
(5)

where w is the modified weighting of each parameter, T is the correlation value of each parameter with nitrate concentrations, and TMax is the highest correlation value of all parameters with nitrate concentrations. The parameters, the original weighting, the correlation coefficient values, and the modified weighting factors are indicated in Table 6.

The weights obtained from this equation will be used for the evaluation of the modified DRASTIC model. In fact, the weighting factors are comparable to the standard DRASTIC method with the exception of the “impact of the vadose zone” parameter, which has been weakened although it remains relatively high.

Addition of the land use parameter

The land use was used as a pollutant loading indicator in the groundwater vulnerability. It represents a major impact upon the nitrate concentrations of the groundwater. In order to achieve more objective vulnerability and evaluate groundwater pollution risk, land use type should be combined with the groundwater vulnerability indexes in certain cases.

The initial land use distribution is rated according to the Secunda et al. (1998), including seven classes of Land use (Fig. 10 and Table 7). An initial weight of five has been attributed to the land use parameter due to the potential impact that this factor may have upon the nitrate concentrations determination of the groundwater.

Fig. 10
figure 10

The land use map with the original rating of the study area

Table 7 Original and modified land use ratings

The modification of the rating of each land-use type depended on the revision of the rating scale of the initial land use parameter using the median of nitrate concentrations corresponding to each land-use type, as well as the modification of the DRASTIC parameters’ rating scale (Panagopoulos et al. 2006).

The obtained ratings, which correspond to the ratings from the modified land use parameter, are indicated in Table 7. The box plot diagram for each land use category is illustrated in Fig. 11. It shows different spatial distributions for the three land-use categories with dissymmetric boxes. The irrigated field cropland presents the highest nitrate concentration values (median equal to 51) (Fig. 11). This result was due to the intensive use of chemical fertilizer for this crop type. The olives land has low nitrate concentration values (median in order to 15.5).

Fig. 11
figure 11

Box plot showing the distribution of groundwater nitrates concentration for the land use parameter

The land use parameter records highest values of Spearman’s and Kendall’s correlation coefficients, in the order to 0.766 and 0.620, respectively, statistically significant at 99% level of confidence (Table 6). It shows better relationship between land use and groundwater nitrate concentrations. This parameter is very important in the groundwater vulnerability assessment to pollution.

The land use parameter, the original weighting and the correlation coefficient values are indicated in Table 6.

The modified DRASTIC map was calculated using the new rating and weighting system (Fig. 12). The modified DRASTIC model was classified into three classes of vulnerability:

  • low vulnerability class (48.6–100), which occupies 37% of the study area. The percentage of the low vulnerability degree was brought up than the DRASTIC standard model (Fig. 13). It can be explained by the presence of low nitrate concentration values;

  • moderate vulnerability class (100–139), which covers 23% of the study area. Compared with the DRASTIC standard model, it was decreased due to the shortage of average nitrate values;

  • high vulnerability class (139–193) which represents 40% of the study area, covers mainly the urban area and the agricultural lands.

Fig. 12
figure 12

Distribution of the modified DRASTIC vulnerability and nitrates concentrations for the study area

Fig. 13
figure 13

The percentage of priority between standard DRASTIC and modified DRASTIC

Validation of the modified DRASTIC model using nitrate concentration

The application of the modified DRASTIC method produced a significantly increased correlation coefficient with nitrate concentration, around 82% (Fig. 14). The correlation factor was significant statistically at a 99% confidence level.

Fig. 14
figure 14

Correlation between the modified DRASTIC vulnerability index and groundwater nitrates concentration for the study area

The calibration results using the nitrate concentrations recommended that the modified DRASTIC method meaningfully affects the study area. These significant results of the vulnerability map confirmed that nitrate is crucial in the calibration method because most of the Sidi Bouzid region is occupied by agricultural lands. The distribution of the nitrate concentrations in the study area considers the maximum permissible limit of 50 mg/l recommended by the CRDA. The nitrate concentration values exceeding this limit are located in the highest vulnerability areas (Fig. 12).

The modified DRASTIC map indicates that the high vulnerability class represents 40% of the study area. This class percentage was 25% before the modification. While, the moderate vulnerability class presents 71% and 23 before and after modification, respectively. The low-class area increases from 4% to reach 37 after modification. These improved results explained the effect of the model modification.

In addition, maps comparison shows the index spatial distribution before and after modification (Fig. 13). It indicates that 52% of the results had a similar class, and 48% belonged to other class, thus confirming the success of the proposed model.

The application of the modified model proved the modification effect, because the area of higher vulnerability in the modified DRASTIC map is situated around Sidi Bouzid center, which is characterized by the presence of the urban zone, the Sebeka, the major sources of pollution and the agricultural activities.

Factorial analysis of correspondences

To establish a new vulnerability model based on the modified DRASTIC method, six parameters (groundwater depth, net recharge, aquifer media, soil media, impact of the vadose zone media, and land use) were chosen for the factorial analysis of correspondences (FACs) study.

The topography and hydraulic conductivity parameters were excluded since: (i) topography was considered statistically homogeneous because nitrate concentrations do not show a significant variability under the same slope class (0–5%), and (ii) hydraulic conductivity is comparable to the aquifer media parameter.

In this section, statistical analysis considers the quality of parameter introduced in the FACs model. For this reason, categorical parameters (aquifer media, soil media, impact of the vadose zone media and land use) are specified by the use of the rating classes, extracted from modified DRASTIC model and introduced into used data as binary numbers related to registered classes. While, continuous parameters such as groundwater depth and net recharge have been introduced into the program, as they were (Jmal et al. 2017).

The FACs, applied to all data using the ITCF statistical software package (STATIT-CF, 1987), result basically three principal components. A varimax rotation with Kaiser normalization was applied for all data. After a Student’s test (n = 38), the threshold of significance (p < 0.05) is equal to 0.304. The correlations among the selected parameters (variables) and the components are significant and represent about 62.6% of the total variance. The three first FACs represent 44.52, 18.08, and 8.3% of the total variance, for the first, second, and third FACs, respectively.

The projection over the factorial plane for all selected variables (which displays the maximum of inertia; ≈62.60%) shows a distinct data cluster (Fig. 15).

Fig. 15
figure 15

Distribution of correspondent parameters in the correlation circle (for p < 0.05 and n = 38)

Table 8 represents the correlation matrix between the nitrate concentrations and the different parameters where different correlations have been founded. Results have shown significant correlation between nitrate concentration and the groundwater depth (R2 = 0.708), indicating thus the less groundwater depth corresponding to high potentially nitrate concentration. Moreover, the irrigated fields crop lands (Lu3) and the olive tree cultivation (Lu1) present a correlation coefficient equal to 0.765 and 0.633, respectively, due to the large quantities of chemical fertilizers used, such as nitrate in these cultures. And slightly correlated with the net recharge (R2 = 0.398) and the gravel and coarse sand permeability of the vadose zone (I5), and the medium to coarse sand permeability of the vadose zone (I4) presented a correlation coefficient equal to 0.497 and 0.392, respectively, resulting of the arid climate in the study area.

Table 8 Correlation matrix between the different parameters and the nitrate concentrations

Different correspondent parameters are spatially arranged on the factorial plan (F1 × F2) (Fig. 15), showing two principal groups:

  • the first group (G1) regroups the third class of land use (Lu3), which characterizes the irrigated agriculture areas, and the fifth class of vadose zone (I5), which represents high permeability that has a great effect in the presence of nitrate in groundwater;

  • the second group (G2) includes the D, R, I4, and Lu1 classes, which are negatively correlated with nitrate concentrations. Primary, the deeper the water table, the less the presence of nitrate in groundwater. This explains the antagonistic effect of the groundwater depth parameter (D). Secondly, olive tree cultivation (Lu1) does not require chemical fertilizers such as nitrate. Thirdly, the net recharge (R) and the medium to a coarse sand class of vadose zone media (I4) are negatively correlated with the nitrate concentration due to the low precipitation and soil permeability.

One notices that the other parameters are statistically insignificant. These results explain the major effect of the groundwater depth and the land use parameters in the vulnerability assessment of the Sidi Bouzid shallow aquifer.

To a greater precise analysis, six parameters (groundwater depth, net recharge, aquifer media, soil media, impact of the vadose zone and land use) were introduced in a representation quality (QLT) (Table 9) in order to better testify the most significant parameter in terms of affecting NO3 concentrations in different areas and under different conditions.

Table 9 Representation qualities (QLT) of selected parameters as a function of the two first axes of the (1 × 2) factorial plane (‰) (the threshold of significance = 24.5 mg/l for p < 0.05 and n = 38 (Jmal et al. 2017))

The representation quality of parameters showed that the “groundwater depth” parameter has the best quality of representation (QLT equal to 879.213). As well as the, “land use” parameter has a meaningful quality of representation (QLT equal to 708 and 664.191 for Lu3 and Lu1, respectively). In fact, the role of groundwater depth and land use is reflected not only in their relationship with nitrate concentrations in the groundwater but also in the vulnerability assessment of the Sidi Bouzid shallow aquifer.

Multiple linear regression

In the present study, MLR analysis was applied to the six most significant parameters in the vulnerability hypothesis (caused by nitrate concentration in analyzed water) for all sampling sites. The MLR applied to all selected data has been expressed as follows:

$${{\mathbf{N}\mathbf{O}}_{3}}^{-}=0.380\mathbf{*}\left(\mathbf{O}\mathbf{S}3\right)+0.021\mathbf{*}\left({\varvec{R}}\right)-0.130\mathbf{*}\left(\mathbf{I}4\right)-0.050\mathbf{*}\left(\mathbf{I}5\right)-0.030\mathbf{*}\left(\mathbf{O}\mathbf{S}1\right)-0.007\mathbf{*}({\varvec{D}})+1.434$$
$${{\varvec{R}}}^{2}=0.754$$
(6)

The above model shows the R2 value, which gives the percentage of variability in the dependent variable accounted for by all the independent variables together. According to the R2 statistic, 75.4% of the total variance for the estimation of NO3 is explained by the MLR model.

The positive sign of the beta coefficients in the case of OS3 and R indicates that there is a positive relationship between NO3 concentrations and these variables. However, the MLR model shows a negative sign of the beta coefficients in the case of I4, I5, OS1, and D, indicating a negative relationship between them and NO3 concentrations. Such results show to be in good accordance with previous ones.

After all, the obtained specific vulnerability model, characterizes the Sidi Bouzid shallow aquifer using nitrate concentrations as pollutant factor. Several authors have used other specific methods in the study region, such as DRASTICU (Safa et al. 2020), studying the “assessment of urban groundwater vulnerability in arid areas: case of Sidi Bouzid aquifer (central Tunisia).” SVI method was used by Ncibi et al. (2021) for determining the “spatial variation of groundwater vulnerability to nitrate pollution under excessive fertilization using index overlay method in Central Tunisia (Sidi Bouzid basin).” The contribution of the modified DRASTIC method consists of the modification of the rates and the weights of each parameter using the nitrate concentrations. As well as the elimination of hydrogeological parameters which does not affect directly the water table, leading primarily to five parameters by the application of this method and conducting to two parameters by the application of the FACs model. The groundwater depth and land use have an important role in the groundwater vulnerability to pollution in this area. finally, the considered specific vulnerability model characterizes the study region using a little number of hydrogeological variables.

Conclusion

In the study area, the DRASTIC model application provided a relative satisfactory intrinsic vulnerability evaluation of groundwater to contamination. Given the pollution caused by the use of the chemical fertilizers and presented by the nitrate leaching from the surface to the groundwater. In fact, the intrinsic vulnerability map, using the standard DRASTIC method, shows that the study area is subdivided into three vulnerability classes: low class (3.5% of the study area), medium (71% of study area), and high vulnerability class located (25.5% of the total area). This latter characterizes especially the east part of the study area. The original DRASTIC model required modification to obtain more accurate results. It was applied to the DRASTIC method by the revision of the rating and weighting scale of each parameter and the addition of land use parameter using nitrate concentration. The modified DRASTIC method was established to estimate the specific vulnerability and the pollution risk of nitrates in Sidi Bouzid shallow aquifer. This method subdivides the study region into three different vulnerability zones: a zone of low vulnerability covering 37% of the study area, a zone of medium vulnerability distributed to the north and the northeast of the region, which occupies 40% of the total surface, and the high vulnerability class, with 23% of the study area.

The Pearson’s correlation of nitrate concentrations improved by 82% compared to the original DRASTIC method (44%), resulting from the use of nitrate concentration in the model calibration. Therefore, the nitrate concentration could be used as a modifying parameter to improve the vulnerability index. This modification could lead to more realistic management of groundwater quality.

Statistical analysis applied to determine the relationship between NO3 concentrations and selected parameters proves significant results principally for groundwater depth and land use parameters.

Computing data showed that nitrate concentration was successfully used as a modifying parameter with substantial improvement in the resulting index. This method is suggested for agricultural areas characterized by extensive nitrate use.