Mapping specific groundwater vulnerability to nitrate using random forest: case of Sais basin, Morocco

Lahjouj, Abdelhakim; El Hmaidi, Abdellah; Bouhafa, Karima; Boufala, M’hamed

doi:10.1007/s40808-020-00761-6

Mapping specific groundwater vulnerability to nitrate using random forest: case of Sais basin, Morocco

Original Article
Published: 04 April 2020

Volume 6, pages 1451–1466, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Modeling Earth Systems and Environment Aims and scope Submit manuscript

Mapping specific groundwater vulnerability to nitrate using random forest: case of Sais basin, Morocco

Download PDF

Abdelhakim Lahjouj^1,2,
Abdellah El Hmaidi¹,
Karima Bouhafa² &
…
M’hamed Boufala¹

642 Accesses
31 Citations
Explore all metrics

Abstract

The objective of this study was to assess the groundwater vulnerability to nitrate (NO₃⁻) pollution in the Sais basin, based on the drinking threshold (50 mg/L), using the random forest (RF) model. A spatial dataset consists of the nitrate concentrations observed in 154 water samples and 14 explanatory variables was considered in this research. These variables are rainfall, texture (sand, silt, and clay), lithology, organic matter, piezometric level, altitude, land use, calcium carbonate (CaCO₃), carbon/nitrogen ratio (C/N), slope, hydraulic gradient, and soil classification. 80% of the dataset was randomly selected for training and validation, and the remaining 20% for testing the RF model. The RF model was validated and tested using out-of-bag (OOB) error and receiver operating characteristic (ROC) curve. The error computed and the area under the curve for success rate were 0.11 and 82.2%, respectively. In addition, the RF result revealed that rainfall, sand content, clay content, piezometric level, organic matter, and lithology are the key factors determining groundwater vulnerability to NO₃⁻ in the Sais basin. However, using only these most important factors as RF inputs, the prediction accuracy was found to be slightly similar to that obtained using all variables. The groundwater vulnerability maps were created using the groundwater vulnerability indexes predicted. The most reliable groundwater vulnerability maps to NO₃⁻ showed that about 48 and 63% of the surface area of the basin are under high to very high vulnerability level, using all and most important explanatory variables, respectively. This study serves to determine the most vulnerable areas and to identify the factors affecting NO₃⁻ pollution in the Sais basin, to properly control and protect groundwater.

Locating groundwater artificial recharge sites using random forest: a case study of Shabestar region, Iran

Article 28 June 2019

A machine learning framework for spatio-temporal vulnerability mapping of groundwaters to nitrate in a data scarce region in Lenjanat Plain, Iran

Article 11 June 2024

Groundwater quality assessment using random forest method based on groundwater quality indices (case study: Miandoab plain aquifer, NW of Iran)

Article 05 September 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Groundwater is an important natural resource. It constitutes the main source of water for industries and irrigated agriculture in the arid and semiarid areas (Nampak et al. 2014). The effective quality and quantity management of groundwater has become a major issue, since climate change, rapid population increase, and overuse of groundwater for irrigation can have major effects on groundwater. Therefore, to ensure the sustainable management of groundwater, the assessment of groundwater resources and associated pressure at the local scale are strongly required (Hasiniaina et al. 2010).

Nitrate (NO₃⁻) is the most abundant pollutant in groundwater (Laftouhi et al. 2003; Moore et al. 2006). Indeed, NO₃⁻ concentrations increase with increasing and intensification of agricultural activities due to the overuse of nitrogen fertilizers (Nolan 2001; Puckett et al. 2011; Ki et al. 2015). Consequently, the consumption of water polluted by NO₃⁻ can be associated with health problems, such as methemoglobinemia and cancers for adults (Ward et al. 2005).

In the Sais basin, two aquifers are present: the lias and the plioquaternary aquifers. Their main uses are mainly for drinking and irrigation purposes. These aquifers have been the subject of several geomorphological, geological, hydrogeological, and geophysical studies (Taltasse 1953; Chamayou et al. 1975; Fassi 1999; Essahlaoui et al. 2001; Amraoui 2005). The plioquaternary aquifer is more heavily used for irrigation and drinking of the rural population, due to its shallow depth compared to the lias aquifer.

The geological and hydrogeological characteristics of the Sais basin may contribute positively to the NO₃⁻ pollution of the plioquaternary aquifer (Tabyaoui et al. 2003). Therefore, the assessment of its vulnerability degree can be an important tool for groundwater resource management, which allows determining the most affected area in the basin or presents a high risk of contamination by NO₃⁻.

Groundwater vulnerability is defined as the degree of protection that the natural environment provides against groundwater pollution (National Research Council 1993). In fact, there are two types of groundwater vulnerability: The first type is the intrinsic vulnerability, which is assessed based on the characteristics of the natural environment, including aquifer, soil and climatic characteristics (Schnebelen et al. 2002). However, this type of vulnerability is considered static and invariable. Several methods have been proposed for assessing the intrinsic vulnerability, among others DRASTIC, GOD, SI, and SINTACS frameworks (Ghazavi and Ebrahimi 2015; Al-Shatnawi et al. 2015; Baghapour et al. 2016; El Himer et al. 2013). The second type is the specific vulnerability which concerns a specific pollutant or group of pollutants. This type is assessed using the intrinsic properties of the basin and the characteristics of the pollutant as well as anthropogenic factors related to the pollutant (Ribeiro et al. 2017). The specific vulnerability is assumed to be dynamic and closer to reality. Unlike the first type, the specific vulnerability can changes over time.

In recent years, machine learning techniques such as artificial neural network (ANN), support vector machine (SVM), random forest (RF), and decision tree (CART) have been applied in several fields. The RF model is robust and easy to apply compared to other machine learning techniques, it has the particularity to determine the importance of each explanatory variable in the prediction result. Besides, the RF model can provide good results compared to the multivariate statistics or other machine learning methods such as SVM and ANN (Breiman 2001; Liaw and Wiener 2002; Loosvelt et al. 2012; Ouedraogo et al. 2018).

In groundwater research, RF method has been used to predict NO₃⁻ and arsenic (As) concentrations in groundwater (Anning et al. 2012; Wheeler et al. 2015) and to assess the groundwater vulnerability (Rodriguez-Galiano et al. 2014; Mendes et al. 2016). These studies revealed that RF has a good prediction performance.

To the best of our knowledge, there are no previous studies that have assessed groundwater vulnerability using machine learning in Morocco. Furthermore, no study aimed to assess the specific groundwater vulnerability to NO₃⁻ in the Sais basin. However, Sadkaoui et al. (2013) have applied intrinsic methods in the Sais basin to assess groundwater vulnerability. Nevertheless, the rating proposed by some intrinsic frameworks such as DRASTIC (Aller et al. 1987) may differ depending on the study area specificities. Additionally, intrinsic vulnerability may ignore some important parameters which may affect the groundwater vulnerability. Consequently, the RF model may be a novel technique for the groundwater vulnerability assessment in Morocco.

The main objective of this study was to develop an accurate RF model to assess the specific groundwater vulnerability to NO₃⁻ of the plioquaternary aquifer of the Sais basin, using 14 parameters that may contribute to NO₃⁻ pollution.

The output of this research will contribute to:

1.
Identify the most vulnerable areas to NO₃⁻ pollution in the Sais basin;
2.
Determine the most important factors that control the groundwater vulnerability to NO₃⁻ pollution of the plioquaternary aquifer.

Materials and methods

Research area

The research area is the Sais basin, part of the Fez-Meknes region in Morocco (Fig. 1). The surface area of the basin is approximately 2100 km². The basin is located between the latitude 33°38′ to 34°4′N and longitude 5°49′ to 4°53′W. It is limited by the middle atlasic ranges in the south and the rife ranges in the north (Fig. 2). The geological setting is mainly dominated by the lacustrine limestone of the lias. The altitude of the study area varies between 185 m in the north and 1047 m in the south at the middle atlas ranges, with an average of 600 m. The study area is characterized by a Mediterranean climate (Amraoui 2005). The mean annual rainfall recorded by three stations located at Douyet (Northeastern of the basin), Meknes and Ain Taoujdate during the period 1981–2018 is 468 mm. The Sais basin is characterized by high agricultural activity due to good soil fertility. The agriculture is conducted under rainfed and irrigated conditions. Moreover, the Sais basin contains several lithological classes, including sandstone, siltstones, marlstone, alluvium and oncolite limestone, representing, respectively, 39, 18.4, 18, 11.6, and 10.6% of the total surface area of the basin.

In the Sais plain, two aquifers are distinguished: lias aquifer which constituted by dolomitic limestone, and plioquaternary aquifer, composed of pliovillafranchien sandstone, conglomerate sand, and lake limestone (Essahlaoui et al. 2001; Tabyaoui et al. 2004; Amraoui 2005; Belhassan et al. 2010), the latter has a substratum from the upper Miocene with a depth exceeds 1000 m in some parts of the northeastern Sais basin. The recharge of the plioquaternary aquifer is done mainly by rainfall and irrigation water infiltration, as well as by the drainage of the lias aquifer from the southern part of the Sais basin (Sadkaoui et al. 2013).

Random forest

Random forest is a supervised nonparametric machine learning method, developed by Breiman (2001). It is based on multiple trees’ decision algorithm (Rodriguez-Galiano et al. 2014; Catani et al. 2013; Micheletti et al. 2013). The method is used for data prediction and interpretation purposes. The RF model can be divided into a classification tree and a regression tree (Zabihi et al. 2016).

The RF can compute an unbiased error estimated by bootstrapping (Siroky 2009). The dataset used for RF is divided into two parts: training sub-dataset containing 2/3 of dataset randomly chosen with replacement, and validation sub-dataset containing the remaining 1/3. The validation sub-dataset is called out-of-bag (OOB) (Breiman 2001; Catani et al. 2013). The latter can be used to assess the prediction performance of RF and the input variables importance. In addition, RF presents some other interesting characteristics which justify its application in the groundwater vulnerability assessment:

It can manage both categorical and numerical variables;
It can determine the importance of each explanatory variable in the prediction result;
It can learn complex patterns, without a linear relationship between the explanatory variables and dependent variable;
It can handle outliers’ data;
It can handle a large dataset with high dimensionality;
Its implementation is less complex compared to other machine learning techniques such as ANN and SVM.

The RF model uses two methods to assess the importance of explanatory variables used in the prediction. The first one is called the mean decrease accuracy (MDA), which is an indirect measure of the effect of each explanatory variable on the prediction accuracy (Calle and Urrea 2010). To compute MDA, RF uses the out-of-bag (OOB) dataset and permute each explanatory variable while others are fixed. Increasing the RF model error percentage indicates that the permuted variable is important (Naghibi et al. 2017). The RF model error is calculated from OOB sub-dataset based on the following formula (Grömping 2009):

$${\text{OOB }-\text{ MSE}} = \frac{1}{{n{\text{OOB}},t}}\sum\limits_{\begin{subarray}{l} i = 1 \\ t \in {\text{OOBt}} \end{subarray} }^{n} {(yi - \hat{y}i)^{2} }$$

(1)

where $yi$ and $\hat{y}i$ are, respectively, the observed and the mean of the predicted values from all trees; nOOB is the number of OOB observations in tree $t$ and $i$ is the OOB observation for the tree. Therefore, MDA can be an accurate tool for variable selection.

The second method is the mean decrease in the GINI, based on the heterogeneity decrease defined from the entropy. This tool determines the importance of explanatory variable $j$. It is the weighted sum of the decreases in the node heterogeneity, averaged over all trees using the GINI index. The GINI index can be used to explain the variable strength used as input in the RF model (Al-Abadi and Shahid 2016). The higher GINI value assigned to a variable indicates that it is more important in the prediction compared to other variables (Yang et al. 2019).

Observed nitrate concentrations

A total of 154 water samples of the plioquaternary aquifer in the rural area of the Sais basin were collected for NO₃⁻ analysis. Sampling campaigns were carried out in the spring and autumn seasons of 2013 (56 samples) and 2018 (98 samples). The samples were collected and stored at 2–4 °C and then analyzed within 24 h using the UV-Spectrophotometeric method. The distribution of observed NO₃⁻ concentrations in the different sampling campaigns is shown in Fig. 3. The mean NO₃⁻ concentrations were 60 and 64 mg/L in 2013 and 77 and 70 mg/L in 2018, respectively, in the Spring and Autumn season. Overall, the highest NO₃⁻ concentrations were observed in the north, northwestern and central parts of the basin.

Explanatory variables

In order to assess the groundwater vulnerability to NO₃⁻ in the Sais basin, a total of 14 explanatory variables related to the intrinsic and specific groundwater vulnerability to NO₃⁻ were used as RF model inputs (Fig. 4). All variables were mapped using geographic information system (GIS). Table 1 presents the 14 explanatory variables, their data sources, and their estimations methods. These variables are rainfall, texture (sand, silt, and clay), lithology, organic matter, piezometric level, altitude, land use, Calcium carbonate (CaCO₃), Carbon/nitrogen ratio (C/N), slope, hydraulic gradient, and soil classification. All variables were compiled within a 500-m-radius circular.

Table 1 Explanatory variables used in the RF model

Full size table

The explanatory variables were selected based on the following reasons:

The slope is an important parameter that controls the runoff. A low slope contributes to water retention and therefore increases the probability of groundwater contamination (Tilahun and Merkel 2009).
The altitude was selected based on the hydrogeology of the Sais basin. A part of the plioquaternary recharge is provided from the lias aquifer in the southern part of the basin, where the altitude is high. Which may contribute to the diminution of NO₃⁻ pollution by dilution.
The piezometric level indicates whether the NO₃⁻ can rapidly reach the groundwater surface. The shallower water depth can increase the probability of NO₃⁻ contamination (Stigter et al. 2005).
The rainfall contributes positively to groundwater recharge, which leads to the leaching of soil NO₃⁻ (Aslam et al. 2018).
The hydraulic gradient is related to the groundwater flow direction (Rodriguez-Galiano et al. 2014). Which may contribute to the NO₃⁻ accumulation.
Lithology can affect groundwater quality. It influences the facility of pollutant transfer to the aquifer (Chenini et al. 2015).
Soil classification and texture can influence NO₃⁻ loss. NO₃⁻ leaching may be more important in sandy soils (Ahirwar and Shukla 2018). The texture components (sand, silt, and clay) were introduced in the RF model separately, to determine the most important component.
Organic matter and C/N ratio are considered as parameters to be parameters related to the soil nitrogen cycle, which can contribute to NO₃⁻ losses. Moreover, Berdai et al. (2004) have considered these two parameters as important in the specific groundwater vulnerability to NO₃⁻.
Calcareous soils are characterized by high CaCO₃ content. The latter is considered as a factor dominating the ammonification and nitrification processes, which may increase NO₃⁻ leaching (Zarabi and Jalali 2012; Kutiel and Shaviv 1992).
Land use is a parameter that represents a potential anthropogenic factor related to NO₃⁻ pollution. (Huang et al. 2017).

Modeling approach using RF

The groundwater vulnerability is generally understood as a contamination probability. Therefore, to obtain the groundwater vulnerability map to NO₃⁻, the first step was rescaling the NO₃⁻ concentrations. The observed NO₃⁻ concentrations dataset observed in the 154 samples were divided into two groups, based on the threshold value of 50 mg/L. Concentrations that exceed the threshold were given a value equal to 1 (nitrate pollution) and concentrations lower or equal the threshold value equal to 0 (no nitrate pollution). The rescaled NO₃⁻ concentrations were used in the RF as output variable, while specific and intrinsic parameters as input variables. Secondly, the dataset (input and output) were split randomly into two sub-datasets. The first sub-dataset which contains 80% of dataset, was used for the training and validation and the remaining 20% was used for the testing of the RF model. It should be mentioned that RF model split the first sub-dataset (80% of dataset) into two groups, 2/3 for training and the remaining 1/3 for validation purposes. Figure 5 shows the methodology flowchart used for this study. The distribution of the training, validation, and testing samples are shown in Fig. 6. The RF implementation requires the number of trees and the number of variables (m) used to determine the split at each node. Breiman (2001) recommends using m number close to 1/3 of all input variables. For this study, we used a maximum of 10,000 trees and we tested different numbers of random input variables (m_try = 1, 2, 3 and 4) at each node. The optimal m_try is one that computes the lowest error. The RandomForest package in R software (V 1.1.4) was used for the RF model. The optimal m_try was determined using the TuneRF function in R software.

Two modeling approaches based on variable importance were used for this study. The first approach (RF1), added all explanatory variables selected (14 variables) as model input. The second approach selected the most important variables in the RF1 model result and used them as input for a new RF implementation (RF2). The predicted values obtained by both RF models were considered as Groundwater Vulnerability Indexes (GVI).

The validation and the testing are essential steps in any study aimed at modeling using machine learning techniques. First, the GVI predicted by RF1 and RF2 were validated and compared based on the error computed by each m_try used, we retained the result with the lowest error. Second, the predictive accuracy of the RF model was tested using the receiver operating characteristic (ROC) analysis, through the ROCR package in R Software. The ROC curve allows calculation of the Area Under the Curve (AUC). The ROC plots the false-positive rate on the X-axis and the true positive rate on Y-Axis. It explains the trade-off between the two rates (Sezer et al. 2011; Ozdemir and Altural 2013; Akgun 2011). The classification of the prediction accuracy based on AUC can be described as follows: AUC > 0.9, excellent; 0.8 < AUC < 0.9, very good; 0.7 < AUC < 0.8, good; 0.6 < AUC < 0.7, average and 0.5 < AUC < 0.6, poor (Pourghasemi and Kerle 2016; Bradley 1997; Fawcett 2006).

Mapping groundwater vulnerability to nitrate

After the validation and testing, the vulnerability maps were created using all GVI predicted by RF1 and RF2 models, through the Kriging interpolation method in GIS. The most reliable interpolation retained, is the one that generated the lowest error. However, the average of some GVI coincident values was used in the interpolation.

The GVI were categorized into four vulnerability classes namely low, medium, high and very high. The most accurate map was obtained by comparing the different classification methods proposed by the GIS (quantile, natural breaks, geometrical interval, and equal interval), using the Spearman rank correlation (ρ) and one-way ANOVA, between the vulnerability classes and the observed NO₃⁻ concentrations. All the statistical tests were carried out by R Software.

Results and discussion

Random forest results

Accuracy of the random forest

Figure 7 shows the error computed as function of the number of trees, for each explanatory variable randomly sampled (m_try) at each node, using the RF1 model. From this result, it can be observed that the error decreased when more trees are used. In fact, from 2000 trees, the error of each m_try was low and stable. The same result observed for the other m_try used. However, the m_try that computed the lowest error was 4, which is consistent with that recommended by Breiman (2001). Furthermore, the mean error value obtained was 0.1100, with a minimum and maximum values of 0.1091 and 0.1545, respectively.

Selection of the most important explanatory variables

The variable importance of the RF model is a particular output indicator of the relative contribution of each input variable in the prediction result. The comparison of variable importance was based on MDE (% increase in MSE) and the mean decrease in the GINI (% Increase in node purity). The importance of each explanatory variable is presented in Fig. 8. The high value indicates that the variable is more important.

As shown in Fig. 8a, the relative increase in MSE obtained was relatively high for all explanatory variables. It varies between 55 and 130.3%. This finding indicates that all explanatory variables selected are considered to be controlling factors to groundwater NO₃⁻ pollution. Nevertheless, rainfall, sand, clay, piezometric level, organic matter, and lithology are the most important explanatory variables. The same result was obtained using the mean decrease in GINI (Fig. 8b) with different importance ranks.

According to the MDE results, the rainfall has the highest importance in GVI prediction, followed by sand and clay contents, with a value of 130, 118, and 116%, respectively. These results can be explained by the fact that rainfall contributes to groundwater recharge and therefore contributes to the NO₃⁻ leaching. Indeed, the areas where NO₃⁻ concentrations are high are located within areas containing high soil sand content, mainly in the central and western parts of the basin. Regarding clay importance, the result can be explained by its capability to protect groundwater against NO₃⁻ contamination due to its high retention capacity. The piezometric level was considered also as an important variable with a value of 115%. Concerning the importance of the organic matter, the result shows that the increase in MSE was 103.5%. This finding suggests that groundwater may receive high loads of organic nitrogen. NO₃⁻ leaching increases as a result of high mineralization in the case of high soil organic matter content (Hoffmann and Johnsson 1999; Kulabako et al. 2007). The same importance value was observed for lithology. However, the silt, CaCO₃, C/N and altitude have revealed medium importance. In contrast, land use, slope, hydraulic gradient, and soil classification are the less important parameters, with values of 54.91, 72.86, 76.81, and 76.82%, respectively.

According to the RF1 importance result, we selected the most important variables (Increase in MSE above 100%) as input for RF2, which are: rainfall, sand content, clay content, piezometric level, organic matter, and lithology.

The result revealed that the error tendency is relatively similar to the RF1 result. The lowest errors were computed from 2000 trees (Fig. 9). However, the best m_try for RF2 was 2, which computed the lowest error compared to other m_try. The mean error value obtained was 0.1099 with a minimum and maximum values of 0.1083 and 0.1750, respectively. Therefore, using the most important parameters can decrease slightly the OOB error.

Relative operating characteristics (ROC) curve

The ROC curve plots for both RF models are shown in Fig. 10. The AUC results are quite similar for both RF models. The AUC were 0.822 and 0.82, which correspond to the prediction accuracy of 82.2 and 82% for RF1 and RF2 models, respectively. Therefore, both RF models produce very good prediction performance.

Mapping groundwater vulnerability to nitrate

As seen in Fig. 11, the predicted GVI increase significantly as a function of observed NO₃⁻ concentrations, these findings were similar for both models (RF1 and RF2). However, the predicted GVI obtained showed that RF2 predicts more accurately GVI compared to RF1. The predicted values range from 0.003 to 0.998 and 0.0019–0.999 for RF1 and RF2, respectively. Therefore, the removal of the less important explanatory variables caused a slight increase in the GVI prediction accuracy. This finding was consistent with the OOB errors computed.

The GVI predicted using both RF models were classified according to four vulnerability classes (low, medium, high and very high). The comparison between the classification methods based on the Spearman rank correlation (ρ) and Eta coefficient (η), showed that geometric interval and equal interval are considered as the most appropriates methods in RF1 and RF2, respectively (Table 2 and Table 3). These classification methods were used to create vulnerability classes.

Table 2 Comparison between classification methods applied to RF1

Full size table

Table 3 Comparison between classification methods applied to RF2

Full size table

The observed NO₃⁻ concentrations according to the vulnerability classes obtained are presented as a boxplot in Fig. 12. These plots summarize the observed NO₃⁻ concentrations by a central point which indicates the median, a box to indicate the variability around the median (25th and 75th percentiles), whiskers around the box to indicate the range of variables and the points to indicate the outliers’ values. It can be observed that the vulnerability classes show their suitability for observed NO₃⁻ concentrations. The low class presents the lowest concentrations, while very high class contains the highest NO₃⁻ concentrations. However, the comparison between two RF models (Table 2 and Table 3), showed that the RF2 model was more reliable in GVI prediction, the Spearman rank correlation (ρ) and Eta coefficient (η) between vulnerability classes and observed NO₃⁻ concentration were up to 0.6645 and 0.2837, respectively, which are relatively greater than those obtained by the RF1 model (0.6547 and 0.2800, respectively).

The vulnerability maps obtained using both RF models are shown in Fig. 13. It shows that the northern, central, northeastern and western parts of the basin are the areas where the groundwater vulnerability to NO₃⁻ is classified as high to very high. These two classes cover, respectively, 25.04 and 22.9% of the total area, for RF1 and 36.38 and 26.5% for RF2 (Table 4). In these areas, the annual rainfall varies between 430 and 550 mm, while the sand content varies between 40 and 84%. Regarding clay content, it varies between 2 and 30%. As for the organic matter, the content varies between 1.5 and 5%. Moreover, three lithological classes are dominants in these areas, namely sandstone, marlstone, and oncolite limestone.

Table 4 Statistics of the groundwater vulnerability surface area

Full size table

Concerning the medium vulnerability class, it occupies 27.71 and 26.14% of the total area, respectively for RF1 and RF2. This class is located mainly in some eastern, northern and western parts of the basin. However, the area that presents a low vulnerability does not exceed 24.80 and 11% of the total area, for RF1 and RF2, respectively, and located mainly in some southern (middle atlas limits) and eastern parts of the basin. However, these areas are characterized by high clay content. The latter varies between 30 and 52%. Moreover, two lithological classes are dominants, which are marlstone and siltstone.

Based on these results, the RF model provides good performance in the determination of groundwater vulnerability, this is due to its ability for learning non-linear relationships between NO₃⁻ concentrations and explanatory variables used in this study. However, the groundwater vulnerability maps to NO₃⁻ obtained can be improved continuously over time, when new input variables are considered, such as groundwater recharge and nitrogen fertilizer application.

Conclusion

Improving water management strategies need a robust method to assess groundwater vulnerability. The present study aimed to develop an accurate RF model for the prediction of groundwater vulnerability to NO₃⁻. The observed NO₃⁻ concentrations in the Sais basin were rescaled to 0 and 1, based on the drinking threshold of NO₃⁻ (50 mg/L). The predicted values were considered as GVI. Fourteen explanatory variables related to the intrinsic and specific groundwater vulnerability were used as inputs in the RF model. These variables were rainfall, organic matter, soil texture (sand, clay, and silt), altitude, lithology, land use, C/N ratio, piezometric level, CaCO₃, slope, hydraulic gradient, and soil classification. The OOB-error and AUC were 0.1100 and 82.2%, respectively. Moreover, the study revealed that all explanatory variables used are considered to be controlling factors to groundwater NO₃⁻ pollution, with differing importance degrees. In fact, the rainfall, sand content, clay content, organic matter, piezometric level, and lithology were the most important predictors of GVI. Moreover, using only these important parameters as RF input showed that the OOB-error and AUC were of 0.1099 and 82%, respectively. The comparison between the observed NO₃⁻ concentrations and the vulnerability classes obtained showed that the RF2 model can produce slightly more accurate groundwater vulnerability map.

The results revealed that about 48 and 63% of the total surface area are under high to very high vulnerability to NO₃⁻, using RF1 and RF2, respectively. While about 27.7 and 26.1% of the surface area are in medium vulnerability, and 24.8 and 11% of the surface area are in low vulnerability, using RF1 and RF2, respectively.

Base on the RF results, the most important factors in the prediction result should be taken into consideration when recommending nitrogen fertilization since the agricultural activity is intense in the Sais basin.

Nevertheless, NO₃⁻ pollution can be affected by other variables related to the biogeochemical process, overuse of nitrogen fertilizers and the land use change. Consequently, including these factors in the RF model may also improve the groundwater vulnerability map to NO₃⁻ in the Sais basin.

The current study is a novel application of machine learning technique in groundwater vulnerability assessment in Morocco. In the future, the RF model performance can be compared with other machine learning methods. This study will provide valuable information for groundwater management in the study area.

References

Ahirwar S, Shukla JP (2018) Assessment of groundwater vulnerability in upper Betwa river watershed using GIS based DRASTIC model. J Geol Soc India 91(3):334–340. https://doi.org/10.1007/s12594-018-0859-0
Article Google Scholar
Akgun A (2011) A comparison of landslide susceptibility maps produced by logistic regression, multi-criteria decision, and likelihood ratio methods: a case study at İzmir, Turkey. Landslides 9(1):93–106. https://doi.org/10.1007/s10346-011-0283-7
Article Google Scholar
Al-Abadi AM, Shahid S (2016) Spatial mapping of artesian zone at Iraqi southern desert using a GIS-based random forest machine learning model. Model Earth Syst Environ. https://doi.org/10.1007/s40808-016-0150-6
Article Google Scholar
Aller L, Bennett T, Lehr JH, Petty RH, Hackett G (1987) DRASTIC: a standardized system for evaluating groundwater pollution potential using hydrogeologic settings. USEPA Report 600/2- 87/035, Robert S. Kerr Environmental Research Laboratory, Ada, Oklahoma
Al-Shatnawi AM, El-Bashir MS, Khalaf RMB, Gazzaz NM (2015) Vulnerability mapping of groundwater aquifer using SINTACS in Wadi Al-Waleh Catchment, Jordan. Arab J Geosci. https://doi.org/10.1007/s12517-015-2080-4
Article Google Scholar
Amraoui F (2005) Contribution à la connaissance des aquifères Karstiques cas du Lias da la plaine du Sais et du causse moyen atlasique tabulaire. Université Hassan II Ain Chock, Faculté des Sciences, Casablanca, Maroc, Thèse de Doctorat d’Etat, p 249p
Google Scholar
Anning DW, Paul AP, McKinney TS, Huntington JM, Bexfield LM, Thiros SA (2012) Predicted Nitrate and arsenic concentrations in basin-fill aquifers of the southwestern United States. US Geological Survey Scientific Investigations Report 2012–5065
Aslam RA, Shrestha S, Pandey VP (2018) Groundwater vulnerability to climate change: a review of the assessment methodology. Sci Total Environ 612:853–875. https://doi.org/10.1016/j.scitotenv.2017.08.237
Article Google Scholar
Baghapour MA, Nobandegani AF, Talebbeydokhti N, Bagherzadeh S, Nadiri AA, Gharekhani M, Chitsazan N (2016) Optimization of DRASTIC method by artificial neural network, nitrate vulnerability index, and composite DRASTIC models to assess groundwater vulnerability for unconfined aquifer of Shiraz Plain, Iran. J Environ Health Sci Eng. https://doi.org/10.1186/s40201-016-0254-y
Article Google Scholar
Belhassan K, Hessane MA, Essahlaoui A (2010) Interactions eaux de surface–eaux souterraines: bassin versant de l'Oued Mikkes (Maroc). Hydrol Sci J 55(8):1371–1384. https://doi.org/10.1080/02626667.2010.528763
Article Google Scholar
Berdai H, Soudi B, Bellouti A (2004) Contribution à l’étude de la pollution nitrique des eaux souterraines en zones irriguées: Cas du Tadla. Revue H.T.E. N° 128-Mars
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159. https://doi.org/10.1016/s0031-3203(96)00142-2
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Calle ML, Urrea V (2010) Letter to the editor: stability of random forest importance measures. Brief Bioinf 12(1):86–89. https://doi.org/10.1093/bib/bbq011
Article Google Scholar
Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide susceptibility estimation by random forest technique: sensitivity and scaling issues. Nat Hazards Earth Syst Sci 13:2815–2831. https://doi.org/10.5194/nhess-13-2815-2013
Article Google Scholar
Chamayou J, Combe M, Genetier B, Leclercn C (1975) Le bassin de Fès-Meknès, ressource en eau du Maroc. Notes et mémoire Service Géologique, Maroc, Rabat
Google Scholar
Chenini I, Zghibi A, Kouzana L (2015) Hydrogeological investigations and groundwater vulnerability assessment and mapping for groundwater resource protection and management: state of the art and a case study. J Afr Earth Sc 109:11–26. https://doi.org/10.1016/j.jafrearsci.2015.05.008
Article Google Scholar
El Himer H, Fakir Y, Stigter TY, Lepage M, El Mandour A, Ribeiro L (2013) Assessment of groundwater vulnerability to pollution of a wetland watershed: the case study of the Oualidia-Sidi Moussa wetland, Morocco. Aquat Ecosyst Health Manag 16(2):205–215. https://doi.org/10.1080/14634988.2013.788427
Article Google Scholar
Essahlaoui A, Sahbi H, Bahi L, El-Yamine N (2001) Reconnaissance de la structure géologique du bassin de Saiss occidental, Maroc, par sondages électriques. J Afr Earth Sci 32(4):777–789. https://doi.org/10.1016/s0899-5362(02)00054-4
Article Google Scholar
Fassi O (1999) Les formations superficielles du Saiss de Fès et de Meknès des temps géologique à l’utilisation actuelle des sols. Notes et mémoire Services Géologique, Maroc, Rabat, n°389.
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Article Google Scholar
Ghazavi R, Ebrahimi Z (2015) Assessing groundwater vulnerability to contamination in an arid environment using DRASTIC and GOD models. Int J Environ Sci Technol 12(9):2909–2918. https://doi.org/10.1007/s13762-015-0813-2
Article Google Scholar
Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319. https://doi.org/10.1198/tast.2009.08199
Article Google Scholar
Hasiniaina F, Zhou J, Guoyi L (2010) Regional assessment of groundwater vulnerability in Tamtsag basin, Mongolia using drastic model. J Am Sci 6(11):65–78
Google Scholar
Hoffmann M, Johnsson H (1999) Environ Model Assess 4(1):35–44. https://doi.org/10.1023/a:1019087511708
Article Google Scholar
Huang L, Zeng G, Liang J, Hua S, Yuan Y, Li X, Dong H, Liu J, Nie S, Liu J (2017) Combined impacts of land use and climate change in the modeling of future groundwater vulnerability. J Hydrol Eng 22(7):05017007. https://doi.org/10.1061/(asce)he.1943-5584.0001493
Article Google Scholar
Ki MG, Koh DC, Yoon H, Kim H (2015) Temporal variability of nitrate concentration in groundwater affected by intensive agricultural activities in a rural area of Hongseong, South Korea. Environ Earth Sci 74(7):6147–6161. https://doi.org/10.1007/s12665-015-4637-7
Article Google Scholar
Kulabako N, Nalubega M, Thunvik R (2007) Study of the impact of landuse and hydrogeological settings on the shallow groundwater quality in a peri-urban area of Kampala, Uganda. Sci Total Environ 381(1):180–199
Article Google Scholar
Kutiel P, Shaviv A (1992) Effects of soil type, plant composition and leaching on soil nutrients following a simulated forest fire. For Ecol Manag 53(1–4):329–343. https://doi.org/10.1016/0378-1127(92)90051-a
Article Google Scholar
Laftouhi NE, Vanclooster M, Jalal M, Witam O, Aboufirassi M, Bahir M, Persoons E (2003) Groundwater nitrate pollution in the Essaouira Basin (Morocco). C R Geosci 335(3):307–317. https://doi.org/10.1016/s1631-0713(03)00025-7
Article Google Scholar
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22
Google Scholar
Loosvelt L, Petersb J, Skriverc H, Lievensa H, Van Coillied FMB, De Baets B, Verhoesta NEC (2012) Random forests as a tool for estimating uncertainty at pixel-level in SAR image classification. Int J Appl Earth Obs Geoinf 19:173–184. https://doi.org/10.1016/j.jag.2012.05.011
Article Google Scholar
Mendes MP, Rodriguez-Galiano V, Luque-Espinar JA, Ribeiro L, Chica-Olmo M (2016) Applying random forest to assess the vulnerability of groundwater to pollution by nitrate. Geo ENV 2016. In: The 11th international conference on geostatistics for environmental applications. Lisbon, Portugal. geoENV2016BookofAbstractsMPM
Micheletti N, Foresti L, Robert S, Leuenberger M, Pedrazzini A, Jaboyedoff M, Kanevski M (2013) Machine learning feature selection methods for landslide susceptibility mapping. Math Geosci 46(1):33–57. https://doi.org/10.1007/s11004-013-9511-0
Article Google Scholar
Moore KB, Ekwurzel B, Esser BK, Hudson GB, Moran JE (2006) Sources of groundwater nitrate revealed using residence time and isotope methods. Appl Geochem 21(6):1016–1029. https://doi.org/10.1016/j.apgeochem.2006.03.008
Article Google Scholar
Naghibi SA, Ahmadi K, Daneshi A (2017) Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resour Manag 31(9):2761–2775. https://doi.org/10.1007/s11269-017-1660-3
Article Google Scholar
Walsh ES, Kreakie BJ, Cantwell MG, Nacci D (2017) A Random Forest approach to predict the spatial distribution of sediment pollution in an estuarine system. PLoS ONE 12(7):e0179473. https://doi.org/10.1371/journal.pone.0179473
Article Google Scholar
Nampak H, Pradhan B, Manap MA (2014) Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. J Hydrol 513:283–300. https://doi.org/10.1016/j.jhydrol.2014.02.053
Article Google Scholar
National Research Council (1993) Ground water vulnerability assessment: predicting relative contamination potential under conditions of uncertainty. The National Academies Press, Washington, D.C.
Google Scholar
Nolan BT (2001) Relating nitrogen sources and aquifer susceptibility to nitrate in shallow ground waters of the United States. Ground Water 39(2):290–299. https://doi.org/10.1111/j.1745-6584.2001.tb02311.x
Article Google Scholar
Ouedraogo I, Defourny P, Vanclooster M (2018) Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale. Hydrogeol J. https://doi.org/10.1007/s10040-018-1900-5
Article Google Scholar
Ozdemir A, Altural T (2013) A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey. J Asian Earth Sci 64:180–197. https://doi.org/10.1016/j.jseaes.2012.12.014
Article Google Scholar
Pourghasemi HR, Kerle N (2016) Random forests and evidential belief function-based landslide susceptibility assessment in Western Mazandaran Province, Iran. Environ Earth Sci. https://doi.org/10.1007/s12665-015-4950-1
Article Google Scholar
Puckett LJ, Tesoriero AJ, Dubrovsky NM (2011) Nitrogen contamination of surficial aquifer-a growing legacy. Environ Sci Technol 45:839–844. https://doi.org/10.1021/es1038358
Article Google Scholar
Ribeiro L, Pindo JC, Dominguez-Granda L (2017) Assessment of groundwater vulnerability in the Daule aquifer, Ecuador, using the susceptibility index method. Sci Total Environ 574:1674–1683. https://doi.org/10.1016/j.scitotenv.2016.09.004
Article Google Scholar
Rodriguez-Galiano V, Mendes MP, Garcia-Soldado MJ, Chica-Olmo M, Ribeiro L (2014) Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (southern Spain). Sci Total Environ 476–477:189–206. https://doi.org/10.1016/j.scitotenv.2014.01.001
Article Google Scholar
Sadkaoui N, Boukrim S, Bourak A, Lakhili F, Mesrar L, Chaouni A, Lahrach A, Jabrane R, Akdim B (2013) Groundwater pollution of Sais basin (Morocco), vulnerability mapping by DRASTIC, GOD and PRK methods, involving Geographic Information System(GIS). Present Environ Sustain Dev 7:296–309
Google Scholar
Schnebelen N, Platel JP, Nindre Y, Baudry D (2002) Gestion des eaux souterraines en Aquitaine Année 5. Opération sectorielle. Protection de la nappe de l’Oligocène en région bordelaise, Rapport, BRGM, Orléans, France
Google Scholar
Sezer EA, Pradhan B, Gokceoglu C (2011) Manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Syst Appl 38(7):8208–8219. https://doi.org/10.1016/j.eswa.2010.12.167
Article Google Scholar
Siroky DS (2009) Navigating random forests and related advances in algorithmic modeling. Stat Surv 3:147–163. https://doi.org/10.1214/07-ss033
Article Google Scholar
Stigter TY, Ribeiro L, Dill AMMC (2005) Evaluation of an intrinsic and a specific vulnerability assessment method in comparison with groundwater salinisation and nitrate contamination levels in two agricultural regions in the south of Portugal. Hydrogeol J 14(1–2):79–99. https://doi.org/10.1007/s10040-004-0396-3
Article Google Scholar
Tabyaoui FZ, Sahbi H, Elouazzani A, Chadli K, Essahlaoui A, Elouali A, Rouai M (2004) Etat de la pollution par les nitrates dans des eaux de la nappe plio-quaternaire du plateau de Meknès (Maroc). Geomaghreb, n°2, 63-75
Taltasse P (1953) Recherche géologique et hydrogéologique dans le bassin de Fès-Meknès. Notes et mémoires Service Géologique, Maroc, n°115, p 300
Tilahun K, Merkel BJ (2009) Assessment of groundwater vulnerability to pollution in Dire Dawa, Ethiopia using DRASTIC. Environ Earth Sci 59(7):1485–1496. https://doi.org/10.1007/s12665-009-0134-1
Article Google Scholar
Ward MH, Dekok TM, Levallois P, Brender J, Gulis G, Nolan BT, VanDerslice J (2005) Workgroup report: drinking-water nitrate and health-recent findings and research needs. Environ Health Perspect 113(11):1607–1614. https://doi.org/10.1289/ehp.8043
Article Google Scholar
Wheeler DC, Nolan BT, Flory AR, DellaValle CT, Ward MH (2015) Modeling groundwater nitrate concentrations in private wells in Iowa. Sci Total Environ 536:481–488. https://doi.org/10.1016/j.scitotenv.2015.07.080
Article Google Scholar
Yang J, Griffiths J, Zammit C (2019) National classification of surface–groundwater interaction using random forest machine learning technique. River Res Appl. https://doi.org/10.1002/rra.3449
Article Google Scholar
Zabihi M, Pourghasemi HR, Pourtaghi ZS, Behzadfar M (2016) GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran. Environ Earth Sci. https://doi.org/10.1007/s12665-016-5424-9
Article Google Scholar
Zarabi M, Jalali M (2012) Leaching of nitrogen from calcareous soils in western Iran: a soil leaching column study. Environ Monit Assess 184(12):7607–7622. https://doi.org/10.1007/s10661-012-2522-3
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Geo-Engineering and Environment, Department of Geology, Faculty of Sciences, Moulay Ismail University, P.O. Box 11202, Zitoune, Meknes, Morocco
Abdelhakim Lahjouj, Abdellah El Hmaidi & M’hamed Boufala
Laboratory of Water, Soil, and Plant, National Institute of Agricultural Research, Regional Center of Meknes, km 10, Haj Kaddour Road, P.O. Box 578 (VN), Meknes, Morocco
Abdelhakim Lahjouj & Karima Bouhafa

Authors

Abdelhakim Lahjouj
View author publications
You can also search for this author in PubMed Google Scholar
Abdellah El Hmaidi
View author publications
You can also search for this author in PubMed Google Scholar
Karima Bouhafa
View author publications
You can also search for this author in PubMed Google Scholar
M’hamed Boufala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelhakim Lahjouj.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lahjouj, A., El Hmaidi, A., Bouhafa, K. et al. Mapping specific groundwater vulnerability to nitrate using random forest: case of Sais basin, Morocco. Model. Earth Syst. Environ. 6, 1451–1466 (2020). https://doi.org/10.1007/s40808-020-00761-6

Download citation

Received: 24 December 2019
Accepted: 21 March 2020
Published: 04 April 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s40808-020-00761-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Mapping specific groundwater vulnerability to nitrate using random forest: case of Sais basin, Morocco

Abstract

Similar content being viewed by others

Locating groundwater artificial recharge sites using random forest: a case study of Shabestar region, Iran

A machine learning framework for spatio-temporal vulnerability mapping of groundwaters to nitrate in a data scarce region in Lenjanat Plain, Iran

Groundwater quality assessment using random forest method based on groundwater quality indices (case study: Miandoab plain aquifer, NW of Iran)

Introduction