Introduction

Groundwater sources are the most crucial, dependable and valuable sources of water in all climatic regions in the world (Sacco et al. 2006). The demand for groundwater continues to increase because of population growth, agricultural requirements, urbanisation (Ettazarini 2007) and rapid industrialisation (Pradhan 2009). Groundwater has more benefits than surface water. Groundwater exhibits better quality, is less exposed to seasonal and perennial fluctuations and is more protected from pollutants and infections than surface water (Green et al. 2008). Hydro-technical facilities for surface water require larger investment than groundwater facilities that can be developed gradually (Nampak et al. 2014). Many models have been developed to assess groundwater quality and quantity, such as DRASTIC, neuro-fuzzy classifier and SINTAC, which are designed for groundwater risk and vulnerability assessment (Neshat and Pradhan 2015; Neshat et al. 2015). The Italian experience with SINTAC was a good example, particularly when it was associated with the hazard parameters for nitrate risk assessment (Sacco et al. 2007; Capri et al. 2009; Ghiglieri et al. 2009).

Padovani and Trevisan (2002) proposed IPNOA as the nitrate hazard index (HI) to detect the potential hazard to groundwater of nitrate contamination from agricultural activities at the provincial and regional scales. This approach involves the parametric analysis of all of the parameters under consideration in the potential groundwater contamination hazard (Raaz Maheshwari et al. 2012). Then, a progressive score is allocated to each parameter in accordance with its importance determined through the evaluation (Padovani and Trevisan 2002). This method is aimed at evaluating the nitrate HI from domestic sources (Capri et al. 2009).

Leakages from sewage, mishandling of wastewater effluents, improper denitrification during wastewater treatment and inadequate curing using fertilisers and wastes of animals are the major sources of nitrate contents in groundwater (Zhao et al. 2013). These sources lead to the contamination of aquifers, particularly where groundwater replenishment occurs directly from surface water (DeSimone and Howes 1998; Vikas et al. 2015). Nitrate contamination is often attributed to human activities on the ground that involve fertilisation of agricultural products (Gross 2008). Contaminated groundwater is not easily remediated after occurrence; thus, prevention is the most effective strategy in water quality management (Kowal and Polik 1987; Re et al. 2017). According to Into (2011), the concentrations of nitrate contamination in groundwater exceed the standard limits for freshwater (50 mg N–NO3/L) in Europe in approximately 22% of planted areas (Sacco et al. 2007). Comparable levels, which can cause cancer for those who consume such water, were also observed in China and USA (Cantor 1997).

The application of geographic information systems (GIS) and remote sensing (RS) techniques allows the exploration of groundwater resources in a range of hydrogeological settings (Gupta and Srivastava 2010; Mishra et al. 2014; Nampak et al. 2014). Artificial intelligence and machine learning in GIS have been applied to a vast range of industries and applications with large amounts of data, and deep learning can help automate the extraction of information from visual datasets, which was previously impossible (Neshat and Pradhan 2015; Mezaal et al. 2017). The analysis of large amounts of hydrogeological data and the simulation of complex subsurface flow and transport processes can be conducted using the combination of GIS and RS systems (Mojaddadi et al. 2009; Althuwaynee et al. 2012, 2014; Bui et al. 2017a, b; Chen et al. 2017; Oh and Pradhan 2011; Pradhan 2013; Tehrany et al. 2013, 2014, 2015; Umar et al. 2014; Abdulkareem et al. 2018a, b; Rizeei et al. 2018a, b; Kordestani et al. 2018; Golkarian et al. 2018).

The integration of RS and GIS techniques to determine aquifer system suitability in Ghana was proposed by Gumma and Pavelic (2013). The main aim was to improve the development of groundwater in agricultural and urban purposes (Alwathaf and El Mansouri 2011). The significance of the integration of RS and GIS techniques in the assessment of groundwater projects has been emphasised in many studies (Shahid et al. 2002; Sener et al. 2005; Shaban et al. 2006; Nampak et al. 2014).

Most of the studies related to groundwater pollution concentrated on vulnerability assessment rather than hazard evaluation, that is, the high groundwater vulnerability does not necessarily represent the high level of hazard to groundwater (Spalding and Exner 1993; Lake et al. 2003; Raaz Maheshwari et al. 2012). Generally, most concentrations of nitrate contamination in groundwater are from agriculture areas, but residual and industrial areas contribute to nitrate contamination because of leakages from sewages and discharges from domestic activities not connected to sewage systems (Boy Roura 2013). The IPNOA model does not consider the soil types and roles of land use/land cover (LULC) in nitrate risk and hazard modelling. The IPNOA model runs based on parametric sample points from wells (Ghiglieri et al. 2009). The contributing hazard and control factors (HFs and CFs, respectively) are calculated with equal degree of importance, which is not an efficient method of conducting nitrate concentration analysis.

In the current research, we tried to overcome the previous drawbacks of groundwater nitrate modelling. Therefore, the IPNOA model was used to determine the nitrate concentration in agricultural lands using HFs and CFs. The soil type parameter was added to the IPNOA equation in addition to a detailed LULC extracted from SPOT-5 satellite imagery to improve the quality of contributing factors. The IPNOA model was optimised by the logistic regression (LR) model for the first time to assign the proper weight to HFs and CFs to deal with skewness and uncertainty from the standalone IPNOA model.

Material and methods

Study area

The investigation was conducted in Saladin Province of Iraq, which is located north of the capital Baghdad. Saladin Province covers an area of approximately 24,363 km2 and has a population of 1,042,200 persons. Saladin Province has two main cities, that is, the capital city Tikrit and the city of Samarra. Saladin Province was a part of the capital Baghdad before 1976. The study area is geographically located at 43° 20′ 19.14″ E, 43° 60′ 59.74″ E and 34° 41′ 30.14″ N, 34° 28′ 30.74″ N. the area has a variety of LULC types (i.e. urban, bare and agricultural lands). The study area is composed of sandy to gravel soil types where average humidity is 33%. The amount of precipitation ranges from 600 to 800 mm/year, and the annual average temperature ranged between 15 and 23 °C. The elevation varies from 69 to 178 m. The formation of the study area shows almost similar lithological composition of limestone material (Abdula 2016).

Nitrate contamination of groundwater in the study area is considered moderate with regard to its geological structure, that is, sandy dunes soil (AL-Dulaimi and Younes 2017). Several studies conducted in Tehama Basin have reported nitrate pollution in groundwater, which is one of the major sources of drinking water for inhabitants. The high concentration of nitrate is a result of decomposing organic matters and increasing use of fertiliser and sewage water because most of the cities and villages in the study area have inadequate sanitation system coupled with shallow water content. Figure 1 illustrates the location of study area.

Fig. 1
figure 1

Location of study area

Datasets

The data used in this work divides into three groups, that is, independent, dependent and analysis data. The dependent data comprised concentrated groundwater containing nitrate in mg/L sourced from 185 wells. The data were obtained from hydrogeological and hydrochemical investigations conducted by the government of Saladin. The samples were analysed in the laboratory. The field-data-collection campaign was commenced in 25 November 2014 and was completed in 31 December 2016. Parameters, such as rainfall, TEM, PH, EC and TDO, were recorded from the field as dependent data.

High-resolution SPOT-5 satellite image was used to extract LULC as independent explanatory data (Aal-shamkhi et al. 2017). Other data, such as well depth (shallow and deep wells), were also recorded.

Nitrate groundwater quality in Saladin ranged from 1.3 to 203 mg/L, and the average of 46 mg/L was reported from 185 well samples, of which 65 showed values higher than 50 mg/L. The inventory points of nitrate samples were split into two groups, with the first group being the training group (70%) and the second group being the testing group (30%) (Fig. 2). The model was trained using 70% of the samples and tested using the remaining 30%.

Fig. 2
figure 2

Inventory points of nitrate samples in the study area

Methods

Three main technical parts comprise the framework of the current study, that is, satellite image classification, IPNOA parameter extraction and statistical model construction, as shown in Fig. 3.

Fig. 3
figure 3

Methodology flowchart applied in this study

Several programs and models were used in this analysis. ENVI 5.4 was used for satellite image processing, ArcMap 10.5 was used for spatial modelling and mapping and SPSS 9.2 was used for statistical-data-driven modelling.

LULC extraction

LULC is one of the primary factors that contributes immensely to the geohazard and risk in groundwater and surface water (Hossein Mojaddadi et al. 2017). The nitrates percolate through the soil depending on the type of soil and LULC. In a densely populated environment, nitrates from septic tanks seep down and fertilisers from LULC accumulate and contaminate shallow groundwater until it exceeds the safe drinking water standard level (Wick et al. 2009). For optimum outcome, the variables and subdivision should be selected precisely because of the sensitive correlation between nitrate concentration and LULC (Min et al. 2003).

High-resolution satellite image, such as IKONOS and SPOT satellite imageries, has been recommended to extract a detailed LULC map (Aal-shamkhi et al. 2017; Abdullahi et al. 2017). The SPOT-5 satellite image has high-resolution characteristics and provides a 5-m multispectral resolution. The image was captured in December 2015. All necessary pre-processing approaches were implemented on the image (i.e. radiometric, geometric and atmospheric corrections). In object-based image analysis (OBIA), multiresolution segmentation is the first step of processing that has three components, namely, scale, merge and compactness, which should be defined carefully before applying the support vector machine (SVM) classifier to the image (Rizeei et al. 2018a, b). The Taguchi method was employed to select the best combination of scale, merge and compactness factors to optimise the segments. Meanwhile, the radial base function kernel type was selected as the SVM classifier. The optimal parameters of segmentation and the applied classifier details are shown in Table 1.

Table 1 OBIA segmentation and classification in details

The extracted LULC map in this study area is categorised into five classes, namely, urban land, irrigated agricultural land, bare land, water body and rain-fed agricultural land. However, the study area is mostly covered by agricultural lands, as shown in Fig. 4.

Fig. 4
figure 4

Classified LULC map by OBIA-SVM classifier

Implementation of the IPNOA model

The distribution of nitrogen sources in agricultural lands exhibits a potential negative effect on groundwater (Bone et al. 2010). HFs comprise waste treatment sludge (HFfd), organic fertilisers (HFfo) and non-organic fertilisers (HFfm). Conversely, CFs are employed to assess the influence of nitrate activities in terms of site and farm situation in the area under consideration (Padovani and Trevisan 2002). Leaching of nitrogen content in soil (CFa), agronomic practices (CFpa), climate (CFc) and irrigation techniques (CFi) are some of the factors that regulate water balance. The factors were combined by assigning a score that classifies the level of nitrogen or the effect (positive, negative or neutral) of the factors involved in the leaching of nitrates (CF). The scores assigned to the used factors can be calculated using the conversion tables shown in Tables 2 and 3. The HF score ranges from zero to five, whereas the CF score ranges from 0.94 to 1.10. A score of zero represents non-agricultural land usage (urban and natural environments). Therefore, nitrate contamination from agricultural hazard is finally estimated by multiplying the sum of the scores of the HFs by the product of the CFs.

Table 2 Hazard factors score contributed in nitrate concentration
Table 3 Control factors score contributed in nitrate concentration

HF of mineral fertilisers

The average amount of nitrogen used for each crop grown was used to estimate the nitrogen input from mineral fertilisers (HFfm), which is based on the agronomic practices plotted on the LULC map.

HF of organic fertilisers

The information related to organic fertilisers on agricultural lands was extracted from the site survey. Hence, the identification of cattle farm location, which is responsible for animal waste (HFfo), is easy. However, only a few were observed to operate in such farms in the area under consideration and the disposal of animal waste was easily delineated geographically. The type and amount of waste and their corresponding scores were utilised to estimate nitrogen concentrations in farmlands. The organic fertiliser map was created based on the geostatistical interpolation method (i.e. kriging interpolation technique) using the spatial analyst tool in the GIS framework.

HF of sludge

Generally, sludge spreading (HFfd) is not practiced in all farms in the study area. Therefore, a score of 1 was assigned to this factor. However, in several farms near streams, the amount of sludge was assigned a score of 2. Figure 5 shows the combined HFs of mineral fertilisers, organic fertilisers and sludge, which were implemented by the spatial analyst tool in ArcMap.

Fig. 5
figure 5

Combined hazard factors (HF) index in study area

Nitrate contamination originating from agricultural activities can be estimated by the potential HI (Raaz Maheshwari et al. 2012).

CF of soil nitrogen content

This factor indicated by nitrates of household origin and agricultural lands are designed to measure the N loading and leakage from wastewater pipes. Analytical data are collected during soil surveys to calculate the soil nitrogen content (CFa). The total nitrogen content of the arable stratum was extracted from the data and spatialized during geostatistical interpolation using the kriging interpolation method (Vendrusculo et al. 2002). Figure 6a shows the map of soil nitrogen content.

Fig. 6
figure 6figure 6

CFs of a soil nitrogen content, b climate, c agronomic practices, d irrigation types, and e soil types

CF of climate

The spatial data covering a 20-year time series (from 1995 to 2015) were used for temperature and precipitation. In the original procedure, temperature and rainfall are classified in accordance with the baseline class and assigned a score of one. The amount of precipitation ranges from 600 to 800 mm/year, and the annual average temperature ranged between 15 and 23 °C. Therefore, areas with low annual mean temperature and/or high rainfall are regarded as more hazardous, meaning its score is higher than one. However, in the area under consideration, no spatial changes were observed in the small area of investigation. According to IPNOA climate scheme, 0.94 is achieved where rainfall is less than 600 mm/year and annual mean temperature is more than 16 °C. The entire area obtained the same value of 0.98 because no significant variation of rainfall was observed over the study area (Fig. 6b).

CF of agronomic practices

The agronomic practices CFpa were established based on LULC depending on field surveys conducted for each crop type. Traditional agronomic practices, such as fertile irrigation or tillage/local fertilisation, are usually implemented in several areas. In certain circumstances, the score of the most effective agricultural practice is used. Figure 6c shows the agronomic map of the study area.

CF of irrigation

The integration of layer information and field observation was used to produce the irrigation control factor (CFi) of LULC. This factor is appropriate for the irrigation technique because of its capability to transfer pollutants towards aquifers in direct connection with the efficiency of irrigation drainage. From the surveys conducted on different irrigation types on agricultural land, the scores assigned to LULC classes are shown in Fig. 6d. The detailed LULC extracted from satellite images can considerably enrich the accuracy of the CFi value.

CF of soil characteristics

Soil is the major channel for contaminations to penetrate groundwater, depending on the type of contaminants and the soil texture. Soil texture and nitrate contamination exhibit a positive correlation and are, thus, considered important factors affecting nitrate concentration in groundwater (DeSimone and Howes 1998). Conversely, soil texture is observed to be related to water input, nitrate application rate and evapotranspiration, which are regarded as the major factors determining the nitrate fluxes in groundwater (Liao et al. 2012). In this study, soil has six types, that is, from sandy to gravel, but mostly covers the gypsiferous gravel type (Fig. 6e).

The soil type factor is added to the IPNOA model to improve nitrate concentration analysis. The soil characteristic CF data were collected during the surveys and the classes were determined based on the soil media, as shown in Table 3 and Eq. (1).

The hazard index result is achieved by multiplying the HFs by the CFs, as shown in Eq. (1):

$$ \mathrm{HI}=\left(\mathrm{HFf}+\mathrm{HFm}+\mathrm{HFs}\right)\times {\mathrm{CF}}_{\mathrm{a}}\times {\mathrm{CF}}_{\mathrm{c}}\times {\mathrm{CF}}_{\mathrm{a}\mathrm{p}}\times {\mathrm{CF}}_{\mathrm{i}}\times {\mathrm{CF}}_{\mathrm{s}}, $$
(1)

where the subscripts f, m, s, a, c, ap, i and s represent fertilisers, manure, sludge, nitrogen content, climate, agronomic practices, irrigation and soil, respectively. The total incidence of the CFs is obtained by multiplying the single factors to restrict the weight of the parameters. The HFs are evaluated with respect to the real influence of the nitrogen load. The range between 0 and 5 is specified for HF and that between 0.94 and 1.10 is specified for CF. Finally, the IPNOA is estimated from the HI through the classification of the resultant values based on the percentile of the 135,125 possible combinations, which is scaled between 1 and 6 (Capri et al. 2009).

Therefore, the IPNOA raw data are ranked based on the 135,125 possible combinations into the classes previously mentioned (Capri et al. 2009) according to their hazard level shown in Table 4. A value of 1 represents areas with the least hazard, whereas a value of 6 represents the most hazardous zone.

Table 4 Hazard indices, IPNOA, and relative classification by Capri et al. (2009)

Implementation of the LR model

The LR model is considered one of the most commonly used multivariate statistical models and is often cited as one of the most efficient data-driven techniques in several applications (Pradhan and Lee 2010; Hossein Mojaddadi et al. 2017). The LR model defines rigid assumptions, which are considered obstacles to the approaches used in this study (Benediktsson et al. 1990). The LR model is also difficult to use in real-life applications. However, statistical approaches based on LR could overcome these obstacles and create an easy approach for analysis that does not need a pre-assumption and can be used with other bivariate statistical analysis (BSA) methods, such as frequency ratio (Ayalew and Yamagishi 2005). Although the multivariate LR method is stronger than the other statistical methods, it still has several disadvantages that hinder the analysis of the classes of each nitrate conditioning factor. To overcome these weaknesses, many studies used LR as bivariate method to solve the problem; however, LR has some limitations in performing BSA as it uses the classes as an indicator and does not consider it in the analysis (Süzen and Doyuran 2004).

The dependent variable in this method is considered a binary variable that represents the absence or presence of nitrate (0 and 1, respectively). The binary model is a model that uses logical expressions to select spatial features from composite feature layers or multiple rasters. The output of the binary model is in binary format, that is, 1 (true) for spatial features that meet the selection criteria and 0 (false) for features that do not. This method also represents the probability in the range of [0, 1] on an S-shaped curve. The use of a few parameters with small pixels is recommended to obtain fast and reliable results (Bai et al. 2012). With the derived logistic coefficients, the probability (p) of nitrate concentration was calculated using Eq. (2):

$$ p=1/\left(1+{e}^{-z}\right) $$
(2)

where p is the probability of flooding, with the value between 0 and 1 on an S-shaped curve. z denotes a linear combination, and it follows that LR involves fitting Eq. (3) to the data, as follows:

$$ z={b}_{{}^{\circ}}+{b}_1{x}_1+{b}_2{x}_2+{b}_3{x}_3+{b}_n{x}_n $$
(3)

where b° is the intercept of the model, bi (i = 0, 1, 2, …, n) represents the coefficients of the LR model and xi (i = 0, 1, 2, …, n) denotes the conditioning factors (Lee and Sambath 2006).

Results and discussions

OBIA-SVM-extracted LULC result

As shown in Table 5, most of the study area is covered by bare land (681.43 km2), followed by irrigated agricultural land (471.15 km2). Rain-fed agricultural land is the smallest among all classes, showing that the agricultural industry in Saladin depends on watering and irrigation systems.

Table 5 Extracted LULC areas using OBIA-SVM approach

The LULC overall accuracy and kappa coefficient were calculated as 87.05% and 0.839, respectively, using the confusion matrix and ground truth points.

IPNOA results

When the hazard and control parameters have been extracted, the nitrate concentration index map is calculated using the IPNOA model (Fig. 7a) and classified based on the hazard level scheme, as shown in Fig. 7b.

Fig. 7
figure 7

a Nitrate concentration index map and b agricultural nitrate hazard map using IPNOA model

Correlation analysis was conducted between parameters in different forms and methods, and the p values were used to test the results of statistical analysis efficiency and usually calculated at the 95% significance level.

No parts of this study area were categorised as high or very high hazard, whereas most of the study area is in the safe zone to nitrate hazard (81.54%). However, the moderate zone to nitrate hazard with less than 1 percentage represents the smallest class, as shown in Table 6.

Table 6 Area of nitrate hazard classes derived from IPNOA model

LR optimisation results

Weightage of the conditioning parameters was derived to determine the significance of the sequence of predictors. The rank of each parameter was determined by LR in the statistical software Weka and subsequently overlaid using spatial analyst tools. Figure 8 shows the final weightage derived by the LR model for all conditioning parameters.

Fig. 8
figure 8

Calculated weightage for CFs and HFs calculated by LR

From the results of LR modelling shown in Fig. 8, all HFs achieved less than 0.5 weightage, which is considered insignificant to nitrate concentration. Meanwhile, CFap achieves the highest value (3.28), which is the most significant factor to trigger nitrate concentration. CFa and CFs are ranked second and third, with values of 2.35 and 2.13, respectively. From the calculated weightages for each parameter using the statistical LR, the IPNOA model was optimised.

$$ \mathrm{HI}=\left[\left({\mathrm{HF}}_{\mathrm{f}}\times 0.09\right)+\left(\mathrm{HFm}\times 0.07\right)+\left(\mathrm{HFs}\times 0.04\right)\right]\times \left({\mathrm{CF}}_{\mathrm{n}}\times 2.351\right)\times \left({\mathrm{CF}}_{\mathrm{c}}\times 0.75\right)\times \left({\mathrm{CF}}_{\mathrm{ap}}\times 3.287\right)\times \left({\mathrm{CF}}_{\mathrm{i}}\times 1.567\right)\times \left({\mathrm{CF}}_{\mathrm{s}}\times 2.127\right) $$
(4)

Values were computed by the raster calculator of the ArcGIS software using Eq. (4).

Optimised LR-IPNOA results

For the nitrate concentration index, after calculating the LR coefficients from eight nitrate contributing factors, the optimised IPNOA was calculated (Fig. 9a) and classified based on the hazard level scheme, as shown in Fig. 9b.

Fig. 9
figure 9

Map showing a nitrate concentration index and b agricultural nitrate hazard index using LR-IPNOA model

The classes with moderate potential hazard (major scores) are in the central part of the plain where the most productive aquifers as well as irrigated agricultural lands are situated and are at the same time potential pollution concentration sources. Figure 9a shows the nitrate concentration index generated from the integrated LR and IPNOA model, and the nitrate index ranges from 2.8 to 8.04 for groundwater. This index indicates the estimated probability of groundwater at each pixel in the presence of a given set of conditioning factors.

The quantified information shown in Table 7 indicates that most of the study area is in an unpolluted area that is unlikely to be affected by the nitrate hazard (45.35%). Meanwhile, approximately 5 percentage of the entire area is in the moderate hazard zone for nitrate concentration.

Table 7 Area of nitrate hazard classes derived from LR-IPNOA model

No parts of this study area were categorised as high or very high hazard, whereas most of the study area is in the safe zone to nitrate hazard (81.54%). However, the moderate zone to nitrate hazard with less than 1 percentage represents the smallest class, as shown in Table 7.

Compared with the results obtained by the standalone IPNOA model, the area of the moderate zone derived using the LR-IPNOA model is 68 km2 larger. Moreover, nearly 45% of the study area is classified as “unlikely to nitrate hazard” by the LR-IPNOA model, whereas 81% is classified as the same class by the IPNOA model. The field survey shows that the LR-IPNOA accurately models the nitrate hazard where more than 5 percentage of the study area is affected by nitrate pollution.

Validation

According to Shahid et al. (2000), the existing groundwater borehole wells were used to assess the precision of the GIS model of the anticipated map of groundwater potential. (Pradhan 2009) mentioned that validation is one of the most important processes of modelling, and without validation, the models will be of no scientific significance (Neshat et al. 2014).

The validation of groundwater nitrate concentration was conducted using information extracted from 30% of inventory wells where 56 different wells were sampled over the study area. The nitrate concentrations indices, which were projected by the IPNOA and LR-IPNOA models, were compared with the sampling measurements for the sake of validation. As shown in Fig. 10, the calculated area under the curve shows 85.3% and 91.32% success rates of accuracy for nitrate concentration indices from the IPNOA and LR-IPNOA models, respectively. This result is reasonable for regional nitrate hazard analysis, which indicates that the optimised LR-IPNOA model more accurately determines the nitrate hazard contamination than the standalone IPNOA model.

Fig. 10
figure 10

ROC curve accuracy assessment (standalone IPNOA and integrated LR and IPNOA models)

Conclusion

An approach for the complicated issue of groundwater management was presented in this study. During periods of acute water shortages in the catchment area, groundwater remains as the most viable source of potable drinking water, domestic, irrigation, livestock watering and industrial usage. Therefore, the protection of groundwater is of paramount importance.

For the reduction of uncertainty and skewness, the parametric IPNOA model was empowered by soil and LULC types and optimised by using the data-driven LR model. We examined conditioning factors, such as waste treatment sludge (HFfd), organic (HFfo) and non-organic fertilisers (HFfm), climate (CFc), nitrogen content in soil (CFa), irrigation techniques (CFi), agronomic practices (CFpa) and soil types (HFs), to determine their degree of contribution to groundwater nitrate hazard. LULC was depicted from high-resolution SPOT-5 satellite image using the OBIA-SVM method with 87.5% overall accuracy.

No part of this study area was categorised as “high” or “very high” hazard area for nitrate concentration. However, the moderate zone to nitrate hazard represented the smallest class. The moderate potential hazard classes were in the irrigated agricultural plain, where the most productive aquifers were situated. The optimised LR-IPNOA predicted a larger area with moderate hazard than the IPNOA model. The field surveys showed that the LR-IPNOA accurately modelled the nitrate hazard concentration, and more than 5% of the study area was affected by nitrate pollution. With regard to receiver operating characteristic assessment, the optimised LR-IPNOA model showed better accuracy than the standalone IPNOA model for regional nitrate hazard analysis (i.e. 91.32% and 85.3%, respectively).

Our results showed that CFs and hazard indices do not contribute equally to nitrate concentration modelling. Thus, IPNOA should be integrated with either machine leaning, deep learning, data-driven or multivariate statistical models to achieve the ideal performance.

This research was necessary because of the rapid development of human activities in the Saladin area, which increase water demand during certain times of the year and/or during periods of drought. This demand often exceeds the supplies stored in reservoirs. However, groundwater pollution remains a challenging phenomenon and must be examined in large scale with several models at different climate conditions.