Introduction

The Harran Plain has been undergoing large land use changes due to population growth and the accompanying industrial, commercial and agricultural development. These activities have produced multiple potential sources of contaminants from manure and artificial fertilizers, landfills, accidental spills, and domestic or industrial effluent discharges. Among these sources, agriculture-related activities are well-known non-point source pollution. Agricultural activities may deteriorate the groundwater quality in small to large watersheds, especially due to uncontrolled use of fertilizer and various carcinogenic pesticides (Almasri and Kaluarachchi 2005). Variation in groundwater quality is a function of physical and chemical parameters that are greatly influenced by geological formations and anthropogenic activities as well (Subramani et al. 2005).

Turkey is currently engaged in a large integrated water resources development project in its semi-arid southeastern region. Commonly referred to by its Turkish acronym GAP, the Southeastern Anatolia Project includes 22 dams in the upper Euphrates–Tigris Basin, and aims to provide irrigation for 1.7 million hectares of land by 2015 (Yesilnacar and Gulluoglu 2008). As in other semi-arid and arid parts of the world, water is a valuable resource in the GAP region. Despite large quantities of water currently available from the Euphrates and Tigris rivers, it has become increasingly important to improve management of these resources (Ozdogan et al. 2006) as the GAP region, and in particular in the Harran Plain, faces problems of salinity, excessive and uncontrolled irrigation, an insufficient drainage system and an increased groundwater level caused by irrigation (Kendirli et al. 2005). An unconfined aquifer lacks a low-permeability confining layer overlaying the aquifer. This makes unconfined aquifers susceptible to contamination from various human activities. Hence, unconfined aquifers, like in the Harran Plain, should be monitored to protect these vulnerable water resources from contamination.

Electrical conductivity (EC) and sodium adsorption ratio (SAR) are two significant parameters to be considered when assessing the irrigation water quality for potential water infiltration problem. Waters containing high Na+ and low Ca2+ and Mg2+ have high SAR value and may destroy the soil structure because of dispersion of clay particles. Dispersion of clay particles in turn reduces the amount of large pores, which are responsible for aeration and drainage, in the soil. SAR can be computed using the equation (Eq. 1) below (Devadas et al. 2007):

$$ {\text{SAR}} = [{\text{Na}}^{ + } ]/(( [ {\text{Ca}}^{ 2+ } ]+ [{\text{Mg}}^{ 2+ } ])/ 2)^{ 1/ 2} $$
(1)

where the ion concentrations are in meq/L.

According to regulations in Turkey (Turkish Ministry of Environment and Forestry 2004), waters having SAR value below 18 can be classified as “good” for irrigation purposes and waters with SAR value higher than 26 cannot be used for irrigation. Hence, the quality of groundwater should be monitored as the aquifers are vulnerable to pollution by various sources.

Sources of sulfate may be point and non-point sources. According to Turkish regulations (TS 266 2005), sulfate concentration should be below 250 mg/L in drinking water. Also, high sulfate concentration affects the taste of the water. Furthermore, sulfate concentrations above 500 mg/L may have a laxative effect on humans (Yesilnacar and Gulluoglu 2008; Hudak 2000). Generally, sulfate is beneficial in irrigation water, especially in the presence of calcium. However, high concentration may be unsuitable and with calcium sulfate may form a hard scale in steam boilers (Hudak 2000). Sulfate in groundwater can be used by sulfate-reducing bacteria, as electron acceptor in the presence of organic carbon source and sulfide, which causes precipitation of metals, is produced (Sahinkaya et al. 2007). Consequently, it is very important to monitor or predict the SAR and sulfate concentration in groundwater by means of cost-effective technologies. In this context, black-box models like artificial neural network (ANN) are very attractive, as these do not require prior knowledge of the structure and relationships that exist between important variables. Moreover, their learning abilities make them adaptive to system changes (Strik et al. 2005). ANNs have already been used to simulate the effect of climate change on discharge and the export of dissolved organic carbon and nitrogen from river basins (Clair and Ehrman 1996), forecast salinity (Maier and Dandy 1996), simulate and forecast residual chlorine concentrations within urban water systems (Rodriguez and Sérodes 1998), determine the relationship between sewage odor and BOD (Onkal-Engin et al. 2005), determine the performance of sulfidogenic bioreactor (Sahinkaya et al. 2007) and determine the leachate amount from municipal solid waste landfill (Karaca and Özkaya 2006). There are also other similar applications of ANN in the field of environmental engineering and geosciences.

In the literature, there are also some ANN studies aiming to predict the conditions in soil and quality of groundwater. Yesilnacar et al. (2008) developed an ANN model predicting concentration of nitrate, the most common pollutant in shallow aquifers, in groundwater of the Harran Plain. Das et al. (2010) used computational intelligence techniques ANN and support vector machine to develop models to predict swelling pressure from the inputs: natural moisture content, dry density, liquid limit, plasticity index and clay fraction. In another study, Benerjee et al. (2009) used ANN feed-forward network-based ANN model as a method to predict the groundwater levels. Due to the complexity of hydrogeologic conditions in fractured rock and the scale of interest of the study domain, Mohammed et al. (2010) used a gray model that combines the finite element method (FEM) and ANN for more precise prediction of pore pressure changes.

As both sulfate and SAR are significant parameters for assessing groundwater quality, they should be monitored regularly by cost-effective and easy methods. To our knowledge, few studies (Lischeid 2001) have been conducted on ANN-based prediction of SAR and sulfate in groundwater. In this sense, this study aimed at ANN prediction of SAR values and sulfate concentrations in 24 representative observation wells in the Harran Plain.

Materials and methods

Description of the study area

The Harran Plain is located in the south-central part of the GAP Project within the Sanliurfa–Harran Irrigation District (Fig. 1), which is 30 × 50 km and located in a region of rolling hills and a broad plateau that extends south into Syria. The plain, the largest in the GAP region, has 141,500 ha of irrigable land, 3,700 km2 of drainage area and 1,500 km2 of plain area. Groundwater samples were taken monthly for 1 year from 24 representative observation wells (Fig. 1), which were drilled on the Pleistocene aged unit during the 2006-water year. The detailed information on geology and hydrogeology of the study area can be found elsewhere (Yesilnacar and Gulluoglu 2008).

Fig. 1
figure 1

Location map of the study area (a) and the location of the sampling wells (b). Blue lines show the roads in the study area

Analytical methods

Electrical conductivity (EC), temperature, pH and groundwater level were measured with a SevenGo pro-SG7 conductivity meter, YSI 6600 sonde, a portable pH meter and an electric contact meter immediately after sampling in the field.

Concentrations of Ca+2, Mg+2 and Na+ were determined by Varian Flame Atomic Absorption Spectrometer. The concentration of sulfate (SO −24 ) and chloride (CI) was determined using Merck Spectroquant® test kits and a Merck Nova 60 photometer.

Total hardness (TH) of groundwater was calculated using the formula given below:

$$ {\text{TH (as mg CaCO}}_{ 3} /{\text{L)}} = ({\text{Ca}}^{ + 2} + {\text{Mg}}^{ + 2} ){\text{ meq}}/{\text{L}} \times 50. $$
(2)

Modeling

A neural network is defined as a system of simple processing elements, called neurons, which are connected to a network by a set of weights (Fig. 2). The network is determined by the architecture of the network, the magnitude of the weights and the processing element’s mode of operation. The neuron is a processing element that takes a number of inputs, weights them, sums them up, adds a bias and uses the result as the argument for a singular valued function, the transfer function, which results in the neuron’s output (Strik et al. 2005). At the start of training, the output of each node tends to be small. Consequently, the derivatives of the transfer function and changes in the connection weights are large with respect to the input. As learning progresses and the network reaches a local minimum in error surface, the node outputs approach stable values. Consequently the derivatives of the transfer function with respect to input, as well as changes in the connection weights, are small (Maier and Dandy 1996).

Fig. 2
figure 2

The neural network structure for the prediction of SAR and sulfate concentration in monitoring wells

Back-propagation (BP) algorithms use input vectors and corresponding target vectors to train ANN. The standard BP algorithm is a gradient descent algorithm, in which the network weights are changed along the negative of the gradient of the performance function (Abdi et al. 1996; Nguyen and Widrow 1990). There are a number of variations in the basic BP algorithm that are based on other optimization techniques such as conjugate gradient and Newton methods. For properly trained BP networks, a new input leads to an output similar to the correct output. This ANN property enables training of a network on a representative set of input/target pairs and achieves sound forecasting results.

Although some researchers suggest that one hidden layer is usually sufficient (El-Din and Smith 2002), the introduction of additional hidden layers allows the fit of a larger variety of target functions and enables approximations of complex functions with fewer connection weights (Toth et al. 2000). In this study, a two-layer ANN with a tan-sigmoid transfer function for the hidden layer and a linear transfer function for the output layer were used. Figure 2 shows the ANN structure used in the study. The input and output parameters used in the ANN modeling are shown in Table 1. The data were divided into training, validation and test subsets. Half of the data were used for training and one-forth of the data was used for validation and tests, respectively.

Table 1 Input and output parameters in ANN modeling and linear correlation coefficients (R 2) between input and output variables

As a preliminary statistical analysis before ANN study, the correlation matrix was used to explore the degree of linear correlation between the input and output variables (Table 1).

Selection of back-propagation algorithm

BP neural networks have become a popular tool for modeling environmental systems (Maier and Dandy 1996). In this study, 12 BP algorithms were compared to select the best fitting one. For all algorithms, we used a two-layer network with a tan-sigmoid transfer function within the hidden layer and a linear transfer function within the output layer. In the selection of BP algorithm, the number of neurons was kept constant at 20. The learning rate parameter may also play an important role in the convergence of the network, depending on the application and network architecture. The learning rate can be used to increase the chance of preventing the training process being trapped in a local minimum instead of a global minimum (Hamed et al. 2004). A larger learning rate involves a bigger step. If the learning rate is too large, the algorithm becomes unstable. If the learning rate is set too small, the algorithm takes a long time to converge. In addition, the momentum allows a network to respond not only to the local gradient, but also to recent trends in the error surface. Without momentum, a network may get stuck in a shallow local minimum (Hagan et al. 1996). In this study, the learning rate and the momentum constant were 0.1 and 0.9, respectively. The performance results of the model with each back-propagation algorithms are provided in Table 2. The performance of the BP algorithms was evaluated with the root-mean square error (MSE) and determination coefficient (R) between the modeled output and measured data set. The best BP algorithm with minimum training error and maximum R was the Levenberg–Marquardt (trainlm) algorithm both for SAR vales and sulfate concentrations (Table 2).

Table 2 Comparison of back-propagation algorithms for predicting SAR and sulfate concentration in monitoring wells (neuron number was 20)

Optimization of neuron number

After selecting the best BP algorithm, Levenberg–Marquardt (trainlm) algorithm, the number of neurons was optimized keeping all other parameters constant (Table 3). For the output variables (SAR and sulfate concentration), the squared mean error decreased for the training set with increasing neuron numbers (Table 3). However, after optimum neuron number (20 in our case), the squared mean errors did not change significantly. So, all the modeling was carried out using Levenberg–Marquardt (trainlm) algorithm with 20 neurons for both output parameters (SAR and sulfate concentration).

Table 3 R values, mean square errors and iteration numbers at different neuron numbers for predicting SAR and sulfate concentration in monitoring wells (Levenberg–Marquardt BP algorithm was used)

Results and discussion

The linear correlation coefficients (Table 1) between sulfate and pH, and temperature and groundwater level were very small and sulfate exhibited the highest correlation with total hardness (R 2 = 0.66). Similar to sulfate, the linear correlations between SAR and the input variables were weak and the SAR exhibited the highest correlation with chloride (R 2 = 0.28). The correlation analyses indicated the weakness of the linear relationship between input and output (SAR and sulfate) variables. In addition to standard linear regression, a multiple linear regression analysis was performed using POLYMATH 6.10 taking conductivity, pH, groundwater level, temperature, total hardness and chloride as independent variables and SAR or sulfate as the dependent variable. The R 2 values of multiple linear regression models were 0.75 and 0.46 for sulfate and SAR predictions, respectively (data not shown). Hence, the use of conventional regression techniques to predict the sulfate and SAR variations using easily measurable parameters (conductivity, pH, groundwater level, temperature, total hardness and chloride) is irrelevant and more powerful methods, such as ANN, is needed (Mjalli et al. 2007). In this context, the applicability of ANN was investigated to predict SAR vales and sulfate concentrations in 24 observation wells in the Harran Plain. The groundwater quality in the observation wells was previously described in detail by Yesilnacar and Gulluoglu (2008). In this study, we have used the aforementioned water quality parameters (Table 1) in the predictions.

The variation of input parameters is provided in Fig. 3. The groundwater of the study area is mainly alkaline in nature, as the pH values of groundwater ranged from 7.0 to 7.5. The average temperature of the wells was around 20°C. The electrical conductivity value in the wells varied between 400 and 8,235 μS/cm, with the average of 1,526 μS/cm. The maximum allowable conductivity value is 2,500 μS/cm in TS266 and the European Union (EU) directives (Yesilnacar and Gulluoglu 2008). Hence, the groundwater conductivity in most of the observation wells exceeds the maximum allowable value. Previous studies have reported soil salinity of this area to be fairly high as well. Excessive amounts of dissolved ions in irrigation water affect plants and agricultural soil physically and chemically, thus reducing the productivity (Şahinci 1991). The groundwater levels (depth below surface) in the wells were between 0.6 and 15 m except for well 4, in which groundwater level was around 50 m (Fig. 3). The total hardness in the wells was between 107 and 1,612 mg CaCO3/L and averaged 450 mg CaCO3/L. Similarly, chloride concentrations showed significant variation between 9 and 760 mg/L with an average of 112 mg/L.

Fig. 3
figure 3

The variation of input parameters in ANN modeling

The maximum allowable concentration of sulfate in drinking water is 250 mg/L according to the TS 266 (Yesilnacar and Gulluoglu 2008). Sulfate concentrations in 24 observation wells ranged from 3 to 1,330 mg/L, with an average value of 183 mg/L. Except for well nos. 22, 21, 17, 18 and 23, the average sulfate concentrations were found to be below the maximum allowable value of 250 mg/L designated by the TS 266 standard, the WHO guidelines and the EU directive. The natural source of sulfate in the study area is the thin gypsiferous layers within the Pliocene aged deposits. In particular, gypsum and anhydrite appear in evaporate deposits in the center of Anatolia and in southeast Anatolia (Erguvanli and Yüzer 1987). The anthropogenic sources of sulfate are the excessive use of artificial fertilizers in intensive agricultural activities, especially in the southeast part of Şanlıurfa and in the vicinity of Harran, and household sewage.

SAR values in 24 observation wells were between 0.0048 and 10.94 with an average value of 1.37. The SAR values were below the maximum allowable value of 26.

As an example, training, validation and test mean-squared error (MSE) for Levenberg–Marquardt algorithm with 20 neurons in the prediction of SAR are illustrated in Fig. 4. The training was stopped after 12 iterations as MSE did not change significantly. We also performed a regression analysis between network output (A) and the corresponding target (T) (Fig. 5). The R and MSE values were observed as 0.956 and 0.0266 for SAR and 0.98 and 0.0031 for sulfate, respectively (Table 3). The variations of measured and predicted data are also presented in Fig. 6 and the model data tracked the experimental data closely for both output parameters. Hence, ANN is a powerful tool in the prediction of SAR and sulfate in groundwater. Using ANN predictions, the usage purposes of the wells can be determined, which makes it easy to manage huge groundwater resources.

Fig. 4
figure 4

Training, validation and test square mean errors for Levenberg–Marquardt algorithm with 20 neurons for SAR prediction

Fig. 5
figure 5

Linear regression between the network outputs (A) and the corresponding targets (T) using Levenberg–Marquardt algorithm with 20 neurons for SAR (a) and sulfate (b) predictions

Fig. 6
figure 6

Measured and the neural network prediction of SAR (a) and sulfate concentration (b) in groundwater of 24 observation wells

Similarly, Yesilnacar et al. (2008) presented an ANN model predicting the concentration of nitrate, the most common pollutant in shallow aquifers, in the groundwater of the Harran Plain. The samples from 24 observation wells were analyzed monthly for 1 year. The Levenberg–Marquardt algorithm was selected as the best of 12 BP algorithms and optimal neuron number was determined as 25. The model tracked the experimental data very closely (R = 0.93).

The effect of eliminating each input variable on the ANN prediction performance was also analyzed based on the correlation index (R) using the expression given below (Eq. 3) (Gontarski et al. 2000).

$$ (\% {\text{influence) }} = { 1}00( 1- R_{\text{i}} /R_{\text{CB}} ) $$
(3)

where R CB is the correlation index between predicted and observed values for the base case. R i is the correlation index for the case in which one input variable is eliminated. In the estimation of sulfate, total hardness is the most important parameter and the elimination of total hardness decreased R value by 12%. In the estimation of SAR, chloride is the most important parameter among the used input parameters as the elimination of chloride decreased the R value by 14% (Table 4). Results in Table 4 also support that acceptable predications may still be possible even if any parameter, whose determination is relatively costly or difficult (such as chloride), is eliminated from the input data. In the linear regression analyses (Table 1), total hardness and chloride exhibited highest correlation with sulfate and SAR, respectively. Hence, linear regression analyses may be used as a preliminary study to select the significant input parameters in ANN modeling. A similar finding was also reported in the study of Moral et al. (2008) for the prediction of a full-scale biological treatment plant performance using ANN modeling.

Table 4 Influence of each variable on the performance of ANN prediction (Levenberg–Marquardt BP algorithm with 20 neurons was used)

Conclusions

This study demonstrates that ANN provides a robust tool for predicting sulfate and SAR values in 24 observation wells of the Harran Plain. To our knowledge, this is the first study optimizing the architecture of ANN to predict sulfate and SAR in groundwater. The sulfate concentration and SAR value may increase due to the intensive agricultural practices and the excessive use of artificial fertilizers. Both SAR and sulfate can easily be predicted with the designed, trained, validated neural network model. Although the model was applied to a specific site, excellent fits to a wide range of data claim that model can also be applied to other sites after optimizing the model architecture. Chloride and total hardness were found to be the most sensitive parameters for SAR and sulfate prediction, respectively. With the proposed model applications, it is possible to manage groundwater resources in a more cost-effective and easier way.