Introduction

Fluorine is the most electronegative of all chemical elements and is therefore never found in nature in its elemental form. Aggregated chemically in the form of fluorides, it ranks 17th in abundance of elements that form the earth’s crust, representing about 0.06–0.09% of the earth’s crust (Tebutt 1983; WHO 1994). Florspar, cryolite, fluorapatite, mica, and hornblende are the richest minerals. Volcanic rocks, mica minerals (sirolite, fluorite, fluorapatite), and thermal sources cause high fluoride concentrations in natural waters (Brindha and Elango 2011; Yesilnacar et al. 2013, 2016).

The fluoride level in surface waters is generally less than 1 mg/L. In deep groundwaters or in hot spring waters in contact with fluoride-rich minerals, this amount can be up to 20–53 mg/L (WHO 1994; Selinus 2005; Msonda et al. 2007; Yesilnacar et al. 2013, 2016; Yetis et al. 2019).

Fluoride occurs in rocks, soil, air, water, plants, and animals as well as in the human body. While a low fluoride intake by humans can benefit teeth (e.g., caries prevention) and bone growth, long-term excessive absorption of fluoride can lead to fluorosis of teeth and bones, and to a multitude of other health-related problems including adverse impacts on the intellectual development of children (Harvard 2015). High fluoride levels in humans result from exposure to countless sources, such as fluoride-emitting industries, coal burning, volcanic ash, excessive fluoride in brick tea (tea compressed into block form), and fluoride-contaminated drinking water. Fluorosis is becoming a global environmental toxicological problem in a number of parts of the world, and is most commonly found in water-stressed regions (Chen et al. 2012).

Over 200 million people worldwide use drinking water that exceeds the optimal fluoride level recommended by the WHO (2006). Of this figure, 70 million are in India, 45 million are in China, and about five million are in Mexico (UNICEF 1999; Yesilnacar et al. 2016). In India, an estimated 62 million people, of whom six million are children, have dental fluorosis because of consuming fluoride-contaminated water (Brindha and Elango 2011; Gowrisankar et al. 2017). Known fluoride belts on land include one that stretches from Syria through Jordan, Egypt, Libya, Algeria, Sudan, and Kenya, and another that stretches from Turkey through Iraq, Iran, Afghanistan, India, Bengal, northern Thailand, and China. There are also similar belts in the Americas and Japan. Fluorosis has been reported in each of these areas (Selinus 2005; Dar et al. 2012; Yesilnacar et al. 2016; Thapa et al. 2018).

Fluoride found in natural drinking water is the largest source of fluorine entering the human body. Endemic fluorosis is a major public health problem seen in individuals living within geographical areas with a higher fluoride concentration than the daily optimal dose. Fluoride is taken into the human body generally by food and water consumption. Studies show that soluble fluoride in drinking water is an important source, with potable water accounting for the majority of daily fluoride intake for humans (Yesilnacar et al. 2016).

The U.S. Health and Human Services (US HHS) recommends reducing the current limit range of 0.7–1.2 mg of fluoride per liter of water to 0.7 mg (HHS 2020). In the determination of fluoride concentration in drinking water, certain chemical analytical instruments including ion chromatography, spectrophotometry, ion-selective electrode, and inductively coupled plasma are utilized. In recent years, machine learning methods have also been widely used in medical research (Groznik et al. 2013; Nakano et al. 2014; Pérez et al. 2015). Although the aforementioned techniques provide accurate and quantitative results, they are, in general, expensive, labor-intensive, and cumbersome. It is absolutely necessary to use chemical measurements or analyses to use the cost-effectiveness of machine learning. However, due to improvements in machine learning techniques, there is a reduction in these chemical analytical procedures. Thus, a cost-effectiveness can be mentioned. Therefore, the detection of fluoride in a rapid and cost-effective manner is essential, particularly for researchers, as well as for technical and medical staff.

Machine learning (ML) is a term applied for developing machines that are capable of approaching tasks in a way similar to the human brains, such as speech recognition, self-driving cars, and interpreting complex data (Sanuade et al. 2020). ANN is an information processing paradigm inspired by biological nervous systems, like our brain (Yang et al. 2015). ANN is a technique used as in the problem-solving process of the human brain (Khashei-Siuki and Sarbazi 2015). SVM is a classification based on statistical learning theory and this method is based on two main concepts. It creates a hyper plane that best separates the two classes and maximizes the distance between them (Gasmi et al. 2016). Naïve Bayes classifier is a process that estimates the probability of a new observation belonging to a predefined category, using a probability model defined according to the theory of Bayes (Tsangaratos and Ilia 2016).

In this context, the soft computing methods of ANN, support vector machine (SVM), and Naïve Bayes algorithms have been adopted for the current study. To the best of the researchers’ knowledge, very few studies (Sirisha et al. 2008; Asghari Moghaddam et al. 2010; Amini et al. 2010; Dar et al. 2012; Nadiri et al. 2013; Kumari and Pathak 2015; Barzegar et al. 2017; Charulatha et al. 2017: Kheradpisheh et al. 2018) have been conducted on ANN-based prediction of fluoride in groundwater.

High fluoride content in groundwater as a cause of dental fluorosis disease being noted in Karataş and Sarım villages, in western Şanlıurfa, Southeastern Anatolia, Turkey, was first reported by Yesilnacar et al. (2011). In their study, the geological structure, hydrogeology, and hydrobiology of these villages and their surrounding area were investigated. The acquired data enabled a comparison of the dentition of the regional population (Yesilnacar et al. 20132016). However, there have been no studies estimating dental fluorosis occurrence resulting from high-fluoride groundwater using ANN and rock data obtained through XRD and XRF measurements.

In the remainder of this paper, the “Materials and methods” section provides detailed information about the area studied, the preparation, data handling, and data modeling. Outcomes of a proposed feature ranking and classification models are discussed in the “Results and discussion” section. Concluding remarks and future projections are put forward within the final section.

Materials and methods

Study area

The region between the Suruc and Bozova districts in Sanliurfa province, which is situated in Turkey’s southeast Anatolia region, was selected as the study area. It covers an area of 33 km by 41 km and has the typical continental climate type of the Southeast Anatolian region. The long-term average annual precipitation is 432.50 mm, with an average temperature of 18.40 °C and an average relative humidity of 51.60%. The digital elevation model (DEM) map showing the sampling points of the study area is given in Fig. 1. The land is mostly stony, and the most common land use is dry field agriculture. Cereal and pistachio crops are the dominant agricultural production, in addition to some livestock. Drinking water in the region is provided by groundwater. Even though there is an elementary school at each settlement (village), students are usually bussed to larger settlements for their education. The settlement populations range from 65 to 894 inhabitants (Yesilnacar et al. 2016).

Fig. 1
figure 1

DEM map showing study area sample points

There are two different types of rocks: sedimentary and volcanic rocks in the study area. Pleistocene basalts were formed during the Karacadag intrusion as a result of the magma rising to the surface through fractures and cracks with chimneys. This unit on the surface and close to the surface does not have an aquifer feature in this region. In the Siverek Karacadag region, however, basalts have been used as spring water for years. Although the Pliocene formation does not have an aquifer feature, the water quality is very low in the sections that supply very little water. Oligocene - Lower Miocene–aged formation has no aquifer feature; it can be defined hydrogeologically as aquitard. The Eocene-aged formation is generally composed of crystallized limestone. Since karstic structures are well developed, it is an important aquifer in the region. Eocene-aged formation is used in the wells drilled for drinking water in the region.

Fieldwork and data collection

The locations and routes of the study’s sampling points were defined according to 1:25000 scale topographic and digital geological maps. A total of 63 village/sampling points were determined for fluoride level testing. Groundwater samples were taken seasonally for a period of 1 year. GPS was used for the coordinates of the sampling points. A portable Hach-Lange HQ40d multi-measurement device was used (in situ) to measure the fluoride concentration. Fluoride analyses of the groundwater samples were carried out using the procedures described in the EPA Method 340.2 (EPA 1993), APHA method 4500- F (APHA 1998), and ASTM D1179 - 99 (ASTM 1999) (Yesilnacar et al. 2016).

Fluorosis classifications and dental examination were undertaken by local dentists practicing within the study area. Exploration drilling was performed at two villages (Sarım and Karataş) to search for the geological structure. Samples were taken every 2 m and from 200 m deep wells in the drilling stage. The preserving and transporting of drilling samples were conducted using the procedure described in ASTM D5079 - 08 (ASTM 2008) (Yesilnacar et al. 2016).

X-ray diffraction (XRD) and X-ray fluorescence (XRF) modalities were analyzed on a total of 108 samples in order to determine mineralogical compositions and chemical constituents, with 62 samples from Karataş village and 46 from Sarım. The mineral composition according to depth is illustrated in Fig. 2. The chemical substances obtained from the XRF analysis were SiO2, Al2O3, Fe2O3, MgO, CaO, K2O, TiO2, P2O5, and LOI (loss on ignition). Similarly, Calcite, Quartz, Cu, Ni, S, Sr, Zn, and Zr substances were determined from the XRD analysis. It should be noted that in the XRD analysis, Cu, Ni, S, Sr, Zn, and Zr parameters are represented in ppm scale and the remaining are in percentage form. Some basic statistical information of these compounds and substances for the XRD and XRF modalities are listed in Table 1.

Fig. 2
figure 2

Mineral composition variation of Karataş (a) and Sarım (b) villages according to depth (Yesilnacar et al. 2016)

Table 1 Basic statistical information for the variables of XRD and XRF modalities

Modeling

As previously mentioned, Table 1 indicates a total of nine parameters for XRF and eight for the XRD analysis. These variables were handled as the input feature vector for the proposed simulation models. Likewise, for specifying the output variable, two aspects were considered. The first aspect is the utilization of fluoride concentrations of 2.17 and 2.63 values as an output. The second aspect is related to assigning the fluoride concentrations of 2.17 and 2.63 to dental fluorosis classes (DFC) 4 and 5, which were determined by field experts and then using those classes as the output value. Whether using fluoride concentrations or DFC as an output variable, the prediction performance of the learning algorithm actually has no effect as the problem being studied inherently has two class problems (binary classification). Therefore, DFC was selected as the output of the proposed models for the sake of simplicity.

As Table 2 depicts, in order to evaluate the classification performance of each modality (XRF and XRD) along with the input variables, various learning models were developed. Model-1 and Model-2 are based on XRF and XRD variables, respectively. Model-3, on the other hand, is a mixture of both XRF and XRD constituents. It should be noted that Model-1, Model-2, and Model-3 assess all of the variables. Conversely, Model-4, Model-5, and Model-6 try to seek the best variables from the variable pool. So, Model-4 and Model-5 were developed based on the best feature subset of XRF and XRD variables, respectively, via a global (exhaustive) search. Running a global exhaustive search on Model-6 (XRF+XRD) was not deemed to be feasible due to higher computational cost. As a result, simulated annealing (SA) meta-heuristic search, also known as suboptimal search, was applied (Kirkpatrick et al. 1983) to the Model-6 dataset.

Table 2 List of the proposed models

In the current study, another significant contribution is to determine the influence of each input variable in order to rank the variables according to their saliencies. Various variable ranking methods and approaches exist in the literature. In this respect, variables can be treated individually or collaboratively (jointly) with other variables. Traditional feature selection schemes, including Fisher discrimination power (FDP), best first, forward selection and backward elimination, independent component analyses, exhaustive and genetic, search on a single variable, i.e., they evaluate the saliency of the feature/variable individually in a greedy fashion (Blachnik 2009). Likewise, global combinatorial searches such as exhaustive, genetic, and information theoretic approaches aim to find groups of variables that have better correlation to the output. As Guyon and Elisseeff (2003) indicated, there may exist circumstances where the ranking of input variables one by one cannot actually highlight the discrimination power or weakness of the attribute/variable. Figure 3 illustrates this significant inference from their work.

Fig. 3
figure 3

Projection visualization of multivariate (2 variables [X, Y]) Gaussian distribution of binary classification. Projection on diagonal line exhibits relatively good class separation

Note that for binary classification, both of the dimensions (X and Y axes) exhibit relatively weaker classification potential based on the distribution projections on the individual axes, because class distributions projected on the axes overlap. On the other hand, by taking these two features as a complementary group and projecting them onto the diagonal line, good class separation (see lower left portion) can be achieved. Furthermore, as mentioned by Ataş et al. (2012), although the FDP and correlation coefficient of each variable with respect to target class can provide preliminary intuition about feature saliency, evaluating complementary features in the feature space can also provide significant information. For feature-ranking issues, the feature saliency score should be extracted from the list of classification performances of feature subsets. Therefore, in the current study, Normalized Weighted Voting Map (NWVM) was used, which can be considered an extension to the generic vote map algorithm used by Ataş et al. (2012).

Three different machine learning algorithms were selected as learning models. In order to investigate linearly separable characteristics of the problem, Naïve Bayes was used as a simple classifier. However, for handling the non-linearity issue, relatively complex models such as single hidden layered feed forward multi-layer perceptron (SHFF-MLP) and support vector machine (SVM) classifiers were employed. All free parameters of the classifiers were set to default values and no tuning or optimization processes were carried out.

Results and discussion

Classification results

XRD and XRF features were utilized as the input parameters of the proposed models, as shown in Table 2. Field experts specified two distinct fluorosis levels based on fluoride content in the drinking water obtained from the Karataş and Sarım wells. Those levels constituted the class labels, or output, of the proposed models. The models were evaluated by the Naïve Bayes as a linear, and ANN and SVM as non-linear classifiers with 10-fold cross-validation technique. Weka machine learning software written in Java was used for the learning algorithms (Hall et al. 2009). Classification performances of the proposed models are shown in Table 3.

Table 3 Classification performance of the proposed models

The highest classification score is highlighted in italic type for each model in Table 3. Model-1, Model-2, and Model-3 basically evaluate the classification performance of the parameters as a whole. In contrast, Model-4, Model-5, and Model-6 concentrate on the classification performance of the combinatorial subset of parameters based on exhaustive search and simulated annealing suboptimal search schemes, respectively. In the subset search process, sometimes subsets having different numbers of features can yield the same results. In this case, the subset containing fewer features was preferred, which also accomplishes dimensionality reduction (admitted in machine learning community). For example, the CaO parameter was chosen as a single input feature because it could classify the fluoride problem solely by itself. These outcomes are also in accordance with Figs. 4 and  5.

Fig. 4
figure 4

FDP scores of XRD (a) and XRF (b) variables

Fig. 5
figure 5

NWVM-based ranked variables graph of XRD (a), XRF (b), and joint, (c) modalities

Table 3 reveals that ANN outperforms SVM and Naïve Bayes classifiers in terms of the correctly classified percentage amount, and also indicates that non-linear classifiers best fit the current problem. This outcome is also in accordance with previous studies from which researchers used ANN as a machine learning algorithm (Yesilnacar et al. 2008; Sirisha et al. 2008; Asghari Moghaddam et al. 2010; Amini et al. 2010; Dar et al. 2012; Nadiri et al. 2013; Kumari and Pathak 2015).

Feature influence analysis

In order to assess the influence of each feature, FDP and Normalized Weighted Voting Map (NWVM) methods were utilized and results were then compared. The NWVM method and other software modules were developed in Java using NetBeans 8.0 integrated development environment. As mentioned previously, FDP seeks class separation capability of each feature individually and yields limited class separation on feature subsets. Another significant difference between the FDP and NWVM method is that FDP relies on the filter method and therefore it is invariant to classifiers, whereas NWVM is based on the wrapper approach and is strongly related to the classification algorithm used. Fisher discriminant (FD) was first proposed by Fisher in 1936 (Fisher 1939), and tries to project data from n-dimensional space to a one-dimensional space (i.e., line or axis) from where between-class scatter is at its maximum and within-class scatter is at its minimum (Ataş et al. 2012). FD can be computed as:

$$ FDP=\frac{{\left|{\mu}_1-{\mu}_2\right|}^2}{\sigma_1^2+{\sigma}_2^2} $$
(1)

where FDP is Fisher discrimination power, and μ and σ depict the mean and variance of each class, respectively. Figures 4 and 6 highlight the influence of each variable based on between-class scatter/distribution as a box-plot visualization and Fisher scores. In order to unify distribution ranges of parameters, z-score normalization was applied. It should be noted that in Fig. 6, the notches represent medians and edges of the boxes lie between the 25th and 75th percentiles. In addition, “+” pointers are used to indicate the outliers in the distribution. Class separation quality is determined by the distance between notches and its level of overlap. The highest gap between notches and the smallest overlap represents good class separation power of the feature. In Fig. 4, the features having relatively higher discrimination powers have higher FDP scores, and from this perspective, both Figs. 4 and 6 are in agreement. It can therefore be inferred from these figures that the XRF variables have higher discrimination power potential than the XRD attributes, and that the most salient feature is Zr and CaO from XRD and XRF, respectively.

Fig. 6
figure 6

Visualization of normalized distribution of XRD (a) and XRF (b) modalities as boxplots

Although the FDP approach provides good preliminary intuition about feature saliency/influence, it is inadequate in terms of feature complementarity. As Fig. 3 shows, sometimes weak features expose a stronger class separation power when they are combined. Thus, a new feature-ranking method was utilized, Normalized Weighted Voting Map (NWVM) based on the voting approach. The proposed NWVM method can be considered an extension of the classical voting map. That is, in the normal voting method, while a voting is made by calculating the contribution of each component equally, in the NVWM method, there is a weighted contribution for each component and the sum of the weights is one. The main idea of NWVM is demonstrated as shown in Fig. 7. If there are N number of variables, then theoretically there are 2N−1 subsets available. Basically, the objective is to find the best subset (complementary feature group) within the subset pool. Figure 7 indicates that all combinatorial groups of variables were evaluated, and that the classification accuracy of each trial was recorded in the dynamic list structure.

Fig. 7
figure 7

Schematic diagram for NWVM method

Since the contribution of each variable in the subset was unknown, equal scores of classification accuracy were assigned to each feature. Next, the scores gained for each variable were summed for all of the subsets. Then, weights of the variables were normalized to within the 20–100 range through min-max normalization, and finally the ranking procedure was performed. The researchers selected the 20–100 range because differences of influence score among the variables could be more easily visualized within that range. It should be noted that, as the XRD and XRF modalities each have only a small number of variables, NWVM computation was deemed to be straightforward due to its low computational cost. However, if these modalities were combined, as in the case of Model-6, the computational cost would increase exponentially and only the suboptimal search would be feasible. Simulated annealing seeks to find suboptimal results for the relatively larger search space where global search is infeasible. Figure 5 illustrates the NWVM graph for (a) XRD, (b) XRF, and (c) a combination of both XRD and XRF.

The influence of the attribute can be monitored both by the score located on the top of the bars and the gray color gradients. Light color indicates a weak influence, and dark color a strong influence of the features. It should also be noted that NWVM ranking scores do not represent actual scores; rather they exhibit relative influences among the features.

Conclusion

The interaction of rock chemistry on the occurrence of high fluoride concentrations in the groundwater, and as a result of this issue, dental and skeletal fluorosis cases, has been widely studied in many parts of the world (Barzegar et al. 2017; Sracek et al. 2015; Li et al. 2015; He et al. 2013; Fekri and Kasmaei 2013; Singaraja et al. 2013; Srinivasamoorthy et al. 2012; Naseem et al. 2010). However, there have been no studies estimating dental fluorosis occurrence resulting from high-fluoride groundwater using machine learning techniques and rock data obtained through XRD and XRF measurements.

Detection of fluoride content in groundwater is commonly conducted via chemical approaches. Although these have been shown to provide accurate, consistent, and reliable results, they are acknowledged to be labor-extensive, cumbersome, and costly processes. An alternative and more subtle approach is proposed in the current study. In this context, ANN, SVM, and Naïve Bayes classifiers were utilized and a novel feature selection and ranking method known as NWVM was presented. According to the results of the current study, in FDP scores, XRF variables have higher discrimination power potential than XRD attributes and most salient feature is Zr (0.464) and CaO (219.993) from XRD and XRF, respectively. When the XRD and XRF parameters are classified separately for the effect of NWVM ranking scores on the fluoride values and dental fluoride in groundwater, CaO, SiO2, MgO, Fe2O3, P2O5, and K2O (for XRF) and Quartz and Zr (for XRD) offer a stronger effect. In addition, when looking at the effects among themselves, the first order is the same XRF parameters and then the XRD parameters. Experiments reveal that X-ray fluorescence (XRF) constituents including CaO, SiO2, MgO, P2O5, and K2O have higher class discrimination power than the X-ray diffraction (XRD) variables.

The proposed NWVM feature-ranking method provides a simple yet powerful tool that assesses the influence of the variables, based on the complementarity/grouping potential, and therefore can be used as a generic ranking tool for analysis of multivariate problems. Influence score of Quartz in XRD modality based on FDP method was reported as 0.237 as the second salient variable. However, according to the NWVM approach, it is the most significant feature because it contributes to the classification performance of most feature subsets. Al2O3 is the least salient feature in the XRF parameters, considering both the NWVM and FDP methods, and as a result should be removed from the parameter list. Experimental results have shown that the classification of dental fluorosis stemming from groundwater and rocks by machine learning techniques is possible, and that the proposed NWVM method provided promising results for feature influence analysis.