Introduction

Groundwater makes up about 20 % of the world’s fresh water supply, which is about 0.61 % of the entire world’s water including oceans and permanent ice. In India, almost 80 % of the rural population depends on untreated groundwater for domestic and agriculture purpose. In recent years, the rate of discharge of pollutants into the environment is continuously increasing due to rapid growth of population, urbanization and accelerated pace of industrialization. It causes contamination of both fresh water and groundwater. Groundwater pollution can occur where industrial waste is discharged into pits, ponds, lagoons, rivers enabling the waste to percolate to water table. The polluted water endangers not only the valuable human life but also causes considerable biological disorder in the organisms. In addition, they cause serious environmental pollution responsible for health hazards. Groundwater contamination and its management have become important because of far reaching impact on human health. Therefore, knowledge of the water quality and evaluation of water quality index (WQI) plays a significant role in water quality control and management. WQI is generally considered as a means of summarizing the various water quality parameters into a simple index. The index helps in interpreting the quality of water in a single numerical value (Horton, 1965; Brown et al., 1970; Dinius, 1972; Lohani and Todino, 1984).

Classification of WQI is useful for quantifying the heteroginity existing in the system. Researchers such as Bhargava (1983), Swamee and Tyagi (2000), Sarkar and Abbasi (2006) and Tiwari and Mishra (1985) have worked in this direction and proposed mathematical and statistical models for classification of groundwater. Recently, Lumb et al. (2011) have reviewed various water quality classification models based on WQI. Various WQIs differ in the manner in which statistical integration and interpretation of parameter values are made. A totally different approach was adopted in the Canadian Water Quality Index also known as Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI) (Lumb et al., 2011, 2006). CCME WQI is also being used by many countries all over the world and has also been endorsed by United Nations Environmental Program (UNEP) in 2007 as a model for Global Drinking Water Quality Index (GDWQI). The most commonly used parameters in this model are dissolved oxygen, pH, turbidity, total dissolved solids, nitrates, phosphates, and metals, among others.

The present study proposes an empirical method of classification scheme for assessing water quality of groundwater. The correlation of water quality parameters sometimes results in unjustified classifications if water quality is expressed by an index. The proposed method is quite efficient and prediction quality is reasonably good. The method first calculates similarity coefficients for all the members in the data set using Euclidean distance as a similarity measure. The Euclidean distance matrix is used as input to the Q-mode Principal Component Analysis (PCA) to classify the data (Albadawi et al. 2005). The factor loadings of each member of data set on principal component are taken into account to cluster the water sample into appropriate group. The number of clusters is decided on the basis of percentage variation explained by the principal components.

Literature Review

The quality of groundwater has been studied earlier by various researchers. Among them, Karnchanawong and Ikeguchi (1993) have evaluated quality of well water near the Mae-Hia waste disposal site. Zhang et al. (1996) have predicted water quality index in 14 cities of China where the groundwater is polluted due to use of fertilizers for agricultural purposes. Lind et al. (1998) have studied the impact of mining activity on pH of groundwater and its effect on water quality. Maticie (1999) have observed the impact of agriculture on groundwater quality in Slovenia. Shamruck et al. (2001) have studied the impact of environmental parameters on quality of Nile Valley aquifer. Ammann et al. (2003) have evaluated groundwater pollution and its impact on water quality by run-off. Almasri and Kaluarachchi (2004) have reported on occurrence of nitrate in the groundwater in agricultural watersheds in Whatcon County, Washington. WQI is strongly dependent on various correlated parameters taken for the study.

However, identification of the suitability of the parameters is critical for accurate evaluation of WQI. Water quality is generally ascertained based on physical,chemical, and biological indicators using indicators such as pH, electrical conductivity (EC), total dissolved solids (TDS), total suspended solids (TSS), hardness, turbidity and contaminant concentrations based on guidelines provided by agencies such as the World Health Organization WHO (2006) and the Bureau of Indian Standards (BIS) (1991). Nagarajan and Priya (1999) have studied the groundwater quality deterioration in Tiruchirapalli, Tamil Nadu and found that TSS, iron and magnesium values are beyond the permissible limit. Singh and Parwana (1999) have studied the pollution load in the groundwater in Punjab state due to industrial waste water and found the presence of chromium and cyanide in groundwater beyond permissible limit of drinking water standards. Jha and Verma (2000) have studied the physicochemical properties of drinking water in town area of Godda district under Santal Pargana (Bihar). They have reported that most of the surface water quality parameters are within the limit of drinking water standards. However, well water is characterized by a very high concentration of chloride, chromium and selenium. Srinivas et al. (2000) have studied the groundwater quality of Hyderabad taking 32 tube well water samples and reported that electrical conductivity, TDS, total alkali, hardness, calcium, magnesium, sodium and chlorides are above the permissible limit according to WHO and Indian Standards. Chaudhari et al. (2004) have studied the quality of groundwater near an industrial area at Jalgaon (Maharastra) and WQI of samples suggests that the water is not suitable for direct consumption. Shaji et al. (2009) have studied quality of water in mineral and industrial area of Kerala in India. Waste materials near the factories are subjected to reaction with percolating rain water and reach the aquifer system and, hence, degrade the groundwater quality (Tyagi et al. 2003). Heavy metals constitute a very heterogeneous group of elements widely varied in their chemical properties and biological functions. They are persistent in nature; therefore, they get accumulated in soil and plants. Dietary intake of many heavy metals through consumption of plants and drinking water has long-term detrimental effect on human health (Sharma and Agarwal, 2005; Ubala et al., 2001; Sabal and Khan, 2008).

However, quality is a vague term that cannot easily be described using crisp data set e.g. good quality water cannot simply be described as having a pH value of 7.0 or above. Instead, water quality can best be described based on its degree of potability and potential usages rather than expressing its constituents in numerical terms. Fuzzy reasoning technique has also been applied in groundwater and surface water quality forecasting (Dahiya et al. 2007; Parinet et al. 2004; Singh et al. 2008). Some of the artificial neural network as well as multivariate analysis methods are also found to be more useful for determination of water quality based on fuzzy and principal component analysis (PCA) techniques (Iscen et al., 2008; Shrestha and Kazama, 2007).

A number of indices have been developed to summarize water quality data in an easily expressible and easily understood format (Couillard and Lefebvre 1985). Horton (1965) proposed the first water quality index (WQI), where a great deal of consideration has been given to the development of index methods. The basic differences among these indices are the way their sub-indices were developed. Walski and Parker (1974) used an exponential function to represent the sub-indices of various quality variables. Landwehr (1979) suggested the Pearson type 3-distribution function to represent the sub-indices of all the quality variables. Bhargava (1987) modified the exponential formula; Dinius (1987) used power function for the majority of sub-indices. Nives (1999), Swamee and Tyagi (2000) proposed aggregate index for water quality description. In addition, Harrison et al. (2000), Faisal et al. (2003), Ahmed et al. (2004) and Shiow-Mey et al. (2004), each have recently modified a water quality index. Some of the sub-indices have since been incorporated into water quality indices used by agencies such as the National Sanitation Foundation (NSF) (Ahmed et al. 2004). The most important WQIs belonging to environmental departments or agencies are the National Sanitation Foundation (NSF), British Colombia Water Act Quality Index, Oregon Water quality Index, Stream Watch (Southern Indiana), Malaysian Water Quality Index, France Water Quality Index, French Creek Quality Index, Florida Stream Water Quality Index, British Colombia Water Quality Index, Canadian Water Quality Index, Taiwan Water Quality Index and Washington State Water Quality Index. Comparison among several WQI systems currently in use showed that none of them describes quality of water from mining effluent because most of sub-indices in current WQIs are not relevant to indicating changes in water quality brought about by mining activities.

Study Area

Odisha is a state in India located at an elevation of about 219 meters above mean sea level. In this study, groundwater quality of wells of urban area of Rourkela in Sundergarh district which is located at 84.54E longitude and 22.12N latitude is considered. Rourkela comes under tropical monsoon climate and is more like that of the Deccan Plateau. Being in the north-eastern corner of the Deccan Plateau, the climate is milder than the climate of the main Deccan region. The area of Rourkela is 200 square kilometers approximately. Red and laterite soils are found here which are quite rich in minerals. The area near Rourkela is rich in iron ore; hence a steel plant and other iron and steel industries are situated in the region. These industries are polluting the surrounding areas including groundwater resources. Large number of motor vehicles may also contribute in the release of heavy metals into surrounding environment. The climate is hot and dry during summer season. Normally, there is heavy rainfall due to south-west monsoon and light rainfall during the pre-monsoon seasons. The south-west monsoon usually onsets during second week of June and retreats by mid September. The humidity is generally high mostly in the monsoon and post-monsoon periods. The relative humidity is low during summer season. The mean values of the humidity, however, in a year range from 35 to 85 % and the annual average is 66 %. The Koel and Sankha Rivers meet at Vedvyas, Rourkela and flow as a single river called Brahmani. Hence, Rourkela is the confluence of three rivers: Koel, Sankha and Brahmani. The geographical location of study area is shown in Fig. 1.

Fig. 1
figure 1

The study area

Methodology for Sampling

In order to classify water quality into different clusters, a number of water samples were collected from 10 wells and their locations are shown in Table 1. Water samples of three years have been taken into consideration for the study. Also season-wise data for each corresponding year was taken. Water samples of all the places and seasons were not included due to some seasonal impact like heavy rainfall, encroaching heat, etc. Water samples from different sampling stations are collected in standardized PET (polyethylene terephthalate) bottles, which are thermostated bottles. The PET bottles of 1.5 liter capacities with stopper were used for collecting samples. The PET bottles can be used for collection of samples to analyze both organic and inorganic constituents in water. The bottles were washed thoroughly with 2 % nitric acid and subsequently rinsed with distilled water. The bottles were then preserved in a clean place. Before taking the water samples, all the supply bottles are rinsed with sample water 2–3 times. As all the physicochemical parameters are measured within 24 hours of sample collection, there is very little possibility of changing concentration of any parameters including heavy metals. The sampled bottle is made watertight by air tightening it inside water. Precautions have been taken to remove any air bubble present. Each container was clearly marked with the name and date of sampling. Various physicochemical parameters such as pH, turbidity, total dissolved solids (TDS), hardness, biochemical oxygen demand (BOD), dissolved oxygen (DO), chloride, sulfate, iron, calcium hacw been taken for analysis. Physicochemical parameters such as pH, turbidity, dissolved oxygen (DO), and total dissolved solids (TDS) were measured using water analysis kit model 191 E. The methodologies adopted for determination of water quality parameters of the collected samples are shown in Table 2.

Table 1 The location of wells used for collection of water samples
Table 2 Test methods for various parameters of water samples

Water samples were collected from groundwater sources on monthly basis. However, study of physicochemical characteristics was made through seasonal observations. The seasons are broadly divided into three seasons such as summer (March to June), rainy (July to October) and winter (November to February). The summer is too hot (max temperature 48 °C) in this part of the country; hence, data have not been collected in summer season. The procedure for estimating 10 quality parameters such as pH, DO (dissolved oxygen), turbidity, TDS (total dissolved solids), hardness, calcium ions, chloride ions, BOD (biological oxygen demand), iron and sulfate is shown in Table 2. The samples were collected in three years: 2008, 2009, and 2010. The physicochemical characteristics of water samples are given in Tables 3a and 3b, respectively, for rainy and winter seasons. The notation used to denote samples is as follows: W stands for the well and the number following W denotes the well number. The letters r and w respectively denote the rainy and winter season. The numbers 01, 02, and 03 respectively denote the year of sampling, 2008, 2009, or 2010.

Table 3a Values of physicochemical parameters in rainy season
Table 3b Values of physicochemical parameters in winter season

The average values of water quality parameters in two different seasons are verified with the permissible limits prescribed in IS:10500 shown in Table 4. From Tables 3a, and 3b it can be observed that most of the parameters are within permissible limit. However, parameters such as TDS, hardness, calcium, chloride, and iron lie below the permissible limit whereas, while DO lies above the limit. It may be noted that the value of turbidity decreases to almost half during winter season as compared to rainy season. Pearson correlation coefficients for parameters in rainy and winter seasons are shown in Tables 5a and 5b, respectively. The correlation coefficient of 0.5–0.75 is considered as moderate correlation between two variables (Montgomery and Runger 1999). In rainy season, highest correlation is observed for parameters turbidity and pH (0.731), followed by calcium and hardness (0.699). In winter season, the strongest correlation is observed between calcium and pH (0.529). The outliers in the data are removed through examination of scatter plots. The changes in average parameter values and the correlation coefficients in different seasons are observed due to human and industrial activities.

Table 4 Permissible limits of parameters (IS:10500)
Table 5a Correlation matrix of parameters (rainy season)
Table 5b Correlation matrix of parameters (winter season)

Determination of Water Quality Index (WQI)

In the formulation of water quality index, the importance of various parameters depends on the intended use of water and water quality parameters are studied from the point of view of suitability for human consumption. The ‘standards’ (permissible values of various pollutants) for the drinking water are recommended by the Indian Council of Medical Research (ICMR). When the ICMR standards are not available, the standards of United States Public Health Services (USPHS), World Health Organization (WHO), Indian Standards Institution (ISI) and European Economic Community (EEC) are being quoted.

The quality rating q i for the ith water quality parameter is obtained from the relation

$$ q_{i} = 100(v_{i}/s_{i}) $$
(1)

where v i =value of the ith parameter at a given sampling station and s i =standard permissible value of the ith parameter. This equation ensures that q i =0 when a pollutant (the ith parameter) is absent in the water while q i =100 if the value of this parameter is just equal to its permissible value for drinking water. Thus, the larger the value of q i the more polluted is the water with the ith pollutant. However, quality ratings for pH and DO require special handling. The permissible range of pH for the drinking water is 7.0 to 8.5. Therefore, the quality rating for pH may be

$$ q_{\mathrm{pH}} = 100\bigl[(v_{\mathrm{pH}} - 7)/(8.5 - 7.0)\bigr] $$
(2)

where v pH=value of pH∼7, it means the numerical difference between v pH and 7.0 ignoring algebraic sign. Equation (2) ensures the q pH=0 for pH=7.0. In contrast to other pollutants, the case of DO is slightly complicated because the quality of water is enhanced if it contains more DO. Therefore, the quality rating q DO has been calculated from the relation

$$ q_{\mathrm{DO}} = 100\bigl[(14.6 - v_{\mathrm{DO}})/(14.6 - 5) \bigr] $$
(3)

where v DO= value of DO.

In Eq. (3), 14.6 is the solubility of oxygen (mg/l) in distilled water at 0 °C and 5.0 mg/l is the standard for drinking water. Equation (3) gives q DO=0 when DO=14.6 mg/l and q DO=100 when v DO=5.0 mg/l. The more harmful a given pollutant is, the smaller is its permissible value for drinking water. So the ‘weights’ for various water quality parameters are assumed to be inversely proportional to the recommended standards for the corresponding parameters, i.e.

$$ W_{i} = \frac{K}{S_{i}} $$
(4)

where W i =unit weight for the ith parameter (i=1,2,3,…,10), k=constant of proportionality which is determined from the condition and k=1 for the sake of simplicity:

$$ \sum_{i = 1}^{10} W_{i} = 1 $$
(5)

To calculate the WQI, first the sub-index \(\mathrm{(SI)}_{i}\) corresponding the ith parameter is calculated. These are given by the product of the quality rating q i and the unit weight W i of the ith parameter, i.e.

$$ \mathrm{(SI)}_{i} = q_{i}W_{i} $$
(6)

The overall WQI is then calculated by aggregating these sub-indices (SI) linearly. Thus, WQI can be written as

$$ \mathrm{WQI} = \Biggl[ \sum_{i = 1}^{12} q_{i}W_{i} \Biggl/ \sum_{i = 1}^{12} W_{i} \Biggr] $$
(7)

which gives

$$ \mathrm{WQI} = \sum_{i = 1}^{12} q_{i}W_{i} $$
(8)

Water quality can be categorized into five classifications depending on WQI values. Water quality can be treated as excellent, good, poor, very poor, and unsuitable for drinking if WQI lies in the ranges of 0–25, 26–50, 51–75, 76–100, and above 100, respectively.

The water quality index values for all the data shown in Tables 3a and 3b are obtained using Eqs. (1) through (8) and are shown in Table 6. It should be noted that WQI for all the data lies in the range from excellent to very poor for human consumption.

Table 6 The water quality index (WQI) values for all samples

Generation of Euclidean Distance Matrix

The parameters of water samples shown in Tables 3a and 3b possess different measuring scales. Therefore, they need to be normalized to reduce the scaling effect. A simple normalization procedure of dividing selected variables by their maximum value is adopted here. After normalization, all data vary from zero to one. Implementation of factor analysis requires the correlation matrix of the initial data set. The correlation matrix is obtained in the form of a Euclidean distance matrix (Hair et al. 2009). Euclidean distance is taken as the similarity measure and is defined as the sum of the squares of the difference between the values of attributes of two water samples. Mathematically, it may be given as

$$ d(x,y)= \sqrt{\sum_i(x_i-u_i)^2} $$
(9)

where d(x,y)=Euclidean distance, x=x 1,x 2,…,x m and y=y 1,y 2,…,y m represent m attribute values of two samples. If the distance is zero, both the coal samples are similar. If it is above zero, the Euclidean distance indicates the intensity of dissimilarity between two water samples. The Euclidean distance matrix is generated considering all the water samples. An entry in the matrix denotes Euclidean distance between the pth row and the (p+1)th row of the water samples. The Euclidean distance matrix is thus generated for the use in Q-mode PCA.

Q-mode Principal Component Analysis

PCA is the most widely used, straightforward and quantitatively involved method for transforming a given set of interrelated variables into a new set of variables called the principal components (corresponding to factors in factor analysis). The set of principal components generated presents uncorrelated linear combinations of the original variables and accounts for the total variance of the original data. In this method, all the principal components are generated in such a way that they are orthogonal to each other; hence, correlation between them is zero. The principal components are generated in a sequentially ordered manner with decreasing contributions to the variance, i.e. the first principal component explains most of the variations present in the original data, and successive principal components account for decreasing proportions of the variance. This property means that the data points can be rigorously separated into distinct clusters when projected into a space spanned by the first few principal components, which are called factors. This achieves the dimensionality reduction objective of factor analysis. PCA can be broadly classified into two categories, viz., R-mode and Q-mode, based on application. If PCA is used to develop a structure among variables, it is referred to as an R-mode PCA. When PCA analysis is used to group cases, it is called a Q-mode PCA. It is customary to use rotation methods to transform the factors to simpler and more interpretable constructs. After rotation, each variable will be only related to one of the factors and each factor will have high correlation with only a small set of variables. In recent years, Q-mode PCA has been widely adopted by the researchers for classification of groundwater quality, coffee preference, gene regulatory process, and machines in cellular manufacturing (Albadawi et al., 2005; Dijksterhuis, 1998; Park et al., 2001; Singh et al., 2008; Woocay and Walton, 2008).

Assuming the sample parameters as the original set of variables, and the Euclidean distance matrix as an estimate of the correlation matrix explaining the correlations between each pair of samples, we proceed to use the PCA framework for grouping the samples into separate independent clusters. In the PCA method, the initial clusters are extracted out by the eigenvalue-eigenvector analysis of the similarity coefficient matrix as presented in Eq. (10):

$$ (S-I\lambda_{i}) Y_{i} = 0,\quad i = 1, 2, \ldots, P $$
(10)

where S is a P×P Euclidean distance matrix, I is the identity matrix, λ i are the characteristic roots (eigenvalues), and Y i are the corresponding eigenvectors.

Equation (10) is an eigenvalue-eigenvector equation, λ 1λ 2≥⋯≥λ p are the real, nonnegative roots of the determinant polynomial of degree P given as

$$ I(S - I\lambda_{i})I = 0 $$
(11)

This equation is solved for λ i and then Y i can be calculated, using the values of λ i in Eq. (10). It is proven that the eigenvectors thus computed represent the unique set of P independent principal components (factors) of the data set, which maximize the variance (Basilevsky 1994). According to the PCA method, each of the P independent principal components (factors) can be written as a linear combination of the original variables (water samples), with the elements of the P eigenvectors as the coefficients of these linear combinations. Furthermore, the elements of these eigenvectors reflect the degree of association between each principal component (factor) and the sample, and are called the ‘factor loadings’ of the samples on the ith principal component in factor analysis terminology. Each of the P independent principal components represents a cluster. There should be low similarities among samples that are associated with different clusters and high similarities among samples strongly associated with the same cluster. In regard to the number of sample size, Basilevsky’s assumed data set should be three to four times the number of variables.

Results and Discussion

Considering water sample parameters as shown in Tables 3a and 3b as variables, and applying the above methodology, the corresponding eigenvalues and eigenvectors for the Euclidean matrix were calculated using SPSS version 14.0 software. It is customary to use rotation of the components to obtain optimal distribution of variances in various components. Varimax rotation was applied to obtain optimal distribution of variances in various components. The number of factors (clusters) can now be selected based on principal components showing eigenvalues above one or number of principal components forming the cliff in scree plot or Akaike’s information criterion (AIC) (Basilevsky, 1994; Kaiser, 1960; Valarmathie et al., 2009). In this work, scree plot is used to select the number of clusters. It can be observed from the scree plots (Figs. 2a and 2b) for two seasons that only four clusters are needed to group the water samples. These four groups contribute 85.62 % in rainy and 88.22 % in winter seasons. Therefore, it is clear that the water sample data for two seasons can be clustered into four groups. In order to make a fair comparison with WQI values, it is decided that water quality can be categorized into four classifications depending on WQI values. Water quality can be treated as excellent, good, poor, and very poor if WQI lies in the ranges 0–25, 26–50, 51–75, and 76–100, respectively.

Fig. 2a
figure 2

Scree plot for data of rainy season

Fig. 2b
figure 3

Scree plot for data of winter season

The next step is to assign the water samples into various clusters. In this study, absolute values of the elements of the eigenvectors (the factor loadings) are used to identify the clusters for water samples. The rotated factor loadings are shown in Table 7. For example, for water sample of W1.r.01, the factor loading in cluster 1 is 0.93 which is higher as compared to loadings in other clusters, hence a stronger relationship with cluster 1 rather than with clusters 2, 3, and 4. It is to be noted that all the samples of well number 1 are clustered into group 1 irrespectively of seasons. From Table 6 it can be seen that the WQI of samples from well number 1 lying in the range 0–25 indicates that water samples belong to excellent category. Therefore, water samples belonging to cluster 1 (principal component 1 or PC1), cluster 2 (PC2), cluster 3 (PC3), and cluster 4 (PC4) are treated as of excellent, good, poor, and very poor quality, respectively. This procedure was repeated for all the samples to find out their respective clusters. In the same argument, sample W4.w.03 belongs to cluster 4 (very poor quality).

Table 7 Rotated factor loadings for water samples

If a comparison is made on classification of water samples by Q-PCA mode (four clusters) and WQI method, it is observed that same classification has resulted in both the methods. It is found that nine samples belong to cluster 1, four samples to cluster 2, three samples to cluster 3, and three samples to cluster 4 for data from rainy season. Similarly, nine samples belong to cluster 1, three samples from cluster 2, two samples to cluster 3, and three samples to cluster 4 for data from winter season. The major advantage of clustering water quality data in an empirical manner lies in the fact that it avoids subjectivity on weight assignment to parameters for WQI calculation. Furthermore, it provides a computationally elegant method with lesser dependence on choice of parameters. However, quality is a vague term that cannot be easily described using crisp data set, e.g. good quality water cannot simply be described as having a pH value of 7.0 or above. Instead, water quality can best be described based on its degree of potability and potential usages rather than expressing its constituents in numerical terms. Therefore, the non-parametric empirical method proposed here is efficient for such application. Any new water sample can be placed in any one of the above categories by knowing the constituents of physicochemical analysis, which is a routine analysis in the field and laboratory. The method is quite generic and can take care of any number of parameters. Although the method classifies water samples into proper groups, the performance of Q-mode clustering can be improved if size of data set is increased.

Conclusions

In this work, PCA-based classification has been proposed for classification of water samples. It has been demonstrated that the methodology efficiently classifies into various clusters as far as the present data set is concerned. A similar classification can be obtained when WQI is calculated for the data set. The approach presented here is efficient and computationally elegant for classification of water samples. Importantly, it can be used in the field and laboratory due to easy accessibility and availability of statistical packages. The present approach has several advantages over other approaches:

  • The physicochemical analysis of any water sample can be determined in a laboratory conveniently, since no sophisticated and costly experimental setup is required for the purpose. The classification system matches closely with the classification system based on calculation of WQI.

  • It can be supported by available commercial software programs such as SPSS in order to facilitate industrial applications.

  • It has the flexibility in allowing the user to identify the required number of clusters in advance, or consider it as a dependent variable.

The method of classification of water samples proposed in this work is quite generic and works well for present data set. Since such structured approach has already been applied in various fields of engineering due to its strong foundation, it is expected to work efficiently irrespectively of data sets. However, the efficiency of the method needs to be tested with water samples from other parts of the world.