Geoenvironmental zoning evaluation and optimization

Zamani, Ahmad; Farahi, Sedigheh; Boostani, Reza; Hassani-Saadi, Hamed

doi:10.1007/s12145-014-0184-0

Geoenvironmental zoning evaluation and optimization

Research Article
Published: 19 October 2014

Volume 8, pages 583–593, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Earth Science Informatics Aims and scope Submit manuscript

Geoenvironmental zoning evaluation and optimization

Download PDF

Ahmad Zamani¹,
Sedigheh Farahi¹,
Reza Boostani² &
…
Hamed Hassani-Saadi²

218 Accesses
3 Citations
Explore all metrics

Abstract

The problem of fast pattern classification by automatic analytical and sorting techniques is relevant across a wide range of scientific and technical disciplines. Since the availability of mass transactional and experimental data sets to address the challenges faced by the Earth and environmental scientists, validation of zoning has become an important topic. In this paper, we propose a new approach for producing Automatic Integrated Self-Organized Optimum Zoning (AISOOZ) maps using comprehensive (multivariate) geological and geophysical data. Unlike conventional zoning, the new approach includes techniques for finding the optimal structure that best fits natural pattern of a given area without the benefit of any a priori class information. While there are obvious similarities between the conventional and new optimal zoning maps, the automatic optimal approach reveals new insights into the geological evolution of the study area that could not be observed on the conventional maps. The success of the AISOOZ case study encourages the enlargement of its scope and application for rapid online as well as offline interactive multivariate pattern discovery in the Earth and environmental sciences studies. Finally, a comparative study between two widely used stopping criteria for optimal zoning and pattern recognition has been performed.

Mineral Occurrence Target Mapping: A General Iterative Strategy in Prediction Modeling for Mineral Exploration

Article 15 May 2019

GEOstats: an excel-based data analysis program applying basic principles of statistics for geological studies

Article 02 October 2021

Introducing the coupled stepwise areal constraining and Mahalanobis distance: a promising MCDM-based probabilistic model for landfill site selection

Article 27 April 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

One of the important capabilities of the human brain is its ability to classify entities based on the characteristics that the brain either recognizes or judges to be of importance. However, conventional methods of classification are characterized by two deficiencies. The first one is the large uncertainty involved in classification based on subjective analysis. Failure to visually interpret accurately a massive amount of data is the second. Automatic Integrated Self-Organized Optimum Zoning (AISOOZ) provides researchers with an empirical and objective method for performing this natural task for humans (Zamani and Hashemi 2004; Zamani et al. 2011, hereafter referred to as I, and II). Typically, the characteristics of the Earth and its natural phenomena are not only correlated with each other, but each characteristic is also influenced by other characteristics. Thus, in many instances the characteristics are interwoven in such a way that when analyzed individually, they produce little information about the Earth and its environment. Furthermore, the development of more sophisticated technology to collect information about the area under investigation has outpaced human’s ability to use it to full potential. Today, Earth scientists have access to large quantities of data which contain far more useful information about the Earth and its environment than can be extracted by conventional methods. As the pace of decision making increases, researchers must also become more acquainted with new “field” of data analysis. Sifting through massive amount of data requires both statistical reduction and the ability to compute theoretical solution in Earth models with many parameters (I, II). Data mining (sometimes called data surfing or knowledge—discovery) is defined as the process of automatically collecting and searching through large amount of data in a database (Witten et al. 2011; Zamani et al. 2009, 2012, 2013b). As the term suggests, this new perspective of data analysis has a somewhat more exploratory rather than confirmatory nature. This technique is directed toward searching deeply into the characteristics of the large databases for patterns and relationships. Data mining uses artificial intelligence techniques, neural networks, and advanced statistical methods, to reveal trends, patterns, and relationships, which might otherwise have remained obscured by the complex patterns of association and massive amount of data. To do this, data mining uses computational techniques from statistics and pattern recognition (Zamani et al. 2012). Such data processing makes large volume of data digestible, discovers hidden rules underlying the data, and facilitates deduction of patterns and trends. Because of such properties, from a practical perspective, automated pattern recognition plays an essential role in many fields such as scientific data exploration, information retrieval and text mining, business, census, and education (Ashok Kumar and Kannathasan 2011). Among various data mining methods, statistical method of hierarchical classification is an important tool for “unsupervised” learning procedures—the problem of finding groups in data set without respect to external information. Unsupervised evaluation measures used in these procedures are often called internal indices because they use only information present in the data set. Whereas supervised learning procedures use some external evaluation measures often called external indices. In the context of data mining and pattern recognition hierarchical classification divides data set into groups by automatic analytical sorting techniques. Therefore, entities of same group are more similar than entities of different groups at different levels of partitioning. Nonhierarchical techniques assign objects or entities into groups once the number of groups to be formed is specified before any classification is accomplished. The procedure begins by selecting class seeds, which are used as initial guesses of the means of the classes. Class seeds can be specified by the researcher or selected randomly from all observations (i. e. objects or entities). Assuming n sampled sites in the study area, the first level is a partition of n sites into n groups each one containing one site. Therefore, at level n all sites form one group. If zones in the same groups stay together at higher levels, then the sequence is called a hierarchical classification (Zhu et al. 2008, 2010). As a result a series of nested partitions are generated, from the trivial partition with a single zone (containing all sampled sites) to the trivial partition where each sampling site is a zone. In between these two extremes are many candidate partitions or zonings that the researcher will have to choose from. The problem of determining the appropriate number of zones in an area is relevant across a wide variety of disciplines such as business, social sciences, biological sciences, Earth sciences, medicine, and engineering to name among others. Therefore, the use of a hierarchical classification method requires a decision regarding the stage or level which best reproduces the underlying structure. Since there is no internal statistical criterion used for inference, researchers have developed a number of criteria and guidelines for approaching the problem. These techniques that are sometimes referred to as stopping rules are ad hoc procedures and must be computed by the researchers. The resulting pattern or underlying structure could be considered to consist of “natural” or optimal zones in a given area. This underlying structure exhibits the properties of internal cohesion (compactness) of the sampled sites in a zone and external isolation (separation) of a zone from other zones (Halkidi et al. 2001; Tan et al. 2006). The interpretation of the resulting pattern depends very much on the correct choice of the number of zones. Therefore, to achieve such goal as selecting the “best” or optimal number of zones based on the distribution of zoning criterion, a stopping rule is desired (Chaimontree et al. 2010). Virtually all zoning procedures provide little if any information as to the number of zones present in the study area. Silhouette criterion is a zone validity index that is used to judge the quality of any zoning solution (partition). It reflects the relative similarity cohesion within zones and separation between zones. This criterion has gained popularity by showing more stability and robustness with the multivariate zoning classification. To illustrate the successful application of AISOOZ, Ward’s classification method and Silhouette validation criterion were used to produce the optimal comprehensive (multivariate) geologic zoning (AISOOZ) map of Iran as a case study. While there are similarities between AISOOZ map and conventional (current) geologic maps, the comprehensive optimal zoning map reveals some remarkable features not found in the conventional maps. Finally, a comparative performance study of Wilk’s lambda test statistic (II) and the Silhouette validation index for determining the optimum number of geological zones was conducted.

Method of analysis

Unsupervised hierarchical classification procedures are among the best known pattern recognition techniques (Grira et al. 2005). These procedures are needed to seek and separate zones with similar pattern in the dataset at different levels. In supervised classification, the validation of the resulting classification model is an integral part of the procedure of developing a classification model. However, because of its very nature, unsupervised zoning validation is not a commonly used part of classification analysis. Let us consider a hierarchical classification of n sampled sites or entities into k zones. The first level is a partition into n singletons, each one containing one site. The next level is a partition into n-one zones, and so on until all the sampled sites form one zone. At level m in the sequence the numbers of zone(s) is c = n-m + 1. Therefore, level one corresponds to n singletons, each containing one site and level n in which all the sites form one zone (Zhu et al. 2008, 2010). Hierarchical classification is divided into agglomerative (bottom up or clumping) and divisive (top down or splitting) methods. Agglomerative methods begin with n singleton and the process is repeated by successively merging zones until all sites finally join together in a single zone. Divisive methods begin with one zone containing all the sampled sites, which in succeeding steps, split into smaller zones until each site is in a separate zone (singleton). These procedures are the opposite of the agglomerative procedures. Given a dataset, one of the major challenges to be solved in a classification process is to choose the classification model that could be more appropriate and explain better the underlying structure of data set (Ertöz et al. 2003). Another important objective in classification is to assess the “natural” or optimum level of partitioning in a given dataset, which is even more challenging when no response variable or classification model is available. Virtually all classification methods provide little if any information as to the optimum level of partitioning in the dataset. So, in this paper a new approach based on the application of Ward’s minimum variance method for hierarchical classification and Silhouette validation criterion (also known as Silhouette width) is proposed to construct optimum geoenvironmental zoning maps. Ward’s minimum variance method (Ward 1963) minimizes the loss of information with each geoenvironmental zone. Information loss is expressed in terms of a within-error sum-of-squares criterion, ESS, which is a built-in dissimilarity criterion (I). The Silhouette statistics is an internal criterion for unsupervised zoning validity, which combines both cohesion (compactness) and separation for determining classification evaluation (Fig. 1). It implies the assessment of zone coherence by comparing inter-to intra-zone variability (Halkidi et al. 2001; Tan et al. 2006). The cohesion and separation functions can be expressed using Eqs. (1) and (2), respectively (Jegatha Deborah et al. 2010; Tan et al. 2006).

$$ cohesion\left({C}_i\right)={\displaystyle \sum_{\begin{array}{c}\hfill x\varepsilon {c}_i\hfill \\ {}\hfill y\varepsilon {c}_i\hfill \end{array}}} proximity\left(x,y\right) $$

(1)

$$ separation\left({C}_i,{C}_j\right)={\displaystyle \sum_{\begin{array}{c}\hfill x\varepsilon {c}_i\hfill \\ {}\hfill y\varepsilon {c}_i\hfill \end{array}}} proximity\left(x,y\right) $$

(2)

The evaluation function can be a similarity, dissimilarity, or some combination of these quantities. Equations 1 and 2 give us some well-defined measures of classification evaluation that can be combined into an overall measure of classification evaluation for a set of k zones, using a weighted sum of the validity of individual zones (Eq. 3).

$$ overall validity={\displaystyle \sum_{i=1}^k}{w}_i validity\left({C}_i\right) $$

(3)

The cohesion and separation validity functions used in the overall validation of a group of zones (Eq. 3) also can be applied to evaluate individual zones or objects. This information can be used to improve the quality of a classification process. The contribution of individual entities or sites within a zone also can be evaluated, using cohesion and separation validity functions (Jegatha Deborah et al. 2010; Tan et al. 2006). Objects that contribute more to the cohesion and separation are near the “interior” of the zone. On the other hand, items that contribute less to the cohesion are near the “edge” of the zone. The Silhouette validation criterion is a dimensionless stopping rule that has been used in a wide variety of disciplines for assessing the “natural” numbers of partitions (Rousseeuw 1987; Kaufman and Rousseeuw 1990, 2005). Its application for geoenvironmental zoning and patterns discovery in the Earth and environmental sciences is a novel approach to this problem.

Data analysis

To illustrate the successful application of AISOOZ, geological zoning map of Iran has been produced as a case study. For this purpose, large numbers of new and updated geophysical and geological attributes of Iran (Table 1) have been compiled for the 175 quadrangles sites of 1° areas.

Table 1 Characteristics used for producing automatic integrated self-organized optimum zoning (AISOOZ) map, measured within 1° quadrangular sites^a

Full size table

The quadrangles from west to east are numbered beginning with 1 for the quadrangle between 44° E and 45° E meridians increasing to175 for the quadrangle between61° E and 62° E meridians (I, II). None of offshore Iran is included in the dataset. These quadrangles are used as items or observations (input samples). Each sampling site has been characterized by 49 attributes (Table 1) that seem to characterize the intensity and degree of contrast between tectonic, neotectonic, and seismotectonic characteristics of Iran. The attributes have all been standardized to have a mean of zero and a standard deviation of one. This removes the bias introduced by the differences in the scales of the attributes and ensures all have equal weight in the calculation of similarity. The normalized data set is then classified using Ward’s minimum variance for hierarchical classification. This procedure which is more intuitive and computationally efficient minimizes information loss associated with zoning classification and creates small and even sized zones (I, II). At each step it merges those two elements, whose merging least increases their sums of squared differences from the mean. Finally, the unsupervised statistical measure, namely Silhouette index, which is based on the comparison of the compactness and separations, is here applied for the first time as an internal validation index for determining the optimum number of zones. The quantitative statistical Silhouette coefficient combines both cohesion and separation validity functions to evaluate individual sampling sites, zones, and the entire set of zones within the study region (Rousseeuw 1987; Kaufman and Rousseeuw 1990, 2005). The following steps explain how to perform zoning validation.

1
For each i_th site, let a_i be the average dissimilarity between i and all other sites within the same zone. We can interpret a_i as how well matched i is to the zone it assigned (the smaller the value, the better the matching).
2
For the i_th site and any zone not containing the site, calculate average dissimilarity of i to all the sites in the given zone. Repeat this for every zone of which i is not a member. Denote the lowest average dissimilarity to i of any such zone by b_i. This value can be seen as the dissimilarity between i and its nearest “neighbor” zone.
3
Finally, the Silhouette index of the i_th site (observation) is defined as:
$$ {\mathrm{SC}}_{\mathrm{i}}=\left({\mathrm{b}}_{\mathrm{i}}-{\mathrm{a}}_{\mathrm{i}}\right)/ \max \left({\mathrm{a}}_{\mathrm{i}},{\mathrm{b}}_{\mathrm{i}}\right) $$
(4)

This can be written as:
$$ S{C}_i=\left\{\begin{array}{ccc}\hfill 1-{a}_i/{b}_i,\hfill & \hfill if\hfill & \hfill {a}_i<{b}_i\hfill \\ {}\hfill 0,\hfill & \hfill if\hfill & \hfill {a}_i={b}_i\hfill \\ {}\hfill {a}_i/{b}_i-1,\hfill & \hfill if\hfill & \hfill {a}_i>{b}_i\hfill \end{array}\right. $$
(5)

From the above definition it is clear that the Silhouette coefficient SC_i can vary between −1.0 and +1.0. If the Silhouette coefficient is close to 1.0, it means that sampling site is “well- partitioned”. If the coefficient is close to zero, it means that site could be assigned to another closest zone as well (i. e. it is on a boundary between zones). If the coefficient is about −1.0, the sampled site is in the wrong zone.

Result and discussion

For a given number of zones, the global or overall average Silhouette coefficient for classification $ \overline{SC} $ is simply the average of Silhouette index over all sampling sites i,

$$ \overline{SC}=1/n{\displaystyle \sum_{i=1}^n}S{C}_i $$

(6)

Where n is the number of observations (sampling sites).

The average Silhouette width is a measure of how tightly clustered all the sites in the zoning are. Thus the average Silhouette width of the entire data set is a measure of how appropriately the data has been partitioned and determines the optimum number of zones in a data set (Rousseeuw 1987; Kaufman and Rousseeuw 2005; Trauwaert et al. 1989). The largest overall average Silhouette score indicates a good zoning (Fig. 2). For current research, the 13-zone model with largest overall average Silhouette score is taken as the optimum number of zones. That is, the highest zone separability is achieved when we use 13 of them. The average Silhouette coefficient of a zone SC_j can be calculated by taking the average of Silhouette coefficients of sampling sites belonging to the given zone.

$$ S{C}_j=1/{m}_j{\displaystyle \sum_{i=1}^{m_j}}S{C}_i^j $$

(7)

Where m_j is the number of sampling sites within the j-th zone, j = 1, 2, …, K zones.

Finally, the overall or global Silhouette coefficient for the full or entire zoning SC can be determined by calculating the average Silhouette coefficient of all zones (Petrović 2006).

$$ SC=1/k{\displaystyle \sum_{j=1}^K}S{C}_j $$

(8)

It is clear to see that both a zone’s Silhouette and the global Silhouette take values between −1.0 and 1.0, both inclusive. The overall Silhouette index value compares all the sampling sites globally (Bizhani and Tarokh 2011). Therefore, local trends and outliers do not hinder locating the true overall average Silhouette coefficient. Pollard and van der Laan (2002) showed that the overall Silhouette index tends to be a global criteria in the sense that it is not necessarily maximized at the level of the partitioning which we would select visually but rather usually higher up in the model. In this study, the zone validity analysis is performed using the statistics toolbox designed for the cluster analysis written under Matlab (Kaufman and Rousseeuw 1990, 2005). As we mentioned, Silhouette method is a comprehensive method based on which we can both find the number of clusters and determine the association of each sample to the created clusters. It should be mentioned that Silhouette method enables us to correct a clustering process by evaluating such samples. More importantly, Silhouette method is not a clustering method; it is a robust evaluation method that gives an opinion about how well the samples are distributed in the clusters. A negative Silhouette value indicates that a sample is wrongly assigned to a cluster. Contrary, a maximized Silhouette value demonstrates that a sample is properly assigned to a specific cluster. Our results show that, unlike other validation methods, the Silhouette method is not dependent on the employed clustering method. Therefore, the Silhouette method evaluates each sample individually while other methods evaluate the entire samples within a cluster as a group. You can produce a simpler and general classification defining the most basic structure of the area under investigation by comparing the zoning classifications globally. The Silhouette plot of the zoning (Fig. 3) shows the Silhouette index values across zones used to establish the 13 optimal geologic zones, each with specific numerical characteristics. It represents the Silhouette of all zones next to each other, so that the quality of zones can be compared.

The Silhouette index values for each zone is a plot of the Silhouette coefficients ranked in decreasing order of all the sampling sites within the same zone. The plot is a horizontal line, whose length is proportional to the Silhouette index value. The Silhouette plot shows which sampling sites lie well within the zone and which ones are somewhere in between zones. A wide Silhouette plot indicates large Silhouette index values and hence a pronounced zone. The height of a zone is simply equal to the number of sampling sites in the zone. The global or overall Silhouette width of 0.26 for the 13 optimal zones was found. As a rule of thumb, the Silhouette coefficient should be around or larger than 0.25 in order to be able to claim that a pattern in the data set has been found (Boroš 2011). Therefore, the Silhouette averages and plots can be used to select the optimal number of zones, assess how well individual observations are classified, and determine the underlying structure of the data set. As a case study of the successful application of the new approach, an optimal 13-zone geologic map of Iran is produced (Fig. 4). The geologic zones are numbered according to their hierarchical order. The following sections describe the features of each of these 13 optimal zones. The specific numerical geophysical and geological characteristics of these zones are presented in Table 2.

Table 2 Geophysical and geological characteristics of 13 automatic integrated self-organized optimum zoning (AISOOZ) map shown in Fig. 4 (units and symbols are shown in Table 1)

Full size table

1)
The Urumiyeh Zone: This zone is characterized by the moderate magnetic field intensity, moderate seismicity, relatively low earthquake energy release, moderate topographic relief, and moderate fault length density.
2)
The Eastern Alborz - Kopeh Dagh Zone: Moderate exposure of Precambrian and metamorphic rocks, moderate magnetic field intensity and topographic relief, moderate seismicity, minimum area of Cenozoic rocks and high average elevation.
3)
The Western Alborz Zone: This zone is distinguished from other zones by low crustal thickness, maximum outcrop of igneous rocks, and high gravity and isostatic anomalies.
4)
The Hashtrud-Natanz Zone: Moderate magnetic field intensity and topographic relief, high percentage area of Precambrian, metamorphic, and igneous rocks exposures. Highest average elevation plays an important role in separating this zone from the other zones.
5)
The Central Alborz Zone: The main geological characteristics of this zone are moderate fault length density, maximum area of Paleozoic rocks exposures, high gravity anomaly, relatively high seismic severity, relatively high average Bouguer anomaly and low average magnetic field intensity.
6)
The Central - East Iran Zone: This zone is differentiated from surrounding areas by the maximum area of Cenozoic rocks exposures, relatively low seismicity, low magnetic field intensity, moderate seismic severity and moderate unconsolidated sediments (Zamani et al. 2013a).
7)
The Sanandaj - Baft Zone: Relatively high crustal thickness, high fault length density, moderate elevation, and moderate seismicity are important features of this zone.
8)
The Zagros Fold - Thrust Zone: This zone is characterized by relatively low maximum isostatic anomaly (Zamani and Farahi 2011), high crustal thickness, moderate seismic activity and moderate elevation.
9)
The Zagros Simply Folded Zone: This zone is differentiated from surrounding regions by moderate gravity anomaly, minimum outcrops of igneous and metamorphic rocks (Zamani and Farahi 2011), low magnetic field intensity and high seismicity. Maximum number of earthquakes occurred in this zone.
10)
The Khozestan Plain Zone: This unfolded zone is characterized by low seismic activity, minimum fault length density, low isostatic anomaly, low topographic relief, and maximum exposures of unconsolidated sediments.
11)
The Minab - Baft Zone: This zone is characterized by the moderate area of Paleozoic rocks, largest percentage area of ophiolite exposures, highest value of residual bouguer anomaly and magnetic field intensity.
12)
The Makran Zone: Moderate regional bouguer anomaly, minimum area of Precambrian rocks exposures, maximum area of sedimentary rocks and high fault length density are the main characteristics of this zone.
13)
The Chabahar Zone: Relatively low elevation and highest value of regional bouguer anomaly, low crustal thickness, low seismicity, and minimum percentage area of Mesozoic rocks exposures are the general characteristics of this zone.

Although there are some similarities between AISOOZ and conventional maps, AISOOZ map (Fig. 4) reveals some remarkable features that could not be found in the conventional maps (i.e. Stöcklin 1968; Stöcklin and Nabavi 1973). These conventional maps which are not only original but also widely used in Iran have been considered for illustrative purposes only. New features revealed by the robust multivariate geological zoning (AISOOZ) method (Fig. 4) include the following: 1- Contrary to the current maps AISOOZ map makes a clear distinction between the Makran Mountains (Zone 12) and the East Iranian Ranges. 2- The extent and rigidity of the disputed microplate in central and eastern parts of Iran (Zone 6) is revealed. 3- The robust multivariate zoning method displays striking similarities between the Eastern Alborz - Kopeh Dagh Zone (Zone 2) and the Zagros Simply Folded Zone (Zone 9) even though these zones are geographically far apart. They probably have gone through similar geologic processes during their geological evolution. 4- On the other hand, the Sanandaj - Baft Zone (Zone 7) and the Zagros Fold - Thrust Zone (Zone 8) that are geographically close are geologically differentiated from each other. This indicates that these zones went through separate paths during their geologic evolution.

What we are primarily concerned with in this manuscript is the mechanics of producing Automatic Integrated Self-Organized Optimum Zoning (AISOOZ) map based on overall similarities from a list of variables within individual zones. This new approach has been applied to zoning tectonic map for illustrative purpose only. It is important to recognize that the geological zoning generated by AISOOZ method is based purely on the geological and geophysical characteristics presented in Table 1. So, the correspondences and differences between AISOOZ and a given zoning based on conventional methods must receive careful thought (I & II).

However, the attributes used could readily be adapted to take additional or alternative data sets. The same approach could also be used to zoning maps of any entity, regardless of its context and/or scale.

Assessment of geoenvironmental zoning

Apart from primary validity, which assures that the zoning classification is representative of the general population, there is a measure of secondary validity. It assesses how well the classification is stable and robust. According to Williams (1967), a classification should be truly stable and robust in that slight alterations of information should not produce major changes in the classification. The clustering is naturally a “blind” process if it does not include any explicit semantic knowledge of the dataset. The most direct approach to evaluating and testing the stability and robustness of the Silhouette validation index is to alter the data set slightly by adding or removing of a few data samples and repeat the classification. For this purpose, the Silhouette validation criterion has also been applied to the data set used in our previous paper (II). This is slightly different than the up-to-date data set with the current research (Table 1). Since the procedure produced the same zoning classification on repeated trials (Fig. 4), one can conclude that the Silhouette validation index is a reliable measure for determining the optimum number of zones. Finally, the same reliability testing was performed on the Wilk’s lambda validation index (II). The resulting plots are presented in Fig. 5.

Comparison of Fig. 5(a) and (b) indicates that the Wilk’s lambda criterion tends to produce a different number of zones with a slight modification of the data set. This is a serious error because information is lost by merging distinct zones. Therefore, in comparison with the Wilk’s lambda criterion, Silhouette index performs better for optimum geoenvironmental zoning because it produces more stable and robust zoning classifications.

Conclusion

Zoning classification provides us with an empirical and objective technique for performing one of the most inherent capabilities for the human- classification. It aims at identifying groups of similar entities or sampling sites and helps to discover distribution of patterns and interesting correlations in the study area. Whether for purposes of simplification, exploration or confirmation, zoning classification is a powerful analytical method that has a wide range of applications in virtually every scientific and technical discipline. Since the availability of massive transactional and experimental data sets, validating the zoning results has become an important topic in the context of the Earth and environmental sciences. Having decided on a suitable classification method, consideration needs to be given to how to find a partition into groups that optimizes the number of zones. Because there is no internal statistical criterion used for inference, researchers have developed several criteria and guidelines for approaching the problem (Everitt and Dunn 2010). These criteria are ad hoc procedures and must be computed by the researcher after the experiment is completed. When applied to the results of zoning classifications, these procedures are sometimes referred to as stopping rules. In this paper Ward unsupervised classification method is used to produce geological zoning maps. After that, the Silhouette validation criterion has been applied for the first time as stopping rule to measure the relative usefulness of geological zoning maps. By comparison the sampling site globally and ignoring local dependencies, the average Silhouette index of the zoning classifications as a whole can determine the optimal or natural number of geologic zones within the study area. This criterion, eventually led to an AISOOZ map displaying 13 optimal geologic zone map of Iran. Finally, a comparative study has been performed on the geological zoning classifications produced by the Silhouette validation coefficient and that of the Wilk’s lambda criterion. In comparison with the Wilk’s lambda, the geological zoning produced by the Silhouette criterion is more reliable in that the resulting geological classification is not disturbed by the adding or removing a few sites or attributes to the data set. Therefore, this criterion could be built into computer programs as validation index or stopping rule to assess natural or optimum number of zones, each with specific numerical characteristics. AISOOZ maps not only can be reevaluated quite often, but also provides us with a means to create online as well as offline interactive zoning maps in the Earth and environmental sciences. The attributes utilized could easily be adapted to take additional or alternative data sets. Therefore, AISOOZ approach could also be used to zoning maps of any entity, regardless of its context and/or scale.

References

Ashok Kumar D, Kannathasan N (2011) A survey on data mining and pattern recognition techniques for soil data mining. Int J Comput Sci 8:422–428
Google Scholar
Bizhani M, Tarokh MJ (2011) Behavioral rules of bank’s point-of-sale for segments description and scoring prediction. Int J Ind Eng Comput 2:337–350
Google Scholar
Boroš M (2011) Cluster analysis, methodology and statistics. University of Groningen. World Wide Web Address: WWW.let.rug.nl/~nerbonne
Chaimontree S, Atkinson K, Coenen F (2010) Best Zoneing configuration metrics: towards multi- agent based zoning. ADMA, proceedings of the 6th in international conference on advanced data mining and applications: part 1. Springer 6440:48–59
Google Scholar
Dehghani GA, Makris J (1984) The gravity field and crustal structure of Iran. N Jahrbuchfür Geol Paläontol 168:215–229
Google Scholar
Engdahl ER, Jackson JA, Myers SC, Bergman EA, Priestley K (2006) Relocation and assessment of seismicity in the Iran region. Geophys J Int 167:761–778
Article Google Scholar
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. International Conference on Data Mining, San Francisco, California, USA
Everitt BS, Dunn G (2010) Applied multivariate data analysis, 2nd edn. Wiley, New York, p 354
Google Scholar
Grira N, Crucianu M, Boujemaa N (2005) Techniques for processing multimedia content. Report of the MUSCLE European Network of Excellence (6th Framework Programme)
Gutenberg B, Richter CF (1954) Seismicity of the earth and associated phenomena. Princeton University Press, New Jersey
Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17:107–145
Article Google Scholar
ISC: International Seismological Centre (2011) On-line Bulletin. http://www.isc.ac.uk, Internatl. Seis. Cent., Thatcham, United Kingdom
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc
Jegatha Deborah L, Baskaran R, Kannan A (2010) A survey on internal validity measure for cluster validation. Int J Comput Sci Eng Surv (IJCSES) 1(fir):85–102
Article Google Scholar
Kaufman L, Rousseeuw PJ (1990) Clustering large data sets. In: Gelsema ES, Kanal LN (eds) Pattern recognition in practice II. Elsevier, North Holland, pp 425–437
Google Scholar
Kaufman L, Rousseeuw PJ (2005) Finding groups in data: An introduction to cluster analysis. Wiley-Interscience, New York, p 368
Google Scholar
National atlas of thematic geological & exploration maps (2004) Geological Survey of Iran, Ministry of Industries and Mines
NEIC: National Earthquake Information Center (2011) Earthquake information bulletin Colorado, USA. http://earthquake.usgs.gov/regional/neic/
Petrović S (2006) Proceedings of the 11th Nordic workshop on secure IT-systems. NORDSEC, Linkoping, pp 53–64
Google Scholar
Pollard KS, van der Laan MJ (2002) A method to identify significant clusters in gene expression data. Paper 107, division of biostatistics working paper series. University of California, Berkeley
Google Scholar
Rousseeuw PJ (1987) Silhouettes, a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
SRTM: Hole-filled seamless SRTM data V, International Centre for Tropical Agriculture (CIAT) available from http://gisweb.ciat.cgiar.org/sig/90m_data_tropics.htm (2008)
Stöcklin J (1968) Structural history and tectonics of Iran: a review. Am Assoc Petrol Geol Bull 52:1229–1258
Google Scholar
Stöcklin J, Nabavi MH (1973)1∕2,500,000 sheet, tectonic map of Iran. Geological Survey of Iran
Tan PN, Steinbach M, Kumar V (2005) Introduction to Data Mining. Addison-Wesley; 1 edition, ISBN-13: 9780321321367
Tan P, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison-Wesley, pp. 769
Trauwaert E, Rousseeuw PJ, Kaufman L (1989) Some silhouette-based graphics for clustering interpretation. Belg J Oper Res Stat Comp Sci 29(3):35–55
Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Article Google Scholar
Williams WT (1967) Numbers, taxonomy, and judgment. Bot Rev 33:379–386
Article Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining, practical machine learning tools and techniques. 3rd ed., Elsevier, pp. 629
World Digital Magnetic Anomaly Map (WDMAM) (2007) UNESCO Commission
Zamani A, Farahi Ghasre Aboonasr S (2011) The significance of parameters used for self-organized tectonic zoning of Iran. Sci Q J Geosci 79:165–170
Google Scholar
Zamani A, Hashemi N (2004) Computer-based self-organized tectonic zoning: a tentative pattern recognition for Iran. Comput Geosci 30:705–718
Article Google Scholar
Zamani A, Nedaei M, Boostani R (2009) Tectonic zoning of Iran based on self organizing map. J Appl Sci 9:4099–4114
Article Google Scholar
Zamani A, Khalili M, Gerami A (2011) Computer-based self-organized tectonic zoning revisited: Scientific criterion for determining the optimum number of zones. Tectonophysics 510(1):207–216
Article Google Scholar
Zamani A, Sami A, Khalili M (2012) Multivariate rule—based seismicity map of Iran: a data—driven model. Bull Earthq Eng 10:1667–1683
Article Google Scholar
Zamani A, Samiee J, Kirby JF (2013a) Estimating the mechanical anisotropy of the Iranian lithosphere using the wavelet coherence method. Tectonophysics 601:139–147
Article Google Scholar
Zamani A, Sorbi MR, Safavi AA (2013b) Application of neural network and ANFIS model for earthquake occurrence in Iran. Earth Sci Inform 6(2):71–85
Article Google Scholar
Zhu L, Lin C, Huang H, Chen Y, Yuille AL (2008) Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. Proc. Of the European Conference on Computer Vision (ECCV)
Zhu L, Chen Y, Yuille A (2010) Latent hierarchical structure learning for object detection. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)

Download references

Acknowledgments

This work was supported by the Center of Excellence for Environmental Geohazards, and the Research Council of Shiraz University. One of the authors (S.F.) is grateful to the Ministry of Science, Research and Technology of Iran and Iranian National Institute of Oceanography for financial support.

Author information

Authors and Affiliations

Department of Earth Sciences, College of Sciences, Shiraz University, Shiraz, Iran
Ahmad Zamani & Sedigheh Farahi
Department of Computer Science and IT, Faculty of Engineering, Shiraz University, Shiraz, Iran
Reza Boostani & Hamed Hassani-Saadi

Authors

Ahmad Zamani
View author publications
You can also search for this author in PubMed Google Scholar
Sedigheh Farahi
View author publications
You can also search for this author in PubMed Google Scholar
Reza Boostani
View author publications
You can also search for this author in PubMed Google Scholar
Hamed Hassani-Saadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Zamani.

Additional information

Communicated by: H. A. Babaie

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zamani, A., Farahi, S., Boostani, R. et al. Geoenvironmental zoning evaluation and optimization. Earth Sci Inform 8, 583–593 (2015). https://doi.org/10.1007/s12145-014-0184-0

Download citation

Received: 26 March 2014
Accepted: 06 October 2014
Published: 19 October 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s12145-014-0184-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Geoenvironmental zoning evaluation and optimization

Abstract

Similar content being viewed by others

Mineral Occurrence Target Mapping: A General Iterative Strategy in Prediction Modeling for Mineral Exploration

GEOstats: an excel-based data analysis program applying basic principles of statistics for geological studies

Introducing the coupled stepwise areal constraining and Mahalanobis distance: a promising MCDM-based probabilistic model for landfill site selection

Introduction

Method of analysis

Data analysis

Result and discussion

Assessment of geoenvironmental zoning

Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Geoenvironmental zoning evaluation and optimization

Abstract

Similar content being viewed by others

Mineral Occurrence Target Mapping: A General Iterative Strategy in Prediction Modeling for Mineral Exploration

GEOstats: an excel-based data analysis program applying basic principles of statistics for geological studies

Introducing the coupled stepwise areal constraining and Mahalanobis distance: a promising MCDM-based probabilistic model for landfill site selection

Explore related subjects

Introduction

Method of analysis

Data analysis

Result and discussion

Assessment of geoenvironmental zoning

Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation