Introduction

One of the important capabilities of the human brain is its ability to classify entities based on the characteristics that the brain either recognizes or judges to be of importance. However, conventional methods of classification are characterized by two deficiencies. The first one is the large uncertainty involved in classification based on subjective analysis. Failure to visually interpret accurately a massive amount of data is the second. Automatic Integrated Self-Organized Optimum Zoning (AISOOZ) provides researchers with an empirical and objective method for performing this natural task for humans (Zamani and Hashemi 2004; Zamani et al. 2011, hereafter referred to as I, and II). Typically, the characteristics of the Earth and its natural phenomena are not only correlated with each other, but each characteristic is also influenced by other characteristics. Thus, in many instances the characteristics are interwoven in such a way that when analyzed individually, they produce little information about the Earth and its environment. Furthermore, the development of more sophisticated technology to collect information about the area under investigation has outpaced human’s ability to use it to full potential. Today, Earth scientists have access to large quantities of data which contain far more useful information about the Earth and its environment than can be extracted by conventional methods. As the pace of decision making increases, researchers must also become more acquainted with new “field” of data analysis. Sifting through massive amount of data requires both statistical reduction and the ability to compute theoretical solution in Earth models with many parameters (I, II). Data mining (sometimes called data surfing or knowledge—discovery) is defined as the process of automatically collecting and searching through large amount of data in a database (Witten et al. 2011; Zamani et al. 2009, 2012, 2013b). As the term suggests, this new perspective of data analysis has a somewhat more exploratory rather than confirmatory nature. This technique is directed toward searching deeply into the characteristics of the large databases for patterns and relationships. Data mining uses artificial intelligence techniques, neural networks, and advanced statistical methods, to reveal trends, patterns, and relationships, which might otherwise have remained obscured by the complex patterns of association and massive amount of data. To do this, data mining uses computational techniques from statistics and pattern recognition (Zamani et al. 2012). Such data processing makes large volume of data digestible, discovers hidden rules underlying the data, and facilitates deduction of patterns and trends. Because of such properties, from a practical perspective, automated pattern recognition plays an essential role in many fields such as scientific data exploration, information retrieval and text mining, business, census, and education (Ashok Kumar and Kannathasan 2011). Among various data mining methods, statistical method of hierarchical classification is an important tool for “unsupervised” learning procedures—the problem of finding groups in data set without respect to external information. Unsupervised evaluation measures used in these procedures are often called internal indices because they use only information present in the data set. Whereas supervised learning procedures use some external evaluation measures often called external indices. In the context of data mining and pattern recognition hierarchical classification divides data set into groups by automatic analytical sorting techniques. Therefore, entities of same group are more similar than entities of different groups at different levels of partitioning. Nonhierarchical techniques assign objects or entities into groups once the number of groups to be formed is specified before any classification is accomplished. The procedure begins by selecting class seeds, which are used as initial guesses of the means of the classes. Class seeds can be specified by the researcher or selected randomly from all observations (i. e. objects or entities). Assuming n sampled sites in the study area, the first level is a partition of n sites into n groups each one containing one site. Therefore, at level n all sites form one group. If zones in the same groups stay together at higher levels, then the sequence is called a hierarchical classification (Zhu et al. 2008, 2010). As a result a series of nested partitions are generated, from the trivial partition with a single zone (containing all sampled sites) to the trivial partition where each sampling site is a zone. In between these two extremes are many candidate partitions or zonings that the researcher will have to choose from. The problem of determining the appropriate number of zones in an area is relevant across a wide variety of disciplines such as business, social sciences, biological sciences, Earth sciences, medicine, and engineering to name among others. Therefore, the use of a hierarchical classification method requires a decision regarding the stage or level which best reproduces the underlying structure. Since there is no internal statistical criterion used for inference, researchers have developed a number of criteria and guidelines for approaching the problem. These techniques that are sometimes referred to as stopping rules are ad hoc procedures and must be computed by the researchers. The resulting pattern or underlying structure could be considered to consist of “natural” or optimal zones in a given area. This underlying structure exhibits the properties of internal cohesion (compactness) of the sampled sites in a zone and external isolation (separation) of a zone from other zones (Halkidi et al. 2001; Tan et al. 2006). The interpretation of the resulting pattern depends very much on the correct choice of the number of zones. Therefore, to achieve such goal as selecting the “best” or optimal number of zones based on the distribution of zoning criterion, a stopping rule is desired (Chaimontree et al. 2010). Virtually all zoning procedures provide little if any information as to the number of zones present in the study area. Silhouette criterion is a zone validity index that is used to judge the quality of any zoning solution (partition). It reflects the relative similarity cohesion within zones and separation between zones. This criterion has gained popularity by showing more stability and robustness with the multivariate zoning classification. To illustrate the successful application of AISOOZ, Ward’s classification method and Silhouette validation criterion were used to produce the optimal comprehensive (multivariate) geologic zoning (AISOOZ) map of Iran as a case study. While there are similarities between AISOOZ map and conventional (current) geologic maps, the comprehensive optimal zoning map reveals some remarkable features not found in the conventional maps. Finally, a comparative performance study of Wilk’s lambda test statistic (II) and the Silhouette validation index for determining the optimum number of geological zones was conducted.

Method of analysis

Unsupervised hierarchical classification procedures are among the best known pattern recognition techniques (Grira et al. 2005). These procedures are needed to seek and separate zones with similar pattern in the dataset at different levels. In supervised classification, the validation of the resulting classification model is an integral part of the procedure of developing a classification model. However, because of its very nature, unsupervised zoning validation is not a commonly used part of classification analysis. Let us consider a hierarchical classification of n sampled sites or entities into k zones. The first level is a partition into n singletons, each one containing one site. The next level is a partition into n-one zones, and so on until all the sampled sites form one zone. At level m in the sequence the numbers of zone(s) is c = n-m + 1. Therefore, level one corresponds to n singletons, each containing one site and level n in which all the sites form one zone (Zhu et al. 2008, 2010). Hierarchical classification is divided into agglomerative (bottom up or clumping) and divisive (top down or splitting) methods. Agglomerative methods begin with n singleton and the process is repeated by successively merging zones until all sites finally join together in a single zone. Divisive methods begin with one zone containing all the sampled sites, which in succeeding steps, split into smaller zones until each site is in a separate zone (singleton). These procedures are the opposite of the agglomerative procedures. Given a dataset, one of the major challenges to be solved in a classification process is to choose the classification model that could be more appropriate and explain better the underlying structure of data set (Ertöz et al. 2003). Another important objective in classification is to assess the “natural” or optimum level of partitioning in a given dataset, which is even more challenging when no response variable or classification model is available. Virtually all classification methods provide little if any information as to the optimum level of partitioning in the dataset. So, in this paper a new approach based on the application of Ward’s minimum variance method for hierarchical classification and Silhouette validation criterion (also known as Silhouette width) is proposed to construct optimum geoenvironmental zoning maps. Ward’s minimum variance method (Ward 1963) minimizes the loss of information with each geoenvironmental zone. Information loss is expressed in terms of a within-error sum-of-squares criterion, ESS, which is a built-in dissimilarity criterion (I). The Silhouette statistics is an internal criterion for unsupervised zoning validity, which combines both cohesion (compactness) and separation for determining classification evaluation (Fig. 1). It implies the assessment of zone coherence by comparing inter-to intra-zone variability (Halkidi et al. 2001; Tan et al. 2006). The cohesion and separation functions can be expressed using Eqs. (1) and (2), respectively (Jegatha Deborah et al. 2010; Tan et al. 2006).

Fig. 1
figure 1

Graphical representation of the cohesion and separation based on the Silhouette criterion (Jain and Dubes 1988; Tan et al. 2005)

$$ cohesion\left({C}_i\right)={\displaystyle \sum_{\begin{array}{c}\hfill x\varepsilon {c}_i\hfill \\ {}\hfill y\varepsilon {c}_i\hfill \end{array}}} proximity\left(x,y\right) $$
(1)
$$ separation\left({C}_i,{C}_j\right)={\displaystyle \sum_{\begin{array}{c}\hfill x\varepsilon {c}_i\hfill \\ {}\hfill y\varepsilon {c}_i\hfill \end{array}}} proximity\left(x,y\right) $$
(2)

The evaluation function can be a similarity, dissimilarity, or some combination of these quantities. Equations 1 and 2 give us some well-defined measures of classification evaluation that can be combined into an overall measure of classification evaluation for a set of k zones, using a weighted sum of the validity of individual zones (Eq. 3).

$$ overall validity={\displaystyle \sum_{i=1}^k}{w}_i validity\left({C}_i\right) $$
(3)

The cohesion and separation validity functions used in the overall validation of a group of zones (Eq. 3) also can be applied to evaluate individual zones or objects. This information can be used to improve the quality of a classification process. The contribution of individual entities or sites within a zone also can be evaluated, using cohesion and separation validity functions (Jegatha Deborah et al. 2010; Tan et al. 2006). Objects that contribute more to the cohesion and separation are near the “interior” of the zone. On the other hand, items that contribute less to the cohesion are near the “edge” of the zone. The Silhouette validation criterion is a dimensionless stopping rule that has been used in a wide variety of disciplines for assessing the “natural” numbers of partitions (Rousseeuw 1987; Kaufman and Rousseeuw 1990, 2005). Its application for geoenvironmental zoning and patterns discovery in the Earth and environmental sciences is a novel approach to this problem.

Data analysis

To illustrate the successful application of AISOOZ, geological zoning map of Iran has been produced as a case study. For this purpose, large numbers of new and updated geophysical and geological attributes of Iran (Table 1) have been compiled for the 175 quadrangles sites of 1° areas.

Table 1 Characteristics used for producing automatic integrated self-organized optimum zoning (AISOOZ) map, measured within 1° quadrangular sitesa

The quadrangles from west to east are numbered beginning with 1 for the quadrangle between 44° E and 45° E meridians increasing to175 for the quadrangle between61° E and 62° E meridians (I, II). None of offshore Iran is included in the dataset. These quadrangles are used as items or observations (input samples). Each sampling site has been characterized by 49 attributes (Table 1) that seem to characterize the intensity and degree of contrast between tectonic, neotectonic, and seismotectonic characteristics of Iran. The attributes have all been standardized to have a mean of zero and a standard deviation of one. This removes the bias introduced by the differences in the scales of the attributes and ensures all have equal weight in the calculation of similarity. The normalized data set is then classified using Ward’s minimum variance for hierarchical classification. This procedure which is more intuitive and computationally efficient minimizes information loss associated with zoning classification and creates small and even sized zones (I, II). At each step it merges those two elements, whose merging least increases their sums of squared differences from the mean. Finally, the unsupervised statistical measure, namely Silhouette index, which is based on the comparison of the compactness and separations, is here applied for the first time as an internal validation index for determining the optimum number of zones. The quantitative statistical Silhouette coefficient combines both cohesion and separation validity functions to evaluate individual sampling sites, zones, and the entire set of zones within the study region (Rousseeuw 1987; Kaufman and Rousseeuw 1990, 2005). The following steps explain how to perform zoning validation.

  1. 1

    For each ith site, let ai be the average dissimilarity between i and all other sites within the same zone. We can interpret ai as how well matched i is to the zone it assigned (the smaller the value, the better the matching).

  2. 2

    For the ith site and any zone not containing the site, calculate average dissimilarity of i to all the sites in the given zone. Repeat this for every zone of which i is not a member. Denote the lowest average dissimilarity to i of any such zone by bi. This value can be seen as the dissimilarity between i and its nearest “neighbor” zone.

  3. 3

    Finally, the Silhouette index of the ith site (observation) is defined as:

    $$ {\mathrm{SC}}_{\mathrm{i}}=\left({\mathrm{b}}_{\mathrm{i}}-{\mathrm{a}}_{\mathrm{i}}\right)/ \max \left({\mathrm{a}}_{\mathrm{i}},{\mathrm{b}}_{\mathrm{i}}\right) $$
    (4)

    This can be written as:

    $$ S{C}_i=\left\{\begin{array}{ccc}\hfill 1-{a}_i/{b}_i,\hfill & \hfill if\hfill & \hfill {a}_i<{b}_i\hfill \\ {}\hfill 0,\hfill & \hfill if\hfill & \hfill {a}_i={b}_i\hfill \\ {}\hfill {a}_i/{b}_i-1,\hfill & \hfill if\hfill & \hfill {a}_i>{b}_i\hfill \end{array}\right. $$
    (5)

From the above definition it is clear that the Silhouette coefficient SCi can vary between −1.0 and +1.0. If the Silhouette coefficient is close to 1.0, it means that sampling site is “well- partitioned”. If the coefficient is close to zero, it means that site could be assigned to another closest zone as well (i. e. it is on a boundary between zones). If the coefficient is about −1.0, the sampled site is in the wrong zone.

Result and discussion

For a given number of zones, the global or overall average Silhouette coefficient for classification \( \overline{SC} \) is simply the average of Silhouette index over all sampling sites i,

$$ \overline{SC}=1/n{\displaystyle \sum_{i=1}^n}S{C}_i $$
(6)

Where n is the number of observations (sampling sites).

The average Silhouette width is a measure of how tightly clustered all the sites in the zoning are. Thus the average Silhouette width of the entire data set is a measure of how appropriately the data has been partitioned and determines the optimum number of zones in a data set (Rousseeuw 1987; Kaufman and Rousseeuw 2005; Trauwaert et al. 1989). The largest overall average Silhouette score indicates a good zoning (Fig. 2). For current research, the 13-zone model with largest overall average Silhouette score is taken as the optimum number of zones. That is, the highest zone separability is achieved when we use 13 of them. The average Silhouette coefficient of a zone SCj can be calculated by taking the average of Silhouette coefficients of sampling sites belonging to the given zone.

Fig. 2
figure 2

Overall average Silhouette coefficient versus number of zones for the current research. There is a distinct peak in the Silhouette coefficient when the number of zones is equal to 13. This suggests 13-zone map is best, since it has the largest overall average Silhouette coefficient

$$ S{C}_j=1/{m}_j{\displaystyle \sum_{i=1}^{m_j}}S{C}_i^j $$
(7)

Where mj is the number of sampling sites within the j-th zone, j = 1, 2, …, K zones.

Finally, the overall or global Silhouette coefficient for the full or entire zoning SC can be determined by calculating the average Silhouette coefficient of all zones (Petrović 2006).

$$ SC=1/k{\displaystyle \sum_{j=1}^K}S{C}_j $$
(8)

It is clear to see that both a zone’s Silhouette and the global Silhouette take values between −1.0 and 1.0, both inclusive. The overall Silhouette index value compares all the sampling sites globally (Bizhani and Tarokh 2011). Therefore, local trends and outliers do not hinder locating the true overall average Silhouette coefficient. Pollard and van der Laan (2002) showed that the overall Silhouette index tends to be a global criteria in the sense that it is not necessarily maximized at the level of the partitioning which we would select visually but rather usually higher up in the model. In this study, the zone validity analysis is performed using the statistics toolbox designed for the cluster analysis written under Matlab (Kaufman and Rousseeuw 1990, 2005). As we mentioned, Silhouette method is a comprehensive method based on which we can both find the number of clusters and determine the association of each sample to the created clusters. It should be mentioned that Silhouette method enables us to correct a clustering process by evaluating such samples. More importantly, Silhouette method is not a clustering method; it is a robust evaluation method that gives an opinion about how well the samples are distributed in the clusters. A negative Silhouette value indicates that a sample is wrongly assigned to a cluster. Contrary, a maximized Silhouette value demonstrates that a sample is properly assigned to a specific cluster. Our results show that, unlike other validation methods, the Silhouette method is not dependent on the employed clustering method. Therefore, the Silhouette method evaluates each sample individually while other methods evaluate the entire samples within a cluster as a group. You can produce a simpler and general classification defining the most basic structure of the area under investigation by comparing the zoning classifications globally. The Silhouette plot of the zoning (Fig. 3) shows the Silhouette index values across zones used to establish the 13 optimal geologic zones, each with specific numerical characteristics. It represents the Silhouette of all zones next to each other, so that the quality of zones can be compared.

Fig. 3
figure 3

The plot of Silhouette index values across zone used to establish the 13 optimal geologic zoning map of Iran. The Silhouette plot shows the sorted Silhouette indices (x-axis) for all sampling sites in each zone (y-axis). The global Silhouette coefficient of the zoning is the average of the zones’ Silhouette widths

The Silhouette index values for each zone is a plot of the Silhouette coefficients ranked in decreasing order of all the sampling sites within the same zone. The plot is a horizontal line, whose length is proportional to the Silhouette index value. The Silhouette plot shows which sampling sites lie well within the zone and which ones are somewhere in between zones. A wide Silhouette plot indicates large Silhouette index values and hence a pronounced zone. The height of a zone is simply equal to the number of sampling sites in the zone. The global or overall Silhouette width of 0.26 for the 13 optimal zones was found. As a rule of thumb, the Silhouette coefficient should be around or larger than 0.25 in order to be able to claim that a pattern in the data set has been found (Boroš 2011). Therefore, the Silhouette averages and plots can be used to select the optimal number of zones, assess how well individual observations are classified, and determine the underlying structure of the data set. As a case study of the successful application of the new approach, an optimal 13-zone geologic map of Iran is produced (Fig. 4). The geologic zones are numbered according to their hierarchical order. The following sections describe the features of each of these 13 optimal zones. The specific numerical geophysical and geological characteristics of these zones are presented in Table 2.

Fig. 4
figure 4

Distinct Automatic Integrated Self-Organized Optimum Zoning (AISOOZ) of Iran based on Ward’s minimum variance method for hierarchical classification and Silhouette validation criterion. Zoning has been made purely by reference to specified geophysical and geological characteristics shown in Table 1. The geologic zones are numbered according to their hierarchical order. The Corners of zones on map have been rounded slightly for cosmetic reason

Table 2 Geophysical and geological characteristics of 13 automatic integrated self-organized optimum zoning (AISOOZ) map shown in Fig. 4 (units and symbols are shown in Table 1)
  1. 1)

    The Urumiyeh Zone: This zone is characterized by the moderate magnetic field intensity, moderate seismicity, relatively low earthquake energy release, moderate topographic relief, and moderate fault length density.

  2. 2)

    The Eastern Alborz - Kopeh Dagh Zone: Moderate exposure of Precambrian and metamorphic rocks, moderate magnetic field intensity and topographic relief, moderate seismicity, minimum area of Cenozoic rocks and high average elevation.

  3. 3)

    The Western Alborz Zone: This zone is distinguished from other zones by low crustal thickness, maximum outcrop of igneous rocks, and high gravity and isostatic anomalies.

  4. 4)

    The Hashtrud-Natanz Zone: Moderate magnetic field intensity and topographic relief, high percentage area of Precambrian, metamorphic, and igneous rocks exposures. Highest average elevation plays an important role in separating this zone from the other zones.

  5. 5)

    The Central Alborz Zone: The main geological characteristics of this zone are moderate fault length density, maximum area of Paleozoic rocks exposures, high gravity anomaly, relatively high seismic severity, relatively high average Bouguer anomaly and low average magnetic field intensity.

  6. 6)

    The Central - East Iran Zone: This zone is differentiated from surrounding areas by the maximum area of Cenozoic rocks exposures, relatively low seismicity, low magnetic field intensity, moderate seismic severity and moderate unconsolidated sediments (Zamani et al. 2013a).

  7. 7)

    The Sanandaj - Baft Zone: Relatively high crustal thickness, high fault length density, moderate elevation, and moderate seismicity are important features of this zone.

  8. 8)

    The Zagros Fold - Thrust Zone: This zone is characterized by relatively low maximum isostatic anomaly (Zamani and Farahi 2011), high crustal thickness, moderate seismic activity and moderate elevation.

  9. 9)

    The Zagros Simply Folded Zone: This zone is differentiated from surrounding regions by moderate gravity anomaly, minimum outcrops of igneous and metamorphic rocks (Zamani and Farahi 2011), low magnetic field intensity and high seismicity. Maximum number of earthquakes occurred in this zone.

  10. 10)

    The Khozestan Plain Zone: This unfolded zone is characterized by low seismic activity, minimum fault length density, low isostatic anomaly, low topographic relief, and maximum exposures of unconsolidated sediments.

  11. 11)

    The Minab - Baft Zone: This zone is characterized by the moderate area of Paleozoic rocks, largest percentage area of ophiolite exposures, highest value of residual bouguer anomaly and magnetic field intensity.

  12. 12)

    The Makran Zone: Moderate regional bouguer anomaly, minimum area of Precambrian rocks exposures, maximum area of sedimentary rocks and high fault length density are the main characteristics of this zone.

  13. 13)

    The Chabahar Zone: Relatively low elevation and highest value of regional bouguer anomaly, low crustal thickness, low seismicity, and minimum percentage area of Mesozoic rocks exposures are the general characteristics of this zone.

Although there are some similarities between AISOOZ and conventional maps, AISOOZ map (Fig. 4) reveals some remarkable features that could not be found in the conventional maps (i.e. Stöcklin 1968; Stöcklin and Nabavi 1973). These conventional maps which are not only original but also widely used in Iran have been considered for illustrative purposes only. New features revealed by the robust multivariate geological zoning (AISOOZ) method (Fig. 4) include the following: 1- Contrary to the current maps AISOOZ map makes a clear distinction between the Makran Mountains (Zone 12) and the East Iranian Ranges. 2- The extent and rigidity of the disputed microplate in central and eastern parts of Iran (Zone 6) is revealed. 3- The robust multivariate zoning method displays striking similarities between the Eastern Alborz - Kopeh Dagh Zone (Zone 2) and the Zagros Simply Folded Zone (Zone 9) even though these zones are geographically far apart. They probably have gone through similar geologic processes during their geological evolution. 4- On the other hand, the Sanandaj - Baft Zone (Zone 7) and the Zagros Fold - Thrust Zone (Zone 8) that are geographically close are geologically differentiated from each other. This indicates that these zones went through separate paths during their geologic evolution.

What we are primarily concerned with in this manuscript is the mechanics of producing Automatic Integrated Self-Organized Optimum Zoning (AISOOZ) map based on overall similarities from a list of variables within individual zones. This new approach has been applied to zoning tectonic map for illustrative purpose only. It is important to recognize that the geological zoning generated by AISOOZ method is based purely on the geological and geophysical characteristics presented in Table 1. So, the correspondences and differences between AISOOZ and a given zoning based on conventional methods must receive careful thought (I & II).

However, the attributes used could readily be adapted to take additional or alternative data sets. The same approach could also be used to zoning maps of any entity, regardless of its context and/or scale.

Assessment of geoenvironmental zoning

Apart from primary validity, which assures that the zoning classification is representative of the general population, there is a measure of secondary validity. It assesses how well the classification is stable and robust. According to Williams (1967), a classification should be truly stable and robust in that slight alterations of information should not produce major changes in the classification. The clustering is naturally a “blind” process if it does not include any explicit semantic knowledge of the dataset. The most direct approach to evaluating and testing the stability and robustness of the Silhouette validation index is to alter the data set slightly by adding or removing of a few data samples and repeat the classification. For this purpose, the Silhouette validation criterion has also been applied to the data set used in our previous paper (II). This is slightly different than the up-to-date data set with the current research (Table 1). Since the procedure produced the same zoning classification on repeated trials (Fig. 4), one can conclude that the Silhouette validation index is a reliable measure for determining the optimum number of zones. Finally, the same reliability testing was performed on the Wilk’s lambda validation index (II). The resulting plots are presented in Fig. 5.

Fig. 5
figure 5

Showing how the application of Wilk’s lambda criterion as stopping rule may result in different optimal number of geologic zones due to the slight alteration of the data samples a Wilk’s lambda discrepancy, calculated for the data set used in our previous paper (Zamani et al. 2011). b Wilk’s lambda discrepancy calculated for the up-to-date data from current research. This indicates that in comparison with Wilk’s lambda stopping criterion the Silhouette index is a more reliable and robust criterion

Comparison of Fig. 5(a) and (b) indicates that the Wilk’s lambda criterion tends to produce a different number of zones with a slight modification of the data set. This is a serious error because information is lost by merging distinct zones. Therefore, in comparison with the Wilk’s lambda criterion, Silhouette index performs better for optimum geoenvironmental zoning because it produces more stable and robust zoning classifications.

Conclusion

Zoning classification provides us with an empirical and objective technique for performing one of the most inherent capabilities for the human- classification. It aims at identifying groups of similar entities or sampling sites and helps to discover distribution of patterns and interesting correlations in the study area. Whether for purposes of simplification, exploration or confirmation, zoning classification is a powerful analytical method that has a wide range of applications in virtually every scientific and technical discipline. Since the availability of massive transactional and experimental data sets, validating the zoning results has become an important topic in the context of the Earth and environmental sciences. Having decided on a suitable classification method, consideration needs to be given to how to find a partition into groups that optimizes the number of zones. Because there is no internal statistical criterion used for inference, researchers have developed several criteria and guidelines for approaching the problem (Everitt and Dunn 2010). These criteria are ad hoc procedures and must be computed by the researcher after the experiment is completed. When applied to the results of zoning classifications, these procedures are sometimes referred to as stopping rules. In this paper Ward unsupervised classification method is used to produce geological zoning maps. After that, the Silhouette validation criterion has been applied for the first time as stopping rule to measure the relative usefulness of geological zoning maps. By comparison the sampling site globally and ignoring local dependencies, the average Silhouette index of the zoning classifications as a whole can determine the optimal or natural number of geologic zones within the study area. This criterion, eventually led to an AISOOZ map displaying 13 optimal geologic zone map of Iran. Finally, a comparative study has been performed on the geological zoning classifications produced by the Silhouette validation coefficient and that of the Wilk’s lambda criterion. In comparison with the Wilk’s lambda, the geological zoning produced by the Silhouette criterion is more reliable in that the resulting geological classification is not disturbed by the adding or removing a few sites or attributes to the data set. Therefore, this criterion could be built into computer programs as validation index or stopping rule to assess natural or optimum number of zones, each with specific numerical characteristics. AISOOZ maps not only can be reevaluated quite often, but also provides us with a means to create online as well as offline interactive zoning maps in the Earth and environmental sciences. The attributes utilized could easily be adapted to take additional or alternative data sets. Therefore, AISOOZ approach could also be used to zoning maps of any entity, regardless of its context and/or scale.