Agglomerative Fuzzy Clustering

Borgelt, Christian; Kruse, Rudolf

doi:10.1007/978-3-319-42972-4_9

Christian Borgelt⁹ &
Rudolf Kruse⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 456))

Included in the following conference series:

International Conference on Soft Methods in Probability and Statistics

1720 Accesses
1 Citations

Abstract

The term fuzzy clustering usually refers to prototype-based methods that optimize an objective function in order to find a (fuzzy) partition of a given data set and are inspired by the classical c-means clustering algorithm. Possible transfers of other classical approaches, particularly hierarchical agglomerative clustering, received much less attention as starting points for developing fuzzy clustering methods. In this chapter we strive to improve this situation by presenting a (hierarchical) agglomerative fuzzy clustering algorithm. We report experimental results on two well-known data sets on which we compare our method to classical hierarchical agglomerative clustering.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Fuzzy Agglomerative Clustering

Contributions of Fuzzy Concepts to Data Clustering

Fuzzy c-Means Clustering Strategies: A Review of Distance Measures

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The objective of clustering or cluster analysis is to divide a data set into groups (so-called clusters) in such a way that data points in the same cluster are as similar as possible and data points from different clusters are as dissimilar as possible (see, e.g., [5, 10]), where the notion of similarity is often formalized by defining a distance measure for the data points. Even in classical clustering the resulting grouping need not be a partition (that is, in some approaches not all data points need to be assigned to a group and the formed groups may overlap), but only if points are assigned to different groups with different degrees of membership, one arrives at fuzzy clustering [2, 3, 8, 14].

However, the term fuzzy clustering usually refers to a fairly limited set of methods, which are prototype-based and optimize some objective function to find a good (fuzzy) partition of the given data. Although classical clustering comprises many more methods than the well-known c-means algorithm (by which most fuzzy clustering approaches are inspired), these other methods are only rarely “fuzzified”. This is particularly true for hierarchical agglomerative clustering (HAC) [16], of which only few fuzzy versions have been proposed.

Exceptions include [1, 7, 11]. Ghasemigol et al. [7] describes HAC for trapezoidal fuzzy sets with either single or complete linkage, but is restricted to one dimension due to its special distance function. Konkol [11] proposes an HAC algorithm for crisp data based on fuzzy distances, which are effectively distances weighted by a function of membership degrees. It mixes single and complete linkage. Bank and Schwenker [1] merges clusters in the spirit of HAC, but keeps the original clusters for possible additional mergers, so that a hierarchy in the form of a directed acyclic graph results (while standard HAC produces a tree). Also noteworthy is [15], which suggest a mixed approach, re-partitioning the result of fuzzy c-means clustering and linking the partitions of two consecutive steps.

Related approaches include [6, 12] as well as its extension [9]. The first uses a competitive agglomeration scheme and an extended objective function for fuzzy c-means in order to reduce an overly large initial number of clusters to an “optimal” number. The latter two change the term in the objective function that penalizes many clusters from a quadratic expression to an entropy expression. Although fairly different from hierarchical agglomerative clustering approaches, they share the property that clusters are merged to find a good final partition, but they do not necessarily produce a hierarchy.

Our approach is closest in spirit to [11], as it also relies on the standard scheme of hierarchical agglomerative clustering, although we treat the original data points as clusters already, while [11] keeps data points and clusters clearly separate. Furthermore, [11] focuses on single and complete linkage while we use a centroid scheme. Our approach also bears some relationship to [15] concerning the distances of fuzzy sets, which [15] divides into three categories: (1) comparing membership values, (2) considering spatial characteristics, and (3) characteristic indices. While [15] relies on (2), we employ (1).

The remainder of this paper is structured as follows: in Sects. 2 and 3 we briefly review standard fuzzy clustering and hierarchical agglomerative clustering, indicating which elements we use in our approach. In Sect. 4 we present our method and in Sect. 5 we report experimental results. Finally, we draw conclusions from our discussion in Sect. 6.

2 Fuzzy Clustering

The input to our clustering algorithm is a data set $\mathbf {X} = \{\mathbf {x}_1,\ldots ,\mathbf {x}_n\}$ with n data points, each of which is an m-dimensional real-valued vector, that is, $\forall j; 1 \le j \le n:$ $\mathbf {x}_j = (x_{j1},\ldots ,x_{jm}) \in \mathbb {R}^m$. Although HAC usually requires only a distance or similarity matrix as input, we assume metric data, since a centroid scheme requires the possibility to compute new center vectors.

In standard fuzzy clustering one tries to minimize the objective function

$$ J(\mathbf {X},\mathbf {C},\mathbf {U}) = \sum _{i=1}^c \sum _{j=1}^n u_{ij}^w\, d_{ij}^2, $$

where $\mathbf {C} = \{\mathbf {c}_1,\ldots ,\mathbf {c}_c\}$ is the set of prototypes (often merely cluster centers), the $c \times n$ matrix $\mathbf {U} = (u_{ij})_{1 \le i \le c;1 \le j \le n}$ is the partition matrix containing the degrees of membership with which the data points belong to the clusters, the $d_{ij}$ are the distances between cluster $\mathbf {c}_i$ and data point $\mathbf {x}_j$, and w, $w > 1$, is the fuzzifier (usually $w = 2$), which controls the “softness” of the cluster boundaries (the larger w, the softer the cluster boundaries). In order to avoid the trivial solution of setting all membership degrees to zero, the constraints $\forall j; 1 \le j \le n: \sum _{i=1}^c u_{ij} = 1$ and $\forall i; 1 \le i \le c: \sum _{j=1}^n u_{ij} > 0$ are introduced.

A fuzzy clustering algorithm optimizes the above function, starting from a random initialization of either the cluster prototypes or the partition matrix, in an alternating fashion: (1) optimize membership degrees for fixed prototypes and (2) optimize prototypes for fixed membership degrees. From this scheme we take the computation of membership degrees for $w = 2$, namely

$$ u_{ij} = \frac{d_{ij}^{-2}}{{\textstyle \sum _{k=1}^c d_{kj}^{-2}}}. $$

We compute membership degrees for fuzzy clusters in an only slightly modified fashion, which are then compared to decide which clusters to merge.

3 Hierarchical Agglomerative Clustering

As its name already indicates, hierarchical agglomerative clustering produces a hierarchy of clusters in an agglomerative fashion, that is, by merging clusters (in contrast to divise approaches, which split clusters). It starts by letting each data point form its own cluster and then iteratively merges those two clusters that are most similar (or closest to each other).

While the similarity (or distance) of the data points is an input to the procedure, how the distances of (non-singleton) clusters are to be measured is a matter of choice. Common options include (1) single linkage (cluster distances are minimum distances of contained data points), (2) complete linkage (maximum distances of contained data points), and (3) centroid (distances of cluster centroids). Note that the centroid method requires that one can somehow compute a cluster center (or at least an analog), while single and complete linkage only require the initial similarity or distance matrix of the data points. Because of this we assume metric data as input.

In the single and complete linkage methods, clusters are merged by simply pooling the contained data points. In the centroid method, clusters are merged by computing a new cluster centroid as the weighted mean of the centroids of the clusters to be merged, where the weights are provided by the relative number of data points in the clusters to be merged.

4 Agglomerative Fuzzy Clustering

Our algorithm builds on the idea to see the given set of data points as the initial cluster centers (as in standard HAC) and to compute membership degrees of all data points to these cluster centers. However, for this the membership computation reviewed in Sect. 2 is not quite appropriate, since it leads to each data point being assigned to itself and to itself only (only one membership degree is 1, all others are 0). As a consequence, there would be no similarity between any two clusters (at least in the initial partition) and thus no proper way to choose a cluster merger. In order to circumvent this problem, we draw on the concept of a “raw” membership degree, which is computed from a distance via a radial function, where “raw” means that its value is not normalized to sum 1 over the clusters [4]. Possible choices for such a radial function (with parameters $\alpha $ and $\sigma ^2$, respectively) are

$$ f_\mathrm{Cauchy} (r;\alpha ) = \frac{1}{r^2+\alpha } \qquad \text {and}\qquad f_\mathrm{Gauss} (r;\sigma ^2) = e^{-\frac{r^2}{2\sigma ^2}}, $$

where r is the distance to a cluster center. Using these functions (with $\alpha > 0$) prevents singularities at the cluster centers that occur with the simple inverted squared distance (that is, for $\alpha = 0$) and thus allows us to compute suitable membership degrees even for the initial set of clusters, namely as

$$ u_{ij}^{(\alpha )} = \frac{f_\mathrm{Cauchy}(d_{ij};\alpha )}{\sum _{k=1}^c f_\mathrm{Cauchy}(d_{kj};\alpha )} \qquad \text {or}\qquad u_{ij}^{(\sigma ^2)} = \frac{f_\mathrm{Gauss}(d_{ij};\sigma ^2)}{\sum _{k=1}^c f_\mathrm{Gauss}(d_{kj};\sigma ^2)}. $$

Based on these membership degrees two clusters $\mathbf {c}_i$ and $\mathbf {c}_k$ can now be compared by aggregating (here: simply summing) point-wise comparisons:

$$\begin{aligned} \delta _{ik} = \sum _{j=1}^n g(u_{ij},u_{kj}), \end{aligned}$$

where g is an appropriately chosen difference function. Here we consider

$$ g_\mathrm{abs}(x,y) = |x-y|, \quad g_\mathrm{sqr}(x,y) = (x-y)^2 \quad \text {and}\quad g_\mathrm{wgt}(x,y) = (x-y)(x+y). $$

The first function, $g_\mathrm{abs}$, may appear the most natural choice, while $g_\mathrm{sqr}$ generally weights large differences more strongly and $g_\mathrm{wgt}$ emphasizes large differences of large membership degrees and thus focuses on close neighbors.

A fuzzy HAC algorithm can now be derived in a standard fashion: compute the initial membership degrees by using each data point as a cluster center. Compute the cluster dissimilarities $\delta _{ik}$ for this initial set of clusters. Merge the two clusters $\mathbf {c}_i$ and $\mathbf {c}_k$, for which $\delta _{ik}$ is smallest, according to

$$ \mathbf {c}_* = \frac{1}{\sum _{j=1}^n (u_{ij} +u_{kj})} \bigg (\mathbf {c}_i \sum _{j=1}^n u_{ij} + \mathbf {c}_k \sum _{j=1}^n u_{kj}\bigg ). $$

That is, the sum of membership degrees for each cluster is used as the relative weight of the cluster for the merging and thus (quite naturally) replaces the number of data points in the classical HAC scheme. The merged clusters $\mathbf {c}_i$ and $\mathbf {c}_k$ are removed and replaced by the result $\mathbf {c}_*$ of the merger.

For the next step membership degrees and cluster dissimilarities are re-computed and again the two least dissimilar clusters are merged. This process is repeated until only one cluster remains. From the resulting hierarchy a suitable partition may then be chosen to obtain a final result (if so desired), which may be further optimized by applying standard fuzzy c-means clustering.

Note that this agglomerative fuzzy clustering scheme is computationally considerably more expensive than standard HAC, since all membership degrees and cluster dissimilarities need to be re-computed in each step.

5 Experimental Results

We implemented our agglomerative fuzzy clustering method prototypically in Python, allowing for the two radial functions (Cauchy and Gauss, with parameters $\alpha $ and $\sigma ^2$) to compute membership degrees and the three cluster dissimilarity measures ($g_\mathrm{abs}$, $g_\mathrm{sqr}$ and $g_\mathrm{wgt}$) to decide which clusters to merge. We applied this implementation in a simple first test of functionality to two well-known data sets from the UCI machine learning repository [13], namely the Iris data and the Wine data. For the clustering runs we used attributes petal_length and petal_width for the Iris data and attributes 7, 10 and 13 for the Wine data, since these are the most informative attributes w.r.t. the class structure of these data sets. This restriction of the attributes also allows us to produce (low-dimensional) diagrams with which the cluster hierarchies can be easily compared. The latter is important, since it is difficult to find an undisputed way of evaluating clustering results. Visual representations in diagrams at least allow to compare the results subjectively and provide some insight about the properties of the different variants.

As a comparison we applied standard hierarchical agglomerative clustering (HAC) with the centroid method for linking clusters. As it also produces a hierarchy of clusters (cluster tree) the results can be displayed in the same manner and thus are easy to compare. For both standard HAC and agglomerative fuzzy clustering we z-normalized the data (that is, we normalized each attribute to mean 0 and standard deviation 1) in order to avoid effects resulting from different scales (which is particularly important for attribute 13 of the Wine data set, which spans a much larger range than all other attributes and thus would dominate the clustering without normalization).

A selection of results we obtained are shown in Figs. 1 and 2 for the Iris data and in Figs. 3 and 4 for the Wine data. Since in our approach cluster dissimilarity basically depends on all data points, the distribution of the data points in the data space has a stronger influence on the mergers to be carried out. For example, for the Iris data, which is mainly located along a diagonal of the data space, mergers with our algorithm tend to be carried out more often in a direction perpendicular to this diagonal. How strong this effect is depends on the parameters: a smaller $\alpha $ or $\sigma ^2$ reduces this effect. For the wine data set, which has a more complex data distribution, we believe that we can claim that the resulting cluster trees better respects the distribution of the data points than standard HAC does.

6 Conclusions

We described a (hierarchical) agglomerative fuzzy clustering algorithm (fuzzy HAC) that is based on a cluster dissimilarity measure derived from aggregated point-wise membership differences. Although it is computationally more expensive than classical (crisp) HAC, a subjective evaluation of its results seems to indicate that it may be able to produce cluster hierarchies that better fit the spatial distribution of the data points than the hierarchy obtained with classical HAC. Future work includes a more thorough investigation of the effects of its parameters ($\alpha $ and $\sigma ^2$ and the choice of the dissimilarity function, as well as the fuzzifier, which we neglected in this paper). Furthermore, an intermediate (partial) optimization of the cluster centers with fuzzy c-means is worth to be examined and may make it possible to return to simple inverted squared distances to compute the membership degrees.

References

Bank M, Schwenker F (2012) Fuzzification of agglomerative hierarchical crisp clustering algorithms. In: Proceedings 34th annual conference on Gesellschaft für Klassifikation (GfKl, (2010) Karlsruhe, Germany). Springer, Heidelberg/Berlin, Germany, pp 3–11
Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Book MATH Google Scholar
Bezdek JC, Keller J, Krishnapuram R, Pal N (1999) Fuzzy models and algorithms for pattern recognition and image processing. Kluwer, Dordrecht
Book MATH Google Scholar
Borgelt C (2005) Prototype-based classification and clustering. Otto-von-Guericke-University of Magdeburg, Germany, Habilitationsschrift
Google Scholar
Everitt BS (1981) Cluster analysis. Heinemann, London
MATH Google Scholar
Frigui H, Krishnapuram R (1997) Clustering by competitive agglomeration. Pattern Recogn 30(7):1109–1119. Elsevier, Amsterdam, Netherlands
Google Scholar
Ghasemigol M, Yazdi HS, Monsefi R (2010) A new hierarchical clustering algorithm on fuzzy data (FHCA). Int J Comput Electr Eng 2(1):134–140. International Academy Publishing (IAP), San Bernadino, CA, USA
Google Scholar
Höppner F, Klawonn F, Kruse R, Runkler T (1999) Fuzzy cluster analysis. Wiley, Chichester
MATH Google Scholar
Hsu M-J, Hsu P-Y, Dashnyam B (2011) Applying agglomerative fuzzy K-means to reduce the cost of telephone marketing. In: Proceedings of international symposium on integrated uncertainty in knowledge modelling and decision making (IUKM 2011, Hangzhou, China), pp 197–208
Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Book Google Scholar
Konkol M (2015) Fuzzy agglomerative clustering. In: Proceedings of 14th international conference on artificial intelligence and soft computing (ICAISC, (2015) Zakopane, Poland). Springer, Heidelberg/Berlin, Germany, pp 207–217
Google Scholar
Li M, Ng M, Cheung Y-M, Huang J (2008) Agglomerative fuzzy K-means clustering algorithm with selection of number of clusters. IEEE Trans Knowl Data Eng 20(11):1519–1534. IEEE Press, Piscataway, NJ, USA
Google Scholar
Lichman M (2013) UCI machine learning repository. University of California, Irvine, CA, USA. http://archive.ics.uci.edu/ml
Ruspini EH (1969) A new approach to clustering. Inf Control 15(1):22–32. Academic Press, San Diego, CA, USA
Google Scholar
Torra V (2005) Fuzzy c-means for fuzzy hierarchical clustering. In: Proceedings of IEEE international conference on fuzzy systems (FUZZ-IEEE, (2005) Reno, Nevada). IEEE Press, Piscataway, NJ, USA, pp 646–651
Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Otto-von-Guericke-University Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Christian Borgelt & Rudolf Kruse

Authors

Christian Borgelt
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Kruse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Borgelt .

Editor information

Editors and Affiliations

Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maria Brigida Ferraro
Department of Statistical Sciences, Sapienza University of Rome, Roma, Italy
Paolo Giordani
Dept of Basic & Applied Sciences Engg, Sapienza University of Rome, Rome, Italy
Barbara Vantaggi
Dept of Stoc Metds,Polish Academy of Sci, Systems Research Institute, Warsaw, Poland
Marek Gagolewski
Dept of Statis&OR and MD, Univ de Oviedo, Oviedo, Spain
María Ángeles Gil
Dept of Stoc Metds,Polish Aca of Science, Systems Res Inst, Warsaw, Poland
Przemysław Grzegorzewski
Sys Res Inti, Dept of Stoch Methods, Polish Acadmy Sci in Warsaw, Warsaw, Poland
Olgierd Hryniewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Borgelt, C., Kruse, R. (2017). Agglomerative Fuzzy Clustering. In: Ferraro, M., et al. Soft Methods for Data Science. SMPS 2016. Advances in Intelligent Systems and Computing, vol 456. Springer, Cham. https://doi.org/10.1007/978-3-319-42972-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-42972-4_9
Published: 30 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42971-7
Online ISBN: 978-3-319-42972-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Agglomerative Fuzzy Clustering

Abstract