Partitional Clustering

Jin, Xin; Han, Jiawei

doi:10.1007/978-1-4899-7687-1_637

Xin Jin³ &
Jiawei Han⁴

190 Accesses
4 Citations

Abstract

Partitional clustering is a type of clustering algorithms that divide a set of data points into disjoint subsets. Each data point is in exactly one subset.

Access provided by CONRICYT-eBooks. Download reference work entry PDF

Analysis of Clustering Algorithms

Synonyms

Objective function

Definition

Partitional clustering (Han et al. 2011) decomposes a data set into a set of disjoint clusters. Given a data set of N points, a partitioning method constructs K (N ≥ K) partitions of the data with each partition representing a cluster. That is, it classifies the data into K groups by satisfying the following requirements: (1) each group contains at least one point, and (2) each point belongs to exactly one group. For fuzzy partitioning, a point can belong to more than one group. The quality of the solution is measured by clustering criteria.

Some partitional clustering algorithms work by minimizing an objective function. For example, in K-means and K-medoids, the function (also referred as the distortion function) is

$$\displaystyle{ \sum _{i=1}^{K}\sum _{ j=1}^{\vert C_{i}\vert }Dist(x_{ j},center(i)) }$$

(1)

where | C_i | is the number of points in cluster i and Dist(x_j, center(i)) is the distance between point x_j and center i. Depending on the need of the applications, different distance functions can be used, such as Euclidean distance and L₁ norm.

Major Algorithms

Many algorithms can be used to perform partitional data clustering; representative technologies include K-means (Lloyd 1957), K-medoids (Kaufman and Rousseeuw 2005), quality threshold (QT) (Heyer et al. 1999), expectation-maximization (EM) (Dempster et al. 1977), mean shift (Comaniciu and Meer 2002), locality-sensitive hashing (LSH) (Gionis et al. 1999), K-way spectral clustering (Luxburg 2007), etc. In the K-means algorithm, each cluster is represented by the mean value of the points in the cluster. For the K-medoids algorithm, each cluster is represented by one of the points located near the center of the cluster. Instead of setting the cluster number K, the QT algorithm uses the maximum cluster diameter as a parameter to find clusters with guaranteed quality. Expectation-maximization clustering performs expectation-maximization analysis based on statistical modeling of the data distribution, and it has more parameters. Mean shift is a nonparameter algorithm to find any shape of clusters using density estimator. Locality-sensitive hashing-based method performs clustering by hashing similar points to the same bin. K-way spectral clustering algorithm represents the data as a graph and performs graph partitioning to find clusters.

Cross-References

Author information

Authors and Affiliations

PayPal Inc., San Jose,CA, USA
Xin Jin
University of Illinois at Urbana-Champaign, 61801, Urbana,IL, USA
Jiawei Han

Authors

Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Jin .

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Claude Sammut
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Jin, X., Han, J. (2017). Partitional Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_637

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7687-1_637
Published: 14 April 2017
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Partitional Clustering

Abstract

Similar content being viewed by others