Synonyms

Objective function

Definition

Partitional clustering (Han et al. 2011) decomposes a data set into a set of disjoint clusters. Given a data set of N points, a partitioning method constructs K (NK) partitions of the data with each partition representing a cluster. That is, it classifies the data into K groups by satisfying the following requirements: (1) each group contains at least one point, and (2) each point belongs to exactly one group. For fuzzy partitioning, a point can belong to more than one group. The quality of the solution is measured by clustering criteria.

Some partitional clustering algorithms work by minimizing an objective function. For example, in K-means and K-medoids, the function (also referred as the distortion function) is

$$\displaystyle{ \sum _{i=1}^{K}\sum _{ j=1}^{\vert C_{i}\vert }Dist(x_{ j},center(i)) }$$
(1)

where | C i | is the number of points in cluster i and Dist(x j , center(i)) is the distance between point x j and center i. Depending on the need of the applications, different distance functions can be used, such as Euclidean distance and L1 norm.

Major Algorithms

Many algorithms can be used to perform partitional data clustering; representative technologies include K-means (Lloyd 1957), K-medoids (Kaufman and Rousseeuw 2005), quality threshold (QT) (Heyer et al. 1999), expectation-maximization (EM) (Dempster et al. 1977), mean shift (Comaniciu and Meer 2002), locality-sensitive hashing (LSH) (Gionis et al. 1999), K-way spectral clustering (Luxburg 2007), etc. In the K-means algorithm, each cluster is represented by the mean value of the points in the cluster. For the K-medoids algorithm, each cluster is represented by one of the points located near the center of the cluster. Instead of setting the cluster number K, the QT algorithm uses the maximum cluster diameter as a parameter to find clusters with guaranteed quality. Expectation-maximization clustering performs expectation-maximization analysis based on statistical modeling of the data distribution, and it has more parameters. Mean shift is a nonparameter algorithm to find any shape of clusters using density estimator. Locality-sensitive hashing-based method performs clustering by hashing similar points to the same bin. K-way spectral clustering algorithm represents the data as a graph and performs graph partitioning to find clusters.

Cross-References