Comparison of Four Initialization Techniques for the K-Medians Clustering Algorithm

Juan, A.; Vidal, E.

doi:10.1007/3-540-44522-6_87

A. Juan⁸ &
E. Vidal⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1876))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

1810 Accesses
11 Citations

Abstract

Clustering in Metric Spaces can be conveniently performed by the so called k-medians method. It consists of a variant of the popular k-means algorithm in which cluster medians (most centered cluster points) are used instead of the conventional cluster means. Two main aspects of the k-medians algorithm deserve special attention: computing efficiency and initialization. Efficiency issues have been studied in previous works. Here we focus on initialization. Four techniques are studied: Random selection, Supervised selection, the Greedy-Interchange algorithm and the Maxmin algorithm. The capabilities of these techniques are assessed through experiments in two typical applications of Clustering; namely, Exploratory Data Analysis and Unsupervised Prototype Selection. Results clearly show the importance of a good initialization of the k-medians algorithm in all the cases. Random initialization too often leads to bad final partitions, while best results are generally obtained using Supervised selection. The Greedy-Interchange and the Maxmin algorithms generally lead to partitions of high quality, without the manual effort of Supervised selection. From these algorithms, the latter is generally preferred because of its better computational behaviour.

Download to read the full chapter text

Chapter PDF

Favoring the k-Means Algorithm with Initialization Methods

DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering

Article 21 September 2020

A Quality Metric for K-Means Clustering Based on Centroid Locations

Key words

References

R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.
Google Scholar
E. Granum and M. G. Thomason. Automatically Inferred Markov Network Models for Classification of Chromosomal Band Pattern Structures. Cytometry, 11, 1990.
Google Scholar
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
Google Scholar
A. Juan and E. Vidal. Application of the K-Nearest-Neighbour Rule to the Classification of Banded Chromosomes. In ICPR’00 (accepted).
Google Scholar
A. Juan and E. Vidal. Fast k-means-like clustering in metric spaces. Pattern Recognition Letters, 15(1):19–25, 1994.
Article Google Scholar
A. Juan and E. Vidal. Fast Median Search in Metric Spaces. In A. Amin, D. Dori, P. Pudil, and H. Freeman, editors, Advances in Pattern Recognition, volume 1451, pages 905–912. Springer-Verlag, 1998.
Google Scholar
O. Kariv and S. L. Hakimi. An Algorithmic Approach to Network Location Problems. II: The p-Medians. SIAM Journal on Applied Math., 37(3):539–560, 1979.
Article MATH MathSciNet Google Scholar
I. Katsavounidis. A new initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letters, 1(10):144–146, October 1994.
Google Scholar
C. Lundsteen, J. Philip, and E. Granum. Quantitative analysis of 6985 digitized trypsin G-banded human metaphase chromosomes. Clinical Genetics, 18, 1980.
Google Scholar
Mirchandani P.B. and Francis R.L. editors. Discrete Location Theory. Wiley, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut Tecnològic d’Informàtica, Universitat Politècnica de València, 46071, València, Spain
A. Juan & E. Vidal

Authors

A. Juan
View author publications
You can also search for this author in PubMed Google Scholar
E. Vidal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of València, 46100, Burjassot (València), Spain
Francesc J. Ferri
Department of Computer Languages and Systems, University of Alicante, 03071, Alicante, Spain
José M. Iñesta
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
Adnan Amin
Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, 182 08, Prague 8, Czech Republic
Pavel Pudil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Juan, A., Vidal, E. (2000). Comparison of Four Initialization Techniques for the K-Medians Clustering Algorithm. In: Ferri, F.J., Iñesta, J.M., Amin, A., Pudil, P. (eds) Advances in Pattern Recognition. SSPR /SPR 2000. Lecture Notes in Computer Science, vol 1876. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44522-6_87

Download citation

DOI: https://doi.org/10.1007/3-540-44522-6_87
Published: 21 December 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67946-2
Online ISBN: 978-3-540-44522-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Comparison of Four Initialization Techniques for the K-Medians Clustering Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

Favoring the k-Means Algorithm with Initialization Methods

DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering

A Quality Metric for K-Means Clustering Based on Centroid Locations

Key words

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Comparison of Four Initialization Techniques for the K-Medians Clustering Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

Favoring the k-Means Algorithm with Initialization Methods

DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering

A Quality Metric for K-Means Clustering Based on Centroid Locations

Key words

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation