Abstract
Several standardization methods are investigated in conjunction with the K-means algorithm under various conditions. We find that traditional standardization methods (i.e., z-scores) are inferior to alternative standardization methods. Future suggestions concerning the combination of standardization and variable selection are considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brusco, M. J., Cradit, J. D. (2001). “A Variable-Selection Heuristic for If-means Clustering,” Psychometrika, 66, 249–270.
Dillon, W. R., Mulani, N., Frederick, D. G. (1989). “On the Use of Component Scores in the Presence of Group Structure,” Journal of Consumer Research, 16, 106–112.
Hubert, L., Arabie, P. (1985). “Comparing partitions,” Journal of Classification, 2, 193–218.
MacQueen, J. (1967). “Some Methods of Classification and Analysis of Multivariate Observations,” in Proceedings of the 5th Berkeley Symposium on Statistics and Probability, eds. L. Le Cam and J. Neyman, Berkeley, CA: University of California Press, pp. 281–297.
Milligan, G. W. (1980). “An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms,” Psychometrika, 45, 325–342.
Milligan, G. W. (1985). “An Algorithm for Generating Artificial Test Clusters,” Psychometrika, 50, 123–127.
Milligan, G. W., Cooper, M. C. (1988). “A Study of Standardization of Variables in Cluster Analysis,” Journal of Classification, 5, 181–204.
Schaffer, C. M., Green, P. E. (1996). “An Empirical Comparison of Variable Standardization Methods in Cluster Analysis,” Multivariate Behavioral Research, 31, 149–167.
Späth, H. (1985). Cluster Dissection and Analysis-Theory, FORTRAN Programs, Examples. Wiley, New York.
Steinley, D. (2003a). “K-means Clustering: What You Don’t Know May Hurt You,” Psychometric Methods, 8, 294–304.
Steinley, D. (2003b). “Properties of the Hubert-Arabie Adjusted Rand Index,” Manuscript submitted for publication.
Steinley, D., Henson, R. (2003). “OCLUS-An Analytic Method to Generate Clusters with Known Overlap,” Manuscript submitted for publication.
Stoddard, A. M. (1979). “Standardization of Measures Prior to Cluster Analysis,” Biometrics, 35, 765–773.
Vesanto, J. (2001). “Importance of Individual Variables in the K-means Algorithm,” in Proceedings of the Pacific-Asia Conference in Knowledge Discovery and Data Mining, eds. D. Cheung, G. J. Willimas, and J. Li, New York: Springer, pp. 513–518.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Steinley, D. (2004). Standardizing Variables in K-means Clustering. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17103-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-17103-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22014-5
Online ISBN: 978-3-642-17103-1
eBook Packages: Springer Book Archive