Abstract
In a real-world data set, there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process.
With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analyzing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
ALOISE, D., DESHPANDE, A., HANSEN, P., and POPAT, P. (2009), “NP-Hardness of Euclidean Sum-of-Squares Clustering”, Machine Learning, 75(2), 245–248.
ARBELAITZ, O., GURRUTXAGA, I., MUGUERZA, J., PÉREZ, J.M., and PERONA, I. (2013), “An Extensive Comparative Study of Cluster Validity Indices”, Pattern Recognition, 46(1), 243–256.
BALL, G.H., and HALL, D.J. (1967), “A Clustering Technique for Summarizing Multivariate Data”, Behavioral Science, 12(2), 153–155.
BELLMAN, R. (1957), Dynamic Programming, Princeton, NJ: Princeton University Press.
BEYER, K., GOLDSTEIN, J., RAMAKRISHNAN, R., and SHAFT, U. (1999), “When is Nearest Neighbor Meaningful?”, in Proceedings of the 7th International Conference on Database Theory, Vol. 1540, Springer, pp. 217–235.
BEZDEK, J.C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, Norwell, MA: Kluwer Academic Publishers.
BLUM, A.L., and RIVEST, R.L. (1992), “Training a 3-Node Neural Network Is NP-Complete”, Neural Networks, 5(1), 117–127.
CHAN, E.Y., CHING,W.K., NG, M.K., and HUANG, J.Z. (2004), “An Optimization Algorithm for Clustering Using Weighted Dissimilarity Measures”, Pattern Recognition, 37(5), 943–952.
CHATZIS, S.P. (2011), “A Fuzzy C-Means-Type Algorithm for Clustering of Data with Mixed Numeric and Categorical Attributes Employing a Probabilistic Dissimilarity Functional”, Expert Systems with Applications, 38(7), 8684–8689.
CHEN, X., YE, Y., XU, X., and HUANG, J.Z. (2012),“A Feature Group Weighting Method for Subspace Clustering of High-Dimensional Data”, Pattern Recognition, 45(1), 434–446.
DE AMORIM, R.C., and HENNIG, C. (2015), “Recovering the Number of Clusters in Data Sets with Noise Features Using Feature Rescaling Factors”, Information Sciences, 324, 126–145.
DE AMORIM, R.C., and MAKARENKOV, V. (to appear), “Applying Subclustering and Lp Distance in Weighted K-Means with Distributed Centroid”, Neurocomputing, doi:10.1016/j.neucom.2015.08.018.
DE AMORIM, R.C., and MIRKIN, B. (2012), “Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering”, Pattern Recognition, 45(3), 1061–1075.
DE AMORIM, R.C., and MIRKIN, B. (2014), “Selecting the Minkowski Exponent for Intelligent K-Means with Feature Weighting”, in Clusters, Orders, Trees: Methods and Applications, Optimization and Its Applications, eds, F. Aleskerov, B. Goldengorin, and P. Pardalos, Berlin: Springer, pp. 103–117.
DE SOETE, G. (1988), “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting”, Journal of Classification, 5(1), 101–104.
DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering”, Quality and Quantity, 20(2-3), 169–180.
DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
DESARBO, W.S., and CRON, W.L. (1988), “A Maximum Likelihood Methodology for Clusterwise Linear Regression”, Journal of classification, 5(2), 249–282.
DESARBO, W.S., and MAHAJAN, V. (1984), “Constrained Classification: The Use of A Priori Information in Cluster Analysis”, Psychometrika, 49(2), 187–215.
DESARBO,W.S., CARROLL, J.D., CLARK, L.A., and GREEN, P.E. (1984), “Synthesized Clustering: A Method for Amalgamating Alternative Clustering Bases with Differential Weighting of Variables, Psychometrika, 49(1), 57–78.
DEVANEY, M., and RAM, A. (1997), “Efficient Feature Selection in Conceptual Clustering”, in Proceedings of the 14th ACMInternational Conference in Machine Learning, Nashville, TN, pp. 92–97.
DING, C., and HE, X. (2004), “K-means Clustering via Principal Component Analysis”, in Proceedings of the Twenty-First ACM International Conference on Machine Learning, pp. 29.
DOMENICONI, C., GUNOPULOS, D., MA, S., YAN, B., AL-RAZGAN, M., and PAPADOPOULOS, D. (2007), “Locally Adaptive Metrics for Clustering High Dimensional Data”, Data Mining and Knowledge Discovery, 14(1), 63–97.
DRINEAS, P., FRIEZE, A., KANNAN, R., VEMPALA, S., and VINAY, V. (2004), “Clustering Large Graphs via the Singular Value Decomposition”, Machine Learning, 56(1-3), 9–33.
DY, J.G. (2008), “Unsupervised Feature Selection”, in Data Mining & Knowledge Discovery, Computational Methods of Feature Selection, eds. H. Liu, and H. Motoda, Chapman & Hall/CRC, pp. 19–30.
GASCH, A.P., and EISEN,M.B. (2002), “Exploring the Conditional Coregulation of Yeast Gene Expression Through Fuzzy K-Means Clustering”, Genome Biology, 3(11), 1–22.
GNANADESIKAN, R., KETTENRING, J.R., and TSAO, S.L. (1995), “Weighting and Selection of Variables for Cluster Analysis”, Journal of Classification, 12(1), 113–136.
GODER, A., and FILKOV, V. (2008), “Consensus Clustering Algorithms: Comparison and Refinement”, in Proceedings of the 10th ALENEX SIAM, Vol. 8, pp. 109–117.
GREEN, P.E., KIM, J., and CARMONE, F.J. (1990), “A Preliminary Study of Optimal Variable Weighting in K-Means Clustering”, Journal of Classification, 7(2), 271–285.
GUYON, I., and ELISSEEFF, A. (2003), “An Introduction to Variable and Feature Selection”, The Journal of Machine Learning Research, 3, 1157–1182.
HUANG, J.Z., NG, M.K., RONG, H., and LI, Z. (2005), “Automated Variable Weighting in K-Means Type Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 657–668.
HUANG, J.Z., XU, J., NG, M., and YE, Y. (2008), “Weighting Method for Feature Selection in K-Means, in Computational Methods of Feature Selection, Data Mining & Knowledge Discovery, eds. H. Liu and H. Motoda, Chapman & Hall/CRC, pp. 193–209.
HUANG, Z. (1998), “Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery, 2(3), 283–304.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of classification, 2(1), 193–218.
JAIN, A.K. (2010), “Data Clustering: 50 years Beyond K-Means”, Pattern Recognition Letters, 31(8), 651–666.
JI, J., PANG,W., ZHOU, C., HAN, X., and WANG, Z. (2012),“A Fuzzy K-Prototype Clustering Algorithm for Mixed Numeric and Categorical Data”, Knowledge-Based Systems, 30, 129–135.
JI, J., BAI, T., ZHOU, C., MA, C., and WANG, Z. (2013), “An Improved K-Prototypes Clustering Algorithm for Mixed Numeric and Categorical Data, Neurocomputing, 120, 590–596.
JING, L., NG, M.K., and HUANG, J.Z. (2007), “An Entropy Weighting K-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data”, IEEE Transactions on Knowledge and Data Engineering, 19(8), 1026–1041.
KOHAVI, R., and JOHN, G.H. (1997), “Wrappers for Feature Subset Selection”, Artificial Intelligence, 97(1), 273–324.
LI, C., and YU, J. (2006), “A Novel Fuzzy C-Means Clustering Algorithm”, in Proceedings of the First International Conference on Rough Sets and Knowledge Technology, Berlin, Heidelberg: Springer, pp. 510–515.
LICHMAN, M. (2013), “UCI Machine Learning Repository”, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml.
LIU, H., and YU, L. (2005), “Toward Integrating Feature Selection Algorithms for Classification and Clustering”, IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), Berkley, CA: University of California Press, pp. 281–297.
MAKARENKOV, V., and LEGENDRE, P. (2001), “Optimal Variable Weighting for Ultrametric and Additive Trees and K-Means Partitioning: Methods and Software”, Journal of Classification, 18(2), 245–271.
MIRKIN, B. (2012), Clustering: A Data Recovery Approach, Boca Raton FL: Chapman and Hall/CRC.
MIRKIN, B. (1998), Mathematical Classification and Clustering: From How to What and Why, Dordrecht: Springer.
MITRA, P.,MURTHY, C.A., and PAL, S.K. (2002), “Unsupervised Feature Selection Using Feature Similarity”, IEEE transactions on Pattern Analysis and Machine Intelligence, 24(3), 301–312.
MODHA, D.S., and SPANGLER, W.S. (2003), “Feature Weighting in K-Means Clustering”, Machine Learning, 52(3), 217–237.
MURTAGH, F. (1984), “Complexities of Hierarchic Clustering Algorithms: State of the Art”, Computational Statistics Quarterly, 1(2), 101–113.
MURTAGH, F., and CONTRERAS, P. (2011), “Methods of Hierarchical Clustering”, arXiv preprint arXiv:1105.0121.
NG, M.K., and WONG, J.C. (2002), “Clustering Categorical Data Sets Using Tabu Search Techniques”, Pattern Recognition, 35(12), 2783–2790.
POLAK, E. (1971), Computational Methods in Optimization: A Unified Approach, New York: Academic press.
SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy. The Principles and Practice of Numerical Classification, San Francisco, CA: W.H.Freeman & Co Ltd.
STEINHAUS, H. (1956), “Sur la Division des Corp Materiels en Parties”, Bulletin of the Polish Academy of Sciences, 1, 801–804.
STEINLEY, D. (2006), “K-Means Clustering: A Half-Century Synthesis”, British Journal of Mathematical and Statistical Psychology, 59(1), 1–34.
STEINLEY, D., and BRUSCO, M.J. (2008a), “Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures”, Psychometrika, 73(1), 125–144.
STEINLEY, D., and BRUSCO, M.J. (2008b), “A New Variable Weighting and Selection Procedure for K-Means Cluster Analysis”, Multivariate Behavioral Research, 43(1), 77–108.
STURN, A., QUACKENBUSH, J., and TRAJANOSKI, Z. (2002), “Genesis: Cluster Analysis of Microarray Data”, Bioinformatics, 18(1), 207–208.
TALAVERA, L. (1999), “Feature Selection as a Preprocessing Step for Hierarchical Clustering”, in Proceedings of the 16th International Conference in Machine Learning, Bled, Slovenia, pp. 389–397.
TSAI, C.Y., and CHIU, C.C. (2008), “Developing a FeatureWeight Self-AdjustmentMechanism for a K-Means Clustering Algorithm”, Computational Statistics & Data Analysis, 52(10), 4658–4672.
VEDALDI, A., and FULKERSON, B. (2010), “VLFeat: An Open and Portable Library of Computer Vision Algorithms”, in Proceedings of the ACM International Conference on Multimedia, 1469–1472.
WETTSCHERECK, D., AHA, D.W., and MOHRI, T. (1997), “A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms”, Artificial Intelligence Review, 11(1-5), 273–314.
ZADEH, L.A. (1965), “Fuzzy Sets”, Information and Control, 8(3), 338–353.
ZHA, H., HE, X., DING, C., GU, M., and SIMON, H.D. (2001), “Spectral Relaxation for K-Means Clustering”, Advances in Neural Information Processing Systems, 1057–1064.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
de Amorim, R.C. A Survey on Feature Weighting Based K-Means Algorithms. J Classif 33, 210–242 (2016). https://doi.org/10.1007/s00357-016-9208-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-016-9208-4