Abstract
Standard clustering algorithms can completely fail to identify clear cluster structure if that structure is confined to a subset of the variables. A forward selection procedure for identifying the subset is proposed and studied in the context of complete linkage hierarchical clustering. The basic approach can be applied to other clustering methods, too.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
ART, D., GNANADESIKAN, R., and KETTENRING, J.R. (1982), “Data-Based Metrics for Cluster Analysis,”Utilitas Mathematica, 21A, 75–99.
DE SARBO, W. S., CARROLL, J. D., CLARK, L. A., and GREEN, P.E. (1984), “Synthesized Clustering: A Method for Amalgamating Alternative Clustering Bases with Differential Weighting of Variables,”Psychometrika, 49, 57–78.
DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering,”Quality and Quantity, 20, 169–180.
DE SOETE, G., DE SARBO, W.S. and CARROLL, J.D. (1985), “Optimal Variable Weighting for Hierarchical Clustering: An Alternating Least-Squares Algorithm,”Journal of Classification, 2, 173–192.
FOWLKES, E. B., GNANADESIKAN, R., and KETTENRING, J. R. (1987), “Variable Selection in Clustering and Other Contexts,” inDesign, Data, and Analysis ed. C. L. Mallows, New York: Wiley, pp. 13–34.
GNANADESIKAN, R. (1977),Methods for Statistical Data Analysis of Multivariate Observations New York: Wiley.
GNANADESIKAN, R., KETTENRING, J. R., and LANDWEHR, J. M. (1977), “Interpreting and Assessing the Results of Cluster Analyses,”Bulletin of the International Statistical Institute, 47, 451–463.
HARTIGAN, J. A. (1972), “Direct Clustering of a Data Matrix,”Journal of the American Statistical Association, 67, 123–129.
HARTIGAN, J. A. (1975),Clustering Algorithms New York: Wiley.
MAC QUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations,” inProceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), eds. L. LeCam and J. Neyman, Berkeley: University of California Press, pp. 355–372.
MC KAY, R. J. and CAMPBELL, N. A. (1982), “Variable Selection Techniques in Discriminant Analysis. I. Description,”British Journal of Mathematical and Statistical Psychology, 35, 1–29.
MILLIGAN, G. W. and COOPER, M. C. (1988), “A Study of Variable Standardization,”Journal of Classification, 5, to appear.
PILLAI, K. C. S. (1955), “Some New Test Criteria in Multivariate Analysis,”Annals of Mathematical Statistics, 26, 117–121.
ROY, S. N., GNANADESIKAN, R., and SRIVASTAVA, J. N. (1971),Analysis and Design of Certain Quantitative Multiresponse Experiments Oxford: Pergamon Press.
SEBER, G. A. F. (1984),Multivariate Observations New York: Wiley.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fowlkes, E.B., Gnanadesikan, R. & Kettenring, J.R. Variable selection in clustering. Journal of Classification 5, 205–228 (1988). https://doi.org/10.1007/BF01897164
Issue Date:
DOI: https://doi.org/10.1007/BF01897164