Latent class analysis variable selection

Dean, Nema; Raftery, Adrian E.

doi:10.1007/s10463-009-0258-9

Latent class analysis variable selection

Published: 24 July 2009

Volume 62, pages 11–35, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Latent class analysis variable selection

Download PDF

Nema Dean¹ &
Adrian E. Raftery²

1297 Accesses
103 Citations
3 Altmetric
Explore all metrics

Abstract

We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable’s usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNPs.

Article PDF

Uncovering Cluster Structure and Group-Specific Associations: Variable Selection in Multivariate Mixture Regression Models

Supervised clustering of variables

Article 15 November 2014

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Article 13 April 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Badsberg, J. H. (1992). Model search in contingency tables by CoCo. In Y. Dodge, J. Whittaker (Eds.), Computational statistics (Vol. 1, pp. 251–256). Heidelberg: Physica Verlag.
Clogg C.C. (1981) New developments in latent structure analysis. In: Jackson D.J., Borgatta E.F. (eds) Factor analysis and measurement in sociological research. Sage, Beverly Hills, pp 215–246
Google Scholar
Clogg C.C. (1995) Latent class models. In: Arminger G., Clogg C.C., Sobel M.E. (eds) Handbook of statistical modeling for the social and behavioral sciences. Plenum, New York, pp 311–360
Google Scholar
Detrano R., Janosi A., Steinbrunn W., Pfisterer M., Schmid J.-J., Sandhu S., Guppy K. H., Lee S., Froelicher V. (1989) International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology 64: 304–310
Article Google Scholar
Fraley C., Raftery A.E. (2002) Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97: 611–631
Article MATH MathSciNet Google Scholar
Galimberti G., Soffritti G. (2006) Identifying multiple cluster structures through latent class models. In: Spiliopoulou M., Kruse R., Borgelt C., Nürnberger A., Gaul W. (eds) From data and information analysis to knowledge engineering. Springer, Berlin, pp 174–181
Chapter Google Scholar
Gennari J.H., Langley P., Fisher D. (1989) Models of incremental concept formation. Artificial Intelligence 40: 11–61
Article Google Scholar
Goodman L.A. (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61: 215–231
Article MATH MathSciNet Google Scholar
Hagenaars J.A., McCutcheon A.L. (2002) Applied latent class analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Hubert L., Arabie P. (1985) Comparing partitions. Journal of Classification 2: 193–218
Article Google Scholar
Kass R.E., Raftery A.E. (1995) Bayes factors. Journal of the American Statistical Association 90: 773–795
Article MATH Google Scholar
Keribin C. (1998) Consistent estimate of the order of mixture models. Comptes Rendues de l’Academie des Sciences, Série I-Mathématiques 326: 243–248
Article MATH MathSciNet Google Scholar
Lazarsfeld, P. F. (1950a). The logical and mathematical foundations of latent structure analysis. In S. A. Stouffer (Ed.), Measurement and prediction, the American soldier: studies in social psychology in World War II (Vol. IV, Chap. 10, pp. 362–412). Princeton, NJ: Princeton University Press.
Lazarsfeld, P. F. (1950b). The interpretation and computation of some latent structures. In S. A. Stouffer (Ed.), Measurement and prediction, the American soldier: studies in social psychology in World War II (Vol. IV, Chap. 11, pp. 413–472). Princeton, NJ: Princeton University Press.
Lazarsfeld P.F., Henry N.W. (1968) Latent structure analysis. Houghton Mifflin, Boston
MATH Google Scholar
McCutcheon A.L. (1987) Latent class analysis. Sage, Newbury Park, CA
Google Scholar
McLachlan G.J., Peel D. (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Raftery A.E., Dean N. (2006) Variable selection for model-based clustering. Journal of the American Statistical Association 101: 168–178
Article MATH MathSciNet Google Scholar
Rand W.M. (1971) Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66: 846–850
Article Google Scholar
Rusakov D., Geiger D. (2005) Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research 6: 1–35
MathSciNet Google Scholar
The International HapMap Consortium (2003) The international hapmap project. Nature 426: 789–796
Article Google Scholar
Wolfe, J. H. (1963). Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley.

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Glasgow, Glasgow, G12 8QQ, Scotland, UK
Nema Dean
Department of Statistics, University of Washington, Box 354320, Seattle, WA, 98195-4320, USA
Adrian E. Raftery

Authors

Nema Dean
View author publications
You can also search for this author in PubMed Google Scholar
Adrian E. Raftery
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrian E. Raftery.

About this article

Cite this article

Dean, N., Raftery, A.E. Latent class analysis variable selection. Ann Inst Stat Math 62, 11–35 (2010). https://doi.org/10.1007/s10463-009-0258-9

Download citation

Received: 19 July 2008
Revised: 22 April 2009
Published: 24 July 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s10463-009-0258-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Latent class analysis variable selection

Abstract

Article PDF

Similar content being viewed by others

Uncovering Cluster Structure and Group-Specific Associations: Variable Selection in Multivariate Mixture Regression Models

Supervised clustering of variables

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Latent class analysis variable selection

Abstract

Article PDF

Similar content being viewed by others

Uncovering Cluster Structure and Group-Specific Associations: Variable Selection in Multivariate Mixture Regression Models

Supervised clustering of variables

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation