Access provided by Autonomous University of Puebla. Download reference work entry PDF
The notion of mixed membership arises naturally in the context of multivariate data analysis (see Multivariate Data Analysis: An Overview) when attributes collected on individuals or objects originate from a mixture of different categories or components. Consider, for example, an individual with both European and Asian ancestry whose mixed origins correspond to a statement of mixed membership: “1/4 European and 3/4 Asian ancestry.” This description is conceptually very different from a probability statement of “25% chance of being European and 75% chance of being Asian”. The assumption that individuals or objects may combine attributes from several basis categories in a stochastic manner, according to their proportions of membership in each category, is a distinctive feature of mixed membership models. In most applications, the number and the nature of the basis categories, as well as individual membership frequencies, are typically considered latent or unknown. Mixed membership models are closely related to latent class and finite mixture models in general. Variants of these models have recently gained popularity in many fields, from genetics to computer science.
Early Developments
Mixed membership models arose independently in at least three different substantive areas: medical diagnosis and health, genetics, and computer science. Woodbury et al. (1978) proposed one of the earliest mixed membership models in the context of disease classification, known as the Grade of Membership or GoM model. The work of Woodbury and colleagues on the GoM model is summarized in the volume Statistical Applications Using Fuzzy Sets (Manton et al. 1994).
Pritchard et al. (2000) introduced a variant of the mixed membership model which became known in genetics as the admixture model for multilocus genotype data and produced remarkable results in a number of applications. For example, in a study of human population structure, Rosenberg et al. (2002) used admixture models to analyze genotypes from 377 autosomal microsatellite loci in 1,056 individuals from 52 populations. Findings from this analysis indicated a typology structure that was very close to the “traditional” five main racial groups.
Among the first mixed membership models developed in computer science and machine learning for analyzing words in text documents were a multivariate analysis method named Probabilistic Latent Semantic Analysis (Hofmann 2001) and its random effects extension by Blei et al. (2003a, b). The latter model became known as Latent Dirichlet Allocation (LDA) due to the imposed Dirichlet distribution assumption for the mixture proportions. Variants of LDA model in computer science are often referred to as unsupervised generative topic models. Blei et al. (2003a, b) and Barnard et al. (2003) used LDA to combine different sources of information in the context of analyzing complex documents that included words in main text, photographic images, and image annotations. Erosheva et al. (2004) analyzed words in abstracts and references in bibliographies from a set of research reports published in the Proceeding of the National Academy of Sciences (PNAS), exploring an internal mixed membership structure of articles and comparing it with the formal PNAS disciplinary classifications. Blei and Lafferty (2007) developed another mixed membership model replacing the Dirichlet assumption with a more flexible logistic normal distribution for the mixture proportions. Mixed membership developments in machine learning have spurred a number of applications and further developments of this class of models in psychology and cognitive sciences where they became known as topic models for semantic representations (Griffiths et al. 2007).
Basic Structure
The basic structure of a mixed membership model follows from the specification of assumptions at the population, individual, and latent variable levels, and the choice of a sampling scheme for generating individual attributes (Erosheva et al. 2004). Variations in these assumptions can provide us with different mixed membership models, including the GoM, admixture, and generative topic models referred to above.
Assume \(K\) basis subpopulations. For each subpopulation \(k = 1,\ldots ,K\), specify \(f({x}_{j}\vert {\theta }_{kj}),\) a probability distribution for attribute \({x}_{j}\), conditional on a vector of parameters \({\theta }_{kj}\). Denote individual-level membership score vector by \(\lambda = ({\lambda }_{1},\ldots ,{\lambda }_{K})\), representing the mixture proportions in each subpopulation. Given \(\lambda \), the subject-specific conditional distribution for \(jth\) attribute is
In addition, assume that attributes \({x}_{j}\) are independent, conditional on membership scores. Assume membership scores, the latent variables, are random realizations from some underlying distribution \({D}_{\alpha }\), parameterized by \(\alpha \). Finally, specify a sampling scheme by picking the number of observed distinct attributes, \(J\), and the number of independent replications for each attribute, \(R\).
Combining these assumptions, the marginal probability of observed responses \({\left \{{x}_{1}^{(r)},\ldots ,{x}_{J}^{(r)}\right \}}_{r=1}^{R}\), given model parameters \(\alpha \) and \(\theta \), is
In general, the number of observed attributes need not be the same across subjects, and the number of replications need not be the same across attributes. In addition, instead of placing a probability distribution on membership scores, some mixed membership model variants may treat latent variables as fixed but unknown constants. Finally, other extensions can be developed by specifying further dependence structures among sampled individuals or attributes that may be driven by particular data forms as, e.g., in relational or network data (Airoldi et al. 2008b; Chang and Blei 2010; Xing et al. 2010).
Estimation
A number of estimation methods have been developed for mixed membership models that are, broadly speaking, of two types: those that treat membership scores as fixed and those that treat them as random. The first group includes the numerical methods introduced by Hofmann (2001), and joint maximum likelihood type methods described in Manton et al. (1994) and Cooil and Varki (2003), and related likelihood approaches in Potthoff et al. (2000) and Varki et al. (2000). The statistical properties of the estimators in these approaches, such as consistency, identifiability, and uniqueness of solutions, are yet to be fully understood (Haberman 1995) – empirical evidence suggests that the likelihood function is often multi-modal and can have bothersome ridges. The second group uses Bayesian hierarchical structure for direct computation of the posterior distribution, e.g., with Gibbs sampling based on simplified assumptions (Pritchard et al. 2000; Griffiths and Steyvers 2004) or with fully Bayesian MCMC sampling (Erosheva 2003). Variational methods used by Blei et al. (2003a, b), or expectation-propagation methods developed by Minka and Lafferty (2002), can be used to approximate the posterior distribution. The Bayesian hierarchical methods solve some of the statistical and computational problems, and variational methods in particular scale well for higher dimensions. Many other aspects of working with mixed membership models remain as open challenges, e.g., dimensionality selection (Airoldi et al. 2008a).
Relationship to Other Methods of Multivariate Analysis
It is natural to compare mixed membership models with other latent variable methods, and, in particular, with factor analysis and latent class models (Bartholomew and Knott 1999). For example, the GoM model for binary outcomes can be thought of as a constrained factor analysis model: E(x | λ) = A λy, where \(x\) is a column-vector of observed attributes \(x = {({x}_{1},\ldots ,{x}_{J})}^{{\prime}}\), \(\lambda = {({\lambda }_{1},\ldots ,{\lambda }_{K})}^{{\prime}}\) is a column-vector of factor (i.e., membership) scores, and \(A\) is a \(J \times K\) matrix of factor loadings. The respective constraints in this factor model are \({\lambda }^{{\prime}}{I}_{K} = 1\) and \(A{I}_{K} = {I}_{K}\), where \({I}_{K}\) is a \(K\)-dimensional vector of 1s.
Mixed membership models can also address objectives similar to those in Correspondence Analysis and Multidimensional Scaling methods for contingency tables. Thus, one could create a low-dimensional map from a contingency table data and graphically examine membership scores (representing table rows or individuals) in the convex space defined by basis or extreme profiles (representing columns or attributes) to address questions such as whether some table rows have similar distribution over the table columns categories.
Finally, there is a special relationship between the sets of mixed membership and latent class models, where each set of models can be thought of as a special case of the other. Manton et al. (1994) and Potthoff et al. (2000) described how GoM model can be thought of as an extension of latent class models. On the other hand, Haberman (1995) first pointed out that GoM model can be viewed as a special case of latent class models. The fundamental representation theorem of equivalence between mixed membership and population-level mixture models clarifies this nonintuitive relationship (Erosheva et al. 2007).
About the Authors
Elena Erosheva is a Core member of the Center for Statistics and the Social Sciences, University of Washington. For biography of Professor Fienberg see the entry Data Privacy and Confidentiality.
Acknowledgments
Supported in part by National Institutes of Health grant No. R03 AG030605-01 and by National Science Foundation grant DMS-0631589.
References and Further Reading
Airoldi EM, Blei DM, Fienberg SE, Xing EP (2008a) Mixed-membership stochastic blockmodels. J Mach Learn Res 9:1981–2014
Airoldi EM, Fienberg SE, Joutard C, Love TM (2008b) Discovery of latent patterns with hierarchical Bayesian mixed-membership models and the issue of model choice. In: Poncelet P, Masseglia F, Teisseire M (eds) Data mining patterns: new methods and applications. pp 240–275
Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3: 1107–1135
Bartholomew DJ, Knott M (1999) Latent variable models and factor analysis, 2nd edn. Arnold, London
Blei DM, Lafferty JD (2007) A correlated topic model of Science. Ann Appl Stat 1:17–35
Blei DM, Ng AY, Jordan MI (2003a) Latent Dirichlet allocation. J Mach LearnRes 3:993–1002
Blei DM, Ng AY, Jordan MI (2003b) Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 127–134
Chang J, Blei DM (2010) Hierarchical relational models for document networks. Ann Appl Stat 4, pp 124–150
Cooil B, Varki S (2003) Using the conditional Grade-of-Membership model toassess judgement accuracy. Psychometrika 68:453–471
Erosheva EA (2003) Bayesian estimation of the Grade of Membership Model. In: Bernardo J et al (eds) Bayesian statistics 7. Oxford University Press, Oxford, pp 501–510
Erosheva EA, Fienberg SE (2004) Partial membership models with application to disability survey data. In: Weihs C, Caul W (eds) Classification – the ubiquitous challenge. Springer, Heidelberg, pp 11–26
Erosheva EA, Fienberg SE, Lafferty J (2004) Mixed membership models of scientific publications. Proc Natl Acad Sci 101 (suppl 1):5220–5227
Erosheva EA, Fienberg SE, Joutard C (2007) Describing disability through individual-level mixture models for multivariate binary data. Ann Appl Stat 1:502–537
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101 (suppl 1):5228–5235
Griffiths TL, Steyvers M, Tenenbaum JB (2007) Topics in Semantic Representation. Psychol Rev 114(2):211–244
Haberman SJ (1995) Book review of “Statistical applications using fuzzy sets,” by K.G. Manton, M.A. Woodbury and H.D. Tolley. J Am Stat Assoc 90:1131–1133
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42:177–196
Manton KG, Woodbury MA, Tolley HD (1994) Statistical applications using fuzzy sets. Wiley, New York
Minka TP, Lafferty JD (2002) Expectation-propagation for the generative aspect model. In: Uncertainty in Artificial Intelligence: Proceedings of the Eighteenth Conference (UAI–2002), Morgan Kaufmann, San Francisco, pp 352–359
Potthoff RF, Manton KG, Woodburry MA (2000) Dirichlet generalizations of latent-class models. J Classif 17:315–353
Pritchard P, Stephens JK, Donnely P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW (2002) Genetic structure of human populations. Science 298:2381–2385
Varki S, Cooil B, Rust RT (2000) Modeling fuzzy data in qualitative marketing research. J Market Res 37:480–489
Woodbury MA, Clive J, Garson A (1978) Mathematical typology: a grade of membership technique for obtaining disease definition. Comput Biomed Res 11:277–298
Xing E, Fu W, Song L (2010) A state-space mixed membership blockmodel for dynamic network tomography. Ann Appl Stat 4, in press
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this entry
Cite this entry
Erosheva, E.A., Fienberg, S.E. (2011). Mixed Membership Models. In: Lovric, M. (eds) International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_367
Download citation
DOI: https://doi.org/10.1007/978-3-642-04898-2_367
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04897-5
Online ISBN: 978-3-642-04898-2
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering