The notion of mixed membership arises naturally in the context of multivariate data analysis (see Multivariate Data Analysis: An Overview) when attributes collected on individuals or objects originate from a mixture of different categories or components. Consider, for example, an individual with both European and Asian ancestry whose mixed origins correspond to a statement of mixed membership: “1/4 European and 3/4 Asian ancestry.” This description is conceptually very different from a probability statement of “25% chance of being European and 75% chance of being Asian”. The assumption that individuals or objects may combine attributes from several basis categories in a stochastic manner, according to their proportions of membership in each category, is a distinctive feature of mixed membership models. In most applications, the number and the nature of the basis categories, as well as individual membership frequencies, are typically considered latent or unknown. Mixed membership models are closely related to latent class and finite mixture models in general. Variants of these models have recently gained popularity in many fields, from genetics to computer science.

Early Developments

Mixed membership models arose independently in at least three different substantive areas: medical diagnosis and health, genetics, and computer science. Woodbury et al. (1978) proposed one of the earliest mixed membership models in the context of disease classification, known as the Grade of Membership or GoM model. The work of Woodbury and colleagues on the GoM model is summarized in the volume Statistical Applications Using Fuzzy Sets (Manton et al. 1994).

Pritchard et al. (2000) introduced a variant of the mixed membership model which became known in genetics as the admixture model for multilocus genotype data and produced remarkable results in a number of applications. For example, in a study of human population structure, Rosenberg et al. (2002) used admixture models to analyze genotypes from 377 autosomal microsatellite loci in 1,056 individuals from 52 populations. Findings from this analysis indicated a typology structure that was very close to the “traditional” five main racial groups.

Among the first mixed membership models developed in computer science and machine learning for analyzing words in text documents were a multivariate analysis method named Probabilistic Latent Semantic Analysis (Hofmann 2001) and its random effects extension by Blei et al. (2003a, b). The latter model became known as Latent Dirichlet Allocation (LDA) due to the imposed Dirichlet distribution assumption for the mixture proportions. Variants of LDA model in computer science are often referred to as unsupervised generative topic models. Blei et al. (2003a, b) and Barnard et al. (2003) used LDA to combine different sources of information in the context of analyzing complex documents that included words in main text, photographic images, and image annotations. Erosheva et al. (2004) analyzed words in abstracts and references in bibliographies from a set of research reports published in the Proceeding of the National Academy of Sciences (PNAS), exploring an internal mixed membership structure of articles and comparing it with the formal PNAS disciplinary classifications. Blei and Lafferty (2007) developed another mixed membership model replacing the Dirichlet assumption with a more flexible logistic normal distribution for the mixture proportions. Mixed membership developments in machine learning have spurred a number of applications and further developments of this class of models in psychology and cognitive sciences where they became known as topic models for semantic representations (Griffiths et al. 2007).

Basic Structure

The basic structure of a mixed membership model follows from the specification of assumptions at the population, individual, and latent variable levels, and the choice of a sampling scheme for generating individual attributes (Erosheva et al. 2004). Variations in these assumptions can provide us with different mixed membership models, including the GoM, admixture, and generative topic models referred to above.

Assume \(K\) basis subpopulations. For each subpopulation \(k = 1,\ldots ,K\), specify \(f({x}_{j}\vert {\theta }_{kj}),\) a probability distribution for attribute \({x}_{j}\), conditional on a vector of parameters \({\theta }_{kj}\). Denote individual-level membership score vector by \(\lambda = ({\lambda }_{1},\ldots ,{\lambda }_{K})\), representing the mixture proportions in each subpopulation. Given \(\lambda \), the subject-specific conditional distribution for \(jth\) attribute is

$$\begin{array}{l@{\,}l} Pr({x}_{j}\vert \lambda ) ={ \sum \nolimits }_{k}{\lambda }_{k}f({x}_{j}\vert {\theta }_{kj}).\, \end{array}$$

In addition, assume that attributes \({x}_{j}\) are independent, conditional on membership scores. Assume membership scores, the latent variables, are random realizations from some underlying distribution \({D}_{\alpha }\), parameterized by \(\alpha \). Finally, specify a sampling scheme by picking the number of observed distinct attributes, \(J\), and the number of independent replications for each attribute, \(R\).

Combining these assumptions, the marginal probability of observed responses \({\left \{{x}_{1}^{(r)},\ldots ,{x}_{J}^{(r)}\right \}}_{r=1}^{R}\), given model parameters \(\alpha \) and \(\theta \), is

$$Pr\left ({\left \{{x}_{1}^{(r)},\ldots ,{x}_{ J}^{(r)}\right \}}_{ r=1}^{R}\vert \alpha ,\theta \right )$$
$$= \int \nolimits \nolimits \left ({\prod }_{j=1}^{J}{ \prod }_{r=1}^{R}{ \sum }_{k=1}^{K}{\lambda }_{ k}\,f\left ({x}_{j}^{(r)}\vert {\theta }_{ kj}\right )\right )\,d{D}_{\alpha }(\lambda ).$$
(1)

In general, the number of observed attributes need not be the same across subjects, and the number of replications need not be the same across attributes. In addition, instead of placing a probability distribution on membership scores, some mixed membership model variants may treat latent variables as fixed but unknown constants. Finally, other extensions can be developed by specifying further dependence structures among sampled individuals or attributes that may be driven by particular data forms as, e.g., in relational or network data (Airoldi et al. 2008b; Chang and Blei 2010; Xing et al. 2010).

Estimation

A number of estimation methods have been developed for mixed membership models that are, broadly speaking, of two types: those that treat membership scores as fixed and those that treat them as random. The first group includes the numerical methods introduced by Hofmann (2001), and joint maximum likelihood type methods described in Manton et al. (1994) and Cooil and Varki (2003), and related likelihood approaches in Potthoff et al. (2000) and Varki et al. (2000). The statistical properties of the estimators in these approaches, such as consistency, identifiability, and uniqueness of solutions, are yet to be fully understood (Haberman 1995) – empirical evidence suggests that the likelihood function is often multi-modal and can have bothersome ridges. The second group uses Bayesian hierarchical structure for direct computation of the posterior distribution, e.g., with Gibbs sampling based on simplified assumptions (Pritchard et al. 2000; Griffiths and Steyvers 2004) or with fully Bayesian MCMC sampling (Erosheva 2003). Variational methods used by Blei et al. (2003a, b), or expectation-propagation methods developed by Minka and Lafferty (2002), can be used to approximate the posterior distribution. The Bayesian hierarchical methods solve some of the statistical and computational problems, and variational methods in particular scale well for higher dimensions. Many other aspects of working with mixed membership models remain as open challenges, e.g., dimensionality selection (Airoldi et al. 2008a).

Relationship to Other Methods of Multivariate Analysis

It is natural to compare mixed membership models with other latent variable methods, and, in particular, with factor analysis and latent class models (Bartholomew and Knott 1999). For example, the GoM model for binary outcomes can be thought of as a constrained factor analysis model: E(x | λ) = A λy, where \(x\) is a column-vector of observed attributes \(x = {({x}_{1},\ldots ,{x}_{J})}^{{\prime}}\), \(\lambda = {({\lambda }_{1},\ldots ,{\lambda }_{K})}^{{\prime}}\) is a column-vector of factor (i.e., membership) scores, and \(A\) is a \(J \times K\) matrix of factor loadings. The respective constraints in this factor model are \({\lambda }^{{\prime}}{I}_{K} = 1\) and \(A{I}_{K} = {I}_{K}\), where \({I}_{K}\) is a \(K\)-dimensional vector of 1s.

Mixed membership models can also address objectives similar to those in Correspondence Analysis and Multidimensional Scaling methods for contingency tables. Thus, one could create a low-dimensional map from a contingency table data and graphically examine membership scores (representing table rows or individuals) in the convex space defined by basis or extreme profiles (representing columns or attributes) to address questions such as whether some table rows have similar distribution over the table columns categories.

Finally, there is a special relationship between the sets of mixed membership and latent class models, where each set of models can be thought of as a special case of the other. Manton et al. (1994) and Potthoff et al. (2000) described how GoM model can be thought of as an extension of latent class models. On the other hand, Haberman (1995) first pointed out that GoM model can be viewed as a special case of latent class models. The fundamental representation theorem of equivalence between mixed membership and population-level mixture models clarifies this nonintuitive relationship (Erosheva et al. 2007).

About the Authors

Elena Erosheva is a Core member of the Center for Statistics and the Social Sciences, University of Washington. For biography of Professor Fienberg see the entry Data Privacy and Confidentiality.

Acknowledgments

Supported in part by National Institutes of Health grant No. R03 AG030605-01 and by National Science Foundation grant DMS-0631589.

Cross References

Correspondence Analysis

Factor Analysis and Latent Variable Modelling

Multidimensional Scaling

Multivariate Data Analysis: An Overview