Abstract
We introduce a Bayesian extension of the latent block model for model-based block clustering of data matrices. Our approach considers a block model where block parameters may be integrated out. The result is a posterior defined over the number of clusters in rows and columns and cluster memberships. The number of row and column clusters need not be known in advance as these are sampled along with cluster memberhips using Markov chain Monte Carlo. This differs from existing work on latent block models, where the number of clusters is assumed known or is chosen using some information criteria. We analyze both simulated and real data to validate the technique.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bozdogan, H.: Mixture-model cluster analysis using model selection criteria and a new information measure of complexity. In: Bozdogan, H. (ed.) Proceedings of the first US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, vol. 2, pp. 69–113. Kluwer Academic, Boston (1994)
Brooks, S.P., Giudici, P., Roberts, G.O.: Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 65, 3–39 (2003)
Carpaneto, G., Martello, S., Toth, P.: Algorithms and codes for the assignment problem. Ann. Oper. Res. 13, 193–223 (1988)
Carpaneto, G., Toth, P.: Algorithm 548: Solution of the assignment problem. ACM Trans. Math. Softw. 6, 104–111 (1980)
Celeux, G., Hurn, M., Robert, C.P.: Computational and inferential difficulties with mixtures posterior distribution. J. Am. Stat. Assoc. 95, 957–979 (2000)
Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB 2000 Proceedings, pp. 93–103 (2000)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 39, 1–38 (1977)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA 97, 12079–12084 (2000)
Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: Comparison of different approaches. Comput. Stat. Data Anal. 52, 3233–3245 (2008)
Green, P.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995)
Green, P.J.: Trans-Dimensional Markov chain Monte Carlo. In: Green, P.J., Hjord, N.L., Richardson, S. (eds.) Highly Structured Stochastic Systems, pp. 179–198. Oxford University Press, Oxford (2003)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. USA 101, 5228–5235 (2004)
Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67, 123–129 (1972)
Hartigan, J.A.: Bloc voting in the United States senate. J. Classif. 17, 29–49 (2000)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177–196 (2001)
Kaiser, S., Santamaria, R., Sill, M., Theron, R., Quintales, L., Leisch, F.: Biclust: BiCluster Algorithms. R package version 0.9.1. (2009)
Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral Biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13, 703–716 (2003)
Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Stat. Sin. 12, 61–86 (2002)
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2004)
Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–358. Kluwer Academic, Dordrecht (1998)
Nobile, A.: Bayesian finite mixtures: a note on prior specification and posterior computation. Technical report. Department of Statistics, University of Glasgow (2005)
Nobile, A., Fearnside, A.T.: Bayesian finite mixtures with an unknown number of components: The allocation sampler. Stat. Comput. 17, 147–162 (2007)
Phillips, D.B., Smith, A.F.M.: Bayesian model comparison via jump diffusions. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 215–239. Chapman & Hall, London (1996)
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. B 59, 731–792 (1997)
Robert, C.P., Rydén, T., Titterington, D.M.: Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. J. R. Stat. Soc. B 62, 57–76 (2000)
Roberts, G.O.: Markov chain concepts related to sampling algorithms. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 45–58. Chapman & Hall, London (1996)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Sheng, Q., Moreau, Y., Moor, B.D.: Biclustering microarray data by Gibbs sampling. Bioinformatics 19, 196–205 (2003)
Spiegelhalter, D.J., Best, N.G., Gilks, W.R., Inskip, H.: Hepatitis B: a case study in MCMC methods. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 21–44. Chapman & Hall, London (1996)
Stephens, M.: Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods. Ann. Stat. 28, 40–74 (2000)
Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D., Brown, P.: Clustering methods for the analysis of DNA microarray data. Technical report, Stanford University (1999)
van Dijk, B., van Rosmalen, J., Paap, R.: A Bayesian approach to two-mode clustering. Technical report, Econometric Institute Report, Erasmus University Rotterdam (2009)
Wit, E., McClure, J.: Statistics for Microarrays: Design, Analysis and Inference. Wiley, New York (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wyse, J., Friel, N. Block clustering with collapsed latent block models. Stat Comput 22, 415–428 (2012). https://doi.org/10.1007/s11222-011-9233-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-011-9233-4