Abstract
We introduce a novel Markov chain Monte Carlo algorithm for estimation of posterior probabilities over discrete model spaces. Our learning approach is applicable to families of models for which the marginal likelihood can be analytically calculated, either exactly or approximately, given any fixed structure. It is argued that for certain model neighborhood structures, the ordinary reversible Metropolis-Hastings algorithm does not yield an appropriate solution to the estimation problem. Therefore, we develop an alternative, non-reversible algorithm which can avoid the scaling effect of the neighborhood. To efficiently explore a model space, a finite number of interacting parallel stochastic processes is utilized. Our interaction scheme enables exploration of several local neighborhoods of a model space simultaneously, while it prevents the absorption of any particular process to a relatively inferior state. We illustrate the advantages of our method by an application to a classification model. In particular, we use an extensive bacterial database and compare our results with results obtained by different methods for the same data.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Altekar G., Dwarkadas S., Huelsenbeck J.P., and Ronquist F. 2004. Parallel metropolis-coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20: 407–415.
Brooks S.P., Giudici P., and Roberts G.O. 2003. Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. J. Roy. Statist. Soc. B 65: 3–39.
Carlin B.P. and Chib S. 1995. Bayesian model choice via Markov-chain Monte Carlo methods. J. Roy. Statist. Soc. B 57: 473–484.
Chib S. and Greenberg E. 1995. Understanding the Metropolis-Hastings algorithm. Amer. Statist. 49: 327–335.
Corander J., Gyllenberg M., and Koski T. 2006. Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy. Submitted to J. Statist. Comput. Simulation.
Corander J., Waldmann P., Marttinen P., and Sillanpää M.J. 2004. BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics 20: 2363–2369.
Diaconis P., Holmes S., and Neal R.M. 2000. Analysis of a nonreversible Markov chain sampler. Ann. App. Prob. 10: 726–752.
Doucet A., Godsill S., and Andrieu C. 2000. On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 10: 197–208.
Farmer J.J., Davis B.R., and Hickmanbrenner F.W. 1985. Biochemical identification of new species and biogroups of Enterobacteriaceae isolated from clinical specimens. J. Clin. Microbiology 21: 46–76.
Geyer C.J. and Thompson E.A. 1995. Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Stat. Assoc. 90: 909–920.
Gidas B. 1985. Nonstationary Markov chains and convergence of the annealing algorithm. J. Statist. Phys. 39: 73–130.
Giudici P. and Castelo R. 2003. Improving Markov chain Monte Carlo search for data mining. Machine Learning 50: 127–158.
Green P. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika82: 711–732.
Gyllenberg H.G., Gyllenberg M., Koski T., Lund T., Schindler J., and Verlaan, M. 1997. Classification of Enterobacteriaceae by minimization of stochastic complexity. Microbiology 143: 721–732.
Gyllenberg H.G., Gyllenberg M., Koski T., and Lund T. 1998. Stochastic complexity as a taxonomic tool. Computer Methods and Programs in Biomedicine 56: 11–22.
Gyllenberg H.G., Gyllenberg M., Koski T., Lund T., and Schindler J. 1999a. An assessment of cumulative classification. Quantitative Microbiology 1: 7–27.
Gyllenberg H.G., Gyllenberg M., Koski T., Lund T., and Schindler J. 1999b. Enterobacteriaceae taxonomy approached by minimization of stochastic complexity. Quantitative Microbiology 1: 157–170.
Gyllenberg H.G., Gyllenberg M., Koski T., Lund T., Mannila H., and Meek C. 1999c. Singling out ill-fit items in a classification. Application to the taxonomy of Enterobacteriaceae. Archives of Control Sciences 9: 97–105.
Gyllenberg M., Koski T., Lund T., and Gyllenberg H.G. 1999. Bayesian predictive identification and cumulative classification of bacteria. Bulletin of Mathematical Biology 61: 85–111.
Häggström O. 2002. Finite Markov Chains and Algorithmic Applications. Cambridge, Cambridge University Press.
Isaacson D.L. and Madsen R.W. 1976. Markov Chains: Theory and Applications, New York, Wiley.
Jensen S.T., Liu S., Zhou Q., and Liu J.S. 2004. Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Stat. Sci. 19: 188–204.
Laskey K.B. and Myers J.W. 2003. Population Markov chain Monte Carlo. Machine Learning 50: 175–196.
Robert C.P. and Casella G. 2005. Monte Carlo Statistical Methods. 2nd edition, New York, Springer.
Schervish M.J. 1995. Theory of Statistics. New York, Springer.
Sisson S.A. 2005. Transdimensional Markov chains: A decade of progress and future perspectives. J. Amer. Stat. Assoc. 100: 1077–1089.
Tierney L.M. 1994. Markov chains for exploring posterior distributions. Ann. Statist. 22: 1701–1728.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Corander, J., Gyllenberg, M. & Koski, T. Bayesian model learning based on a parallel MCMC strategy. Stat Comput 16, 355–362 (2006). https://doi.org/10.1007/s11222-006-9391-y
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11222-006-9391-y