Abstract
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aliferis, C. and Cooper, G. (1994). An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pages 8–14. Morgan Kaufmann.
Badsberg, J. (1992). Model search in contingency tables by CoCo. In Dodge, Y. and Whittaker, J., editors, Computational Statistics, pages 251–256. Physica Verlag, Heidelberg.
Becker, S. and LeCun, Y. (1989). Improving the convergence of back-propagation learning with second order methods. In Proceedings of the 1988 Connectionist Models Summer School, pages 29–37. Morgan Kaufmann.
Beinlich, I., Suermondt, H., Chavez, R., and Cooper, G. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the Second European Conference on Artificial Intelligence in Medicine, London, pages 247–256. Springer Verlag, Berlin.
Bernardo, J. (1979). Expected information as expected utility. Annals of Statistics, 7: 686–690.
Bernardo, J. and Smith, A. (1994). Bayesian Theory. John Wiley and Sons, New York.
Buetine, W. (1991). Theory refinement on Bayesian networks. In Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, pages 52–60. Morgan Kaufmann.
Buntine, W. (1993). Learning classification trees. In Artificial Intelligence Frontiers in Statistics: AI and statistics III. Chapman and Hall, New York.
Buntine, W. (1996). A guide to the literature on learning graphical models. IEEE Transactions on Knowledge and Data Engineering, 8: 195–210.
Chaloner, К. and Duncan, G. (1983). Assessment of a beta prior distribution: PM elicitation. The Statistician, 32: 174–180.
Cheeseman, P. and Stutz, J. (1995). Bayesian classification (AutoClass): Theory and results. In Fayyad, U., Piatesky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors, Advances in Knowledge Discovery and Data Mining, pages 153–180. AAAI Press, Menlo Park, CA.
Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90: 1313–1321.
Chickering, D. (1995). A transformational characterization of equivalent Bayesian network structures. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 87–98. Morgan Kaufmann.
Chickering, D. (1996). Learning equivalence classes of Bayesian-network structures. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR. Morgan Kaufmann.
Chickering, D., Geiger, D., and Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. In Proceedings of Fifth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, pages 112–128. Society for Artificial Intelligence in Statistics.
Chickering, D. and Heckerman, D. (Revised November, 1996 ). Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network. Technical Report MSR-TR-96–08, Microsoft Research, Redmond, WA.
Cooper, G. (1990). Computational complexity of probabilistic inference using Bayesian belief networks (Research note). Artificial Intelligence, 42: 393–405.
Cooper, G. and Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9: 309–347.
Cooper, G. and Herskovits, E. (January, 1991 ). A Bayesian method for the induction of probabilistic networks from data. Technical Report SMI-91–1, Section on Medical Informatics, Stanford University.
Cox, R. (1946). Probability, frequency and reasonable expectation. American Journal of Physics, 14: 1–13.
Dagum, P. and Luby, M. (1993). Approximating probabilistic inference in bayesian belief networks is np-hard. Artificial Intelligence, 60: 141–153.
D’Ambrosio, B. (1991). Local expression languages for probabilistic dependence. In Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, pages 95–102. Morgan Kaufmann.
Darwiche, A. and Provan, G. (1996). Query DAGs: A practical paradigm for implementing belief-network inference. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pages 203–210. Morgan Kaufmann.
Dawid, P. (1984). Statistical theory. The prequential approach (with discussion). Journal of the Royal Statistical Society A, 147: 178–292.
Dawid, P. (1992). Applications of a general propagation algorithm for probabilistic expert systmes. Statistics and Computing, 2: 25–36.
de Finetti, B. (1970). Theory of Probability. Wiley and Sons, New York.
Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B 39: 1–38.
DiCiccio, T., Kass, R., Raftery, A., and Wasserman, L. (July, 1995 ). Computing Bayes factors by combining simulation and asymptotic approximations. Technical Report 630, Department of Statistics, Carnegie Mellon University, PA.
Friedman, J. (1995). Introduction to computational learning and statistical prediction. Technical report, Department of Statistics, Stanford University.
Friedman, J. (1996). On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining and Knowledge Discovery, 1.
Friedman, N. and Goldszmidt, М. (1996). Building classifiers using Bayesian networks. In Proceedings AAAI-96 Thirteenth National Conference on Artificial Intelligence, Portland, OR, pages 1277–1284. AAAI Press, Menlo Park, CA.
Frydenberg, M. (1990). The chain graph Markov property. Scandinavian Journal of Statistics, 17: 333–353.
Geiger, D. and Heckerman, D. (Revised February, 1995 ). A characterization of the Dirichlet distribution applicable to learning Bayesian networks. Technical Report MSR-TR-94–16, Microsoft Research, Redmond, WA.
Geiger, D., Heckerman, D., and Meek, C. (1996). Asymptotic model selection for directed networks with hidden variables. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pages 283–290. Morgan Kaufmann.
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6: 721–742.
Gilks, W., Richardson, S., and Spiegelhalter, D. (1996). Markov Chain Monte Carlo in Practice. Chapman and Hall.
Good, I. (1950). Probability and the Weighing of Evidence. Hafners, New York.
Heckerman, D. (1989). A tractable algorithm for diagnosing multiple diseases. In Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence, Windsor, ON, pages 174–181. Association for Uncertainty in Artificial Intelligence, Mountain View, CA. Also in Henrion, M., Shachter, R., Kanal, L., and Lemmer, J., editors, Uncertainty in Artificial Intelligence 5, pages 163–171. North-Holland, New York, 1990.
Heckerman, D. (1995). A Bayesian approach for learning causal networks. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 285–295. Morgan Kaufmann.
Heckerman, D. and Geiger, D. (Revised, November, 1996 ). Likelihoods and priors for Bayesian networks. Technical Report MSR-TR-95–54, Microsoft Research, Redmond,WA
Heckerman, D., Geiger, D., and Chickering, D. (1995a). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197–243.
Heckerman, D., Mamdani, A., and Wellman, M. (1995b). Real-world applications of Bayesian networks. Communications of the ACM, 38.
Heckerman, D. and Shachter, R. (1995). Decision-theoretic foundations for causal reasoning. Journal of Artificial Intelligence Research, 3: 405–430.
H.øjsgaard, S., Skjøth, F., and Thiesson, B. (1994). User’s guide to BIOFROST. Technical report, Department of Mathematics and Computer Science, Aalborg, Denmark.
Howard, R. (1970). Decision analysis: Perspectives on inference, decision, and experimentation. Proceedings of the IEEE, 58: 632–643.
Howard, R. and Matheson, J. (1981). Influence diagrams. In Howard, R. and Matheson, J., editors, Readings on the Principles and Applications of Decision Analysis, volume II, pages 721–762. Strategic Decisions Group, Menlo Park, CA.
Howard, R. and Matheson, J., editors (1983). The Principles and Applications of Decision Analysis. Strategic Decisions Group, Menlo Park, CA.
Humphreys, P. and Freedman, D. (1996). The grand leap. British Journal for the Philosphy of Science, 47: 113–118.
Jaakkola, T. and Jordan, M. (1996). Computing upper and lower bounds on likelihoods in intractable networks. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pages 340–348. Morgan Kaufmann.
Jensen, F. (1996). An Introduction to Bayesian Networks. Springer.
Jensen, F. and Andersen, S. (1990). Approximations in Bayesian belief universes for knowledge based systems. Technical report, Institute of Electronic Systems, Aalborg University, Aalborg, Denmark.
Jensen, F., Lauritzen, S., and Olesen, K. (1990). Bayesian updating in recursive graphical models by local computations. Computational StatisticalsQuarterly, 4: 269–282.
Kass, R. and Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90: 773–795.
Kass, R., Tierney, L., and Kadane, J. (1988). Asymptotics in Bayesian computation. In Bernardo, J., DeGroot, M., Lindley, D., and Smith, A., editors, Bayesian Statistics 3, pages 261–278. Oxford University Press.
Koopman, B. (1936). On distributions admitting a sufficient statistic. Transactions of the American Mathematical Society, 39: 399–409.
Korf, R. (1993). Linear-space best-first search. Artificial Intelligence, 62: 41–78.
Lauritzen, S. (1982). Lectures on Contingency Tables. University of Aalborg Press, Aalborg, Denmark.
Lauritzen, S. (1992). Propagation of probabilities, means, and variances in mixed graphical association models. Journal of the American Statistical Association, 87: 1098–1108.
Lauritzen, S. and Spiegelhalter, D. (1988). Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B, 50: 157–224.
Lauritzen, S., Thiesson, B., and Spiegelhalter, D. (1994). Diagnostic systems created by model selection methods: A case study. In Cheeseman, P. and Oldford, R., editors, AIand Statistics IV, volume Lecture Notes in Statistics, 89, pages 143–152. Springer-Verlag, New York.
Mackay, D. (1992a). Bayesian interpolation. Neural Computation, 4: 415–447.
Mackay, D. (1992b). A practical Bayesian framework for backpropagation networks. Neural Computation, 4: 448–472.
Mackay, D. (1996). Choice of basis for the Laplace approximation. Technical report, Cavendish Laboratory, Cambridge, UK.
Madigan, D., Garvin, J., and Raftery, A. (1995). Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Communications in Statistics: Theory and Methods, 24: 2271–2292.
Madigan, D. and Raftery, A. (1994). Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association, 89: 1535–1546.
Madigan, D., Raftery, A., Volinsky, C., and Hoeting, J. (1996). Bayesian model averaging. In Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, Portland, OR.
Madigan, D. and York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63: 215–232.
Martin, J. and VanLehn, K. (1995). Discrete factor analysis: Learning hidden variables in bayesian networks. Technical report, Department of Computer Science, University of Pittsburgh, PA. Available at http://bert.cs.pitt.edu/ vanlehn.
Meng, X. and Rubin, D. (1991). Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association, 86: 899–909.
Neal, R. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG–TR–93–1, Department of Computer Science, University of Toronto.
Olmsted, S. (1983). On representing and solving decision problems. PhD thesis, Department of Engineering-Economic Systems, Stanford University.
Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29: 241–288.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82: 669–710.
Pearl, J. and Verma, T. (1991). A theory of inferred causation. In Allen, J., Fikes, R., and sandewall, E., editors, Knowledge Representation and Reasoning: Proceedings of the Second International Conference, pages 441–452. Morgan Kaufmann, New York.
Pitman, E. (1936). Sufficient statistics and intrinsic accuracy. Proceedings of the Cambridge Philosophy Society, 32: 567–579.
Raftery, A. (1995). Bayesian model selection in social research. In Marsden, P., editor, Sociological Methodology. Blackwells, Cambridge, MÁ.
Raftery, A. (1996). Hypothesis testing and model selection, chapter 10. Chapman and Hall.
Ramamurthi, К. and Agogino, A. (1988). Real time expert system for fault tolerant supervisory control. In Tipnis, V. and Patton, E., editors, Computers in Engineering, pages 333–339. American Society of Mechanical Engineers, Corte Madera, CA.
Ramsey, F. (1931). Truth and probability. In Braithwaite, R., editor, The Foundations of Mathematics and other Logical Essays. Humanities Press, London. Reprinted in Kyburg and Smolder, 1964.
Richardson, T. (1997). Extensions of undirected and acyclic, directed graphical models. In Proceedings of Sixth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, pages 407–419. Society for Artificial Intelligence in Statistics.
Rissanen, J. (1987). Stochastic complexity (with discussion). Journal of the Royal Statistical Society, Series B, 49:223–239 and 253–265.
Robins, J. (1986). A new approach to causal interence in mortality studies with sustained exposure results. Mathematical Modelling, 7: 1393–1512.
Rubin, D. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6: 34–58.
Russell, S., Binder, J., Koller, D., and Kanazawa, К. (1995). Local learning in probabilistic networks with hidden variables. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, QU, pages 1146–1152. Morgan Kaufmann, San Mateo, CA.
Saul, L., Jaakkola, T., and Jordan, M. (1996). Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4: 61–76.
Savage, L. (1954). The Foundations of Statistics. Dover, New York.
Schervish, M. (1995). Theory of Statistics. Springer-Verlag.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6:461–464.
Sewell, W. and Shah, V. (1968). Social class, parental encouragement, and educational aspirations. American Journal of Sociology, 73: 559–572.
Shachter, R. (1988). Probabilistic inference and influence diagrams. Operations Research, 36: 589–604.
Shachter, R., Andersen, S., and Poh, К. (1990). Directed reduction algorithms and decomposable graphs. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, Boston, MA, pages 237–244. Association for Uncertainty in Artificial Intelligence, Mountain View, CA.
Shachter, R. and Kenley, C. (1989). Gaussian influence diagrams. Management Science, 35: 527–550.
Silverman, B. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
Singh, M. and Provan, G. (November, 1995 ). Efficient learning of selective Bayesian network classifiers. Technical Report M5–CIS–95–36, Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA.
Spetzler, C. and Stael von Holstein, C. (1975). Probability encoding in decision analysis. Management Science, 22: 340–358.
Spiegelhalter, D., Dawid, A., Lauritzen, S., and Cowell, R. (1993). Bayesian analysis in expert systems. Statistical Science, 8: 219–282.
Spiegelhalter, D. and Lauritzen, S. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks, 20: 579–605.
Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, Prediction, and Search. Springer-Verlag, New York.
Spirtes, P. and Meek, C. (1995). Learning Bayesian networks with discrete variables from data. In Proceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU. Morgan Kaufmann.
Suermondt, H. and Cooper, G. (1991). A combination of exact algorithms for inference on Bayesian belief networks. International Journal of Approximate Reasoning, 5: 521–542.
Thiesson, B. (1995?). Accelerated quantification of Bayesian networks with incomplete data. In Proceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU, pages 306–311. Morgan kaufmaiin.
Thiesson, B. (1995b). Score and information for recursive exponential models with incomplete data. Technical report, Institute of Electronic Systems, Aalborg University, Aalborg, Denmark.
Thomas, A., Spiegelhalter, D., and Gilks, W. (1992). Bugs: A program to perform Bayesian inference using Gibbs sampling. In Bernardo, J., Berger, J., Dawid, A., and Smith, A., editors, Bayesian Statistics 4, pages 837–842. Oxford University Press.
Tukey, J. (1977). Exploratory Data Analysis. Addison-Wesley.
Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185: 1124–1131.
Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence, Boston, MA, pages 220–227. Morgan Kaufmann.
Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. John Wiley and Sons.
Winkler, R. (1967). The assessment of prior distributions in Bayesian analysis. American Statistical Association Journal, 62: 776–800.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Heckerman, D. (1998). A Tutorial on Learning with Bayesian Networks. In: Jordan, M.I. (eds) Learning in Graphical Models. NATO ASI Series, vol 89. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5014-9_11
Download citation
DOI: https://doi.org/10.1007/978-94-011-5014-9_11
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-6104-9
Online ISBN: 978-94-011-5014-9
eBook Packages: Springer Book Archive