Abstract
Bayesian networks (BN) and Bayesian classifiers (BC) are traditional probabilistic techniques that have been successfully used by various machine learning methods to help solving a variety of problems in many different domains. BNs (and BCs) can be considered a probabilistic graphical language suitable for inducing models from data aiming at knowledge representation and reasoning about data domains. The main goal of this chapter is the empirical investigation of a few roles played by BCs in machine learning related processes namely (i) data pre-processing (feature selection and imputation), (ii) learning and (iii) postprocessing (rule generation). By doing so the chapter contributes with organizing, specifying and discussing the many different ways Bayes-based concepts can successfully be employed in automatic learning.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Bayesian Network
- Class Variable
- Bayesian Classifier
- Feature Subset Selection
- Probabilistic Graphical Model
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abellán, J., Gómez-Olmedo, M., Moral, S.: Some variations on the PC algorithm. In: Proc. of The 3rd European Workshop on Probabilistic Graphical Models (PGM 2006), Prague, pp. 1–8 (2006)
Anderson, R.L.: Missing plot techniques. Biometrics 2, 41–47 (1946)
Antal, P., Hullám, G., Gézsi, A., Millinghoffer, A.: Learning complex Bayesian network features for classification. In: Proc. of The 3rd European Workshop on Probabilistic Graphical Models, pp. 9–16 (2006)
Antal, P., Millinghoffer, A., Hullam, G., Szalai, C., Falus, A.: A Bayesian view of challenges in feature selection: multilevel analysis, feature aggregation, multiple targets, redundancy and interaction. In: Journal of Machine Learning Research: Workshop and Conference Proceedings, vol. 4, pp. 74–89 (2008)
Batista, G.E.A.P., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 17(5-6), 519–534 (2003)
Beinlich, I., Suermondt, H.J., Chavez, R.M., Cooper, G.F.: The ALARM monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proc. of the 2nd European Conference on Artificial Intelligence in Medicine, London, UK, vol. 38, pp. 247–256 (1989)
Ben-Gal, I.: Bayesian networks. In: Ruggeri, F., Faltin, F., Kenett, R. (eds.) Encyclopedia of Statistics in Quality & Reliability. Wiley & Sons (2007)
Bilmes, J.: A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, University of Berkeley, ICSI-TR-97-021 (1997)
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence, 245–271 (1997)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: CART: Classification and Regression Trees. Chapman & Hall, Wadsworth (1983)
Bressan, G.M., Oliveira, V.A., Hruschka Jr., E.R., Nicoletti, M.C.: Using Bayesian networks with rule extraction to infer the risk of weed infestation in a corn-crop. Engineering Applications of Artificial Intelligence 22, 579–592 (2009)
Brown, L.E., Tsamardinos, I.: Markov blanket-based variable selection in feature space. Technical Report DSL TR-08-01, Department of Biomedical Informatics, Vanderbilt University (2008)
Chajewska, U., Halpern, J.Y.: Defining explanation in probabilistic systems. In: Proc. of Conference of Uncertainty in Artificial Intelligence, Providence, RI, pp. 62–71 (1997)
Cheng, J., Bell, D.A., Liu, W.: Learning belief networks from data: an information theory based approach. In: Proc. of The 6th ACM International Conference on Information and Knowledge Management, pp. 325–331 (1997)
Cheng, J., Greiner, R.: Comparing Bayesian network classifiers. In: Proc. of The 15th Conference on Uncertainty in Artificial Intelligence, pp. 101–107 (1999)
Cheng, J., Greiner, R.: Learning Bayesian Belief Network Classifiers: Algorithms and System. In: Stroulia, E., Matwin, S. (eds.) Canadian AI 2001. LNCS (LNAI), vol. 2056, pp. 141–151. Springer, Heidelberg (2001)
Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning Bayesian networks from data: an information-theory based approach. Artificial Intelligence 137(1), 43–90 (2002)
Chickering, D.M.: Learning Bayesian networks is NP-complete. In: Fisher, D., Lenz, A. (eds.) Learning from Data: Artificial Intelligence and Statistics V, pp. 121–130. Springer (1996)
Chickering, D.M.: Optimal structure identification with greedy search. Journal of Machine Learning Research 3, 507–554 (2002)
Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks (research note). Artificial Intelligence 42(2-3), 393–405 (1990)
Cooper, G., Herskovitz, E.: A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9, 309–347 (1992)
Cooper, G.F.: NESTOR: A computer-based medical diagnostic aid that integrates causal and probabilistic knowledge. PhD thesis, Medical Information Sciences, Stanford University, Stanford, CA (1984)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, 1–39 (1977)
Díez, F.J., Mira, J., Iturralde, E., Zubillaga, S.: Diaval, a Bayesian expert system for echocardiography. Artificial Intelligence in Medicine 10(1), 59–73 (1997)
Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. John Wiley & Sons (1973)
Druzdzel, M.J.: Qualitative verbal explanations in Bayesian belief networks. Artificial Intelligence and Simulation of Behaviour Quarterly 94, 43–54 (1996)
Druzdzel, M.J.: SMILE: Structural modeling, inference, and learning engine and GeNIe: A development environment for graphical decision-theoretic models. In: Proc. of the 16th National Conference on Artificial Intelligence, Orlando, FL, pp. 902–903 (1999)
Duch, W., Adamczak, R., Grabczewski, K.: A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 11(2), 1–31 (2000)
Fast, A., Jensen, D.: Constraint relaxation for learning the structure of Bayesian networks. Technical Report 09-18, Computer Science Department, University of Massachusetts, Amherst (2009)
Fayyad, U.M., Shapiro, G.P., Smyth, P.: From data mining to knowledge discovery: an overview. In: Fayyad, et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–37. MIT Press (1996)
Frank, A., Asuncion, A.: UCI Machine Learning Repository. School of Information and Computer Science. University of California, Irvine (2010), http://archive.ics.uci.edu/ml
Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian network to analyze expression data. Journal of Computational Biology 7, 601–620 (2000)
Friedman, N.: Inferring cellular networks using probabilistic graphical models. Science 303, 799–805 (2004)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)
Friedman, N., Goldszmidt, M.: Building classifiers using Bayesian networks. In: Proc. of the AAAI 1996, vol. 2, pp. 1277–1284 (1996)
Friedman, H.F., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proc. of the 13th National Conference on Artificial Intelligence, pp. 717–724. AAAI Press/MIT Press, Cambridge, MA (1996)
Fu, F.S., Demarais, M.C.: Markov blanket based feature selection: a review of past decade. In: Proc. of the World Congress on Engineering (WCE 2010), London, UK, pp. 321–328 (2010)
Ghahramami, Z., Jordan, M.: Learning from incomplete data. Technical Report AI Lab Memo no. 1509, CBCL paper no. 108. MIT AI Lab. (1995)
Guo, H., Hsu, W.: A survey on algorithms for real-time Bayesian network inference. In: Proc. of The AAAI-02/KDD-02/UAI-02 Joint Workshop on Real-Time Decision Support and Diagnosis Systems, Edmonton, Alberta, Canada (2002)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Heckerman, D.: Bayesian networks for data mining. Data Mining and Knowledge Discovery Journal 1(1), 79–119 (1997)
Heckerman, D., Geiger, D.: Learning Bayesian networks: a uni. cation for discrete and Gaussian domains. In: Proc. 11th Conference on Uncertainty in Artificial Intelligence (UAI 1995), pp. 274–284 (1995)
Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research 1(1), 49–75 (2000)
Henrion, M., Druzdzel, M.J.: Qualitative propagation and scenario-based approaches to explanation of probabilistic reasoning. In: Proc. of 6th Conference on Uncertainty in Artificial Intelligence, Cambridge, MA, pp. 17–32 (1990)
Horvitz, E., Breese, J., Heckerman, D., Hovel, D., Rommelse, K.: The Lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In: Proc. of the 14th Conference on Uncertainty in Artificial Intelligence, Madison, WI, pp. 256–265. Morgan Kaufmann, San Francisco (1998)
Hruschka Jr., E.R., Nicoletti, M.C., Oliveira, V., Bressan, G.: BayesRule: a Markov-blanket based procedure for extracting a set of probabilistic rules from Bayesian classifiers. Int. Journal of Hybrid Intelligent Systems 76(2), 83–96 (2008)
Hruschka, E.R., Garcia, A., Hruschka Jr., E.R., Ebecken, N.F.F.: On the influence of imputation in classification: practical issues. Journal of Experimental and Theoretical Artificial Intelligence 21, 43–58 (2009)
Hruschka Jr., E.R., Hruschka, E.R., Ebecken, N.F.F.: Bayesian networks for imputation in classification problems. Journal of Intelligent Information Systems 29, 231–252 (2007)
Hruschka Jr., E.R., Hruschka, E.R., Ebecken, N.F.F.: Feature Selection by Bayesian Networks. In: Tawfik, A.Y., Goodwin, S.D. (eds.) Canadian AI 2004. LNCS (LNAI), vol. 3060, pp. 370–379. Springer, Heidelberg (2004)
Hruschka Jr., E.R., Ebecken, N.F.F.: Missing values prediction with K2. Intelligent Data Analysis Journal (IDA) 6(6), 557–566 (2002)
Hruschka Jr., E.R., Ebecken, N.F.F.: Ordering attributes for missing values prediction and data classification. In: Data Mining III - Management Information Systems Series, 6th edn., WIT Press, Southampton (2002)
Hruschka, E.R., Hruschka Jr., E.R., Ebecken, N.F.F.: Evaluating a Nearest-Neighbor Method to Substitute Continuous Missing Values. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 723–734. Springer, Heidelberg (2003)
Husmeier, D., Dybowski, R., Roberts, S. (eds.): Probabilistic modeling in bioinformatics and medical informatics. Springer, London (2005)
Inza, I., Larrañaga, P., Etxeberia, R., Sierra, B.: Feature subset selection by Bayesian networks based optimization. Artificial Intelligence 123(1-2), 157–184 (2000)
Inza, I., Larrañaga, P., Sierra, B.: Feature subset selection by Bayesian networks: a comparison with genetic and sequential algorithms. International Journal of Approximate Reasoning 27, 143–164 (2001)
Jansen, R., et al.: A Bayesian network approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003)
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proc. of the 11th International Conference on Machine Learning, pp. 121–129 (1994)
Jordan, M., Xu, L.: Convergence results for the EM approach to mixtures of experts architectures. Neural Networks 8, 1409–1431 (1996)
Kalisch, M., Bühlmann, P.: Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research 8, 613–636 (2007)
Kohavi, R., Becker, B., Sommerfield, D.: Improving simple Bayes. In: van Someren, M., Widmer, G. (eds.) Poster papers of the ECML 1997, pp. 78–87. Charles University, Prague (1997)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proc. of the 13th International Conference on Machine Learning, pp. 284–292 (1996)
Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. Journal of the American Statistical Association 89(425), 278–288 (1994)
Kononenko, I., Bratko, I., Roskar, E.: Experiments in automatic learning of medical diagnostic rules. Technical Report, Jozef Stefan Institute, Ljubjana (1984)
Lacave, C., Díez, F.: A review of explanation methods for Bayesian networks. The Knowledge Engineering Review 17(2), 107–127 (2002)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)
Lam, W., Bacchus, E.: Using causal information and local measures to learn Bayesian networks. In: Proceedings of 9th Conference on Uncertainty in Artificial Intelligence, Washington, DC, pp. 243–250 (1993)
Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proc. of the AAAI 1992, pp. 223–228 (1992)
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proc. of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann Publishers, Seattle (1994)
Lauritzen, S.L.: Some modern applications of graphical models. In: Green, P.J., Hjort, N.L., Richardson, S. (eds.) Highly Structured Stochastic Systems. Oxford University Press (2003)
Lauritzen, S., Spiegelhalter, D.: Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society B 50, 157–224 (1988)
Little, R., Rubin, D.B.: Statistical analysis with missing data. John Wiley & Sons, New York (1987)
Liu, H., Motoda, H.: Feature selection for knowledge discovery and data mining. Kluwer Academic (1998)
Lobo, O.O., Noneao, M.: Ordered estimation of missing values for propositional learning. Journal of the Japanese Society for Artificial Intelligence 15(1), 162–168 (2000)
Madden, M.G.: Evaluation of the performance of the Markov blanket Bayesian classifier algorithm. Technical Report No. NUIG-IT-011002, NUI Galway, Ireland (2002)
Mitchell, T.: Machine learning. The McGraw-Hill Companies, Inc. (1997)
Moore, A.: Data Mining Tutorials (2011), http://www.autonlab.org/tutorials/
Murphy, K.: A brief introduction to graphical models and Bayesian networks (1998), http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html
Neapolitan, R.E.: Learning Bayesian networks. Prentice Hall (2003)
Nicoletti, M.C.: The feature subset selection problem in machine learning – Talk presented at The Seventh International Conference on Intelligent Systems Design and Applications, Rio de Janeiro, Brazil (2007) (unpublished)
Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers, San Mateo (1988)
Pearl, J.: Causality: models, reasoning, and inference. Cambridge University Press (2000)
Pourret, O., Nai, P., Marcot, B.: Bayesian networks: a practical guide to applications. Wiley, Chichester (2008)
Preece, A.D.: Iterative procedures for missing values in Experiments. Technometrics 13, 743–753 (1971)
Pyle, D.: Data preparation for data mining. Academic Press, San Diego (1999)
Quinlan, J.R.: C4.5 program for machine learning. Morgan Kaufmann, San Francisco (1993)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Redner, R., Walker, H.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26(2), 195–239 (1984)
Reunanen, J.: Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research 3, 1371–1382 (2003)
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)
Rubin, D.B.: Formalizing subjective notion about the effects of nonrespondents in samples surveys. Journal of the American Statistical Association 72, 538–543 (1977)
Rubin, D.B.: Multiple imputation for non-responses in surveys. John Wiley & Sons, New York (1987)
Russel, S., Norvig, P.: Artificial intelligence: a modern approach. Prentice Hall Series in Artificial Intelligence (1995)
Sachs, K., Perez, O., Pe’er, D., Lauffenburguer, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 523–529 (2005)
Santos, E.B., Hruschka Jr., E.R., Nicoletti, M.C.: Conditional independence based learning of Bayesian classifiers guided by a variable ordering genetic search. In: Proc. of CEC 2007, vol. 1, pp. 1–10. IEEE Press, Los Alamitos (2007)
Schllimmer, J.C.: Concept acquisition through representational adjustment. Doctoral Dissertation, Department of Information and Computer Science. University of California, Irvine (1987)
Schafer, J.L.: Analysis of incomplete multivariate data. Chapman & Hall/CRC, Boca Raton (2000)
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychological Methods 7(2), 147–177 (2002)
Sebastiani, P., Yu, Y.-H., Ramoni, M.F.: Bayesian machine learning and its potential applications to the genomic study of oral oncology. Advances in Dental Research 17, 104–108 (2003)
Spiegelhalter, D.J., Lauritzen, S.L.: Sequential updating of conditional probability on direct graphical structures. Networks 20, 576–606 (1990)
Spirtes, P., Glymour, C., Scheines, R.: Causation, predication, and search. Springer, New York (1993)
Spirtes, P., Meek, C.: Learning Bayesian networks with discrete variables from data. In: KDD 1995, pp. 294–299 (1995)
Suzuki, J.: A construction of Bayesian networks from databases based on an MDL scheme. In: Proc. of 9th Conference on Uncertainty in Artificial Intelligence, Washington, DC, pp. 266–273 (1993)
Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association 82, 528–550 (1987)
Troyanskaya, O.G., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
White, A.P.: Probabilistic induction by dynamic path generation in virtual trees. In: Bramer, M.A. (ed.) Research and Development in Expert Systems III, pp. 35–46. Cambridge University Press (1987)
Witten, I.H., Frank, E.: Data mining – practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publishers, USA (2000)
Wu, C.F.J.: On the convergence properties of the EM algorithm. The Annals of Statistics 11(1), 95–103 (1983)
Zeng, Y., Luo, J., Lin, S.: Classification using Markov blanket for feature selection. In: Proc. of The International Conference on Granular Computing (GrC 2009), pp. 743–747 (2009)
Zio, M.D., Scanu, M., Coppola, L., Luzi, O., Ponti, A.: Bayesian networks for imputation. Journal of the Royal Statistical Society, Series A (Statistics in Society) 167(2), 309–322 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hruschka, E.R., do Carmo Nicoletti, M. (2013). Roles Played by Bayesian Networks in Machine Learning: An Empirical Investigation. In: Ramanna, S., Jain, L., Howlett, R. (eds) Emerging Paradigms in Machine Learning. Smart Innovation, Systems and Technologies, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28699-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-28699-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28698-8
Online ISBN: 978-3-642-28699-5
eBook Packages: EngineeringEngineering (R0)