Classification using Hierarchical Naïve Bayes models

Langseth, Helge; Nielsen, Thomas D.

doi:10.1007/s10994-006-6136-2

Classification using Hierarchical Naïve Bayes models

Published: 03 March 2006

Volume 63, pages 135–159, (2006)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Classification using Hierarchical Naïve Bayes models

Download PDF

Helge Langseth¹^nAff2 &
Thomas D. Nielsen³

2951 Accesses
61 Citations
Explore all metrics

Abstract

Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well-performing set of classifiers is the Naïve Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an instance are conditionally independent given the class of that instance. When this assumption is violated (which is often the case in practice) it can reduce classification accuracy due to “information double-counting” and interaction omission.

In this paper we focus on a relatively new set of models, termed Hierarchical Naïve Bayes models. Hierarchical Naïve Bayes models extend the modeling flexibility of Naïve Bayes models by introducing latent variables to relax some of the independence statements in these models. We propose a simple algorithm for learning Hierarchical Naïve Bayes models in the context of classification. Experimental results show that the learned models can significantly improve classification accuracy as compared to other frameworks.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Binder, J., Koller, D. Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29:2–3, 213–244.
Article Google Scholar
Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Boutilier, C., Friedman, N., Goldszmidt, M., & Koller, D. (1996). Context-specific independence in Bayesian networks. In: Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence. San Francisco, CA. (pp. 115–123), Morgan Kaufmann Publishers.
Chow, C. K., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B 39, 1–38.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:2–3, 103–130.
Article Google Scholar
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: John Wiley & Sons.
Google Scholar
Elidan, G., & Friedman, N. (2001). Learning the dimensionality of hidden variables. In: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. San Francisco, CA. (pp. 144–151), Morgan Kaufmann Publishers.
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuousvalued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence. San Mateo. CA. (pp. 1022–1027), Morgan Kaufmann Publishers.
Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining and Knowledge Discovery, 1:1, 55–77.
Google Scholar
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29:2–3, 131–163.
Article Google Scholar
Greiner, R., Grove, A. J., & Schuurmans, D. (1997). Learning Bayesian nets that perform well. In: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. San Francisco, CA. (pp. 198–207), Morgan Kaufmann Publishers.
Grossman, D., & Domingos, P. (2004). Learning Bayesian network classifiers by maximizing conditional likelihood. In: Proceedings of the Twenty-first International Conference on Machine Learning. Banff, Canada, (pp. 361–368), ACM Press.
Jaeger, M. (2003). Probabilistic classifiers and the concepts they recognize. In: Proceedings of the Twentieth International Conference on Machine Learning. Menlo Park, (pp. 266–273), The AAAI Press.
Jensen, F. V. (2001). Bayesian networks and decision graphs. New York, NY: Springer-Verlag.
Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. San Mateo, CA. (pp. 1137–1143), Morgan Kaufmann Publishers.
Kohavi, R., John, G., Long, R., Manley, D., & K. Pfleger. (1994). MLC++: A machine learning library in C++. In: Proceedings of the Sixth International Conference on Tools with Artificial Intelligence. (pp. 740–743), IEEE Computer Society Press.
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97:1–2, 273–324.
Article Google Scholar
Kononenko, I. (1991). Semi-naive Bayesian classifier. In: Proceedings of Sixth European Working Session on Learning. Porto, Portugal, (pp. 206–219), Springer-Verlag.
Kočka, T., & Zhang, N. L. (2002). Dimension correction for hierarchical latent class models. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence. San Francisco. CA. (pp. 267–274), Morgan Kaufmann Publishers.
Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10:4, 269–293.
Google Scholar
Langley, P. (1993). Induction of recursive Bayesian classifiers. In: Proceedings of the Fourth European Conference on Machine Learning, Vol. 667 of Lecture Notes in Artificial Intelligence. (pp. 153–164), Springer-Verlag.
Langley, P. (1994). Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall symposium on Relevance. The AAAI Press.
Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B, 50:2, 157–224.
Madsen, A. L., & Jensen, F. V. (1998). Lazy propagation in junction trees. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence. (pp. 362–369). Morgan Kaufmann Publishers.
Martin, J. D., & VanLehn, K. (1994). Discrete factor analysis: Learning hidden variables in Bayesian networks. Technical Report LRDC-ONR-94–1, Department of Computer Science, University of Pittsburgh. http://www.pitt.edu/vanlehn/distrib/Papers/Martin.pdf
Mitchell, T. M. (1997). Machine learning. Boston, MA.: McGraw Hill.
Google Scholar
Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52:3, 239–281.
Article Google Scholar
Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In: Advances in Neural Information Processing Systems 15. Vancouver. British Columbia, Canada, (pp. 841–848), The MIT Press.
Pazzani, M. (1996a). Searching for dependencies in Bayesian classifiers. In: Learning from data: Artificial Intelligence and Statistics V. New York, N.Y., (pp. 239–248).
Pazzani, M. J. (1996b), Constructive induction of Cartesian product attributes. In: ISIS: Information, Statistics and Induction in Science. Singapore, (pp. 66–77), World Scientific.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA.: Morgan Kaufmann Publishers.
Google Scholar
Quinlan, R. (1998). C5.0: An informal tutorial. http://www.rulequest.com/see5-unix.html
Rissanen, J. (1978). Modelling by shortest data description. Automatica, 14, 465–471.
Article MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
MATH MathSciNet Google Scholar
Shafer, G. R., & Shenoy, P. P. (1990). Probability propagation. Annals of Mathematics and Artificial Intelligence, 2, 327–352.
Article MathSciNet Google Scholar
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York: Springer-Verlag.
Google Scholar
SPSS Inc. (2002). Clementine v6.5. http://www.spss.com/clementine/
Wettig, H., Grunwald, P., Roos, T., Myllymaki, P., & Tirri, H. (2003). When discriminative learning of Bayesian network parameters is easy. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence. (pp. 491–496), Morgan Kaufmann Publishers.
Whittaker, J. (1990). Graphical models in applied multivariate statistics. Chichester, UK: John Wiley & Sons.
Google Scholar
Zhang, H. (2004a), The optimality of naive Bayes. In: Proceedings of the Seventeenth Florida Artificial Intelligence Research Society Conference. (pp. 562–567), The AAAI Press.
Zhang, N. L. (2004b), Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5:6, 697–723.
Google Scholar
Zhang, N. L., Nielsen, T. D., & Jensen, F. V. (2003). Latent variable discovery in classification models. Artificial Intelligence in Medicine, 30:3, 283–299.
Article Google Scholar

Download references

Author information

Helge Langseth
Present address: SINTEF Technology and Society, N-7465, Trondheim, Norway

Authors and Affiliations

Department of Mathematical Sciences, Norwegian University of Science and Technology, N-7491, Trondheim, Norway
Helge Langseth
Department of Computer Science, Aalborg University, Fredrik Bajers Vej 7E, DK-9220, Aalborg Ø, Denmark
Thomas D. Nielsen

Authors

Helge Langseth
View author publications
You can also search for this author in PubMed Google Scholar
Thomas D. Nielsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helge Langseth.

Additional information

Editor: Peter Flach

Rights and permissions

Reprints and permissions

About this article

Cite this article

Langseth, H., Nielsen, T.D. Classification using Hierarchical Naïve Bayes models. Mach Learn 63, 135–159 (2006). https://doi.org/10.1007/s10994-006-6136-2

Download citation

Received: 01 July 2004
Revised: 09 November 2005
Accepted: 09 November 2005
Published: 03 March 2006
Issue Date: May 2006
DOI: https://doi.org/10.1007/s10994-006-6136-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Classification using Hierarchical Naïve Bayes models

Abstract

Article PDF

Similar content being viewed by others

Hierarchical Classification for Solving Multi-class Problems: A New Approach Using Naive Bayesian Classification

Introducing the Theory of Probabilistic Hierarchical Learning for Classification

Statistical comparison of classifiers through Bayesian hierarchical modelling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classification using Hierarchical Naïve Bayes models

Abstract

Article PDF

Similar content being viewed by others

Hierarchical Classification for Solving Multi-class Problems: A New Approach Using Naive Bayesian Classification

Introducing the Theory of Probabilistic Hierarchical Learning for Classification

Statistical comparison of classifiers through Bayesian hierarchical modelling

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation