Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

Greiner, Russell; Su, Xiaoyuan; Shen, Bin; Zhou, Wei

doi:10.1007/s10994-005-0469-0

Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

Published: June 2005

Volume 59, pages 297–322, (2005)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

Download PDF

Russell Greiner¹,
Xiaoyuan Su²,
Bin Shen³ &
…
Wei Zhou⁴

1434 Accesses
61 Citations
Explore all metrics

Abstract

Bayesian belief nets (BN_s) are often used for classification tasks—typically to return the most likely class label for each specified instance. Many BN-learners, however, attempt to find the BN that maximizes a different objective function—viz., likelihood, rather than classification accuracy—typically by first learning an appropriate graphical structure, then finding the parameters for that structure that maximize the likelihood of the data. As these parameters may not maximize the classification accuracy, “discriminative parameter learners” follow the alternative approach of seeking the parameters that maximize conditional likelihood (CL), over the distribution of instances the BN will have to classify. This paper first formally specifies this task, shows how it extends standard logistic regression, and analyzes its inherent sample and computational complexity. We then present a general algorithm for this task, ELR, that applies to arbitrary BN structures and that works effectively even when given incomplete training data. Unfortunately, ELR is not guaranteed to find the parameters that optimize conditional likelihood; moreover, even the optimal-CL parameters need not have minimal classification error. This paper therefore presents empirical evidence that ELR produces effective classifiers, often superior to the ones produced by the standard “generative” algorithms, especially in common situations where the given BN-structure is incorrect.

Article PDF

Efficient parameter learning of Bayesian network classifiers

Article 26 January 2017

Roles Played by Bayesian Networks in Machine Learning: An Empirical Investigation

Logistic Regression Revisited: Belief Function Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Abe, N., Takeuchi, J., & Warmuth, M. (1991). Polynomial learnability of probablistic concepts with respect to the Kullback-Leibler divergence. In Conference on Learning Theory.
Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29, 213–244.
Article Google Scholar
Bishop, C. (1998). Neural networks for pattern recognition. Oxford.
Blake, C., & Merz, C. (2000). UCI repository of machine learning databases, http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Boyd, S., & Vandenberghe. L. (2004). Convex optimization. Cambridge.
Buntine, W. (1996). A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering.
Cerquides J., & de Mántaras, R.L. (2003). Tractable Bayesian learning of tree augmented naïve Bayes models. In International Conference on Machine Learning (pp. 75–82).
Cheng J., & Greiner, R. (1999). Comparing Bayesian network classifiers. In Uncertainty in Artificial Intelligence.
Cheng, J., Greiner, R., Kelly, J., Bell, D., & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137.
Chickering, D. M., Geiger, D., & Heckerman, D. (1994). Learning Bayesian networks is NP-hard. Technical Report MSR-TR-94-17, Microsoft Research.
Chou, W., Juang, B., & Lee, C. (1992). Segmental GPD training of HMM based speech recognizer. In International Conference on Acoustics, Speech and Signal Processing, vol. 1 (pp. 473–476).
Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Tans. on Information Theory (pp. 462–467).
Cooper, G. (1990). The computational complexity of probabilistic inference using Bayesian belief networks. Artificial intelligence, 42.
Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
Google Scholar
Cox, D. R., & Snell, E. J. (1989). Analysis of binary data. Chapman & Hall, London,
Google Scholar
Darwiche, A. (2000). A differential approach to inference in Bayesian networks. In Uncertainty in Artificial Intelligence.
Dasgupta, S. (1997). The sample complexity of Learning fixed-structure Bayesian networks. Machine Learning, 29, 165–180,
Article MATH Google Scholar
Dash D., & Cooper, G. (2002). Exact model averaging with naïve Bayesian classifiers. In International Conference on Machine Learning (pp. 91–98).
Dawid, A. P. (1976). Properties of diagnostic data distributions. Biometrics, 32, 647–658.
MATH Google Scholar
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. (with discussion). J. Royal Statistics Society, Series B, 39.
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. Wiley.
Edwards, D., & Lauritzen, S. (2001). The TM algorithm for maximising a conditional likelihood function. Biometrika, 88, 961–972.
Article MathSciNet Google Scholar
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In International Joint Conferences on Artificial Intelligence.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning Journal, 29, 131–163.
Google Scholar
Greiner, R., Grove, A., & Schuurmans, D.(1997). Learning Bayesian nets that perform well. In Uncertainty in Artificial Intelligence.
Greiner, R., & Zhou, W. (2002). Structural extension to logistic regression: Discriminant parameter learning of belief net classifiers. In American Association of Artificial Intelligence.
Grossman, D., & Domingos, P. (2004). Learning Bayesian network classifiers by maximizing conditional likelihood. In International Conference on Machine Learning.
Hagan, M., Demuth, H., & Beale, M. (1996). Neural network design. Boston, MA, PWS Publishing.
Google Scholar
Heckerman, D. E. (1998). A tutorial on learning with Bayesian networks. In M. I. Jordan (Ed.), Learning in graphical models.
Jaakkola, T., Meila, M., & Jebara, T. (2000). Maximum entropy discrimination. In Neural Information Processing Systems.
Jordan, M. (1995). Why the logistic function? A tutorial discussion on probabilities and neural networks.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence.
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97:1–2.
Article Google Scholar
Kontkanen, P., Myllymäki, P., Silander, T., & Tirri, H. (1999). On supervised selection of Bayesian networks. In Uncertainty in Artificial Intelligence (pp. 334–342).
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning.
Lauritzen, S. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19, 191–201.
Article MATH Google Scholar
Little, J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York, Wiley.
Google Scholar
McCullagh, P., & Nelder, J. (1989). Generalized linear models. Chapman and Hall.
Minka. T. (2001). Algorithms for maximum-likelihood logistic regression. Technical report, CMU CALD, http://www.stat.cmu.edu/∼minka/papers/logreg/minka-logreg.pdf.
Mitchell, T. M. (1997). Machine learning. McGraw-Hill.
Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–281,
Article Google Scholar
Nesterov, Y., & Nemirovskii, A. (1994). Interior-point polynomial methods in convex programming, Society for Industrial and Applied Mathematics.
Ng, A., & Jordan, M. (2001). On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In Neural Information Processing Systems.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo: Morgan Kaufmann.
Google Scholar
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (2002). Numerical recipes in C. Cambridge, http://www.nr.com/.
Ripley, B. (1996). Pattern recognition and neural networks, Cambridge, UK: Cambridge University Press.
Google Scholar
Roos, T., Wettig, H., Grünwald, P., & Myllymäki, P., Tirri, H. (2005). On discriminative Bayesian network classifiers and logistic regression. Machine Learning, 59:3, 269–298.
Google Scholar
Schlüter, R., Macherey, W., Kanthak, S., Ney, H., & Welling, L. (1997). Comparison of optimization methods for discriminative training criteria. In Proc. of European Conference on Speech Communication and Technology.
Shen, B., Su, X., Greiner, R., Musilek, P., & Cheng, C. (2003). Discriminative parameter learning of general Bayesian network classifiers. In International Conference on Tools with Artificial Intelligence.
Sundberg, R. (2002). The convergence rate of the TM algorithm of Edwards and Lauritzen. Biometrika, 89, 478–483.
Article MathSciNet Google Scholar
Van Allen, T., & Greiner, R. (2000). Model selection criteria for learning belief nets: An empirical comparison. In International Conference on Machine Learning (pp. 1047–1054).
Zhou, W. (2002). Discriminative learning of Bayesian net parameters. Master’s thesis, Dept of Computing Science, University of Alberta.

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Edmonton, AB T6G 2H1, Canada
Russell Greiner
Electrical & Computer Engineering, University of Miami, Coral Gables, FL, 33124, USA
Xiaoyuan Su
Department of Computing Science, University of Alberta, Edmonton, AB T6G 2H1, Canada
Bin Shen
School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
Wei Zhou

Authors

Russell Greiner
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyuan Su
View author publications
You can also search for this author in PubMed Google Scholar
Bin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Russell Greiner.

Additional information

Editors:

Pedro Larrañaga, Jose A. Lozano, Jose M. Peña and Iñaki Inza

This paper extends the earlier results that appear in ELR-AAAI02-s and ELR-ICTAI03-s

This e-mail address is available for all problems and questions

Electronic supplemetary material

Appendix (PDF 246 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Greiner, R., Su, X., Shen, B. et al. Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers. Mach Learn 59, 297–322 (2005). https://doi.org/10.1007/s10994-005-0469-0

Download citation

Received: 13 March 2004
Revised: 29 September 2004
Accepted: 26 November 2004
Issue Date: June 2005
DOI: https://doi.org/10.1007/s10994-005-0469-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

Abstract

Article PDF

Similar content being viewed by others

Efficient parameter learning of Bayesian network classifiers

Roles Played by Bayesian Networks in Machine Learning: An Empirical Investigation

Logistic Regression Revisited: Belief Function Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editors:

Electronic supplemetary material

Appendix (PDF 246 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

Abstract

Article PDF

Similar content being viewed by others

Efficient parameter learning of Bayesian network classifiers

Roles Played by Bayesian Networks in Machine Learning: An Empirical Investigation

Logistic Regression Revisited: Belief Function Analysis

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editors:

Electronic supplemetary material

Appendix (PDF 246 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation