Understanding Probabilistic Classifiers

Garg, Ashutosh; Roth, Dan

doi:10.1007/3-540-44795-4_16

Ashutosh Garg³ &
Dan Roth³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2167))

Included in the following conference series:

European Conference on Machine Learning

5758 Accesses
19 Citations

Abstract

Probabilistic classifiers are developed by assuming generative models which are product distributions over the original attribute space (as in naive Bayes) or more involved spaces (as in general Bayesian networks). While this paradigm has been shown experimentally successful on real world applications, despite vastly simplified probabilistic assumptions, the question of why these approaches work is still open.

This paper resolves this question. We show that almost all joint distributions with a given set of marginals (i.e., all distributions that could have given rise to the classifier learned) or, equivalently, almost all data sets that yield this set of marginals, are very close (in terms of distributional distance) to the product distribution on the marginals; the number of these distributions goes down exponentially with their distance from the product distribution. Consequently, as we show, for almost all joint distributions with this set of marginals, the penalty incurred in using the marginal distribution rather than the true one is small. In addition to resolving the puzzle surrounding the success of probabilistic classifiers our results contribute to understanding the tradeoffs in developing probabilistic classifiers and will help in developing better classifiers.

Download to read the full chapter text

Chapter PDF

Pseudo-Likelihoods for Bayesian Inference

Supervised Classification Using Hybrid Probabilistic Decision Graphs

A General Approach to Probabilistic Data Mining

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley and Sons, 1991.
Google Scholar
L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Applications of Mathematics. Springer Verlag, 1996.
Google Scholar
P. Domingos and Pazzani. M. Beyond independence: Conditions for the optimality of simple bayesian classifier. Machine Learning, 29:103–130, 1997.
Article MATH Google Scholar
C. Elkan. Boosting and naive bayesian learning. Technical Report CS97-557, Department of Computer Science, University of California, San Diego, 1997.
Google Scholar
M Feder and N. Merhav. Relation between entropy and error probability. IEEE Trans. on Information Theory, 40:259–266, 1994.
Article MATH Google Scholar
J. H. Friedman. On bias, variance, 0/1-loss and curse-of-dimensionality. Data Mining and Knowledge Discovery, 55, 1997.
Google Scholar
N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Machine Learning, 29: 131–163, 1997.
Article MATH Google Scholar
A. Garg and D. Roth. Understanding probabilistic classifiers. Technical Report UIUCDCSR-2001-2206, UIUC Computer Science Department, March 2001.
Google Scholar
A. R. Golding. A Bayesian hybrid method for context-sensitive spelling correction. In Proceedings of the 3rd workshop on very large corpora, ACL-95, 1995.
Google Scholar
D. Roth. Learning in natural language. In Proc. of the International Joint Conference of Artificial Intelligence, pages 898–904, 1999.
Google Scholar
H. Schneiderman and T. Kanade. A statistical method for 3D object detection applied to faces and cars. In The IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 746–751, 2000.
Google Scholar
L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, November 1984.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and the Beckman Institute, University of Illinois, Urbana, IL., 61801, USA
Ashutosh Garg & Dan Roth

Authors

Ashutosh Garg
View author publications
You can also search for this author in PubMed Google Scholar
Dan Roth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Department of Computer Science, University of Bristol, Merchant Ventures Bldg., Woodland Road, Bristol, BS8 1UB, UK
Peter Flach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garg, A., Roth, D. (2001). Understanding Probabilistic Classifiers. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_16

Download citation

DOI: https://doi.org/10.1007/3-540-44795-4_16
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Understanding Probabilistic Classifiers

Abstract

Chapter PDF

Similar content being viewed by others

Pseudo-Likelihoods for Bayesian Inference

Supervised Classification Using Hybrid Probabilistic Decision Graphs

A General Approach to Probabilistic Data Mining

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Understanding Probabilistic Classifiers

Abstract

Chapter PDF

Similar content being viewed by others

Pseudo-Likelihoods for Bayesian Inference

Supervised Classification Using Hybrid Probabilistic Decision Graphs

A General Approach to Probabilistic Data Mining

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation