Probably Almost Discriminative Learning

Yamanishi, Kenji

doi:10.1023/A:1022870506888

Probably Almost Discriminative Learning

Published: January 1995

Volume 18, pages 23–50, (1995)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Probably Almost Discriminative Learning

Download PDF

Kenji Yamanishi¹

361 Accesses
6 Citations
Explore all metrics

Abstract

This paper develops a new computational model for learning stochastic rules, called PAD (Probably Almost Discriminative)-learning model, based on statistical hypothesis testing theory. The model deals with the problem of designing a discrimination algorithm to test whether or not any given test sequence of examples of pairs of (instance, label) has come from a given stochastic rule P*. Here a composite hypothesis \(\tilde P\)is unknown other than it belongs to a given class \(\mathcal{C}\)

In this model, we propose a new discrimination algorithm on the basis of the MDL (Minimum Description Length) principle, and then derive upper bounds on the least test sample size required by the algorithm to guarantee that two types of error probabilities are respectively less than δ₁ and δ₂ provided that the distance between the two rules to be discriminated is not less than ε.

For the parametric case where \(\mathcal{C}\) is a parametric class, this paper shows that an upper bound on test sample size is given by \(O(\tfrac{1}{\varepsilon }ln\tfrac{1}{{\delta _1 }} + \tfrac{1}{{\varepsilon ^2 }}ln\tfrac{1}{{\delta _2 }} + \tfrac{{\tilde k}}{\varepsilon } + \tfrac{{\tilde k}}{\varepsilon } + \tfrac{{\ell (\tilde M)}}{\varepsilon })\) Here \(\tilde k\) is the number of real-valued parameters for the composite hypothesis \(\tilde P\), and \(\ell (\tilde M)\) is the description length for the countable model for \(\tilde P\). Further this paper shows that the MDL-based discrimination algorithm performs well in the sense of sample complexity efficiency, comparing it with other kinds of information-criteria-based discrimination algorithms. This paper also shows how to transform any stochastic PAC (Probably Approximately Correct)-learning algorithm into a PAD-learning algorithm.

For the non-parametric case where \(\mathcal{C}\) is a non-parametric class but the discrimination algorithm uses a parametric class, this paper demonstrates that the sample complexity bound for the MDL-based discrimination algorithm is essentially related to Barron and Cover's index of resolvability. The sample complexity bound gives a new view at the relationship between the index of resolvability and the MDL principle from the PAD-learning viewpoint.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Barron, A.R., & Cover T.M. (1991). Minimum complexity density estimation. IEEE Trans. on Information Theory, IT-37, 1034–1054.
Google Scholar
Blahut, R.E. (1988). Principle and Practice of Information Theory. Addison-Wesley.
Cover, T.M., & Thomas, J.A. (1991). Elements of Information Theory. Wiley-Interscience.
DeSantis, A., Markowsky, G., & Wegman, M.N. (1988). Learning probabilistic prediction functions. Proceedings of the First Annual Workshop on Computational Learning Theory (pp. 312–328), Morgan Kaufmann.
Gutman, M. (1989). Asymptotically optimal classification for multiple tests with empirically observed statistics. IEEE Trans. on Information Theory, IT-35, 2, 401–408.
Google Scholar
Hand, D.J. (1981). Discrimination and Classification. New York: Wiley.
Google Scholar
Haussler, D., & Barron, A. (1992). How well does the Bayes method work in on-line predictions of {+1, −1}-values. Proceedings of the Third NEC Symposium (pp. 74–100): SIAM.
Haussler, D., & Long P. (1990). A generalization of Sauer's lemma. Technical Report UCSC CRL-90-15, University of California at Santa Cruz.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Jr. Amer. Stat. Assoc., 58, 13–30.
Google Scholar
Hoeffding, W. (1965). Asymptotically optimal test for multinomial distributions. Annals of Mathematical Statistics, 36, 369–400.
Google Scholar
Kearns, M., & Schapire, R. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48, 3, 464–497.
Google Scholar
Kraft, C. (1949). A device for quantizing, grouping, and coding amplitude modulated pulses. M.S. Thesis, Department of Electrical Engineering, MIT, Cambridge, MA.
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.
Google Scholar
Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11, 416–431.
Google Scholar
Rissanen, J. (1987). Stochastic complexity. J.R. Statist. Soc. B, 49, 3, 223–239.
Google Scholar
Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry. World Scientific, Series in Computer Science, 15.
Rissanen, J., & Yu, B. (1991). MDL learning. Progress in Automation and Information Systems, Springer Verlag.
Rivest, R.L. (1987). Learning decision lists. Machine Learning, 2, 229–246.
Google Scholar
Schwarz, G. (1978). Estimation of the dimension of a model. Annals of Statistics, 6, 416–446.
Google Scholar
Shannon, C.E. (1948). A mathematical theory of communications. Bell Syst. Tech. J. 47:147–157.
Google Scholar
Valiant, L.G. (1984). A theory of the learnable. Communications. of the ACM, 27, 1134–1142.
Google Scholar
Wallace, C.S., & Boulton, D.M. (1968). An information measure for classification. Computer Journal, 185–194.
Yamanishi, K. (1991). A loss bound model for on-line stochastic prediction strategies. Proceedings of the Fourth Annual Workshop on Computational Learning Theory (pp. 290–302), Morgan Kaufmann.
Yamanishi, K. (1992a). A learning criterion for stochastic rules. Machine Learning: Special Issues for COLT-90, 9, 165–203.
Google Scholar
Yamanishi, K. (1992b). Probably almost discriminative learning. Proceedings of the Fifth ACM Workshop on Computational Learning Theory (pp. 164–171), ACM Press.
Yamanishi, K. (1993). On polynomial-time probably almost discriminative learnability. Proceedings of the Sixth ACM Conference on Computational Learning Theory (pp. 94–100), ACM Press.
Zeitouni, O., & Gutman, M. (1991). On universal hypothesis testing via large deviations. IEEE Trans. on Information Theory, IT-37, 285–290.
Google Scholar
Ziv, J. (1988). On classification with empirically observed statistics and universal data compression. IEEE Trans. on Information Theory, IT-34, 278–286.
Google Scholar
Ziv, J., & Lempel, A. (1978). Compression of individual sequences via variable-rate coding. IEEE Trans. on Information Theory, IT-24, 530–536.
Google Scholar

Download references

Author information

Authors and Affiliations

NEC Research Institute, Inc, 4 Independence Way, Princeton, NJ, 08540
Kenji Yamanishi

Authors

Kenji Yamanishi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yamanishi, K. Probably Almost Discriminative Learning. Machine Learning 18, 23–50 (1995). https://doi.org/10.1023/A:1022870506888

Download citation

Issue Date: January 1995
DOI: https://doi.org/10.1023/A:1022870506888

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Probably Almost Discriminative Learning

Abstract

Article PDF

Similar content being viewed by others

Finding Probabilistic Rule Lists using the Minimum Description Length Principle

Discovering Rule Lists with Preferred Variables

Modeling PU learning using probabilistic logic programming

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Probably Almost Discriminative Learning

Abstract

Article PDF

Similar content being viewed by others

Finding Probabilistic Rule Lists using the Minimum Description Length Principle

Discovering Rule Lists with Preferred Variables

Modeling PU learning using probabilistic logic programming

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation