Abstract
We introduce a new formal model in which a learning algorithm must combine a collection of potentially poor but statistically independent hypothesis functions in order to approximate an unknown target function arbitrarily well. Our motivation includes the question of how to make optimal use of multiple independent runs of a mediocre learning algorithm, as well as settings in which the many hypotheses are obtained by a distributed population of identical learning agents.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Amari, S., Fujita, N., & Shinomoto, S. (1992). Four types of learning curves.Neural Computation, 4:605–618.
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R.E., & Warmuth M.K. (1993). How to use expert advice. InProceedings of the Twenty-Fifth Annual ACM Symposium on the Theory of Computing, pages 382–391.
Dudley, R.M. (1978). Central limit theorems for empirical measures.The Annals of Probability, 6(6):899–929.
Erdelyi, A. (1956).Asymptotic expansions. Dover.
Freund, Y. (1990). Boosting a weak learning algorithm by majority. InProceedings of the Third Annual Workshop on Computational Learning Theory, pages 202–216.
Freund, Y. (1992). An improved boosting algorithm and its implications on learning complexity. InProceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 391–398.
Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications.Information and Computation, 100(1):78–150.
Haussler, D, Kearns, M., & Schapire, R.E. (1994). Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension.Machine Learning, 14:83–113.
Kullback, S. (1967). A lower bound for discrimination information in terms of variation.IEEE Transactions on Information Theory, 13:126–127.
Pollard, D. (1984).Convergence of Stochastic Processes. Springer-Verlag.
Schapire, R.E. (1990). The strength of weak learnability.Machine Learning, 5(2):197–227.
Seung, H.S., Sompolinsky, H. & Tishby, N. Statistical mechanics of learning from examples.Physical Review A, 45(8):6056–6091.
Vapnik, V.N. (1982).Estimation of Dependences Based on Empirical Data. Springer-Verlag.
Vapnik, V.N. & Chervonenkis, A. Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities.Theory of Probability and Its Applications, XVI(2):264–280, 1971.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kearns, M., Seung, H.S. Learning from a population of hypotheses. Mach Learn 18, 255–276 (1995). https://doi.org/10.1007/BF00993412
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00993412