Scaling Up a Boosting-Based Learner via Adaptive Sampling

Domingo, Carlos; Watanabe, Osamu

doi:10.1007/3-540-45571-X_37

Carlos Domingo⁴ &
Osamu Watanabe⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1805))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1756 Accesses
11 Citations

Abstract

In this paper we present a experimental evaluation of a boosting based learning system and show that can be run efficiently over a large dataset. The system uses as base learner decision stumps, single atribute decision trees with only two terminal nodes. To select the best decision stump at each iteration we use an adaptive sampling method. As a boosting algorithm, we use a modification of AdaBoost that is suitable to be combined with a base learner that does not use all the dataset. We provide experimental evidence that our method is as accurate as the equivalent algorithm that uses all the dataset but much faster.

Thanks to the European Commission for their generous support via a EU S&T fellowship programme.

Supported in part by the Ministry of Education, Science, Sports and Culture of Japan, Grant-in-Aid for Scientific Research on Priority Areas (Discovery Science).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bauer, E. and Kohavi, R. 1998. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, 1–38, 1998.
Google Scholar
Dietterich, T., 1998. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning, 32:1–22.
Google Scholar
Domingo, C, Gavaldà, R. and Watanabe, R., 1998. Practical Algorithms for On-line Selection. Proceedings of the First International Conference on Discovery Science, DS’98. Lecture Notes in Artificial Intelligence 1532:150–161.
Google Scholar
Domingo, C, Gavaldà, R. and Watanabe, O., 1999. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Proceedings of the Second International Conference on Discovery Science, DS’99. Lecture Notes on Artificial Intelligence, 1721, pp. 172–183.
Google Scholar
Domingo, C. and Watanabe, O., 1999. MadaBoost: A modification of AdaBoost. Tech Rep. C-133, Dept. of Math and Computing Science, Tokyo Institute of Technology. URL: http://www.is.titech.ac.jp/~carlos.
Domingo, C. and Watanabe, O., 1999. Experimental evaluation of a modification of AdaBoost for the filtering framework. Tech Rep. C-139, Dept. of Math and Computing Science, Tokyo Institute of Technology. URL: http://www.is.titech.ac.jp/~carlos.
P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. Machine Learning, 29:2, 103–130, 1997.
Article MATH Google Scholar
Dougherty, J., Kohavi, R., and Sahami, M., 1995. Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference on Machine Learning.
Google Scholar
Fayad, U.M. and Irani, K.B., 1993. Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027.
Google Scholar
Freund, Y., 1995. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285.
Article MATH MathSciNet Google Scholar
Freund, Y., and Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. JCSS, 55(1):119–139.
MATH MathSciNet Google Scholar
Freund, Y., and Schapire, R.E., 1997. Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning, 148–146.
Google Scholar
R.C. Holte. Very simple classification rules perform well on most common datasets. Machine Learning, 11:63–91, 1993.
Article MATH Google Scholar
John, G. H. and Langley, P., 1996. Static Versus Dynamic Sampling for Data Mining. Proc. of the Second International Conference on Knowledge Discovery and Data Mining, AAAI/MIT Press.
Google Scholar
Keogh, E., Blake, C. and Merz, C.J., 1998. UCI repository of machine learning databases, [http://www.ics.uci.edu/mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
Google Scholar
Lipton, R. J. and Naughton, J. F., 1995. Query Size Estimation by Adaptive Sampling. Journal of Computer and System Science, 51:18–25.
Article MATH MathSciNet Google Scholar
Lipton, R. J., Naughton, J. F., Schneider, D. A. and Seshadri, S., 1995. Efficient sampling strategies for relational database operations. Theoretical Computer Science, 116:195–226.
Article MathSciNet Google Scholar
Provost, F., Jensen, D. and Oates, T., 1999. Efficient Progressive Sampling. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining.
Google Scholar
Quinlan, J. R., 1996. Bagging, Boosting and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence, AAAI Press and the MIT Press, pp. 725–723.
Google Scholar
Quinlan, J. R., 1993. C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo, California.
Google Scholar
Schapire, R. E., 1990. The strength of weak learnability. Machine Learning, 5(2):197–227.
Google Scholar
Wald, A., 1947. Sequential Analysis. Wiley Mathematical, Statistics Series.
Google Scholar
Watanabe, O., 1999. From Computational Learning Theory to Discovery Science. Proc. of the 26th International Colloquim on Automata, Languages and Programming, ICALP’99 Invited talk. Lecture Notes in Computer Science 1644:134–148.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Math. and Comp. Science, Tokyo Institute of Technology, Meguro-ku, Ookayama, Tokyo, Japan
Carlos Domingo & Osamu Watanabe

Authors

Carlos Domingo
View author publications
You can also search for this author in PubMed Google Scholar
Osamu Watanabe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Systems Management, Universiy of Tsukuba, 3-29-1 Otsuka, Bunkyo-ku, Tokyo, 112-0012, Japan
Takao Terano
Department of Computer Science and Engineering, Arizona State University, P.O. Box 875 406, Tempe, AZ, 85287-5406
Huan Liu
Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan ROC
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Domingo, C., Watanabe, O. (2000). Scaling Up a Boosting-Based Learner via Adaptive Sampling. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_37

Download citation

DOI: https://doi.org/10.1007/3-540-45571-X_37
Published: 24 March 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67382-8
Online ISBN: 978-3-540-45571-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics