Abstract
This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm — the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Provost, F., Aronis, J.: Scaling Up Inductive Learning with Massive Parallelism. Machine Learning, Vol. 23. (1996) 33–46
Provost, F., Jensen, D., Oates, T.: Efficient Progressive Sampling. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. ACM Press, New York (1999) 22–32
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. Proceedings of the Twenty-Second VLDB Conference. Morgan Kaufmann, San Francisco (1996) 544–555
Catlett, J.: Peepholing: Choosing Attributes Efficiently for Megainduction. Proceedings of the Ninth International Conference on Machine Learning. Morgan Kaufmann, San Mateo (1992) 49–54
Cohen, W.: Fast Effective Rule Induction. Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1995) 115–123
Aronis, J., Provost, F.: Increasing the Efficiency of Data Mining Algorithms with Breadth-First Marker Propagation. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1997) 119–122
Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Breiman, L., Freidman, J. H., Olshen, R. A., Stone, C. J.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
Hecht-Nielsen, R.: Neurocomputing. Addison-Wesley, Menlo Park (1990)
Cover, T. M., Hart, P. E.: Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, Vol. 13. (1967) 21–27
Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest — A Framework for Fast Decision Tree Induction. Proceedings of the Twenty-fourth International Conference on Very Large Databases. Morgan Kaufmann, San Mateo (1998)
Moore, A., Lee, M. S.: Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets. Journal of Artificial Intelligence Research, Vol. 8. (1998) 67–91
Chattratichat, J., Darlington, J., Ghanem, M., Guo, Y., Huning, H., Kohler, M., Sutiwaraphun, J., To, H. W., Yang, D.: Large Scale Data Mining: Challenges and Responses. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1997)
Freund, Y., Schapire, R. E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences, Vol. 55. (1997) 95–121
Breiman, L.: Arcing Classifiers. Technical Report 460. Department of Statistics, University of California, Berkeley (1996)
Breiman, L.: Bagging Predictors. Machine Learning, Vol. 24. (1996) 123–140.
Webb, G. (2000). MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning, Vol. 40, (2000) 159–196
Kong, E. B., Dietterich, T. G.: Error-Correcting Output Coding Corrects Bias and Variance. Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Mateo (1995)
Kohavi, R., Wolpert, D. H.: Bias Plus Variance Decomposition for Zero-One Loss Functions. Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1996)
James, G, Hastie, T.: Generalizations of the bias/variance decomposition for prediction error. Technical Report. Department of Statistics, Stanford University (1997)
Friedman, J. H.: On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Data Mining and Knowledge Discovery, Vol. 1. (1997) 55–77
Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, Vol. 36. (1999) 105–142
Blake, C. L., Merz, C. J. UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Department of Information and Computer Science, University of California, Irvine
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brain, D., Webb, G.I. (2002). The Need for Low Bias Algorithms in Classification Learning from Large Data Sets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_6
Download citation
DOI: https://doi.org/10.1007/3-540-45681-3_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive