Abstract
Hoeffding trees are state-of-the-art in classification for data streams. They perform prediction by choosing the majority class at each leaf. Their predictive accuracy can be increased by adding Naive Bayes models at the leaves of the trees. By stress-testing these two prediction methods using noise and more complex concepts and an order of magnitude more instances than in previous studies, we discover situations where the Naive Bayes method outperforms the standard Hoeffding tree initially but is eventually overtaken. The reason for this crossover is determined and a hybrid adaptive method is proposed that generally outperforms the two original prediction methods for both simple and complex concepts as well as under noise.
Chapter PDF
Similar content being viewed by others
References
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Domingos, P., Hulten, G.: Mining high-speed data streams. Knowledge Discovery and Data Mining, 71–80 (2000)
Gama, J., Medas, P., Rocha, R.: Forest trees for on-line data. In: SAC 2004: Proceedings of the 2004 ACM symposium on Applied computing, pp. 632–636. ACM Press, New York (2004)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 523–528. ACM Press, New York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Holmes, G., Kirkby, R., Pfahringer, B. (2005). Stress-Testing Hoeffding Trees. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_50
Download citation
DOI: https://doi.org/10.1007/11564126_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)