Abstract
In this paper we study the problem of constructing histograms from high-speed time-changing data streams. Learning in this context requires the ability to process examples once at the rate they arrive, maintaining a histogram consistent with the most recent data, and forgetting out-date data whenever a change in the distribution is detected. To construct histogram from high-speed data streams we use the two layer structure used in the Partition Incremental Discretization (PiD) algorithm. Our contribution is a new method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference time window, that reflects the distribution observed in the past; and the current time window reflecting the distribution observed in the most recent data. We compare both distributions and signal a change whenever they are greater than a threshold value, using three different methods: the Entropy Absolute Difference, the Kullback-Leibler divergence and the Cosine Distance. The experimental results suggest that Kullback-Leibler divergence exhibit high probability in change detection, faster detection rates, with few false positives alarms.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Gama, J., Pinto, C.: Discretization from Data Streams: applications to Histograms and Data Mining. In: ACM Symposium on Applied Computing, pp. 662–667. ACM Press, New York (2006)
Pestana, D.D., Velosa, S.F.: Introdução á Probabilidade e á Estatística. Fundação Calouste Gulbenkian (2002)
Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8(3), 281–300 (2004)
Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: Langley, P. (ed.) Proceedings of ICML 2000. 17th International Conference on Machine Learning, Stanford, US, pp. 487–494. Morgan Kaufmann Publishers, San Francisco (2000)
Klinkenberg, R., Renz, I.: Adaptive information filtering: Learning in the presence of concept drifts. In: Learning for Text Categorization, pp. 33–40. AAAI Press, Stanford, California, USA (1998)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101 (1996)
Maloof, M., Michalski, R.: Selecting examples for partial memory learning. Machine Learning 41, 27–52 (2000)
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB 2004: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann Publishers Inc., San Francisco (2004)
Berthold, M., Hand, D.: Intelligent Data Analysis - An Introduction. Springer, Heidelberg (1999)
Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams. In: Interface 2006 (Pasadena, CA) Report (2006)
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2006)
Correa, M., de Ramírez, M.J., Bielza, C., Pamies, J., Alique, J.R.: Prediction of surface quality using probabilistic models. In: 7th Congress of the Colombian Association of Automatic, Cali, Colombia, March 21–24, 2007 (2007) (in Spanish) Domingos, P., Hulten, G.: Learning from infinite data in finite time. In: Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, MA (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sebastião, R., Gama, J. (2007). Change Detection in Learning Histograms from Data Streams. In: Neves, J., Santos, M.F., Machado, J.M. (eds) Progress in Artificial Intelligence. EPIA 2007. Lecture Notes in Computer Science(), vol 4874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77002-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-77002-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77000-8
Online ISBN: 978-3-540-77002-2
eBook Packages: Computer ScienceComputer Science (R0)