Abstract
Data streams classification represents an important and challenging task for a wide range of applications. The diffusion of new technologies, such as smartphones and sensor networks, related to communication services introduces new challenges in the analysis of streaming data. The latter requires the use of approaches that require little time and space to process a single item, providing an accurate representation of only relevant data characteristics for keeping track of concept drift. Based on these premises, this paper introduces a set of requirements related to the data streams classification proposing a new adaptive ensemble method. The outlined system employs two distinct structure, for managing both data aggregation and mining features. The latter are represented by a selective ensemble managed with an adaptive behavior. Our approach dynamically updates the threshold value for enabling the models directly involved in the classification step. The system is conceived to satisfy the proposed requirements even in the presence of concept drifting events. Finally, our method is compared with several existing systems employing both synthetic and real data.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the 2003 International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the 2004 International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, pp. 852–863 (2004)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.: On demand classification of data streams. In: Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA, pp. 503–508 (2004)
Baena-Garcia, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Ravalda, R., Morales-Bueno, R.: Early drift detection method. In: International Workshop on Knowledge Discovery from Data Streams (2006)
Bifet, A., Holmes, G., Pfahringer, B., Kirby, R., Gavaldá, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining, pp. 139–148 (2009)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
Chu, F., Zaniolo, C.: Fast and Light Boosting for Adaptive Mining of Data Streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004)
Cohen, L., Avrahami, G., Last, M., Kandel, A.: Info-fuzzy algorithms for mining dynamic data streams. Applied Soft Computing 8(4), 1283–1294 (2008)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, pp. 71–80 (2000)
Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), Williamstown, MA, pp. 106–113 (2001)
Folino, G., Pizzuti, C., Spezzano, G.: Mining Distributed Evolving Data Streams using Fractal GP Ensembles. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 160–169. Springer, Heidelberg (2007)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM SIGMOD Records 34(2), 18–26 (2005)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: SBIA Brazilian Symposium on Artificial Intelligence, pp. 286–295 (2004)
Gama, J., Pinto, C.: Discretization from data streams: applications to histograms and data mining. In: Proceedings of the 2006 ACM Symposium on Applied Computing (SAC 2006), Dijon, France, pp. 662–667 (2006)
Gama, J., Fernandes, R., Rocha, R.: Decision trees for mining data streams. Intelligent Data Analysis 10(1), 23–45 (2006)
Gao, J., Fan, W., Han, J., Yu, P.S.: On appropriate assumptions to mine data streams: Analysis and practice. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, pp. 143–152 (2007)
Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the 2002 Annual ACM Symposium on Theory of Computing (STOC 2002), Montreal, Quebec, Canada, pp. 389–398 (2002)
Grossi, V.: A New Framework for Data Streams Classification. Ph.D. thesis, Supervisor Prof. Franco Turini, University of Pisa (2009), http://etd.adm.unipi.it/theses/available/etd-11242009-124601/
Grossi, V., Turini, F.: Stream mining: a novel architecture for ensemble based classification. Accepted as full paper by Internl. Journ. of Knowl. and Inform. Sys., forthcoming, draft (2011), www.di.unipi.it/~vgrossi
Grossi, V., Turini, F.: A new selective ensemble approach for data streams classification. In: Proceedings of the 2010 International Conference in Artificial Intelligence and Applications (AIA 2010), Innsbruck, Austria, pp. 339–346 (2010)
Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proceedings of the 2001 Annual ACM Symposium on Theory of Computing (STOC 2001), Heraklion, Crete, Greece, pp. 471–475 (2001)
Hulten, G., Spencer, L., Domingos, P.: Mining time changing data streams. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 97–106 (2001)
Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8, 281–300 (2004)
Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine learning (ICML 2005), Bonn, Germany, pp. 449–456 (2005)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8, 2755–2790 (2007)
Oza, N.C., Russell, S.: Online bagging and boosting. In: Proceedings of 8th International Workshop on Artificial Intelligence and Statistics (AISTATS 2001), Key West, FL, pp. 105–112 (2001)
Pfahringer, B., Holmes, G., Kirkby, R.: Handling Numeric Attributes in Hoeffding Trees. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 296–307. Springer, Heidelberg (2008)
Schlimmer, J.C., Granger, R.H.: Beyond incremental processing: Tracking concept drift. In: Proceedings of the 5th National Conference on Artificial Intelligence, Menlo Park, CA, pp. 502–507 (1986)
Scholz, M., Klinkenberg, R.: An ensemble classifier for drifting concepts. In: Proceeding of 2nd International Workshop on Knowledge Discovery from Data Streams, in Conjunction with ECML-PKDD 2005, Porto, Portugal, pp. 53–64 (2005)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 377–382 (2001)
The UCI KDD: University of California: KDD Cup 1999 Data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
The University of Waikato: MOA: Massive Online Analysis (August 2009), http://www.cs.waikato.ac.nz/ml/moa
The University of Waikato: Weka 3: Data Mining Software in Java, Version 3.6, http://www.cs.waikato.ac.nz/ml/weka
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington, DC, pp. 226–235 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grossi, V., Turini, F. (2013). Data Streams Classification: A Selective Ensemble with Adaptive Behavior. In: Filipe, J., Fred, A. (eds) Agents and Artificial Intelligence. ICAART 2011. Communications in Computer and Information Science, vol 271. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29966-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-29966-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29965-0
Online ISBN: 978-3-642-29966-7
eBook Packages: Computer ScienceComputer Science (R0)