Abstract
Most of the approaches for classifying evolving data stream divide the stream into fixed size chunks to address infinite length and concept drift problems. These approaches suffer from trade-off between performance and sensitivity. To address this problem, existing adaptive sliding window techniques determine chunk boundaries dynamically by detecting changes in classifier error rate which requires true labels for all of the data instances. However, true labels are scarce and often delayed in reality. In this paper, we propose an approach which determines dynamic chunk boundaries by detecting significant changes in classifier confidence scores using only limited number of labeled data instances. Moreover, we integrate suitable classification technique with it to propose a complete semi supervised framework which uses dynamic chunk boundaries to address concept drift and concept evolution efficiently. Results from the experiments using benchmark data sets show the effectiveness of our proposed framework in terms of handling both concept drift and concept evolution.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for on-demand classification of evolving data streams. IEEE Transactions on Knowledge and Data Engineering 18(5), 577–589 (2006)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)
Parker, B., Khan, L.: Detecting and tracking concept class drift and emergence in non-stationary fast data streams. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, January 2015
Aggarwal, C.C., Yu, P.S.: On classification of high-cardinality data streams. In: SDM, pp. 802–813. SIAM (2010)
Koychev, I.: Tracking changing user interests through prior-learning of context. In: De Bra, P., Brusilovsky, P., Conejo, R. (eds.) AH 2002. LNCS, vol. 2347, pp. 223–232. Springer, Heidelberg (2002)
Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intell. Data Anal. 8(3), 281–300 (2004)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Bifet, A., Gavald, R.: Learning from time-changing data with adaptive windowing. In: SDM. SIAM (2007)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: A practical approach to classify evolving data streams: Training with limited amount of labeled data. In: ICDM, pp. 929–934 (2008)
Gama, J.A., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)
Song, X., Wu, M., Jermaine, C., Ranka, S.: Statistical change detection for multi-dimensional data. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 667–676, New York. ACM (2007)
Kuncheva, L.I., Faithfull, W.J.: PCA feature extraction for change detection in multidimensional unlabelled data. IEEE Transactions on Neural Networks and Learning Systems (2013)
Harel, M., Mannor, S., El-yaniv, R., Crammer, K.: Concept drift detection through resampling. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), JMLR Workshop and Conference Proceedings, pp. 1009–1017 (2014)
Monteith, K., Martinez, T.: Using multiple measures to predict confidence in instance classification. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, July 2010
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience (1998)
MOA: Moa massive online analysis-real time analytics for data streams repository data sets (2015). http://moa.cms.waikato.ac.nz/datasets/
Reiss, A., Stricker, D.: Introducing a new benchmarked dataset for activity monitoring. In: ISWC, pp. 108–109. IEEE (2012)
Bifet, A., Holmes, G., Pfahringer, B., Kranen, P., Kremer, H., Jansen, T., Seidl, T.: Moa: massive online analysis, a framework for stream classification and clustering. Journal of Machine Learning Research, 44–50 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Haque, A., Khan, L., Baron, M. (2015). Semi Supervised Adaptive Framework for Classifying Evolving Data Stream. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)