Abstract
Mining in data stream plays a vital role in Big Data analytics. Traffic management, sensor networks and monitoring, weblogs analysis are the application of dynamic environments which generate streaming data. In a dynamic environment, data arrives at high speed and algorithms that process them need to fulfill the constraints on limited memory, computation time, and one-time scan of incoming data. The significant challenge in data stream mining is data distribution changes over a time period which is called concept drifts. So, learning model need to detect the changes and adapt according to that model. By nature, ensemble classifiers are adapting to changes very well and deal the concept drift very well. Three ensemble-based approaches were used to handle the concept drift: online, block-based ensemble, and hybrid approaches. We provide a survey on various ensemble classifiers for learning in data stream mining. Finally, we compare their performance on accuracy, memory, and time on synthetic and real datasets with different drift scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Geisler S, Quix C, Schiffer S, Jarke M (2012) An evaluation framework for traffic information systems based on data streams. Transp Res Part C: Emerg Tech 23:29–55
Cohen L, Avrahami-Bakish G, Last M, Kandel A, Kipersztok O (2008) Real-time data mining of non-stationary data streams from sensor networks. Inf Fusion 9(3):344–353
Delany SJ, Cunningham P, Tsymbal A, Coyle L (2005) A case-based technique for tracking concept drift in electricity filtering. Knowl-Based Syst 187–188
Lane T, Brodley CE (1998) Approaches to online learning and concept drift for user identification in computer security. In: The fourth international conference on knowledge discovery and data mining—KDD-98, 1998, pp 259–263
Garofalakis M, Gehrke J, Rastogi R (2002) Querying and mining data streams: you only get one look a tutorial. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, WI, USA, 2002, p 635
Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Springer Science and Business Media, Kluwer Academic Publishers, London, pp 1–372
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Tsymbal A (2004) The problem of concept drift: definitions and related work, vol 106. Computer Science Department, Trinity College, Dublin, Ireland, Tech. Rep. 2004, pp 1–7
Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):1–14
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drifts adaptation. ACM Comput Surv 46(4):1–44
Oza NC (2005) Online bagging and boosting. In: 2005 IEEE International conference on systems, man and cybernetics, vol 3, Waikoloa, HI, USA, pp 2340–2345
Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Joint European conference on machine learning and knowledge discovery in databases, Barcelona, Spain, pp 135–150
Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, CA, USA, pp 377–382
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, DC, USA, pp 226–235
Brzezinski D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: 6th International conference on hybrid artificial intelligence systems, Wroclaw, Poland, pp 155–159
Brzezinski D, Stefanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Networks Learn Syst 25(1):81–94
Brzezinski D, Stefanowski J (2014) Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf Sci 265:50–67
Sun Y, Wang Z, Liu H, Du C, Yuan J (2016) Online ensemble using adaptive windowing for data streams with concept drift. Int J Distrib Sens Netw 12(5):1–9
Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer Science & Business Media, London, pp 1–1306
Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261
Gama J, Sebastião R, Rodrigues P (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346
Bifet A, Francisci Morales G, Read J, Holmes G, Pfahringer B (2015) Efficient online evaluation of big data stream classifiers. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Sydney, NSW, Australia, pp 59–68
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1603
Bifet A, Holmes G, Pfahringer B, Kranen P, Kremer H, Jansen T, Seidl T (2010) MOA: massive online analysis, a framework for stream classification and clustering. In: Proceedings of the first workshop on applications of pattern analysis, Cumberland Lodge, Windsor, UK, pp 44–48
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Krempl G, Žliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Stefanowski J (2014) Open challenges for data stream mining research. ACM SIGKDD Explor Newsl 16(1):1–10
Krawczyk B, Stefanowski J, Wozniak M (2015) Data stream classification and big data analytics. Neurocomputing 150(PA):238–239
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s)
About this chapter
Cite this chapter
Nagendran, N., Sultana, H.P., Sarkar, A. (2019). A Comparative Analysis on Ensemble Classifiers for Concept Drifting Data Streams. In: Soft Computing and Medical Bioinformatics. SpringerBriefs in Applied Sciences and Technology(). Springer, Singapore. https://doi.org/10.1007/978-981-13-0059-2_7
Download citation
DOI: https://doi.org/10.1007/978-981-13-0059-2_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0058-5
Online ISBN: 978-981-13-0059-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)