Abstract
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Computational intelligence platform for evolving and robust predictive systems, http://infer.eu/ 2012.
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In In PODS, pages 1–16, 2002.
Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. Moa: Massive online analysis. J. Mach. Learn. Res., 99:1601–1604, August 2010.
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavald`a. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pages 139–148, New York, NY, USA, 2009. ACM.
M A Bramer. Automatic induction of classification rules from examples using N-Prism. In Research and Development in Intelligent Systems XVI, pages 99–121, Cambridge, 2000. Springer-Verlag.
M A Bramer. An information-theoretic approach to the pre-pruning of classification rules. In B Neumann M Musen and R Studer, editors, Intelligent Information Processing, pages 201– 212. Kluwer, 2002.
Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. Classification and Regression Trees. Chapman & Hall/CRC, 1 edition, January 1984.
J. Cendrowska. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4):349–370, 1987.
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. In ACM-SIAM Symposium on Discrete Algorithms (SODA 2002), 2002.
Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pages 71–80, New York, NY, USA, 2000. ACM.
Pedro Domingos and Geoff Hulten. A general framework for mining massive data stream. Journal of Computational and Graphical Statistics, 12:2003, 2003.
Mohamed Medhat Gaber. Advances in data stream mining. Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, 2(1):79–85, 2012.
Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy. Mining data streams: a review. SIGMOD Rec., 34(2):18–26, 2005.
Jo˜ao Gama, Raquel Sebasti˜ao, and Pedro Pereira Rodrigues. Issues in evaluation of stream learning algorithms. In Proceedings of the 15th ACM SIGKDD international conference onKnowledge discovery and data mining, KDD ’09, pages 329–338, New York, NY, USA, 2009. ACM.
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann 2001.
Petr Kadlec and Bogdan Gabrys. Architecture for development of adaptive on-line prediction models. Memetic Computing, 1:241–269, 2009.
J. Zico Kolter and Marcus A. Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. J. Mach. Learn. Res., 8:2755–2790, December 2007.
Ross J Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
P. Smyth and R M Goodman. An information theoretic approach to rule induction from databases. 4(4):301–316, 1992.
F. Stahl and M. Bramer. Towards a computationally efficient approach to modular classification rule induction. Research and Development in Intelligent Systems XXIV, pages 357–362, 2008.
F. Stahl and M. Bramer. Computationally efficient induction of classification rules with the pmcri and j-pmcri frameworks. Knowledge-Based Systems, 2012.
F. Stahl and M. Bramer. Jmax-pruning: A facility for the information theoretic pruning of modular classification rules. Knowledge-Based Systems, 29(0):12 – 19, 2012.
W. Nick Street and YongSeog Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’01, pages 377–382, New York, NY, USA, 2001. ACM.
Periasamy Vivekanandan and Raju Nedunchezhian. Mining data streams with concept drifts busing genetic algorithm. Artif. Intell. Rev., 36(3):163–178, October 2011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this paper
Cite this paper
Stahl, F., Gaber, M.M., Salvador, M.M. (2012). eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4739-8_5
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)