Abstract
Classifiers based on emerging patterns are usually more understandable for humans than those based on more complex mathematical models. However, most of the classifiers based on emerging patterns get low accuracy in those problems with imbalanced databases. This problem has been tackled through oversampling or undersampling methods, nevertheless, to the best of our knowledge these methods have not been tested for classifiers based on emerging patterns. Therefore, in this paper, we present an empirical study about the use of oversampling and undersampling methods to improve the accuracy of a classifier based on emerging patterns. We apply the most popular oversampling and undersampling methods over 30 databases from the UCI Repository of Machine Learning. Our experimental results show that using oversampling and undersampling methods significantly improves the accuracy of the classifier for the minority class.
Chapter PDF
Similar content being viewed by others
References
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for credit card fraud: A comparative study. Decision Support Systems 50(3), 602–613 (2011)
Blake, C., Merz, C.J.: {UCI} Repository of machine learning databases. Tech. rep., University of California, Irvine, School of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16(1), 321–357 (2002)
Chawla, N.: Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, US (2010)
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
Dong, G.: Preliminaries. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications, ch. 1. Data Mining and Knowledge Discovery Series, pp. 3–12. Chapman & Hall/CRC, United States of America (2012)
Estabrooks, A., Jo, T., Japkowicz, N.: A Multiple Resampling Method For Learning From Imbalanced Data Sets. Computational Intelligence 20(1), 18–36 (2004)
Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge Discovery in Databases: An Overview. AI Magazine 13(3), 57–70 (1992)
García, S., Herrera, F.: An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)
García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Medina-Pérez, M.A., Ruiz-Shulcloper, J.: LCMine: An efficient algorithm for mining discriminative regularities and its application in supervised classification. Pattern Recognition 43(9), 3025–3034 (2010)
García-Borroto, M., Martínez-Trinidad, J., Carrasco-Ochoa, J.: A survey of emerging patterns for supervised classification. Artificial Intelligence Review 1–17 (2012)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009)
Lenca, P., Lallich, S., Do, T.-N., Pham, N.-K.: A comparison of different off-centered entropies to deal with class imbalance for decision trees. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 634–643. Springer, Heidelberg (2008)
Li, D.C., Liu, C.W., Hu, S.C.: A learning method for the class imbalance problem with medical data sets. Computers in Biology and Medicine 40(5), 509–518 (2010)
Liu, W., Chawla, S., Cieslak, D.A., Chawla, N.V.: A Robust Decision Tree Algorithm for Imbalanced Data Sets. In: SDM 2010, pp. 766–777 (2010)
Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: A Study with Class Imbalance and Random Sampling for a Decision Tree Learning System. In: Bramer, M. (ed.) Artificial Intelligence in Theory and Practice II, vol. 276, pp. 131–140. Springer, Boston (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Loyola-González, O., García-Borroto, M., Medina-Pérez, M.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., De Ita, G. (2013). An Empirical Study of Oversampling and Undersampling Methods for LCMine an Emerging Pattern Based Classifier. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds) Pattern Recognition. MCPR 2013. Lecture Notes in Computer Science, vol 7914. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38989-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-38989-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38988-7
Online ISBN: 978-3-642-38989-4
eBook Packages: Computer ScienceComputer Science (R0)