An Empirical Study of Oversampling and Undersampling Methods for LCMine an Emerging Pattern Based Classifier

Loyola-González, Octavio; García-Borroto, Milton; Medina-Pérez, Miguel Angel; Martínez-Trinidad, José Fco.; Carrasco-Ochoa, Jesús Ariel; De Ita, Guillermo

doi:10.1007/978-3-642-38989-4_27

Octavio Loyola-González^20,21,
Milton García-Borroto²⁰,
Miguel Angel Medina-Pérez²¹,
José Fco. Martínez-Trinidad²¹,
Jesús Ariel Carrasco-Ochoa²¹ &
…
Guillermo De Ita²²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7914))

Included in the following conference series:

Mexican Conference on Pattern Recognition

3062 Accesses
11 Citations

Abstract

Classifiers based on emerging patterns are usually more understandable for humans than those based on more complex mathematical models. However, most of the classifiers based on emerging patterns get low accuracy in those problems with imbalanced databases. This problem has been tackled through oversampling or undersampling methods, nevertheless, to the best of our knowledge these methods have not been tested for classifiers based on emerging patterns. Therefore, in this paper, we present an empirical study about the use of oversampling and undersampling methods to improve the accuracy of a classifier based on emerging patterns. We apply the most popular oversampling and undersampling methods over 30 databases from the UCI Repository of Machine Learning. Our experimental results show that using oversampling and undersampling methods significantly improves the accuracy of the classifier for the minority class.

Download to read the full chapter text

Chapter PDF

A Comparison of Re-sampling Techniques for Pattern Classification in Imbalanced Data-Sets

Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis

Article 18 December 2023

Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach

Keywords

References

Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for credit card fraud: A comparative study. Decision Support Systems 50(3), 602–613 (2011)
Article Google Scholar
Blake, C., Merz, C.J.: {UCI} Repository of machine learning databases. Tech. rep., University of California, Irvine, School of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16(1), 321–357 (2002)
MATH Google Scholar
Chawla, N.: Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, US (2010)
Google Scholar
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
MATH Google Scholar
Dong, G.: Preliminaries. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications, ch. 1. Data Mining and Knowledge Discovery Series, pp. 3–12. Chapman & Hall/CRC, United States of America (2012)
Google Scholar
Estabrooks, A., Jo, T., Japkowicz, N.: A Multiple Resampling Method For Learning From Imbalanced Data Sets. Computational Intelligence 20(1), 18–36 (2004)
Article MathSciNet Google Scholar
Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge Discovery in Databases: An Overview. AI Magazine 13(3), 57–70 (1992)
Google Scholar
García, S., Herrera, F.: An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)
MATH Google Scholar
García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Medina-Pérez, M.A., Ruiz-Shulcloper, J.: LCMine: An efficient algorithm for mining discriminative regularities and its application in supervised classification. Pattern Recognition 43(9), 3025–3034 (2010)
Article MATH Google Scholar
García-Borroto, M., Martínez-Trinidad, J., Carrasco-Ochoa, J.: A survey of emerging patterns for supervised classification. Artificial Intelligence Review 1–17 (2012)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009)
Article Google Scholar
Lenca, P., Lallich, S., Do, T.-N., Pham, N.-K.: A comparison of different off-centered entropies to deal with class imbalance for decision trees. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 634–643. Springer, Heidelberg (2008)
Chapter Google Scholar
Li, D.C., Liu, C.W., Hu, S.C.: A learning method for the class imbalance problem with medical data sets. Computers in Biology and Medicine 40(5), 509–518 (2010)
Article Google Scholar
Liu, W., Chawla, S., Cieslak, D.A., Chawla, N.V.: A Robust Decision Tree Algorithm for Imbalanced Data Sets. In: SDM 2010, pp. 766–777 (2010)
Google Scholar
Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: A Study with Class Imbalance and Random Sampling for a Decision Tree Learning System. In: Bramer, M. (ed.) Artificial Intelligence in Theory and Practice II, vol. 276, pp. 131–140. Springer, Boston (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Centro de Bioplantas, Universidad de Ciego de Ávila., Carretera a Morón km 9, Ciego de Ávila, Cuba, C.P. 69450
Octavio Loyola-González & Milton García-Borroto
Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro No. 1, Sta. María Tonanzintla, Puebla, México, C.P. 72840
Octavio Loyola-González, Miguel Angel Medina-Pérez, José Fco. Martínez-Trinidad & Jesús Ariel Carrasco-Ochoa
Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Av. San Claudio y 14 sur, Puebla, México
Guillermo De Ita

Authors

Octavio Loyola-González
View author publications
You can also search for this author in PubMed Google Scholar
Milton García-Borroto
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Angel Medina-Pérez
View author publications
You can also search for this author in PubMed Google Scholar
José Fco. Martínez-Trinidad
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Ariel Carrasco-Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo De Ita
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro #1, Santa Maria Tonantzintla, Puebla, México
Jesús Ariel Carrasco-Ochoa
Computer Science Department, José Francisco Martínez-Trinidad, National Institute of Astrophysics, Optics and Electronics (INAOE), Luis Enrique Erro No. 1, 72840, Sta. Maria Tonantzintla, Puebla, Mexico
José Francisco Martínez-Trinidad
Instituto Politécnico Nacional (IPN), Cerro Blanco 141, 76090, Colinas del Cimatario, Querétaro, México
Joaquín Salas Rodríguez
Institute of Cybernetics “E. Caianiello”, CNR, Via Campi Flegrei 34, 80078, Pozzuoli, Naples, Italy
Gabriella Sanniti di Baja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Loyola-González, O., García-Borroto, M., Medina-Pérez, M.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., De Ita, G. (2013). An Empirical Study of Oversampling and Undersampling Methods for LCMine an Emerging Pattern Based Classifier. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds) Pattern Recognition. MCPR 2013. Lecture Notes in Computer Science, vol 7914. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38989-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-38989-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38988-7
Online ISBN: 978-3-642-38989-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

An Empirical Study of Oversampling and Undersampling Methods for LCMine an Emerging Pattern Based Classifier

Abstract

Chapter PDF

Similar content being viewed by others

A Comparison of Re-sampling Techniques for Pattern Classification in Imbalanced Data-Sets

Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis

Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

An Empirical Study of Oversampling and Undersampling Methods for LCMine an Emerging Pattern Based Classifier

Abstract

Chapter PDF

Similar content being viewed by others

A Comparison of Re-sampling Techniques for Pattern Classification in Imbalanced Data-Sets

Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis

Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation