Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Yen, Show-Jane; Lee, Yue-Shi

doi:10.1007/978-3-540-37256-1_89

Show-Jane Yen³ &
Yue-Shi Lee³

Part of the book series: Lecture Notes in Control and Information Sciences ((LNCIS,volume 344))

539 Accesses
68 Citations

Abstract

The most important factor of classification for improving classification accuracy is the training data. However, the data in real-world applications often are imbalanced class distribution, that is, most of the data are in majority class and little data are in minority class. In this case, if all the data are used to be the training data, the classifier tends to predict that most of the incoming data belong to the majority class. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class in the imbalanced class distribution problem. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

The Impact of Data Locality on the Performance of Cluster-Based Under-Sampling

Cluster-Based Under-Sampling Using Farthest Neighbour Technique for Imbalanced Datasets

A Review of the Oversampling Techniques in Class Imbalance Problem

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chawla, N. V.: C4.5 and Imbalanced Datasets: Investigating the Effect of Sampling Method, Probabilistic Estimate, and Decision Tree Structure. Proceedings of the ICML’03 Workshop on Class Imbalances, (2003)
Google Scholar
Chawla, N. V., Bowyer, K.W., Hall, L. O., Kegelmeyer, W. P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16 (2002) 321–357
MATH Google Scholar
Caragea, D., Cook, D., Honavar, V.: Gaining Insights into Support Vector Machine Pattern Classifiers Using Projection-Based Tour Methods. Proceedings of the KDD Conference, San Francisco, CA (2001) 251–256
Google Scholar
Chawla, N. V., Lazarevic, A., Hall, L. O., Bowyer, K. W.: Smoteboost: Improving Prediction of the Minority Class in Boosting. Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Dubrovnik, Croatia (2003) 107–119
Google Scholar
Clark, P., Niblett, T.: The CN2 Induction Algorithm. Machine Learning, 3 (1989) 261–283
Google Scholar
Drummond, C., Holte, R. C.: C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling Beats Over-Sampling. Proceedings of the ICML’03 Workshop on Learning from Imbalanced Datasets, (2003)
Google Scholar
Del-Hoyo, R., Buldain, D., Marco, A.: Supervised Classification with Associative SOM. Lecture Notes in Computer Science, 2686 (2003) 334–341
Article Google Scholar
Japkowicz, N.: Concept-learning in the Presence of Between-class and Within-class Imbalances. Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, (2001) 67–77
Google Scholar
Zhang, J., Mani, I.: KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Datasets, (2003).
Google Scholar
Chyi, Y. M.: Classification Analysis Techniques for Skewed Class Distribution Problems. Master Thesis, Department of Information Management, National Sun Yat-Sen University, (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Ming Chuan University, 5 The-Ming Rd., Gwei Shan District, Taoyuan County, 333, Taiwan
Show-Jane Yen & Yue-Shi Lee

Authors

Show-Jane Yen
View author publications
You can also search for this author in PubMed Google Scholar
Yue-Shi Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui, China
De-Shuang Huang
Queen’s University, Belfast, UK
Kang Li & George William Irwin &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yen, SJ., Lee, YS. (2006). Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In: Huang, DS., Li, K., Irwin, G.W. (eds) Intelligent Control and Automation. Lecture Notes in Control and Information Sciences, vol 344. Springer, Berlin, Heidelberg . https://doi.org/10.1007/978-3-540-37256-1_89

Download citation

DOI: https://doi.org/10.1007/978-3-540-37256-1_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37255-4
Online ISBN: 978-3-540-37256-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Abstract

Chapter PDF

Similar content being viewed by others

The Impact of Data Locality on the Performance of Cluster-Based Under-Sampling

Cluster-Based Under-Sampling Using Farthest Neighbour Technique for Imbalanced Datasets

A Review of the Oversampling Techniques in Class Imbalance Problem

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Abstract

Chapter PDF

Similar content being viewed by others

The Impact of Data Locality on the Performance of Cluster-Based Under-Sampling

Cluster-Based Under-Sampling Using Farthest Neighbour Technique for Imbalanced Datasets

A Review of the Oversampling Techniques in Class Imbalance Problem

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation