Abstract
Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
R. Bairagi C.M. Suchindran (1989) ArticleTitleAn estimator of the cutoff point maximizing sum of sensitivity and specificity Sankhya, Series B. Indian Journal of Statistics. 51 263–269
L.B. Booker D.E. Goldberg J.F. Holland (1990) Classifier systems and genetic algorithms J.G. Carbonell (Eds) Machine Learning Paradigms and Methods The MIT Press Cambridge, MA 235–282
J.W. Grzymala-Busse (1992) LERS—A system for learning from examples based on rough sets R. Slowinski (Eds) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory Kluwer Academic Publishers Norwell, MA 3–18
J.W. Grzymala-Busse (1997) ArticleTitleA new version of the rule induction system LERS Fundamenta Informaticae. 31 27–39
Grzymala-Busse, J. W., Goodwin, L. K., Zhang, X. (1999) Increasing sensitivity of preterm birth by changing rule strengths. Proceedings of the 8th Workshop on Intelligent Information Systems (IIS’99), Ustron, Poland, 127–136
Grzymala-Busse, J. W., Goodwin, L. K., Grzymala-Busse, W. J., Zheng, X. (2000) An approach to imbalanced data sets based on changing rule strength. Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, 69–74
Grzymala-Busse, J. W., Stefanowski, J., Wilk, S. (2004) A comparison of two approaches to data mining from imbalanced data. Proceedings of the KES 2004, 8th International Conference on Knowledge-based Intelligent Information + Engineering Systems, Wellington, New Zealand, September 20–24, 2004. Part I, Lecture Notes in AI, Vol. 3213, Springer Verlag, Berlin Heidelberg, 2004, 757–763.
M. Hamburg (1983) Statistical Analysis for Decision Making EditionNumber3 Harcourt Brace Jovanovich Inc. New York, NY
J.H. Holland K.J. Holyoak R.E. Nisbett (1986) Induction Processes of Inference, Learning, and Discovery The MIT Press Cambridge, MA
Japkowicz, N. (2000) Learning from imbalanced data sets: a comparison of various strategies. Learning from Imbalanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, July 30–31, pp. 10–17
Z. Pawlak J.W. Grzymala-Busse R. Slowinski W. Ziarko (1995) ArticleTitleRough sets Communications of the ACM. 38 89–95 Occurrence Handle10.1145/219717.219791
Z. Pawlak (1982) ArticleTitleRough sets International Journal Computer and Information Sciences 11 341–356 Occurrence Handle10.1007/BF01001956
Z. Pawlak (1991) Rough Sets. Theoretical Aspects of Reasoning about Data Kluwer Academic Publishers Norwell, MA
J. Stefanowski (1998) On rough set based approaches to induction of decision rules A. Skowron L. Polkowski (Eds) Rough Sets in Knowledge Discovery NumberInSeriesVol. 1 Physica Verlag Heidelberg 500–529
J. Stefanowski D. Vanderpooten (2001) ArticleTitleInduction of decision rules in classification and discovery-oriented perspectives International Journal of Intelligent Systems. 16 IssueID1 13–28 Occurrence Handle10.1002/1098-111X(200101)16:1<13::AID-INT3>3.0.CO;2-M
J. Stefanowski S. Wilk (2001) ArticleTitleEvaluating business credit risk by means of approach integrating decision rules and case based learning. International Journal of Intelligent Systems in Accounting Finance and Management 10 97–114
S. Wilk R. Slowinski W. Michalowski S. Greco (2004) ArticleTitleSupporting triage of children with abdominal pain in the emergency room European Journal of Operation Research 160 IssueID3 696–709 Occurrence Handle10.1016/j.ejor.2003.06.034
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Grzymala-Busse, J.W., Stefanowski, J. & Wilk, S. A Comparison of Two Approaches to Data Mining from Imbalanced Data. J Intell Manuf 16, 565–573 (2005). https://doi.org/10.1007/s10845-005-4362-2
Issue Date:
DOI: https://doi.org/10.1007/s10845-005-4362-2