Abstract
In real-world classification problems, different types of misclassification errors often have asymmetric costs, thus demanding cost-sensitive learning methods that attempt to minimize average misclassification cost rather than plain error rate. Instance weighting and post hoc threshold adjusting are two major approaches to cost-sensitive classifier learning. This paper compares the effects of these two approaches on several standard, off-the-shelf classification methods. The comparison indicates that the two approaches lead to similar results for some classification methods, such as Naïve Bayes, logistic regression, and backpropagation neural network, but very different results for other methods, such as decision tree, decision table, and decision rule learners. The findings from this research have important implications on the selection of the cost-sensitive classifier learning approach as well as on the interpretation of a recently published finding about the relative performance of Naïve Bayes and decision trees.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Afifi AA, Clark V (1996). Computer-aided multivariate analysis, 3rd edn. Chapman & Hall, London
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bradley AP (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7): 1145–1159
Chan P, Stolfo S (1998) Towards scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the 4th international conference on knowledge discovery and data mining (KDD), New York, pp 164–168
Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002). SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16: 321–357
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the12th international conference on Machine Learning (ICML), Lake Tahoe, CA, pp 115–123
De Falco I, Della Cioppa A, Iazzetta A, Tarantino E (2005). An evolutionary approach for automatically extracting intelligible classification rules. Knowl Inf Syst 7(2): 179–201
Domingos P (1999) MetaCost: A general method for making classifiers cost sensitive. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), San Diego, CA, pp 155–164
Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th international conference on machine learning (ICML), Bari, Italy, pp 105–112
Drummond C, Holte R (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: Proceedings of the 17th international conference on machine learning (ICML), Stanford, CA, pp 239–249
Drummond C, Holte R (2006). Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1): 95–130
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence (IJCAI), Seattle, WA, pp 973–978
Fan W, Stolfo SJ, Zhang J, Chan PK (1999) AdaCost: Misclassification cost-sensitive boosting. In: Proceedings of the 16th international conference on machine learning (ICML), Bled, Slovenia, pp 97–105
Fawcett T (2003). ROC graphs: Notes and practical considerations for data mining researchers. HPL-2003–4, Intelligent Enterprise Technologies Lab Hewlett-Packard, PaloAlto
Gama J (2000). Iterative Bayes. Intell Data Anal 4(6): 475–488
Hand DJ, Till RJ (2001). A simple generalization of the area under the ROC curve to multiple class classification problems. Mach Learn 45(2): 171–186
Hidalgo JMG (2002) Evaluating cost-sensitive unsolicited bulk email categorization. In: Proceedings of the 2002 ACM symposium on applied computing (SAC), Madrid, Spain, pp 615–620
Hosmer DW, Lemeshow S (2000). Applied Logistic Regression, 2nd edn. Wiley, New York
Huang J, Ling CX (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3): 299–310
Japkowicz N, Stephen S (2002). The class imbalance problem: a systematic study. Intell Data Anal 6(5): 429–449
Kim Y, Kim J (2004). Convex hull ensemble machine for regression and classification. Knowl Inf Syst 6(6): 645–663
Kohavi R (1995a) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence (IJCAI), Montreal, Quebec, Canada, pp 1137–1143
Kohavi R (1995b) The power of decision tables. In: Proceedings of the 8th European conference on machine learning (ECML), Heraclion, Crete, Greece, pp 174–189
Li T, Zhu S, Ogihara M (2006). Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10(4): 453–472
Lin FY, McClean S (2000) The prediction of financial distress using a cost sensitive approach and prior probability. In: Proceedings of the 17th international conference on machine learning (ICML) workshop on cost-sensitive learning, Stanford, CA, pp 84–88
Margineantu D (2002) Class probability estimation and cost-sensitive classification decisions. In: Proceedings of the 13th European conference on machine learning (ECML), Helsinki, Finland, pp 270–281
Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th international conference on machine learning (ICML), Madison, WI, pp 445–453
Quinlan JR (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo
Rumelhart DE, Hinton GE, William RJ (1986). Learning representations by back-propagating errors. Nature 323: 533–536
Sinha AP, May JH (2005). Evaluating and tuning predictive data mining models using receiver operating characteristic curves. J Manage Inf Syst 21(3): 249–280
Ting KM (2002). An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3): 659–665
Weiss SM, Kulikowski CA (1991). Computer systems that learn—classification and prediction methods from statistics, neural nets, machine learning and expert system. Morgan Kaufmann, Palo Alto
Witten IH, Frank E (2005). Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, Florida, pp 435–442
Zhang J, Kang DK, Silvescu A, Honavar V (2006). Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data. Knowl Inf Syst 9(2): 157–179
Zhao H, Ram S (2004). Constrained cascade generalization of decision trees. IEEE Trans Knowl Data Eng 16(6): 727–739
Zhao H, Sinha AP (2005). An efficient algorithm for generating generalized decision forests. IEEE Transactions on Systems, Man and Cybernetics, Part A: Syst Hum 35(5): 754–762
Zhao H, Sinha AP, Ram S (2006). Elitist and ensemble strategies for cascade generalization. J Database Manage 17(3): 92–107
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, H. Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inf Syst 15, 321–334 (2008). https://doi.org/10.1007/s10115-007-0079-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0079-1