Abstract
The evaluation of classifier performance in a cost-sensitive setting is straightforward if the operating conditions (misclassification costs and class distributions) are fixed and known. When this is not the case, evaluation requires a method of visualizing classifier performance across the full range of possible operating conditions. This talk outlines the most important requirements for cost-sensitive classifier evaluation for machine learning and KDD researchers and practitioners, and introduces a recently developed technique for classifier performance visualization – the cost curve – that meets all these requirements.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Adams, N.M., Hand, D.J.: Comparing classifiers when misclassification costs are uncertain. Pattern Recognition 32, 1139–1147 (1999)
Antonie, M.-L., Zaiane, O.R., Holtex, R.C.: Learning to use a learned model: A two-stage approach to classification. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), pp. 33–42 (2006)
Bosin, A., Dessi, N., Pes, B.: Capturing heuristics and intelligent methods for improving micro-array data classification. In: IDEAL 2007. LNCS, vol. 4881, pp. 790–799. Springer, Heidelberg (2007)
Briggs, W.M., Zaretzki, R.: The skill plot: a graphical technique for the evaluating the predictive usefulness of continuous diagnostic tests. Biometrics, OnlineEarly Articles (2007)
Chawla, N.V., Hall, L.O., Joshi, A.: Wrapper-based computation and evaluation of sampling methods for imbalanced datasets. In: Workshop on Utility-Based Data Mining held in conjunction with the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 179–188 (2005)
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), pp. 233–240 (2006)
Drummond, C., Holte, R.C.: Explicitly representing expected cost: An alternative to ROC representation. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–207 (2000)
Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, held in conjunction with ICML 2003 (2003)
Drummond, C., Holte, R.C.: Learning to live with false alarms. In: Workshop on Data Mining Methods for Anomaly Detection held in conjunction with the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 21–24 (2005)
Drummond, C., Holte, R.C.: Severe class imbalance: Why better algorithms aren’t the answer. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 539–546. Springer, Heidelberg (2005)
Drummond, C., Holte, R.C.: Cost curves: An improved method for visualizing classifier performance. Machine Learning 65(1), 95–130 (2006)
Fawcett, T.: ROC graphs with instance-varying costs. Pattern Recognition Letters 27(8), 882–891 (2006)
Hilden, J., Glasziou, P.: Regret graphs, diagnostic uncertainty, and Youden’s index. Statistics in Medicine 15, 969–986 (1996)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11(1), 63–91 (1993)
Jumi, M., Suzuki, E., Ohshima, M., Zhong, N., Yokoi, H., Takabayashi, K.: Spiral discovery of a separate prediction model from chronic hepatitis data. In: Sakurai, A., Hasida, K., Nitta, K. (eds.) JSAI 2003. LNCS (LNAI), vol. 3609, pp. 464–473. Springer, Heidelberg (2007)
Liu, T., Ting, K.M.: Variable randomness in decision tree ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 81–90. Springer, Heidelberg (2006)
Liu, Y., Shriberg, E.: Comparing evaluation metrics for sentence boundary detection. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), vol. 4, pp. IV–185—IV–188 (2007)
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42, 203–231 (2001)
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Menlo Park, CA, pp. 43–48 (1997)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Remaleya, A.T., Sampson, M.L., DeLeo, J.M., Remaley, N.A., Farsi, B.D., Zweig, M.H.: Prevalence-value-accuracy plots: A new method for comparing diagnostic tests based on misclassification costs. Clinical Chemistry 45, 934–941 (1999)
Ting, K.M.: Issues in classifier evaluation using optimal cost curves. In: Proceedings of The Nineteenth International Conference on Machine Learning, pp. 642–649 (2002)
Zhou, Z.-H., Liu, X.-L.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18(1), 63–77 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Holte, R.C., Drummond, C. (2008). Cost-Sensitive Classifier Evaluation Using Cost Curves. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)