Abstract
We derive pointwise exact bootstrap distributions of ROC curves and the difference between ROC curves for threshold and vertical averaging. From these distributions, pointwise confidence intervals are derived and their performance is measured in terms of coverage accuracy. Improvements over techniques currently in use are obtained, in particular in the extremes of ROC curves where we show that typical drastic falls in coverage accuracy can be avoided.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agresti, A., & Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. American Statistician, 52(2), 119–226.
Agresti, A. & Min, Y. (2005). Simple improved confidence intervals for comparing matched proportions. Statistics in Medicine, 24, 729–740.
Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
Bandos, A. (2005). Nonparametric methods in comparing two correlated ROC curves. PhD thesis, Graduate School of Public Health, University of Pittsburgh.
Bengio, S., Mariéthoz, J., & Keller, M. (2005). The expected performance curve. In Proceedings of the ICML 2005 workshop on ROC analysis in machine learning, Bonn, Germany.
Clopper, C. J., & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26, 404–413.
Drummond, C., & Holte, R. C. (2006). Cost curves: an improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.
Dugas, C., & Gadoury, D. (2008). Pointwise exact bootstrap distribution of cost curves. In A. McCallum & S. Roweis, (Eds.), Proceedings of the twenty fifth international conference on machine learning, Helsinki, Finland (pp. 280–287).
Efron, B., & Tibshirani, R. J. (1993). Monographs on statistics and probability. Vol. 57: An introduction to the bootstrap. London: Chapman & Hall.
Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers. Technical report, HP Laboratories.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
Fawcett, T. (2006). ROC graphs with instance varying costs. Pattern Recognition Letters, 27(8), 882–891.
Fawcett, T., & Flach, A. (2005). A response to Webb and Ting’s on the application of ROC analysis to predict classification performance under varying class distributions. Machine Learning, 58(1), 33–38.
Fawcett, T., & Niculescu-Mizil, A. (2007). PAV and the ROC convex hull. Machine Learning, 68(1), 97–106.
Hall, P., Hyndman, R. J., & Fan, Y. (2004). Nonparametric confidence intervals for receiver operating characteristic curves. Biometrika, 91(3), 743–750.
Hall, P. G., & Hyndman, R. J. (2003). Improved methods for bandwidth selection when estimating ROC curves. Statistics & Probability Letters, 64, 181–189.
Hsieh, F., & Turnbull, B. W. (1996). Nonparametric and semiparametric estimation of the receiver operating characteristic curve. Annals of Statistics, 24(1), 25–40.
Kerekes, J. (2008). Receiver operating characteristic curve confidence intervals and regions. IEEE Geoscience and Remote Sensing Letters, 5(2), 251–255.
Lloyds, C. J. (1998). The use of smoothed ROC curves to summarise and compare diagnostic systems. Journal of the American Statistical Association, 93, 1356–1364.
Lloyds, C. J., & Wong, Z. (1999). Kernel estimators of the ROC curve are better than empirical. Statistics & Probability Letters, 44, 221–228.
Macskassy, S. A., Provost, F., & Rosset, S. (2005). Pointwise ROC confidence bounds: an empirical evaluation. In Proceedings of the ICML 2005 workshop on ROC analysis in machine learning, Bonn, Germany.
Platt, J. (2000) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In A. Smola, P. Barlett, B. Schölkopf & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.
Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Better decisions through science. Scientific American, 283(4), 82–87.
Swets, J. A., & Pickett, R. M. (1982). Evaluation of diagnostic systems: methods from signal detection theory. San Diego: Academic Press.
Webb, G. I., & Ting, K. M. (2005). On the application of ROC analysis to predict classification performance under varying class distributions. Machine Learning, 58(1), 25–32.
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In KDD’02: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM.
Zou, K. H., Hall, W. J., & Shapiro, D. E. (1997). Smooth non-parametric receiver operating characteristics (ROC) curves for continuous diagnostic tests. Statistics in Medicine, 16, 2143–2156.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Johannes Fürnkranz.
Rights and permissions
About this article
Cite this article
Dugas, C., Gadoury, D. Pointwise exact bootstrap distributions of ROC curves. Mach Learn 78, 103–136 (2010). https://doi.org/10.1007/s10994-009-5134-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5134-6