Abstract
Diagnostic tests, medical tests, screening tests, biomarkers, and prediction rules are all types of classifiers. This chapter introduces methods for classifier development and evaluation. We first introduce measures of classification performance including sensitivity, specificity, and receiver operating characteristic (ROC) curves. We then review some issues in the design of studies to assess and compare the performance of classifiers. Approaches for using the data to estimate and compare classifier accuracy are then introduced. Next, methods for combining multiple classifiers into a single classifier are presented. Lastly, we discuss other important aspects of classifier development and evaluation. The methods presented are illustrated with real data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pepe, M. S. (2003) The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford, Oxford University Press.
Kaiser, S., Frenckner, B., and Jorulf, H. K. (2002) Suspected appendicitis in children: US and CT—a prospective randomized study. Radiology 223, 633–638.
Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M., Thornquist, M., Winget, M., and Yasui, Y. (2001) Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 93, 1054–1061.
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Lijmer, J. G., Moher, D., Rennie, D., and Vet, H. C. W. D. (2003) Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin. Chem. 49, 1–6.
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Lijmer, J. G., Moher, D., Rennie, D., and Vet, H. C. W. D. (2003) The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin. Chem. 49, 7–18.
Cheng, H., and Macaluso, M. (1997) Comparison of the accuracy of two tests with a confirmatory procedure limited to positive results. Epidemiology 8, 104–106.
Christenson, R. H., Fitzgerald, R. L., Ochs, L., Rozenberg, M., Frankel, W. L., Herold, D. A., Duh, S. H., Alonsozana, G. L., and Jacobs, E. (1997) Characteristics of a 20-minute whole blood rapid assay for cardiac troponin T. Clin. Biochem. 30, 27–33.
Schatzkin, A., Connor, R. J., and Taylor, P. R. (1987) Comparing new and old screening tests when a reference procedure cannot be performed on all screenees. Am. J. Epidemiol. 125, 672–678.
Wieand, S., Gail, M. H., James, B. R., and James, K. L. (1989) A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76, 585–592.
Hsieh, F., and Turnbull, B. W. (1996) Nonparametric and semiparametric estimation of the receiver operating chacterisitic ROC curve. Ann. Stat. 24, 25–40.
Bamber, D. (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psychol. 12, 387–415.
Hanley, J. A., and McNeil, B. J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 142, 29–36.
DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845.
Efron, B., and Tibshirani, R. J. (1993) An Introduction to the Bootstrap. New York, Chapman & Hall.
Dodd, L. E., and Pepe, M. S. (2003) Semiparametric regression for the area under the receiver operating characteristic curve. J. Am. Stat. Assoc. 98, 409–417.
Pepe, M. S. (2000) An interpretation for the ROC curve and inference using GLM procedures. Biometrics 56, 352–359.
Metz, C. E., Herman, B. A., and Shen, J. H. (1998) Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat. Medi. 17, 1033–1053.
Dorfman, D. D., and Alf, E. (1969) Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals: rating data. J. Math. Psychol. 6, 487–496.
Ma, G., and Hall, W. J. (1993) Confidence bands for receiver operating characteristic curves. Medical Decision Making 13, 191–197.
Metz, C. E., and Kronman, H. B. (1980) Statistical significance tests for binormal ROC curves. J. Math. Psychol. 22, 218–243.
Marshall, R. J. (1989) The predictive value of simple rules for combining two diagnostic tests. Biometrics 45, 1213–1222.
McIntosh, M., and Pepe, M. S. (2002) Combining several screening tests: optimality of the risk score. Biometrics 58, 657–664.
Baker, S. G. (2000) Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics 56, 1082–1087.
Ruczinski, I., Kooperberg, C., and LeBlanc, M. L. (2003) Logic regression. J. Computati. Graphical Stat. 12, 475–511.
Breiman, L., Freidman, J. H., Olshen, R. A., and Stone, C. J. (1984) Classification and Regression Trees. Belmont, Wadsworth.
Cristianini, N., and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge, Cambridge University Press.
Schapire, R., Freund, Y., Bartlett, P., and Lee, W. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26, 1651–1686.
Friedman, L. M., Hastie, T., and Tibshirani, R. (2000) Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 400–407.
Efron, B., and Morris, C. (1977) Stein’s paradox in statistics. Sci. Am. 236, 119–127.
Copas, J. B. (1997) Using regression models for prediction: shrinkage and regression to the mean. Stat. Methods Med. Res. 6, 167–183.
Moons, K. G. M., Donders, A. R. T., Steyerberg, E. W., and Harrell, F. E. (2004) Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J. Clin. Epidemiol. 57, 1262–1270.
Begg, C. B., and Greenes, R. A. (1983) Assessment of diagnostic tests when disease is subject to selection bias. Biometrics 39, 207–216.
Alonzo, T. A., and Pepe, M. S. (2005) Assessing accuracy of a continuous screening test in the presence of verification bias. Appl. Stat. 54, 173–190.
Gart, J. J., and Buck, A. A. (1966) Comparison of a screening test and a reference test in epidemilogic studies. II. A probabilitic model for the comparison of diagnostic tests. Am. J. Epidemiol. 83, 593–602.
Leisenring, W., Pepe, M. S., and Longton, G. (1997) A marginal regression modelling framework for evaluating medical diagnostic tests. Stat. Med. 16, 1263–1281.
Leisenring, W., Alonzo, T., and Pepe, M. S. (2000) Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics 56, 345–351.
Tosteson, A., and Begg, C. B. (1985) A general regression methodology for ROC curve estimation. Medical Decision Making 8, 204–215.
Toledano, A. Y., and Gastonis, C. A. (1996) Ordinal regression methodology for ROC curves derived from correlated datta. Stat. Med. 15, 1807–1826.
Pepe, M. S. (1997) A regression modelling framework for receiver operating characteristic curves in medical diagnostic testing. Biometrika 84, 595–608.
Dorfman, D. D., Berbaum, K. S., and Metz, C. E. (1992) Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jack-knife method. Invest. Radiol. 27, 723–731.
Obuchowski, N. A. (1995) Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Acad. Radiol. 2, S22–S29.
Zhou, X. H., Obuchowski, N. A., and McClish, D. K. (2002) Statistical Methods in Diagnostic Medicine. New York, John Wiley & Sons.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Alonzo, T.A., Pepe, M.S. (2007). Development and Evaluation of Classifiers. In: Ambrosius, W.T. (eds) Topics in Biostatistics. Methods in Molecular Biology™, vol 404. Humana Press. https://doi.org/10.1007/978-1-59745-530-5_6
Download citation
DOI: https://doi.org/10.1007/978-1-59745-530-5_6
Publisher Name: Humana Press
Print ISBN: 978-1-58829-531-6
Online ISBN: 978-1-59745-530-5
eBook Packages: Springer Protocols