Development and Evaluation of Classifiers

Alonzo, Todd A.; Pepe, Margaret Sullivan

doi:10.1007/978-1-59745-530-5_6

Todd A. Alonzo PhD² &
Margaret Sullivan Pepe PhD³

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 404))

8653 Accesses
10 Citations

Abstract

Diagnostic tests, medical tests, screening tests, biomarkers, and prediction rules are all types of classifiers. This chapter introduces methods for classifier development and evaluation. We first introduce measures of classification performance including sensitivity, specificity, and receiver operating characteristic (ROC) curves. We then review some issues in the design of studies to assess and compare the performance of classifiers. Approaches for using the data to estimate and compare classifier accuracy are then introduced. Next, methods for combining multiple classifiers into a single classifier are presented. Lastly, we discuss other important aspects of classifier development and evaluation. The methods presented are illustrated with real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using and Interpreting Diagnostic Tests with Quantitative Results

Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas

Article 26 June 2024

Methods for Evaluating Prediction Performance of Biomarkers and Tests

References

Pepe, M. S. (2003) The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford, Oxford University Press.
Google Scholar
Kaiser, S., Frenckner, B., and Jorulf, H. K. (2002) Suspected appendicitis in children: US and CT—a prospective randomized study. Radiology 223, 633–638.
Article PubMed Google Scholar
Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M., Thornquist, M., Winget, M., and Yasui, Y. (2001) Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 93, 1054–1061.
Article PubMed CAS Google Scholar
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Lijmer, J. G., Moher, D., Rennie, D., and Vet, H. C. W. D. (2003) Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin. Chem. 49, 1–6.
Article PubMed CAS Google Scholar
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Lijmer, J. G., Moher, D., Rennie, D., and Vet, H. C. W. D. (2003) The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin. Chem. 49, 7–18.
Article PubMed CAS Google Scholar
Cheng, H., and Macaluso, M. (1997) Comparison of the accuracy of two tests with a confirmatory procedure limited to positive results. Epidemiology 8, 104–106.
Article PubMed CAS Google Scholar
Christenson, R. H., Fitzgerald, R. L., Ochs, L., Rozenberg, M., Frankel, W. L., Herold, D. A., Duh, S. H., Alonsozana, G. L., and Jacobs, E. (1997) Characteristics of a 20-minute whole blood rapid assay for cardiac troponin T. Clin. Biochem. 30, 27–33.
Article PubMed CAS Google Scholar
Schatzkin, A., Connor, R. J., and Taylor, P. R. (1987) Comparing new and old screening tests when a reference procedure cannot be performed on all screenees. Am. J. Epidemiol. 125, 672–678.
PubMed CAS Google Scholar
Wieand, S., Gail, M. H., James, B. R., and James, K. L. (1989) A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76, 585–592.
Article Google Scholar
Hsieh, F., and Turnbull, B. W. (1996) Nonparametric and semiparametric estimation of the receiver operating chacterisitic ROC curve. Ann. Stat. 24, 25–40.
Article Google Scholar
Bamber, D. (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psychol. 12, 387–415.
Article Google Scholar
Hanley, J. A., and McNeil, B. J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 142, 29–36.
Google Scholar
DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845.
Article PubMed CAS Google Scholar
Efron, B., and Tibshirani, R. J. (1993) An Introduction to the Bootstrap. New York, Chapman & Hall.
Google Scholar
Dodd, L. E., and Pepe, M. S. (2003) Semiparametric regression for the area under the receiver operating characteristic curve. J. Am. Stat. Assoc. 98, 409–417.
Article Google Scholar
Pepe, M. S. (2000) An interpretation for the ROC curve and inference using GLM procedures. Biometrics 56, 352–359.
Article PubMed CAS Google Scholar
Metz, C. E., Herman, B. A., and Shen, J. H. (1998) Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat. Medi. 17, 1033–1053.
Article CAS Google Scholar
Dorfman, D. D., and Alf, E. (1969) Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals: rating data. J. Math. Psychol. 6, 487–496.
Article Google Scholar
Ma, G., and Hall, W. J. (1993) Confidence bands for receiver operating characteristic curves. Medical Decision Making 13, 191–197.
Article PubMed CAS Google Scholar
Metz, C. E., and Kronman, H. B. (1980) Statistical significance tests for binormal ROC curves. J. Math. Psychol. 22, 218–243.
Article Google Scholar
Marshall, R. J. (1989) The predictive value of simple rules for combining two diagnostic tests. Biometrics 45, 1213–1222.
Article Google Scholar
McIntosh, M., and Pepe, M. S. (2002) Combining several screening tests: optimality of the risk score. Biometrics 58, 657–664.
Article PubMed Google Scholar
Baker, S. G. (2000) Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics 56, 1082–1087.
Article PubMed CAS Google Scholar
Ruczinski, I., Kooperberg, C., and LeBlanc, M. L. (2003) Logic regression. J. Computati. Graphical Stat. 12, 475–511.
Article Google Scholar
Breiman, L., Freidman, J. H., Olshen, R. A., and Stone, C. J. (1984) Classification and Regression Trees. Belmont, Wadsworth.
Google Scholar
Cristianini, N., and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge, Cambridge University Press.
Google Scholar
Schapire, R., Freund, Y., Bartlett, P., and Lee, W. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26, 1651–1686.
Article Google Scholar
Friedman, L. M., Hastie, T., and Tibshirani, R. (2000) Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 400–407.
Article Google Scholar
Efron, B., and Morris, C. (1977) Stein’s paradox in statistics. Sci. Am. 236, 119–127.
Article Google Scholar
Copas, J. B. (1997) Using regression models for prediction: shrinkage and regression to the mean. Stat. Methods Med. Res. 6, 167–183.
Article PubMed CAS Google Scholar
Moons, K. G. M., Donders, A. R. T., Steyerberg, E. W., and Harrell, F. E. (2004) Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J. Clin. Epidemiol. 57, 1262–1270.
Article PubMed CAS Google Scholar
Begg, C. B., and Greenes, R. A. (1983) Assessment of diagnostic tests when disease is subject to selection bias. Biometrics 39, 207–216.
Article PubMed CAS Google Scholar
Alonzo, T. A., and Pepe, M. S. (2005) Assessing accuracy of a continuous screening test in the presence of verification bias. Appl. Stat. 54, 173–190.
Google Scholar
Gart, J. J., and Buck, A. A. (1966) Comparison of a screening test and a reference test in epidemilogic studies. II. A probabilitic model for the comparison of diagnostic tests. Am. J. Epidemiol. 83, 593–602.
CAS Google Scholar
Leisenring, W., Pepe, M. S., and Longton, G. (1997) A marginal regression modelling framework for evaluating medical diagnostic tests. Stat. Med. 16, 1263–1281.
Article PubMed CAS Google Scholar
Leisenring, W., Alonzo, T., and Pepe, M. S. (2000) Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics 56, 345–351.
Article PubMed CAS Google Scholar
Tosteson, A., and Begg, C. B. (1985) A general regression methodology for ROC curve estimation. Medical Decision Making 8, 204–215.
Article Google Scholar
Toledano, A. Y., and Gastonis, C. A. (1996) Ordinal regression methodology for ROC curves derived from correlated datta. Stat. Med. 15, 1807–1826.
Article PubMed CAS Google Scholar
Pepe, M. S. (1997) A regression modelling framework for receiver operating characteristic curves in medical diagnostic testing. Biometrika 84, 595–608.
Article Google Scholar
Dorfman, D. D., Berbaum, K. S., and Metz, C. E. (1992) Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jack-knife method. Invest. Radiol. 27, 723–731.
Article PubMed CAS Google Scholar
Obuchowski, N. A. (1995) Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Acad. Radiol. 2, S22–S29.
Article PubMed Google Scholar
Zhou, X. H., Obuchowski, N. A., and McClish, D. K. (2002) Statistical Methods in Diagnostic Medicine. New York, John Wiley & Sons.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Children’s Oncology Group, University of Southern California, Arcadia, CA
Todd A. Alonzo PhD
Department of Statistics, Fred Hutchinson Cancer Research Center, University of Washington, Seattle, WA
Margaret Sullivan Pepe PhD

Authors

Todd A. Alonzo PhD
View author publications
You can also search for this author in PubMed Google Scholar
Margaret Sullivan Pepe PhD
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Biostatistical Sciences, Wake Forest University Health Sciences, Winston-Salem, NC
Walter T. Ambrosius

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Alonzo, T.A., Pepe, M.S. (2007). Development and Evaluation of Classifiers. In: Ambrosius, W.T. (eds) Topics in Biostatistics. Methods in Molecular Biology™, vol 404. Humana Press. https://doi.org/10.1007/978-1-59745-530-5_6

Download citation

DOI: https://doi.org/10.1007/978-1-59745-530-5_6
Publisher Name: Humana Press
Print ISBN: 978-1-58829-531-6
Online ISBN: 978-1-59745-530-5
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Development and Evaluation of Classifiers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using and Interpreting Diagnostic Tests with Quantitative Results

Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas

Methods for Evaluating Prediction Performance of Biomarkers and Tests

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Development and Evaluation of Classifiers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using and Interpreting Diagnostic Tests with Quantitative Results

Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas

Methods for Evaluating Prediction Performance of Biomarkers and Tests

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation