Abstract
The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as “statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable.” Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin’s lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cox, D. R. (1958) The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B. 20, 215–42.
Berkson, J. (1955) Maximum likelihood and minimum chi-square estimates of the logistic function. J. Am. Stat. Assoc. 50, 130–62.
Cornfield, J., Gordon, T., and Smith W. W. (1961) Quantal response curves for experimentally uncontrolled variables. Bull. Int. Stat. Inst. 38, 91–115.
MeSH B. (2005) Bethesda: National Library of Medicine. Available at http://www.nlm.nih.gov/mesh/MBrowser.html. Retrieved April 2, 2007.
Mullner, M., Matthews, H., and Altman D. G. (2002) Reporting on statistical methods to adjust for confounding: a cross-sectional survey. Ann. Intern. Med. 136, 122–6.
Tibshirani, R. (1982) A plain man’s guide to the proportional hazards model. Clin. Invest. Med. 5, 63–8.
Harrell, F. E. (1986) SUGI Supplemental Library User’s Guide, Version 5 Edition. Cary, SAS Institute Inc., pp. 269–93.
Ojo, A. O., Held, P. J., Port, F. K., Wolfe, R. A., Leichtman, A. B., Young, E. W., Arndorfer, J., Christensen, L., and Merion, R. M. (2003) Chronic renal failure after transplantation of a nonrenal organ. N. Engl. J. Med. 349, 931–40.
Baan, C. C., Balk, A. H., Holweg, C. T., van Riemsdijk, I. C., Matt, L. P., Vantrimpont, P. J., Niesters, H. G., and Weimar, W. (2000) Renal failure after clinical heart transplantation is associated with the TGF-beta 1 codon 10 gene polymorphism. J. Heart Lung Transplant. 19, 866–72.
Harrell, F. E., Jr. (2001) Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York, Springer-Verlag.
Katz, M. H. (1999) Multivariable Analysis: A Practical Guide for Clinicians. New York, Cambridge University Press.
Bland, J. M., and Altman, D. G. (2000) Statistics notes. The odds ratio. BMJ 320, 1468.
Slattery, M. L., Samowtiz, W., Ma, K., Murtaugh, M., Sweeney, C., Levin, T. R., and Neuhausen, S. (2004) CYP1A1, cigarette smoking, and colon and rectal cancer. Am. J. Epidemiol. 160, 842–52.
Hishida, A., Matsuo, K., Tajima, K. Ogura, M., Kagami, Y., Taji, H., Morishima, Y., Emi, N., Naoe, T., and Hamajima, N. (2004) Polymorphisms of p53 Arg72Pro, p73 G4C14-to-A4T14 at exon 2 and p21 Ser31Arg and the risk of non-Hodgkin’s lymphoma in Japanese. Leuk. Lymphoma 45, 957–64.
Harrell, F. E., Jr., Lee, K. L., and Pollock, B. G. (1988) Regression models in clinical studies: determining relationships between predictors and response. J. Natl. Cancer Inst. 80, 1198–202.
Kleinbaum, D. G. (1994) Logistic Regression: A Self-learning Text. New York, Springer-Verlag.
Dupont, W. D. (2002) Statistical Modeling for Biomedical Researchers. Cambridge, Cambridge University Press.
Hosmer, D. W., and Lemeshow, S. (2000) Applied Logistic Regression. New York, John Wiley & Sons.
Nick, T. G., and Hardin, J. M. (1999) Regression modeling strategies: an illustrative case study from medical rehabilitation outcomes research. Am. J. Occup. Ther. 53, 459–70.
Walter, S. D., Feinstein, A. R., and Wells, C. K. (1987) Coding ordinal independent variables in multiple regression analyses. Am. J. Epidemiol. 125, 319–23.
Ford, E. S., Mokdad, A. H., and Liu, S. (2005) Healthy Eating Index and C-reactive protein concentration: findings from the National Health and Nutrition Examination Survey III, 1988–1994. Eur. J. Clin. Nutr. 59, 278–83.
Menard, S. (2004) Six approaches to calculating standardized logistic regression coefficients. Am. Stat. 58, 218–23.
Harrell, F. E., Jr., Lee, K. L., and Mark, D. B. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15, 361–87.
Adams, R. A., Sherer, M., Struchen, M. A., and Nick, T. G. (2004) Post-acute brain injury rehabilitation for patients with stroke. Brain Inj. 18, 811–23.
Sherer, M., Hart, T., and Nick, T. G. (2003) Measurement of impaired self-awareness after traumatic brain injury: a comparison of the patient competency rating scale and the awareness questionnaire. Brain Inj. 17, 25–37.
Sherer, M., Hart, T., Nick, T. G., et al. (2003) Early impaired self-awareness after traumatic brain injury. Arch. Phys. Med. Rehabil. 84, 168–76.
Ottenbacher, K. J., Ottenbacher, H. R., Tooth, L., and Ostir, G. V. (2004) A review of two journals found that articles using multivariable logistic regression frequently did not report commonly recommended assumptions. J. Clin. Epidemiol. 57, 1147–52.
Lang, T. A., and Secic, M. (1997) How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers. Philadelphia, American College of Physicians.
Bagley, S. C., White, H., and Golomb, B. A. (2001) Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J. Clin. Epidemiol. 54, 979–85.
Concato, J., Feinstein, A. R., and Holford, T. R. (1993) The risk of determining risk with multivariable models. Ann. Intern. Med. 118, 201–10.
Riester, K. L. A., Peduzzi, P., Holford, T. R., Ellison, R. T., 3rd, and Donta, S. T. (1997) Statistical evaluation of the role of Helicobacter pylori in stress gastritis: applications of splines and bootstrapping to the logistic model. J. Clin. Epidemiol. 50, 1273–9.
Matthews, J. N., and Altman, D. G. (1996) Statistics notes. Interaction 2: compare effect sizes not P values. BMJ 313, 808.
Altman, D. G., and Bland, J. M. (2003) Interaction revisited: the difference between two estimates. BMJ 326, 219.
Farewll, V. T. (1998) Interaction, In: Armitage, P., and Colton, T., eds. Encyclopedia of Biostatistics. New York, John Wiley & Sons, pp. 2060–2061.
Pregibon, D. (1981) Logistic regression diagnositcs. Ann. Stat. 9, 705–24.
Hosmer, D. W., Hosmer, T., Le Cessie, S., and Lemeshow, S. (1997) A comparison of goodness-of-fit tests for the logistic regression model. Stat. Med. 16, 965–80.
Ash, A., and Shwartz, M. (1999) R2: a useful measure of model performance when predicting a dichotomous outcome. Stat. Med. 18, 375–84.
Nagelkerke, N. J. D. (1991) A note on a general definition of the coefficient of determination. Biometrika 78, 691–2.
Mittlbock, M., and Schemper, M. (1996) Explained variation for logistic regression. Stat. Med. 15, 1987–97.
Steyerberg, E. W., Harrell, F. E., Jr., Borsboom, G. J., Eijkemans, M. J., Vergouwe, Y., and Habbema, J. D. (2001) Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J. Clin. Epidemiol. 54, 774–81.
Steyerberg, E. W., Bleeker, S. E., Moll, H. A., Grobbee, D. E., and Moons, K. G. (2003) Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J. Clin. Epidemiol. 56, 441–7.
Harrell, F. E. (2005) Design Library. Available at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RS.
Elashoff, J. (2005) nQuery Advisor Version 6.0 User’s Guide. Los Angeles, Statistical Solutions.
Hintze, J. (2002) PASS. Kaysville, NCSS Statistical Software.
Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., and Feinstein, A. R. (1996) A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 49, 1373–9.
Steyerberg, E. W., Eijkemans, M. J., and Habbema, J. D. (1999) Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. J. Clin. Epidemiol. 52, 35–42.
Austin, P. C., and Tu, J. V. (2004) Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J. Clin. Epidemiol. 57, 1138–46.
Ambler, G., Brady, A. R., and Royston, P. (2002) Simplifying a prognostic model: a simulation study based on clinical data. Stat. Med. 21, 3803–22.
Harrell, F. E., Jr., Margolis, P. A., Gove, S., Mason, K. E., Mulholland, E. K., Lehmann, D., Muhe, L., Catchalian, S., and Eichenwald, H. F. (1998) Development of a clinical prediction model for an ordinal outcome: the World Health Organization Multicentre Study of Clinical Signs and Etiological Agents of Pneumonia, Sepsis and Meningitis in Young Infants. WHO/ARI Young Infant Multicentre Study Group. Stat. Med. 17, 909–44.
Moons, K. G., Donders, A. R., Steyerberg, E. W., and Harrell, F. E. (2004) Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J. Clin. Epidemiol. 57, 1262–70.
Steyerberg, E. W., Borsboom, G. J., van Houwelingen, H. C., Eijkemans, M. J., and Habbema, J. D. (2004) Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat. Med. 23, 2567–86.
Antoniadis, A. (2003) Penalized logistic regression and classification of microarray data. Available at http://www.bioconductor.org/workshops/2003/Milan/Lectures/anestisMilan3.pdf. Assessed April 2, 2007.
Campbell, G. (2004) Some statistical and regulatory issues in the evaluation of genetic and genomic tests. J. Biopharm. Stat. 14, 539–52.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Nick, T.G., Campbell, K.M. (2007). Logistic Regression. In: Ambrosius, W.T. (eds) Topics in Biostatistics. Methods in Molecular Biology™, vol 404. Humana Press. https://doi.org/10.1007/978-1-59745-530-5_14
Download citation
DOI: https://doi.org/10.1007/978-1-59745-530-5_14
Publisher Name: Humana Press
Print ISBN: 978-1-58829-531-6
Online ISBN: 978-1-59745-530-5
eBook Packages: Springer Protocols