Predicting Risk of Sport-Related Concussion in Collegiate Athletes and Military Cadets: A Machine Learning Approach Using Baseline Data from the CARE Consortium Study

Castellanos, Joel; Phoo, Cheng Perng; Eckner, James T.; Franco, Lea; Broglio, Steven P.; McCrea, Mike; McAllister, Thomas; Wiens, Jenna

doi:10.1007/s40279-020-01390-w

Predicting Risk of Sport-Related Concussion in Collegiate Athletes and Military Cadets: A Machine Learning Approach Using Baseline Data from the CARE Consortium Study

Original Research Article
Published: 24 December 2020

Volume 51, pages 567–579, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sports Medicine Aims and scope Submit manuscript

Predicting Risk of Sport-Related Concussion in Collegiate Athletes and Military Cadets: A Machine Learning Approach Using Baseline Data from the CARE Consortium Study

Download PDF

Joel Castellanos¹^na1^nAff2,
Cheng Perng Phoo³^na1^nAff4,
James T. Eckner ORCID: orcid.org/0000-0001-9630-0048¹,
Lea Franco¹,
Steven P. Broglio⁵,
Mike McCrea⁶,
Thomas McAllister⁷,
Jenna Wiens³ &
The CARE Consortium Investigators

1402 Accesses
14 Citations
7 Altmetric
Explore all metrics

Abstract

Objective

To develop a predictive model for sport-related concussion in collegiate athletes and military service academy cadets using baseline data collecting during the pre-participation examination.

Methods

Baseline assessments were performed in 15,682 participants from 21 US academic institutions and military service academies participating in the CARE Consortium Study during the 2015–2016 academic year. Participants were monitored for sport-related concussion during the subsequent season. 176 baseline covariates mapped to 957 binary features were used as input into a support vector machine model with the goal of learning to stratify participants according to their risk for sport-related concussion. Performance was evaluated in terms of area under the receiver operating characteristic curve (AUROC) on a held-out test set. Model inputs significantly associated with either increased or decreased risk were identified.

Results

595 participants (3.79%) sustained a concussion during the study period. The predictive model achieved an AUROC of 0.73 (95% confidence interval 0.70–0.76), with variable performance across sports. Features with significant positive and negative associations with subsequent sport-related concussion were identified.

Conclusion(s)

This predictive model using only baseline data identified athletes and cadets who would go on to sustain sport-related concussion with comparable accuracy to many existing concussion assessment tools for identifying concussion. Furthermore, this study provides insight into potential concussion risk and protective factors.

A Machine Learning Approach to Concussion Risk Estimation Among Players Exhibiting Visible Signs in Professional Hockey

Article 17 September 2024

Machine learning methods in sport injury prediction and prevention: a systematic review

Article Open access 14 April 2021

Concussion: Predicting Recovery

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FormalPara Key Points

Using a data-driven approach, we used baseline data representing 18 variable categories to build a predictive model for sport-related concussion in collegiate athletes and military cadets with AUROC = 0.73.
Significant features in the predictive model can provide insight into risk and protective factors for sport-related concussion and be used to generate hypotheses for future research.
This is clinically important because a predictive model capable of identifying athletes at elevated risk for sustaining sport-related concussion can facilitate more targeted injury prevention, education, and surveillance strategies.

1 Introduction

Concussion, or mild traumatic brain injury (mTBI), is a common and serious injury faced by athletes and military personnel alike. The US Centers for Disease Control and Prevention estimates as many as 1.6–3.8 million sport and recreation-related concussions occur annually in the US across all age groups and levels of play [1]. Of these, approximately 10,500 concussions occur annually in the approximately 495,000 collegiate student-athletes competing in NCAA championship sports [2]. TBI is also a serious concern for US military personnel, and has been identified by the US Department of Defense as “one of the signature injuries of troops wounded in Afghanistan and Iraq” [3]. Approximately 82.3% of traumatic brain injuries sustained by Active-Duty US Service Members are mTBI/concussions, often occurring outside of combat with a mechanism similar to that in athletes [4]. In the US military service academies, non-varsity-athlete cadets frequently sustain concussions during club/intramural sport participation as well as military training.

In the short term, concussion causes a constellation of physical, cognitive, affective, and sleep-related signs and symptoms that may limit an individual’s ability to participate and perform in school, at work, on the field of play, or on the battlefield. Additionally, there is growing concern that concussions and repetitive sport-associated head trauma may increase an individual’s long-term risk for developing depression [5], dementia [6], and neurodegenerative diseases such as chronic traumatic encephalopathy later in life [7,8,9,10]. While concussions typically occur in the general population as a result of unpredictable traumatic events, athletes and military cadets are placed at risk for concussion during participation in sport activities. Given the myriad of potential short- and long-term consequences of concussion, efforts should be put forth to identify those at greatest risk for sport-related concussion so that targeted injury surveillance, prevention, and education strategies can be optimized.

We are aware of only one small study attempting to use a limited number of baseline factors to identify athletes’ future concussion risk, without success [11]. Broadly speaking, injury prediction models can utilize factors which are either positively or negatively associated with an outcome of interest. In the case of sport-related concussion, other prior work has identified numerous intrinsic (i.e., individual athlete characteristics), and extrinsic (i.e., sport-associated conditions and mechanisms) features as potential risk or protective factors for injury [12,13,14,15,16]. However, many existing studies have limited their study population to a single sex [17, 18], or sport [17,18,19,20,21,22,23,24,25], and those studies assessing large, diverse populations have still considered only a relatively small set of specific risk factors [26,27,28,29]. As such, many potential concussion risk and protective factors remain unexplored.

We, therefore, sought to leverage the Concussion Assessment Research and Education (CARE) Consortium database [30] to develop a predictive model for sport-related concussion using baseline characteristics in NCAA collegiate athletes and US military cadets. We hypothesized the information commonly collected in athletes and cadets during their baseline pre-participation examinations could be used to predict subsequent concussions in this cohort. Using machine learning techniques, we developed an interpretable risk stratification model based on a large number of baseline covariates. By inspecting the model, we also sought insight into potential risk and protective factors for sport-related concussion which can be utilized to generate novel hypotheses for future study.

2 Methods

2.1 General Study Design

This observational study utilized CARE Consortium data collected at the 21 US academic institutions and military academies participating in the study during the 2015–2016 academic year. Participating student-athletes and military service academy cadets at each institution completed a preseason baseline assessment as part of the pre-participation examination process and were then prospectively monitored for concussion. Baseline assessments included a combination of “Level A” assessment measures common across all CARE Consortium Sites (demographics; medical, sport, academic, and family history; Brief Sensation Seeking Scale [BSSS]; Sport Concussion Assessment Tool [SCAT] Symptom Checklist; Brief Symptom Inventory [BSI]; Standardized Assessment of Concussion [SAC]; Balance Error Scoring System [BESS]; and computerized neurocognitive assessment) as well as optional “Level B” assessment measures collected at each site’s discretion (clinical reaction time, advanced measures of postural stability, oculomotor/oculovestibular assessments, and/or quality of life). One of four computerized concussion tests (ImPACT, Axon/CogState, CNS Vital Signs, or Automated Neuropsychological Assessment Metrics [ANAM]) was administered at each CARE Consortium site. Additional CARE Consortium Study details, including a description of Level A and B measures, are available from Broglio et al. [30]. Institutional Review Board (IRB) approval was obtained at the lead study site, with US Department of Defense Human Research Protection Office approval as well as local IRB approval at each participating site. This study was performed in accordance with the standards of ethics outlined in the Declaration of Helsinki.

2.2 Participants

All CARE Consortium Study participants enrolled during the 2015–2016 academic year were eligible for inclusion in this analysis. All cadets are eligible to consent for CARE participation at the US military service academies, regardless of varsity athlete status, and all varsity student-athletes are eligible to consent at the traditional colleges and universities. The only criterion for exclusion from this analysis was failure to complete the 2015–2016 baseline CARE assessment prior to sustaining a concussion during the study period. A total of 15,682 participants were included in the final dataset. All participants provided informed written consent.

2.3 Primary Outcome

The primary study outcome was sustaining a clinician-diagnosed concussion, as defined by consensus definition produced through evidence-based guidelines and adopted by the CARE Consortium Study group [30, 31], as a consequence of sport participation between August 1st, 2015 and July 31st, 2016. Participants sustaining a concussion during any form of sport participation (i.e., game or competition at varsity or non-varsity levels of play) were considered positive for the primary study outcome and negative otherwise, including participants who sustained concussions unrelated to sport.

2.4 Covariates

We considered 176 baseline covariates for each subject, grouped into the following categories: academic or military institution, demographic variables, anthropometric variables, academic variables, primary and secondary sports, primary sport position, primary sport equipment details, concussion history, personal medical history, medications, family medical history, social history, self-reported concussion symptoms (SCAT symptom checklist), psychological and quality of life assessment results, neuro-cognitive assessment results, computerized concussion test results, balance assessment results, vision/vestibular–ocular test results (Table 1). Prior to analysis, the data were cleaned to remove obviously incorrect responses. We then mapped each variable to a set of binary features. Categorical variables were mapped to one binary feature per category. Continuous variables were first discretized based on quintile ranges, or were alternatively classified into discrete bins using clinically relevant cutoffs (e.g., normal vs. abnormal), when available. Clinically relevant cut-offs were applied for the Brief Symptom Inventory, Hospital Anxiety and Depression Scale, and Vestibular Ocular Motor Screening examination. Continuous data were then mapped to one binary feature for each discrete range/bin of each variable. This discretization procedure allowed the incorporation of domain knowledge (i.e., clinically meaningful thresholds) and extended the linear model’s capacity to capture nonlinear relationships that could not be identified in the original continuous feature space. All baseline variables and their associated binary features are listed in Supplementary Appendices A and B.

Table 1 Baseline covariates

Full size table

2.5 Data Analysis

2.5.1 Learning Algorithm

We used a linear support vector machine (SVM) model [32] to stratify subjects’ risk for sport-related concussion based on their baseline data. Nonlinear models were also considered, but performed similar to the SVM model (please refer to Supplementary Appendix C for additional detail regarding nonlinear model performance). As such, only the SVM results are reported given the greater interpretability of the SVM model, i.e., the potential that learned model coefficients may provide insight into potential risk or protective factors [33]. We assumed risk and protective factors were shared across sports, thus framing the problem as a single task and we therefore used a single task learning (STL) model. Sport-specific models were also considered, but performed similar to the STL model, likely in part because very few concussions occurred in some sports (please refer to Supplementary Appendix D for additional detail regarding sport-specific models). As such, only the STL model results are reported. We implemented our linear model using the popular Python Machine Learning package—scikit-learn (https://scikit-learn.org/stable/about.html).

2.5.2 Model Selection

We trained the model by repeating the following process 20,000 times (note: 20,000 repetitions were performed so we could assess the statistical significance of the predictive factors, as described in Sect. 2.5.3):

1.
Split. Data were split into a training set and a held-out test set such that approximately 80% of participants contributed to the training set and approximately 20% to the test set. For sports with less than 10 participants, all participants were included in the training set. For sports with more than 10 participants, we employed stratified splits to ensure equal proportions of positive and negative examples (i.e., participants with and without sport-related concussions) between the training and test sets, except when less than 5 positive examples were present (i.e., less than 5 participants sustained a concussion), in which case we used random splits.
2.
Train. A model was trained (i.e., learned the model parameters) using the training data set with a pre-selected hyperparameter to control for the tradeoff between regularization and loss (i.e., to optimize model performance without overfitting). Such a setup is essential in high-dimensional settings to avoid simply memorizing the data.

We pre-selected the hyperparameter C by repeated nested cross-validation with a grid search (i.e., C = [10⁻⁶,10⁻⁵,10⁻⁴,10⁻³,10⁻², 10⁻¹]) and five-fold cross validation [34] on the training data-set for 100 splits. For computational efficiency, we used the mode of the resultant 100 hyperparameters (C = 10⁻⁴) as the pre-selected hyperparameter for all 20,000 models (please refer to Supplementary Appendix E for additional detail regarding hyperparameter selection).
3.
Test. The trained model was applied to the test data-set to evaluate its performance, as quantified using the area under the receiver operating characteristics curve (AUROC). AUROC is a common measure of a model’s ability to discriminate positive vs. negative examples (i.e., participants with vs. without sport-related concussion) [35] and intuitively represents the probability of the model correctly ranking two randomly chosen examples. A model with AUROC of 1 would be perfect, while a model with AUROC of 0.5 would perform no better than chance.

2.5.3 Model Performance and Predictive Factor Analysis

By repeating the above process 20,000 times, we produced an empirical distribution of model performance allowing calculation of the mean AUROC with an associated 95% confidence interval. The confidence interval was calculated using the 2.5th percentile and 97.5th percentile values of the empirical distribution of AUROCs. Using a linear model allowed us to investigate the learned model parameters, thus identifying potential risk/protective factors. We considered those features with the same sign over the 20,000 repeated runs as statistically significant since features with the same sign over k runs would have a p value of at most 1/k when testing a null hypothesis of zero effect size for that feature. Therefore, to claim statistical significance at a p value < 5.22 × 10⁻⁵ (based on a Bonferroni Correction to account for comparisons over 957 features, 0.05/957 = 5.22 × 10⁻⁵), k must be at least 19,140 (equivalent to 1/k < 0.05/957) which we elected to round up to a total of 20,000 repetitions of the experiment.

3 Results

Of the 15,682 study participants included in this analysis, 595 (3.79%) sustained a subsequent sport-related concussion during the study period. Characteristics of the study population are presented in Table 2.

Table 2 Study population characteristics

Full size table

After preprocessing, our model included 957 binary features (Supplementary Appendices A and B). After splitting the data, our training sets contained 12,539 baseline records (476 positive for sport-related concussion) and our test sets contained 3143 baselines (119 positive for concussion), on average. Applied to the test sets, our model achieved a mean AUROC of 0.73 [95% CI 0.70–0.76]. Figure 1 presents the mean ROC curve. Model performance varied across sports, as did the number of participants available for both the training and test data sets (Fig. 2). Despite American football having the greatest number of concussions, model performance was strongest for Swimming (AUROC = 0.86 [95% CI 0.61–1.00]) and weakest for Cheerleading (AUROC = 0.41 [95% CI 0–1.00]), where the model performed no better than chance.

The mean effect sizes of those features reaching statistical significance is illustrated in Fig. 3. Features with a negative effect sizes can be interpreted as protective (i.e., reduce estimated risk of concussion), while features with a positive effect size are risk factors (i.e., increase estimated risk of concussion). Of the 259 statistically significant features in the model, 179 are protective and 80 are risk factors. The most heavily weighted risk and protective factors reaching statistical significance in each variable category are listed in Table 3.

Table 3 Statistically significant risk and protective factors of greatest mean effect size (mES) by variable category

Full size table

4 Discussion

Concussion is a serious injury affecting athletes and military service members that can be associated with significant short- and long-term morbidity [36, 37]. Since many concussions occur during sport participation, the ability to identify athletes at elevated risk for sustaining sport-related concussion would be of significant clinical value. Early identification of at-risk athletes could allow sports medicine providers to apply more targeted preventative measures, educational intervention, and injury surveillance strategies, and has the potential to influence injury management to prevent re-injury. To this end, we sought to develop a risk-stratification tool capable of identifying athletes and cadets at elevated risk for sport-related concussion. To our knowledge, this is the first successful attempt to develop a model for predicting concussions from prospectively collected baseline data. Using only data elements collected during athletes’ and cadets’ baseline pre-participation examinations, we developed a risk-stratification model that was able to predict those who would go on to sustain a sport-related concussion during the same academic year with an overall mean AUROC of 0.73. This is remarkable considering this model’s ability to predict future concussions falls within the range of sensitivities and specificities demonstrated by existing concussion assessment tools for distinguishing already concussed from non-concussed athletes [38,39,40,41]. The variability in model performance across sports is not surprising given the large differences in the number of athletes participating in each sport as well as the number of observed concussions.

While the primary goal of this study was to develop a predictive algorithm for sport-related concussion, evaluation of heavily weighted factors in the model can provide insight and generate hypotheses regarding potential concussion risk and protective factors. In a recent evidence-based systematic review of risk factors for sport concussion, Abrahams et al. identified only two high-certainty concussion risk factors: having sustained two or more previous concussions and match vs. practice play [12]. Of these, only concussion history is a baseline characteristic. Not surprisingly, the current investigation also identified concussion history as a significant predictor of subsequent sport-related concussion, and not having a concussion history as protective. With respect to concussion history, recent work has demonstrated that sustaining a first concussion at an earlier age is associated with a higher number of subsequent concussions [42]. Similarly, while more specific to repetitive sport-associated head impact exposure than diagnosed concussions, other evidence in retired professional American football athletes has implicated earlier age of first exposure to tackle football as a potential risk factor for adverse long-term neuroanatomical, neurocognitive, and neuropsychiatric outcomes [43,44,45,46]. It is, therefore, interesting to note in the present investigation that earlier age of first concussion did not receive a significant weight in the model. Other results from the CARE Consortium and elsewhere have similarly failed to identify an association between age of first exposure and outcomes during adolescence/young adulthood [47,48,49].

In addition to those associated with concussion history, some other heavily weighted factors in this study came as little surprise. Concussion rates are consistently found to differ across sports, with contact and collision sports associated with greater concussion risk compared to sports with limited or no contact [50,51,52]. In keeping with previous research, [26, 53], this investigation also found a number of contact/collision sports were associated with an increased concussion risk (e.g., soccer, basketball, American football, water polo, rugby, diving, lacrosse), while a number of non-contact sports were associated with decreased risk (e.g., cross-country/track, rowing, swimming, baseball, golf, field events, tennis). Additionally, this study identified female sex as a concussion risk factor, with male sex being protective. This sex-based result is consistent with previous research comparing concussion rates between males and females in sex-comparable sports with similar rules and physicality between males and females [13, 53, 54], but the present study extends this finding by analyzing concussion risk across all sports.

It is interesting to note that reporting higher baseline symptoms (3–5 positive symptoms on the SCAT Symptom Checklist; SCAT Symptom Severity Score 8–86) tended to be associated with higher weight in the model, while lower baseline concussion symptoms (SCAT Symptom Severity Score of 2–3) tended to be associated with negative weight. In addition, several significant predictive factors identified in this study are associated with neurological comorbidities generally considered to represent concussion modifying factors. For example, the model identified the presence of migraine headache at baseline, either by self-report of a prior medical diagnosis or a positive response to the ID Migraine Questionnaire [55] as a heavily weighted concussion risk factor, while the absence of migraine headaches was protective. These findings offer prospective evidence that is consistent with previous speculation for migraine headache as a potential concussion risk factor [56, 57]. In a similar manner, this model identified a history of ADHD [58], or learning disability at baseline as associated with higher risk, while the absence of both conditions at baseline was protective. While a comparable trend was present for history of depression, only the absence of depression at baseline reached statistical significance as a protective factor after Bonferroni correction. Also, it is interesting to note while a normal baseline Hospital Anxiety and Depression Scale (HADS)-Anxiety score at baseline was a significant protective factor, the risk associated with HADS-Anxiety scores falling in the abnormal or borderline-abnormal ranges failed to reach statistical significance as a risk factor.

It is important to recognize that a feature’s significance as a risk or protective factor in this predictive statistical model does not imply causality. In fact, it is likely that a number of significant predictive factors in the model are correlated with concussion risk in the absence of any clinically interpretable causal relationship. For example, an abnormal Anxiety score on the BSI-I8 at baseline was identified a significant protective factor, in contrast to the decreased risk associated with normal HADS-Anxiety scores described above. Furthermore, the model identified being right-handed as a significant protective factor, not using marijuana or alcohol in the previous month as significant risk factors, and missing answers to the marijuana and alcohol questions as significantly protective. In addition, the model identified many academic features as significant predictors of sport-related concussion. For example, several factors corresponding to higher academic performance (e.g., having a high-school GPA in the 4th quintile or a collegiate GPA in the 4th or 5th quintile; having an ACT-Math score in the 4th or 5th quintile, an ACT-Science score in the 3rd or 5th quintile, or an ACT-Reading score in the 4th quintile) were associated with lower risk of concussion, while having a total ACT score in the 1st quintile, corresponding to lower academic performance, was associated with higher risk. The model also identified several significant factors associated with age. Specifically, freshman class status and being in the lowest age quintile (age ≤ 18 years) were significant concussion risk factors, while senior and “fifth-year-senior” class status as well as being in the fourth or fifth age quintiles (20 years < age ≤ 21 years; 21 years < age ≤ 30 years) were identified as being protective. One might hypothesize the age/class-status trend to be associated either with physical maturation over a collegiate career or a “weed-out” effect with some athletes who sustained early concussions not continuing to compete through their entire academic careers. However, in many cases significant model observations fail to suggest clinically relevant hypotheses.

It is also noteworthy that a high-degree of collinearity was present among many of the baseline variables assessed. As such, the significance of some heavily -weighted factors in the model has a high likelihood of being attributable to an association with other more obvious risk or protective factors. For example, the observation that wearing protective gear and wearing a mouth guard are both concussion risk factors is likely attributable to protective equipment and mouth guards both being more frequently worn in higher risk contact sports. Furthermore, the observation of African-American race and weight falling into the 5th quintile are concussion risk factors may be attributed to disproportionate participation in American football by African-Americans (about one-third of African-Americans in the study population were football players) and athletes in the highest quintile for weight (about half of the athletes in this weight quintile were football players).

Ultimately, individual features, regardless of their weight, must be interpreted in the context of the entire model and not as independent risk or predictive factors. As such, it cannot be assumed based on the present results that intervening on a potentially modifiable risk or protective factor would necessarily influence an athlete’s future concussion risk. Such a conclusion would require additional support from a prospective intervention trial. Identification of heavily weighted factors, especially when unanticipated, should prompt novel hypothesis generation and future concussion research. For example, based on the greater risk of sport-related concussion observed in younger freshman athletes and cadets, it would be reasonable for future research to investigate the more gradual incorporation of incoming freshman into varsity collegiate sport participation as a concussion risk reduction strategy.

This study was not without limitations. Despite likely differences in the mechanisms of injury causing concussions in different sports, as well as differences between the varsity and non-varsity levels of play, we were not able to develop sport-specific models. This was largely due to variability in the number of athletes participating in different sports and a small number of positive examples (i.e., concussions) in some sports, leading to over-fitted models that did not generalize well to test data. Other factors were also likely at play, given that American football, which included more participants than any other single varsity sport as well as the greatest number of concussions, had an AUROC falling below that of the full model. Given challenges in developing sport-specific models, we employed a model trained using data from all sports at both the varsity and non-varsity levels of play. With additional data, future research could seek to develop sport- and level-of-play-specific models, as well models specific to military cadets, which might have greater predictive ability. Another study limitation was the presence of “missing data” and the potential for non-values to be coded in more than one way for many variables. For example, non-values could potentially be coded as “skipped” (i.e., a “skip this question” response was selected/provided), “missing” (i.e., no response was selected/provided), “unknown” (i.e., a “don’t know” response was selected/provided), or “N/A” (i.e., the question did not apply to the athlete/cadet; e.g., collegiate GPA for an incoming freshman, helmet type for an athlete participating in a non-helmeted sport, or results of a “Level B” measure or computerized concussion test not used at the athlete’s/cadet’s institution) for many variables. Given that missing data rates varied greatly across features (from a minimum of 0.6% to a maximum of 99.9%), we elected to analyze the dataset retaining all potential non-value options to account for the possibility they might contain predictive information. However, it is challenging to develop clinically relevant hypotheses for most significant non-value features identified in the model. In addition, it is also possible that some inaccurate self-reported information may have remained in the dataset despite our careful review and attempt to remove obviously incorrect data. It would be impractical to independently review records to verify all self-reported information collected during CARE baseline assessments so self-reporting errors are possible. Furthermore, while we attempted to discretize continuous variables using clinically relevant cut-offs when available, our procedure for mapping other continuous variables to discrete sets of binary features using quintile ranges may not have captured clinically important or statistically significant cut-offs in some cases. We elected to use quintiles because of their standardization and reproducibility, but other discretization strategies may have yielded different results. Next, while the effect size values reported in Supplementary Appendices A and B are meaningful for comparing the relative model weights between features in the present study, they cannot be interpreted outside the context of this study in the same way as standardized effect size values like Cohen’s D. In addition, these results should not be extrapolated beyond a collegiate athlete/military cadet population. Even within a population of collegiate athletes and military cadets, it would be challenging to apply this model outside of the CARE Consortium study given the number of baseline variables used as model inputs, many of which are not routinely collected outside of the CARE study protocol. Future work should build on the present results to develop more streamlined models utilizing a subset of the most predictive features so that the models can more easily be applied in a routine clinical setting. In a similar vein, these results should not be extrapolated to non-sport concussions. In fact, since this study focused only on sport-related concussion, it is possible that some participants considered negative for the primary study outcome could potentially have sustained concussions outside of sport participation during the study period. Lastly, given potential athlete under-reporting of concussion symptoms and because there is no objective confirmatory test for concussion, the concussion diagnosis relies both on accurate injury identification as well as the clinical impression of the evaluating medical provider. While issues of potential missed injuries and diagnostic uncertainty are common across all concussion research, the CARE Consortium’s use of a standard concussion definition by all sites should at least mitigate the potential for diagnostic uncertainty. Nonetheless, any diagnostic inaccuracy is undesirable in the context of developing a data-driven risk prediction model where an accurate classification of “case-ness” is relied upon heavily for the analysis and will render injury prediction and evaluation of those predictions more challenging.

5 Conclusion

This collaborative data-driven study leverages powerful analytical techniques and a robust clinical dataset to develop a novel model for predicting collegiate athletes’ and military cadets’ risk of sustaining sport-related concussion. This is clinically important because, to our knowledge, it represents the first successful attempt to predict sport-related concussion using only baseline data. As such tools are developed and refined, clinicians will increasingly need to determine how to apply them in clinical practice. Might a future model perform so accurately that one day certain athletes would be restricted from participating in certain sports due to the amount of model-predicted risk? For now, this study suggests it is feasible to identify athletes at elevated risk of sport-related concussion in whom targeted prevention, education, and injury surveillance strategies may be employed. Furthermore, this study offers insight into potential risk and protective factors for sport-related concussion, generating novel hypotheses for future concussion research. Future work is needed to develop a predictive model that can be easily applied in routine clinical practice, as well as to identify modifiable concussion risk factors that can be intervened upon to modify an athlete’s injury risk.

References

Langlois JA, Rutland-Brown W, Wald MM. The epidemiology and impact of traumatic brain injury: a brief overview. J Head Trauma Rehabil. 2006;21(5):375–8.
PubMed Google Scholar
National Collegiate Athletic Association. Number of NCAA college athletes reaches all-time high. 2018.
US Department of Defense. Special report—traumatic brain injury. 2015.
BrainLine, How Many Service Members Have Sustained a TBI? Defense Medical Surveillance System, 2018.
Guskiewicz KM, et al. Recurrent concussion and risk of depression in retired professional football players. Med Sci Sports Exerc. 2007;39(6):903–9.
PubMed Google Scholar
Guskiewicz KM, et al. Association between recurrent concussion and late-life cognitive impairment in retired professional football players. Neurosurgery. 2005;57(4):719–26 (discussion 719–2).
PubMed Google Scholar
McKee AC, et al. Chronic traumatic encephalopathy in athletes: progressive tauopathy after repetitive head injury. J Neuropathol Exp Neurol. 2009;68(7):709–35.
PubMed Google Scholar
McKee AC, et al. The spectrum of disease in chronic traumatic encephalopathy. Brain. 2013;136(Pt 1):43–64.
PubMed Google Scholar
McKee AC, et al. The neuropathology of chronic traumatic encephalopathy. Brain Pathol. 2015;25(3):350–64.
CAS PubMed PubMed Central Google Scholar
Mez J, et al. Clinicopathological evaluation of chronic traumatic encephalopathy in players of American Football. JAMA. 2017;318(4):360–70.
PubMed PubMed Central Google Scholar
Caccese JB et al. Does baseline concussion testing aid in identifying future concussion risk? Res Sports Med. 2019:1–6.
Abrahams S, et al. Risk factors for sports concussion: an evidence-based systematic review. Br J Sports Med. 2014;48(2):91–7.
PubMed Google Scholar
Harmon KG, et al. American Medical Society for Sports Medicine position statement: concussion in sport. Br J Sports Med. 2013;47(1):15–26.
PubMed Google Scholar
Kerr HA. Concussion risk factors and strategies for prevention. Pediatr Ann. 2014;43(12):e309–15.
PubMed Google Scholar
Noble JM, Hesdorffer DC. Sport-related concussions: a review of epidemiology, challenges in diagnosis, and potential risk factors. Neuropsychol Rev. 2013;23(4):273–84.
PubMed Google Scholar
Finnoff JT, Jelsing EJ, Smith J. Biomarkers, genetics, and risk factors for concussion. PM R. 2011;3(10 Suppl 2):S452–9.
PubMed Google Scholar
Hollis SJ, et al. Incidence, risk, and protective factors of mild traumatic brain injury in a cohort of Australian nonprofessional male rugby players. Am J Sports Med. 2009;37(12):2328–33.
PubMed Google Scholar
Schneider KJ, et al. Preseason reports of neck pain, dizziness, and headache as risk factors for concussion in male youth ice hockey players. Clin J Sport Med. 2013;23(4):267–72.
PubMed Google Scholar
Curran-Sills G, Abedin T. Risk factors associated with injury and concussion in sanctioned amateur and professional mixed martial arts bouts in Calgary, Alberta. BMJ Open Sport Exerc Med. 2018;4(1):e000348.
PubMed PubMed Central Google Scholar
Yeung A, Munjal V, Virji-Babul N. Development of the sports organization concussion risk assessment tool (SOCRAT). Brain Inj. 2017;31(4):542–9.
CAS PubMed Google Scholar
Baugh CM, et al. Frequency of head-impact-related outcomes by position in NCAA division I collegiate football players. J Neurotrauma. 2015;32(5):314–26.
PubMed PubMed Central Google Scholar
Black AM, et al. Policy change eliminating body checking in non-elite ice hockey leads to a threefold reduction in injury and concussion risk in 11- and 12-year-old players. Br J Sports Med. 2016;50(1):55–61.
PubMed Google Scholar
Blumenfeld RS, et al. The epidemiology of sports-related head injury and concussion in water polo. Front Neurol. 2016;7:98.
PubMed PubMed Central Google Scholar
Hollis SJ, et al. Mild traumatic brain injury among a cohort of rugby union players: predictors of time to injury. Br J Sports Med. 2011;45(12):997–9.
PubMed Google Scholar
Teramoto M, et al. Style of play and rate of concussions in the national football league. Orthop J Sports Med. 2015;3(12):2325967115620365.
PubMed PubMed Central Google Scholar
Collins CL, et al. Neck strength: a protective factor reducing risk for concussion in high school sports. J Prim Prev. 2014;35(5):309–19.
PubMed Google Scholar
Dretsch MN, et al. Evaluating the clinical utility of the Validity-10 for detecting amplified symptom reporting for patients with mild traumatic brain injury and comorbid psychological health conditions. Appl Neuropsychol Adult. 2017;24(4):376–80.
PubMed Google Scholar
Van Pelt KL, et al. A cohort study to identify and evaluate concussion risk factors across multiple injury settings: findings from the CARE Consortium. Inj Epidemiol. 2019;6(1):1.
PubMed PubMed Central Google Scholar
Schulz MR, et al. Incidence and risk factors for concussion in high school athletes, North Carolina, 1996–1999. Am J Epidemiol. 2004;160(10):937–44.
PubMed Google Scholar
Broglio SP, et al. A national study on the effects of concussion in collegiate athletes and US military service academy members: the NCAA-DoD concussion assessment, research and education (CARE) consortium structure and methods. Sports Med. 2017;47(7):1437–51.
PubMed PubMed Central Google Scholar
Carney N, et al. Concussion guidelines step 1: systematic review of prevalent indicators. Neurosurgery. 2014;75(Suppl 1):S3–15.
PubMed Google Scholar
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
Google Scholar
Ustun B, Traca S, Rudin C. Cynthia supersparse linear integer models for interpretable classification. 2014. arXiv:1306.6677.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th international joint conference on Artificial intelligence—Volume 2. 1995, Morgan Kaufmann Publishers Inc.: Montreal, Quebec, Canada. p. 1137–1143.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.
CAS PubMed Google Scholar
Wasserman EB, et al. Academic dysfunction after a concussion among US High School and College Students. Am J Public Health. 2016;106(7):1247–53.
PubMed PubMed Central Google Scholar
Manley G, et al. A systematic review of potential long-term effects of sport-related concussion. Br J Sports Med. 2017;51(12):969–77.
PubMed Google Scholar
Barlow M, et al. Differences in change scores and the predictive validity of three commonly used measures following concussion in the middle school and high school aged population. Int J Sports Phys Ther. 2011;6(3):150–7.
PubMed PubMed Central Google Scholar
Randolph C, et al. Concussion symptom inventory: an empirically derived scale for monitoring resolution of symptoms following sport-related concussion. Arch Clin Neuropsychol. 2009;24(3):219–29.
PubMed PubMed Central Google Scholar
Barr WB, McCrea M. Sensitivity and specificity of standardized neurocognitive testing immediately following sports concussion. J Int Neuropsychol Soc. 2001;7(6):693–702.
CAS PubMed Google Scholar
Mucha A, et al. A Brief Vestibular/Ocular Motor Screening (VOMS) assessment to evaluate concussions: preliminary findings. Am J Sports Med. 2014;42(10):2479–86.
PubMed PubMed Central Google Scholar
Schmidt JD, et al. Age at first concussion influences the number of subsequent concussions. Pediatr Neurol. 2018;81:19–24.
PubMed Google Scholar
Stamm JM, et al. Age at first exposure to football is associated with altered corpus callosum white matter microstructure in former professional football players. J Neurotrauma. 2015;32(22):1768–76.
PubMed PubMed Central Google Scholar
Stamm JM, et al. Age of first exposure to football and later-life cognitive impairment in former NFL players. Neurology. 2015;84(11):1114–20.
PubMed PubMed Central Google Scholar
Alosco ML, et al. Age of first exposure to American football and long-term neuropsychiatric and cognitive outcomes. Transl Psychiatry. 2017;7(9):e1236.
CAS PubMed PubMed Central Google Scholar
Schultz V, et al. Age at first exposure to repetitive head impacts is associated with smaller thalamic volumes in former professional american football players. J Neurotrauma. 2018;35(2):278–85.
PubMed PubMed Central Google Scholar
Caccese JB, et al. Estimated age of first exposure to american football and neurocognitive performance amongst NCAA male student-athletes: a cohort study. Sports Med. 2019;49(3):477–87.
PubMed Google Scholar
Caccese JB et al. Estimated age of first exposure to contact sports is not associated with greater symptoms or worse cognitive functioning in male U.S. Service Academy Athletes. J Neurotrauma. 2020;37(2):334–9.
PubMed Google Scholar
Brett BL, et al. Age of first exposure to american football and behavioral, cognitive, psychological, and physical outcomes in high school and collegiate football players. Sports Health. 2019;11(4):332–42.
PubMed PubMed Central Google Scholar
Koh JO, Cassidy JD, Watkinson EJ. Incidence of concussion in contact sports: a systematic review of the evidence. Brain Inj. 2003;17(10):901–17.
PubMed Google Scholar
Hootman JM, Dick R, Agel J. Epidemiology of collegiate injuries for 15 sports: summary and recommendations for injury prevention initiatives. J Athl Train. 2007;42(2):311–9.
PubMed PubMed Central Google Scholar
Marar M, et al. Epidemiology of concussions among United States high school athletes in 20 sports. Am J Sports Med. 2012;40(4):747–55.
PubMed Google Scholar
Giza CC, et al. Summary of evidence-based guideline update: evaluation and management of concussion in sports: Report of the Guideline Development Subcommittee of the American Academy of Neurology. Neurology. 2013;80(24):2250–7.
PubMed PubMed Central Google Scholar
Dick RW. Is there a gender difference in concussion incidence and outcomes? Br J Sports Med. 2009;43(Suppl 1):i46–50.
PubMed Google Scholar
Lipton RB, et al. A self-administered screener for migraine in primary care: the ID Migraine validation study. Neurology. 2003;61(3):375–82.
CAS PubMed Google Scholar
Eckner JT, et al. Is migraine headache associated with concussion in athletes? A case-control study. Clin J Sport Med. 2017;27(3):266–70.
PubMed PubMed Central Google Scholar
Gordon KE, Dooley JM, Wood EP. Is migraine a risk factor for the development of concussion? Br J Sports Med. 2006;40(2):184–5.
CAS PubMed PubMed Central Google Scholar
Alosco ML, Fedor AF, Gunstad J. Attention deficit hyperactivity disorder as a risk factor for concussions in NCAA division-I athletes. Brain Inj. 2014;28(4):472–4.
PubMed Google Scholar

Download references

Acknowledgements

This project was supported, in part, with support from the Grand Alliance Concussion Assessment, Research, and Education (CARE) Consortium, funded, in part by the National Collegiate Athletic Association (NCAA) and the Department of Defense (DOD). The U.S. Army Medical Research Acquisition Activity, 820 Chandler Street, Fort Detrick MD 21702-5014 is the awarding and administering acquisition office. This work was supported by the Office of the Assistant Secretary of Defense for Health Affairs through the Combat Casualty Care Program, endorsed by the Department of Defense under Award No. W81XWH-BA170608. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the Office of the Assistant Secretary of Defense for Health Affairs.

The authors thank Kaitlyn Carter (Azusa Pacific University); Jennifer Brewington Dickerson, Jody Harland, Nicole Johnson, Janetta Matesan, Nicholas Port, Larry Riggen (Indiana University); Margot Putukian (Princeton University); Gerald McGinty (United States Air Force Academy); Patrick G. O’Donnell, Carlos Esteves (United States Coast Guard Academy); Ken Cameron (United States Military Academy); Tom Kaminski (University of Delaware); Julianne Schmidt (University of Georgia); Josh Goldman (University of California Los Angeles); Ashley Rettmann (University of Michigan); Kevin Guskiewicz (University of North Carolina at Chapel Hill); Scott Anderson (University of Oklahoma); Jeffery J Bazarian (University of Rochester); Sara Chrisman (University of Washington); Alison Brooks (University of Wisconsin); Stefan Duma (Virginia Polytechnic Institute and State University); and research and medical staff at each of the CARE participation sites.

Care Consortium Investigators are listed alphabetically by institution: April (Reed) Hoy, Azusa Pacific University; Louise Kelly, California Lutheran University; Jonathan Jackson, United States Air Force Academy; Tim Kelly, United States Military Academy; Thomas Buckley, University of Delaware; James (Jay) R. Clugston, University of Florida; Justus Ortega, Humboldt State University; Anthony Kontos, University of Pittsburgh; Christopher C. Giza, University of California Los Angeles; Jason Mihalik, University of North Carolina at Chapel Hill; Steve Rowson, Virginia Polytechnic Institute and State University.

Author information

Joel Castellanos
Present address: Anestheshiology, School of Medicine, University of California San Diego, San Diego, CA, USA
Cheng Perng Phoo
Present address: Computer Science, Cornell University, New York, USA
Joel Castellanos and Cheng Perng Phoo are co-first authors.

Authors and Affiliations

Department of Physical Medicine and Rehabilitation, Michigan Medicine, University of Michigan, 325 E. Eisenhower Parkway, Ann Arbor, MI, 48108, USA
Joel Castellanos, James T. Eckner & Lea Franco
Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
Cheng Perng Phoo & Jenna Wiens
Kinesiology, University of Michigan, Ann Arbor, MI, USA
Steven P. Broglio
Neurosurgery, Medical College of Wisconsin, Milwaukee, WI, USA
Mike McCrea
Psychiatry, Indiana University, Indianapolis, IN, USA
Thomas McAllister

Authors

Joel Castellanos
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Perng Phoo
View author publications
You can also search for this author in PubMed Google Scholar
James T. Eckner
View author publications
You can also search for this author in PubMed Google Scholar
Lea Franco
View author publications
You can also search for this author in PubMed Google Scholar
Steven P. Broglio
View author publications
You can also search for this author in PubMed Google Scholar
Mike McCrea
View author publications
You can also search for this author in PubMed Google Scholar
Thomas McAllister
View author publications
You can also search for this author in PubMed Google Scholar
Jenna Wiens
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

The CARE Consortium Investigators

April Hoy
, Louise Kelly
, Jonathan Jackson
, Tim Kelly
, Thomas Buckley
, James R. Clugston
, Justus Ortega
, Anthony Kontos
, Christopher C. Giza
, Jason Mihalik
& Steve Rowson

Corresponding author

Correspondence to James T. Eckner.

Ethics declarations

Funding

This project was supported, in part, with support from the Grand Alliance Concussion Assessment, Research, and Education (CARE) Consortium, funded, in part by the National Collegiate Athletic Association (NCAA) and the Department of Defense (DOD). The U.S. Army Medical Research Acquisition Activity, 820 Chandler Street, Fort Detrick MD 21702-5014 is the awarding and administering acquisition office. This work was supported by the Office of the Assistant Secretary of Defense for Health Affairs through the Combat Casualty Care Program, endorsed by the Department of Defense under Award No. W81XWH-BA170608. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the Office of the Assistant Secretary of Defense for Health Affairs.

Conflict of interest

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript: Castellanos and Wiens. The authors whose names are listed immediately below certify that grant support for this project was received from the Grand Alliance Concussion Assessment, Research, and Education (CARE) Consortium, funded, in part by the NCAA and the DoD: Phoo, Eckner, Franco, Broglio, McCrea, McAllister. The authors whose names are listed immediately below also certify that travel support was provided by the Grand Alliance Concussion Assessment, Research, and Education (CARE) Consortium, funded, in part by the NCAA and the DoD: Eckner, Franco, Broglio, and McCrea. The authors whose names are listed immediately below certify additional disclosures that do not have financial interest in the subject matter discussed in this manuscript as detailed in their author declaration forms: Eckner (patent, grant funding), Broglio (consultation, expert testimony, grant funding, advisory and editorial boards), and McCrea (grant funding, consultation).

Ethics approval

Institutional Review Board (IRB) approval was obtained at the University of Michigan (lead study site), with US Department of Defense Human Research Protection Office approval as well as local IRB approval at each participating site. This study was performed in accordance with the standards of ethics outlined in the Declaration of Helsinki.

Consent to participate

All participants provided informed written consent.

Consent for publication

Not applicable. No identifiable information or images are included in this publication.

Availability of data

CARE Consortium data are publically available upon request from the Federal Interagency Traumatic Brain Injury Research (FITBIR) Informatics System.

Code availability

The authors are willing to provide the data analysis code upon written request.

Author contributions

Dr. Castellanos contributed to the conception and design of the work; data interpretation; drafting and revision of the manuscript. He approved the final published version and agreed to be accountable for all aspects of the work. Mr. Phoo contributed to the design of the work; data analysis and interpretation; drafting and revision of the manuscript. He approved the final published version and agreed to be accountable for all aspects of the work. Dr. Eckner contributed to the conception and design of the work; data acquisition, analysis, and interpretation; drafting and revision of the manuscript. He approved the final published version and agreed to be accountable for all aspects of the work. Ms. Franco contributed to data acquisition and interpretation; critical revision of the manuscript for intellectual content. She approved the final published version and agreed to be accountable for all aspects of the work. Drs. Broglio, McCrea, and McAllister contributed to the design of the work; data interpretation; critical revision of the manuscript for intellectual content. They approved the final published version and agreed to be accountable for all aspects of the work. Dr. Wiens contributed to the conception and design of the work; data analysis and interpretation; drafting and revision of the manuscript. She approved the final published version and agreed to be accountable for all aspects of the work. The CARE Consortium Investigators (Ms. Hoy, Dr. Kelly, Dr. Jackson, Mr. Kelly, Dr. Buckley, Dr. Clugston, Dr. Ortega, Dr. Kontos, Dr. Giza, Dr. Mihalik, Dr. Rowson) contributed to data acquisition and interpretation and critical revision of the manuscript for intellectual content. They approved the final published version and agreed to be accountable for all aspects of the work.

Additional information

The members of The CARE Consortium Investigators are mentioned in “Acknowledgements” section.

This article is part of a collection on The NCAA-DoD Concussion Assessment, Research and Education (CARE) Consortium.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Appendix A (XLSX 73 kb)

Supplementary Appendix B (XLSX 74 kb)

Supplementary Appendix C (DOCX 45 kb)

Supplementary Appendix D (DOCX 18 kb)

Supplementary Appendix E (DOCX 176 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castellanos, J., Phoo, C.P., Eckner, J.T. et al. Predicting Risk of Sport-Related Concussion in Collegiate Athletes and Military Cadets: A Machine Learning Approach Using Baseline Data from the CARE Consortium Study. Sports Med 51, 567–579 (2021). https://doi.org/10.1007/s40279-020-01390-w

Download citation

Published: 24 December 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s40279-020-01390-w

Predicting Risk of Sport-Related Concussion in Collegiate Athletes and Military Cadets: A Machine Learning Approach Using Baseline Data from the CARE Consortium Study

Abstract

Objective

Methods

Results

Conclusion(s)

Similar content being viewed by others

Explore related subjects

1 Introduction

2 Methods

2.1 General Study Design

2.2 Participants

2.3 Primary Outcome

2.4 Covariates

2.5 Data Analysis

2.5.1 Learning Algorithm

2.5.2 Model Selection

2.5.3 Model Performance and Predictive Factor Analysis

3 Results

4 Discussion

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

The CARE Consortium Investigators

Corresponding author

Ethics declarations

Funding

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Availability of data

Code availability

Author contributions

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation