Introduction

The Joan C. Edwards School of Medicine is a relatively young medical school created from the Teague Cranston Act and graduating its first class in 1982. The mission of this school is to train a workforce for West Virginia and central Appalachia. Our core mission, which includes training students from this region who are likely to practice here, results in the selection of candidates derived from a small population, making the determination of who can scholastically succeed a more difficult process. Because of this, we are extremely interested in identifying students who may need additional academic coaching and other forms of help in order to pass their courses and, ultimately, their licensure examinations.

Several recent publications challenge the heavy reliance on prematriculation scores such as MCAT and science GPAs as indicative or predictive of successful performance on in-house medical school exams, national licensure exams, and successful academic medical careers [14]. However, some studies have suggested that preadmissions data may indeed be valid positive predictors of future clinical performance [2, 58]. While prematriculation data may certainly be useful for admissions committees when deciding upon their entering class, their utility as predictors for negative performance on national licensing exams is unclear and requires further large scale analysis [9].

Proposed factors that may strongly influence future academic performance for medical students range from prematriculation benchmarks, undergraduate GPAs, performance on internal exams to study habits, and use of social networking [1013]. There is no shortage of variables that are potential predictors of future success, and medical school admissions program officers are keenly aware of the limitations of heavy reliance on prematriculation data for their requirements [14]. Although it is clear that in order for these predictors to be useful, they must occur early enough in the student’s educational developmental to provide benefit to at-risk students. How late is too late is difficult to determine, but a continuous assessment process using predictive algorithms may be more useful than a “one-off” first or second-year determination. To better identify such students, we undertook a study to objectively determine the strongest set of predictors from a large set of preadmission and medical school performance variables that could be useful in determining the future outcome of the high stakes national exam, step 1. The main focus of this work is determining predictors for step 1 and providing useful interventions with students at risk for poor performance on this exam. However, predictions for step 2 were also calculated and are discussed briefly. We introduce the utility of a continuous student-specific data-driven process that allows administrators to track student performance any time during their first two preclinical years.

Methods

Students who matriculated from the Joan C. Edwards School of Medicine between 2008 and 2012 were de-identified and studied. Preadmission data was extracted from the American Medical College Application Service (AMCAS) database for students in the five matriculation years who had subsequently taken United States Medical Licensing Examination (USMLE) step 1 (n = 344). Exam scores on institutionally developed multiple-choice exams were reported from the school’s in-house learning management system. Results on NBME basic sciences subject examinations and Comprehensive Basic Science Self-Assessment (CBSSA) were retrieved from the NBME secured website. Medical College Admission Tests (MCAT) were reported in the categories of verbal reasoning (VR), Physical Science (PS), Biological Science (BS), and total (T). The analysis of MCAT reports used either the best MCAT scores, the first MCAT scores, or the lowest MCAT scores. Undergraduate grade-point averages were reported in the categories of total (UGGPAT) and limited to math and science courses (UGMS-GPA). Results from the subject-specific shelf clinical sciences examinations were retrieved from the NBME secured website.

A total of 22 preadmissions and 15 medical school variables were considered (see Table 1) in our analyses. Medical school data was further divided into MS1 and MS2 years in which MS1 exam data was calculated from 198 students and MS2 data (e.g., exams, NBME subject exams, and CBSSA exams) were calculated from 344 students. The difference is that the number is attributed to a change in curriculum which took place during this time and which was implemented initially in the MS2 year, resulting in two class years for which we have no MS1 exam data (that inaugural MS2 class and the class who were MS1s during that same year and promoted to MS2s during the following year). Thus, our student numbers are smaller when our predictive calculations include MS1 exam scores as a variable. Students who were exposed to our new integrated curriculum are not included in these analyses as their internal exams were dramatically different. It is also important to note that national exams scores were calculated from students who took the exams for the first time and second time test taking scores were not included in the analysis. The focus of our analysis is on the determination of predicted poor performance on step 1 and step 2 national exams exclusively.

Table 1 Total number of preadmissions variables considered in prediction analysis from students admitted to the JCESOM from 2008 to 2012

Biomedical Science Students (BMS) included those who strengthened their undergraduate studies with 1 or 2 years of graduate studies before entering the medical program. There were a total of 20 students in the BMS program from 2009 to 2012 that were used in some of the analysis. Analysis of data from BMS students included the use of a Student’s t test to compare the means between BMS students and non-BMS students for the following variables: Math/Science GPA, lowest total MCAT, step 1, and step 2. No linear regression analysis was performed with biomedical science students as numbers are too low to make statistically meaningful results. The observed comparison between the BMS and the non-BMS cohorts was statistically significant at p < 0.05 for both Math/Science GPA and total MCAT scores.

MS1 and MS2 student data were subjected to multivariate linear regression using the software platform, Matlab® (The Mathworks, Natick Massachusetts, v2014a). Models were varied to include different amounts of data corresponding to times before and following matriculation. The fitting function, “stepwise,” was used to develop predictive models with the additional caveats that only positive coefficients were included and the addition of the coefficient significantly improved the predictive capability of the model. When models are described, the intercept is a scalar added (or subtracted) to the sum of the product of beta coefficients and variable values. Unless otherwise specified, p values reported were at the p < 0.05 and p < 0.01 levels. Visualization of the data was performed using GraphPad® Prism v6.05.

This study (IRB Study #78931-1) has been approved by the Marshall University Institutional Review Board under the exempt approval status on September 2015.

Results

In order to assess how important and valid the preadmissions data we had for our medical students was at predicting future negative performance on USMLE step 1 exams, we looked at a total of 22 variables (Table 1). Using the preadmission variables collected, we found that the best linear predictive model was a combination of the lowest MCAT total score and the undergraduate math-science GPA (UGMS-GPA) with an intercept of 158, and beta coefficients of 9.68 for the UGMS-GPA and 1.58 for the low total MCAT (both p < 0.01) and an overall adjusted R 2 (AR2) of 0.12. When we include the first medical school exam score results (percent correct) in the model, the AR2 increases to 0.28 where the intercept is 81.44 and the beta value is 1.29 for the low total MCAT and 1.26 for exam 1 score (all p < 0.01). GPA, when included in the model, had a high p value and was therefore dropped from the prediction analysis. When we include all grades in year 1, the predictive model has an AR2 of 0.38 and includes lowest total MCAT as well as performance on all MS1 exams. Thus, our best predictive model for year one medical students includes two variables from a total of 24, and these variables account for about 38 % of variance for predicting how wells student will do on their USMLE step 1 exam (see Tables 1 and 2 for the total numbers and types of variables considered). The step 1 prediction data using preadmissions and/or first year performance results is summarized in Table 4.

Table 2 Total number of MS1 variables considered in prediction analysis from students admitted to the JCESOM from 2008 to 2012
Table 3 Total number of MS2 variables considered in prediction analysis from students admitted to the JCESOM from 2008 to 2012
Table 4 USMLE step 1 predictions

In addition, we were also very interested to determine which medical school variables were highly predictive of future negative board performance of students in their second year, recognizing a need for possible remediation of at-risk students at this point as well (see Table 3 for total additional variables considered for MS2 students). If we look at the performance of the first MS2 exam in conjunction with lowest total MCAT score as well as the score of all the MS1 exams, the step 1 model predicts at an adjusted R 2 of 0.46. This improves to an AR2 of 0.53 when we include the scores of all the MS2 exams using our step-wise linear regression model. However, when we exclude preadmissions values and include the clinical sciences subject (Miniboard) exams given in the second year along with al MS1 and MS2 exams, the prediction improves significantly at an adjusted R 2 of 0.65. Surprisingly, when looking at all the possible exams in the first and second year, the best prediction was derived from just three variables: the Microbiology basic science subject exam, the Pathology basic science exam, and the CBSSA exam given at the end of the year. These three alone were able to predict step 1 performance at an adjusted R 2 of 0.77. These data suggest that as a student moves along and completes the second year that preadmission data and many of the exams he/she encounters along the way are not as strong at predicting future step 1 results as the three predictors mentioned. The data also underscore the irrelevance of preadmission values at predicting future performance on step 1 when students are in their second year of medical school. These step 1 prediction data for students in their second year are summarized in Table 4. This approach also suggests a utility in providing assistance or information to administrators or students themselves at various times instead of focusing on one specific endpoint (e.g., at the end of the first or second year) and that the most robust data comes from exams taken during the second year.

Table 5 USMLE step 2 CK predictions

Regarding our ability to predict USMLE step 2 CK performance, we found that the lowest total MCAT, the % score of the all MS1 exams and the % score of the first MS2 exam, had a predictive capacity of 0.32 (AR2). However, the statistical reliability of this comparisons was less relevant (p value for lowest MCAT score was 0.226). Not surprisingly, the prediction improved when we waited until the end of year 2 and used the same variables above but now replaced the first MS2 exam with the total MS2 exams. Using results from all MS2 exams, we were able to predict 39 % (AR2) of the variance. Again, the reliability of this comparison was also statistically insignificant (p value for lowest MCAT was 0.117 and all MS1 exams scores was 0.601). However, when we drop the use of any preadmissions values and use two variables only, the percent score on all MS2 exams and the step 1 score, our predication gives us an adjusted R 2 of 0.49 (with highly significant p values). Most interestingly, our predictive capacity goes up significantly when we use a selection of NBME clinical sciences shelf-examination results. Thus, using step 1 scores in addition to four clinical sciences exam results (Family Medicine, Obstetrics and Gynecology, Pediatrics and Internal Medicine), our adjusted R 2 is now 0.62 with a highly significant p value of close to zero. It is important to note that this predictive capacity excludes the two additional clinical sciences exams that students take in their second year (Surgery and Psychiatry). Finally, step 2 CK prediction using step 1 alone gives us an adjusted R 2 of 0.49 (N = 267 and p value of <0.05). In total, these data are consistent with previous data which shows that prematriculation performance characteristics add very little to the predictive power of step 2 CK. Taken together, these data are summarized in Table 5. As step 2 is taken toward the end of the third year, the utility of using data obtained in the second year as useful information for students at risk is warranted.

Finally, we also looked at students who entered into our biomedical sciences (BMS) program. These students had a lower mean MCAT and math/science GPA scores (23.5 ± 0.94 and 3.17 ± 0.08) than their non-BMS peers (25.8 ± 0.2 and 3.40 ± 0.02) who entered into our program over the same period (p value for lowest total MCAT = 0.0093and p value for Math/Sci; GPA = 0.0088). Despite being weaker students in these categories, these BMS students did just as well as their non-BMS peers with average scores of 226.1 (±2.7) and 229.9 (±4.9) for the USMLE step 1 and step 2 CK respectively (p values for step 1 comparisons were statistically insignificant). The scores for the non-BMS students were 218.8 (±1.2) and 233.23 (±1.04) on step 1 and step 2 CK (p value for step 1 = 0.1453 and step 2 = 0.4357) (p values for step 2 CK comparisons were statistically insignificant). These data, although with a more limited set of numbers, further suggest the inherent limitations that exist in the sole use of undergraduate GPAs and MCAT scores when predicting success in future medical school performance. BMS student data was not used as a distinct cohort in the multivariate linear regression models due to the small numbers.

Discussion

We are very interested in identifying students at risk for failure of their licensure examination—namely step 1 and step 2 CK. Unfortunately, the preadmission variables we analyzed are not very good at making such identifications and are consistent with other publications [3, 1517]. In contrast, adding a number of medical school performance variables to the model dramatically improves our ability to predict licensure exam performance by our students and predictions get stronger as students take more internal exams. To no small degree, this justifies our policy of taking some of our class from the pool of students participating in a master’s program during which they take some medical school courses despite having preadmission credentials which, on their own, were not competitive for selection (e.g., Biomedical Science Students). Notably from a total of 37, three variables have the strongest prediction for the USMLE Step 1 exam at the end of year 2 and five have the strongest prediction for USMLE Step 2 CK exam for all students in their first and second year. For a summary of the stepwise significant predictive power of various variables, see Fig. 1. In brief, preadmissions adds very little to the prediction of failure of step 1 or step 2 USMLE (data not shown for step 2). The best predictions for step 1 were achieved with data that comes from the second year (basic sciences miniboard plus the CBSSA). The best predictions for step 2 were achieved with data obtained from the step 1 result and some of the clinical miniboards (again at the end of the second year). As different schools administer different tests (many use shelf or custom exams provided by the national board of medical examiners (NBME) and some use in-house exams), there is unlikely to ever be any consensus as to which specific determinants that a school should use to identify at risk students. Rather, our recommendation is that schools perform this kind of analysis with their own internal data and that they perform this in an “ongoing” fashion as the data becomes available.

Fig. 1
figure 1

Summary data of all predictions used in this study that reached statistical significance. Summary data includes medical school milestones on the X-axis and their corresponding AR2 values on the Y-axis. The first group of values on the X-axis refers to the first year, while the second groupings refer to those in the second year. Step 1 and step 2 predictions are given when comparisons were significant. Total Y1 refers to the total exams for year 1, while total Y2 refers to the total exams for the second year. The Miniboards on the step 1 curve refer to the basic science subject exams, while those on the step 2 refer to relevant exams for the clinical sciences

By performing this type of analysis, we are able to start looking at our at-risk students empirically as they step through various milestones and intervene with much more confidence as students’ progress through to their second year. In fact, we have built an in-house database that allows appropriate administrators to analyze student performance and make predictive assumptions for future performance that utilizes the data presented here. In an attempt to address the issue of increasing our confidence in prediction of future failure on step 1 earlier, we first divided our student into quartiles using prediction data at matriculation and using prediction data at end of MS1 year. None of the quartile analysis improved our confidence in our predictions when compared to the student cohort as a whole. However, a limitation in this type of analysis was the small power of the analysis when cohort was separated into quartiles. We will certainly try and revisit this issue as we get data from larger datasets.

Identification of these variables which predict strongly for both of these high stakes examinations in this training set data allows us to move forward by (1) validating this data with current students and (2) starting to implement individualized remediation programs for students predicted to fail their USMLE exams (see below). It is obvious that early intervention is desirable for better student outcomes, but our initial data suggest more confidence in our predictions after end of the first year or even during the second year. Our experience with biomedical sciences (BMS) students also suggests that early determinants of success are not always very predictive. Quite a few of our students in this program who were “on average” weaker than non-BMS entrants graduated at the top of their class and/or hold leadership positions in the medical school class. Although anecdotal, this is consistent with the data presented in this manuscript which certainly casts doubt on the use of early preadmissions data when predicting future national exam performance. In our stepwise regression model, many of our early medical schools exams also failed to be very predictive.

It is perhaps not surprising that we found that preadmission performance does not strongly predict future medical school national exam performance or even medical school performance in general. This is supported by a publication that presented the “academic backbone” model which elegantly showed that measures obtained prior to medical school were weaker indicators of future medical school performance than were measure obtained during medical school [16]. This is also consistent with a study from a single school with a large number of medical students (n = 782) which reported that preadmissions academic backgrounds (e.g., humanities, biology, physics, etc.) had no bearing on the outcome of medical school graduation [17]. Although the findings were used to discuss limitations in medical school admissions requirements and policies, these reports and others certainly indicate preadmission student values as having limited value in either admissions and/or future medical school performance. We do include preadmissions data in our pivot tables and databases that we have available for tracking student performance. However, we now understand that its data is less reliable than those such as internal and external exams taken during medical school.

This data that came from this analysis is now currently being used in the following manner: students are now being stratified into risk categories. The top risk category is described as students who are at risk for very significant failure on their step 1 exam. The second highest risk is that of students who are at risk for being slightly below or at the passing rate. The less significant risk groups (denoted as yellow in our database) are students who would be counseled to delay the taking of step 1 by at least 1 clinical rotation. These students would be able to take advantage of additional study time that may include participation in practice exams and guided tutorials. The most significant risk groups (denoted as red in our database) are second year students who would be strongly encouraged to delay taking step 1 by at least two rotations and offered more structured remediation. The aim of this risk stratification and delay in taking third year rotations assists the students by helping them achieve passing rates for their first attempt when sitting for the exam. It also helps the students by attempting to reduce the rate of dropping out of an entire year due to delays in clinical rotations due to failure of step 1 in the first sitting. Furthermore, it is our policy that students can only be behind by two clinical rotations before they are required to sit out for the full year. The other important internal policy is that students must pass step 1 before they can be officially accepted as a third year student. Risk stratification using prediction analysis is a new process for us, and it is certainly not foolproof. Students can only be encouraged to delay taking the exam and/or take advantage of remediation, and not all will take the recommendations from our administration. Students who are in the “red” risk category are certainly not guaranteed to pass step 1. However, all together, we feel that this is a value-added academic advising tool that we can now start using more avidly. We have not implemented a risk stratification process for step 2 CK but are currently discussing plans to do so.

There are a number of confounding factors that are likely to have an impact on our data. Our curriculum has undergone extensive revisions, and there have been dramatic changes to the curriculum during the period of study [18]. In particular, we have moved to a system-based spiral curriculum during this study time, and we have altered our pedagogy to deemphasize lectures and emphasize self-directed and collaborative learning strategies [19]. We strove to control for changes in our curriculum by not using all of the students who attended medical school from 2008 to 2012. As the changes in curriculum had large effects on the exams that students took, we made sure that we controlled for this by only using student data from those who took exams that came from traditional topic-based curriculum. Thus, we have more students in our MS2 cohort in this student that we did in our MS1 cohort. We look forward to comparing predictions from students exposed to the two different curriculums to see if there is a significant change. The data presented here represents a single institution and some of the data may not be as continuous and normally distributed as assumed. That said, the predictive value of performance on these schoolwork based tests was still quite superior to that of the preadmission data which we collected. We feel that it is important for all schools to consider performing this type of analysis and not rely on values from published studies as specific internal exams are likely to play a unique and important role for their own predictions.

Of course, now that we have a tentative way to identify “at risk” students, we need to prospectively validate our findings and ultimately develop comprehensive intervention programs which change the academic trajectory of such students. The work here thus allows us to make a much more informed decision when identifying students at risk.