INTRODUCTION

Since 2011, there has been a dramatic increase in the number of applications submitted to categorical internal medicine (IM) programs per applicant through the Electronic Residency Application Services (ERAS). US MD graduates (USGs) are now applying to an average of 34 programs per applicant, a rise of 67% from 2011.1, 2 Similarly, international medical graduates (IMGs) are now applying to an average of 98 programs per applicant, a rise of 77% from 2011.1, 2 Additionally, IM programs have experienced an increase of 199% in applicants from DO schools over the past ten years.1, 2

The increase in applications has created challenges for programs that must select a subset of applicants to invite for residency interviews. The sheer volume of applications drives the screening of applications through the use of ERAS filters. Program directors routinely use United States Medical Licensing Examination (USMLE) scores, citing those scores as a reliable way to compare USGs and IMGs within and across schools, despite recognition that Step 1 scores do not reliably predict clinical performance.3 As USMLE Step 1 moves to pass/fail, however, program leaders’ ability to compare applicants, even with this imperfect measure, will continue to be challenged, likely more than before.

Looking beyond standardized examinations, two of the most commonly used pieces of information that have the potential to provide critical information that programs could use to make interview invitation decisions by easily comparing students within and across schools are the Medical School Performance Evaluation (MSPE) or “Dean’s letter” and the letters of recommendation (LOR). In 2014, the Association of American Medical Colleges (AAMC) pushed for a more holistic evaluation of applicants by the programs, recommending that each MSPE “provide a summative assessment, based upon the school’s evaluation system, of the student’s comparative performance in medical school, relative to his/her peers, including information about any school-specific categories used in differentiating among levels of student performance”.4 Despite AAMC guidelines for MSPEs, lack of utility of MSPEs is an issue that has been raised for the past few years.5,6,7 Additionally, grading variability across schools has been cited as an additional barrier program directors face when trying to assess and compare applicants.5, 6, 8, 9

Around the same time, the organizations with influence over the content of Department of Medicine (DOM) LORs (formerly known as the Chair’s letter), the Association of Program Directors of Internal Medicine and Clerkship Directors of Internal Medicine (APDIM-CDIM), attempted to improve standardization through the publication of guidelines for preparation (Table 1).10 The intent was to provide enhanced information for use in the review and ranking process by providing a central location for a narrative description of applicant performance in the medicine clerkship (and ideally also the medicine sub-internship), descriptive information about the structure of the rotation, the applicant’s grade, and the grade distribution.10 These published guidelines further recommend that the departmental letters include a numerically based statement of a student’s standing in relation to their peers.

Table 1 Key Components of DOM LOR as Recommended by APDIM-CDIM 2013 Guidelines10

In the 2018 NRMP Program Director Survey, when asked what factors they used to select applicants to interview, 85% of internal medicine program directors cited the MSPE and 74% cited LORs in internal medicine, compared to 95% who reported using USMLE Step 1 and 90% USMLE Step 2. In addition, when asked to rank the importance of each factor on a scale from 1 (not at all important) to 5 (very important), IM PDs ranked the MSPE and internal medicine LORs at 4.3 and 3.9, respectively (n=164). The only factors that were rated more important than the MSPE were all “red flags,” such as professionalism concerns and/or failed standardized examinations.11 The DOM LOR has the potential to provide information that can help program directors compare US seniors within and across medical schools, much like the standardized letter of evaluation, or SLOE, in emergency medicine, first proposed by a Council of Emergency Medicine Residency Directors Task Force in 1995.12, 13 On the 2018 NRMP Program Director Survey, 97% of emergency medicine program directors cited departmental letters of recommendation when asked what factors they used to select applicants for interviews, compared to 97% who cited the USMLE Step 1 examination and 83% who cited the MSPE.11 When asked to rank the importance of the departmental letter of support, they ranked the letters of recommendation at 4.8, more important than any other factor and substantially higher than the importance of the USMLE Step 1 examination (3.8) or the MSPE (3.3) (n=87).

In IM, the potential usefulness of a standardized departmental letter akin to the SLOE has not been realized, potentially due to a lack of standardization across US medical schools. Anecdotally, many program directors report that like MSPEs, DOM LORs are difficult to decipher and that it is increasingly difficult to sort out how any individual may compare with their peers at a given medical school and across schools. However, to our knowledge, there has been no systematic review of DOM LORs since the APDIM-CDIM recommendations were published. To better understand the current state of DOM LORs and the degree to which they comply with APDIM-CDIM guidelines, we reviewed DOM LORs from US allopathic medical schools.

METHODS

Two reviewers from two large university-based internal medicine residency programs (who collectively have nearly 60 years of experience reviewing applications and over 25 years of writing DOM LORs) analyzed three to four DOM LORs from 146 of the 155 LCME-accredited schools in the USA and Canada in the 2019 NRMP Match. They reviewed the documents for length, inclusion of suggested components such as clerkship and sub-internship information, comparative ranking, and adherence to the published APDIM-CDIM guidelines. An administrator collated reviews from each of the two reviewers and flagged any potential discrepancies (representing 22/146, 15% of initial reviews). The reviewers discussed flagged letters and came to a consensus. Descriptive statistics are used to describe findings. This project was deemed not to be research with human subjects by the University of Connecticut IRB.

RESULTS

Overall compliance with the APDIM-CDIM guidelines varied considerably across schools (Fig. 1). Most (119/146, 82%) DOM LORs fell within the recommended length of 1–2 pages, with a range from 1 to 5. Adherence to the recommendation to provide a final characterization of performance relative to peers (specific number, quartiles, percentage grouping) was lower, with only 68/146 (47%) providing such information. DOM LORs provided more information about the clerkship experience than the sub-internship experience in each of the recommended areas: description of experience (90/146, 62% for clerkship, 40/146, 27% for Sub-I), grade distribution (74/146, 51% and 28/146, 19% respectively), and individual student grade (116/146, 79% and 66/146, 45%, respectively). Shelf exam scores for the medicine clerkship were included only 29% of the time (43/146). Slightly more than half of the letters included comments from fourth-year rotations (82/146, 56%). Of the 68 DOM LORs that provided a final characterization of performance, 19 (19/68, 28%) provided a quantitative measure and 49 (49/68, 72%) provided a qualitative descriptor. For those that provided qualitative terms, only 17/49 (35%) described what those terms were, and thirteen distinct qualitative scales were identified among those 17 schools (Table 2). For most qualitative scales (32/49, 65%), factors leading to determination of how students were grouped into the categories were not defined, and very few (6/49, 12%) specified how many students were in each qualitative category. The overwhelming majority of LORs with ranking (55/68, 81%) did not specify if the ranking included the whole class or only those going into internal medicine.

Figure 1
figure 1

Results.

Table 2 List of Different Ranking Tiers Used in the 17 of 49 DOM LORs with Clearly Defined Ranks

DISCUSSION

Despite clear guidelines from APDIM-CDIM regarding DOM LORs, a review of letters from 146 of the 155 LCME-accredited schools demonstrated a low rate of compliance with those recommendations. When these guidelines were developed in 2012, an explicit goal was to provide a more credible and higher quality narrative about applicants to improve reviewers’ ability to predict future performance.10 With such low rates of compliance, however, DOM LORs have not realized that goal. Unfortunately, homogenizing the students into indistinguishable groupings does not benefit students or residency programs. The DOM LOR is a chance to highlight individual student characteristics and abilities that may make them better suited for one program over another.

A practical first step could be to implement a standardized, common descriptive language for the final characterization of student performance, making intra- and inter-school comparisons more meaningful. For example, “superior” is used to designate the top group of students at some schools, yet is the designation used for the middle group at other schools. Such variability in language describing group stratification makes meaningful comparison nearly impossible; this has also been illustrated in prior studies of MSPE language, prompting the authors of that article to suggest a “unified, systematic, and transparent method for ranking students in the MSPE”.14 We recommend a characterization such as superior > outstanding > excellent > very good > good. Regardless of the terms used, a clear statement of how many students are placed in each category is essential to allow readers to truly compare applicants from a school, as there is a difference between the top group consisting of 10% of the class and the top 40% of the class. It may also be beneficial to state whether the comparison groups are comprised of all students or only those students applying to internal medicine. Several conditions contribute to the challenge of making this reality, including schools that only use third-year clerkship data, schools that are pass/fail, and schools that do not rank their students.

Other areas that could lead to improved utility of the DOM LOR would include more consistent information about the clerkship and sub-internship experience, grade distributions, and shelf exam scores. This information could be provided in table format in which the department chair (or their designee) clicks boxes that describe the structure, timing, and content of each rotation. The description should include numbers of patients cared for, numbers of admissions done per call shift, and responsibilities such as writing orders and discharge summaries, as these are skills interns need to perform on day one. As has been adopted in the MSPE, a grade distribution table with an asterisk denoting an individual student’s level of performance would make comparison to their peers much easier for letter readers. Given the dearth of objective data including elimination of USMLE scores, it is reasonable to consider including shelf exam scores as a measure of objectivity. Having these details will allow program directors to better judge the abilities of each student, taking into account variables including timing of a rotation and performance of peers during that same period. It is imperative, though, that letter writers provide transparency in the grading process, as too often the applicants look alike on paper with uniformly positive remarks that make it difficult to compare students.13, 14 Furthermore, inclusion of fourth-year rotations in the letter would give readers more information, particularly about the growth of the student, with which to review a candidate and would potentially give the schools more usable data by which to rank the students.

A standard template across all types of schools, with clear guidelines about the roles and responsibilities that should be held by the letter writer, is another potential solution. The template could include prespecified content to complete. This content would likely include components such as the relationship of the letter writer to the applicant, source of information (i.e., direct experience with applicant, indirect experience with applicant, file review, or some combination), and the abovementioned information about the clerkship and sub-internship experiences, focusing on performance in the core competencies. By accurately portraying a student’s competency across multiple domains and not functioning solely as a letter of recommendation, this standard template would then allow residency programs to easily compare students from all schools, not just US allopathic medical schools.

The whole purpose of the match process is to maximize the chances that a student ends up in a program most appropriate for their skill set and their personal characteristics and that programs end up with interns who are best equipped for their individual programs and goals. The more nebulous and less transparent the process, the greater the disservice to both our students and our programs. There is a potential untoward consequence at play. If a program ends up with a new intern that struggles (in direct contrast to the outstanding LOR written about the student), it is human nature that the program will look at future applications from the school differently and this bias may do a disservice to future students from that school. Given the changes that are occurring including the elimination of grades in clinical rotations in many schools and the ultimate loss of Step 1 scores, it is crucial to have as much transparency as possible. It is unlikely that the volume of applications will decrease in the near future, so it is imperative to make the process work as well as possible. We owe it to our students.

A notable limitation to this study is that it was limited to reviewing DOM LORs from US allopathic medical schools. Although internal medicine residency programs fill over half of their matched positions with US medical students, US allopathic students constitute 33.3% of the applicant pool with the majority coming from osteopathic applicants (12.7%) and IMGs (54%).1 These two large groups presently do not have a document equivalent to the DOM LOR, and although some schools do attempt to provide a similarly designated “Chair’s letter,” it is not clear whether there is any standard template or format. Future consideration needs to be given to the potential barriers these schools face in trying to provide a document similar to the DOM LOR for their applicants.

It is also possible that there are variations among DOM LORs from a school, and perhaps reviewing 3–4 letters from each school did not accurately capture a given school’s standard DOM LOR. However, if indeed there is variability driven by individual approaches to letters within a school, it will only exacerbate the challenge we have described here.

Ultimately, a standard LOR for internal medicine can be more useful if it is well structured. While emergency medicine has the most established SLOE, multiple other fields have developed similar standard letters, including otolaryngology, orthopedic surgery, and dermatology. These letters are specifically designed to provide comparative, objective data including ratings on likelihood of success in residency and ranking students against each other that programs can use to better assess a candidate’s abilities. While the idea of such a standard letter of evaluation does have promise, literature surrounding otolaryngology, orthopedic surgery, and emergency medicine suggests that applicants are unequally distributed among the groups, skewing towards the top tiers.15,16,17 This further indicates that a standardized rating process has potential only if there is a proportionate distribution across the whole rating scale.

CONCLUSION

The DOM LOR is widely used by US allopathic medical schools; however, the degree to which schools adhere to the 2013 APDIM-CDIM guidelines varies. For internal medicine program directors who use these DOM LORs as a key factor in the residency application review process, clearly defined data on student performance, stratification among peers, and a common language across schools for the stratification groups would improve the utility of the DOM LOR in the process of trying to match the right applicants to the right positions. Further investigation is necessary to understand the reasons medical school administrators and department of medicine leaders do not adhere to the national guidelines.