INTRODUCTION

Recruitment of excellent medical residents is a priority for all residency programs, but there is no consensus regarding values predictive of resident success. While the Accreditation Council for Graduate Medical Education (ACGME) has developed six core competencies of resident education and development, methods for identifying and ensuring achievement of the core competencies are poorly defined.1 Furthermore, these competencies do not clearly influence the resident selection process. While residency programs have used metrics such as United States Medical Licensing Exam (USMLE) score, honors society membership, and clinical grades to select residents, evidence suggests these metrics not adequately capture qualities necessary for success in residency.2,3 For example, a study from a pediatric residency revealed that initial rank lists of applicants, medical school grades, and USMLE step score performance poorly correlated with faculty assessments of the same residents at the end of their PGY-3 year.4 This finding has been replicated in internal medicine program and emergency medicine programs.5,6 Furthermore, these traditional metrics have been found to be socioeconomically and racially biased.7,8,9

Several organizations including the Coalition for Physician Accountability and the Association of American Medical Colleges (AAMC) have urged radical change to the residency selection process.10 In response, the AAMC has released guidelines for holistic review, a method in which residency programs are encouraged to use a “mission-aligned admissions” process to consider the whole applicant based on Experiences, Attributes, and Competencies in addition to traditional Metrics (EACM model).11 A central principle of holistic review is “ensuring that admission policies and processes are derived from and reinforce institutional mission and goals”, and thus, tailored to a specific residency’s goals and values.11 While holistic review has been utilized successfully in some residency programs to decrease bias in the selection process, it is limited by its time-consuming nature and has yet to be widely adopted.12,13 While the EACM model looks to capture the whole applicant, it focuses on different facets of applications with limited exploration regarding the values that would motivate applicants to perform these experiences or competencies. While values such as leadership, teamwork, and intellectual curiosity are mentioned, they are not explicitly defined and there is no guidance on how to assess these values in an application.14

Using a modified Delphi method, we sought to establish a set of values that correlates with resident success in our program, with the goal to select residents who were better prepared to excel in our program, using less biased selection methods. We then implemented these values into our resident selection process and evaluated for impact on applicant ranking.

METHODS

We used a modified Delphi method to derive a set of eleven values important for resident success in our Internal Medicine-Pediatrics program at the University of Utah (Supplemental Fig. 1). The modified Delphi method is a qualitative validation strategy that systematically uses literature review, opinion of stakeholders, and the judgment of experts within a field to reach agreement.15,16 The Delphi method relies on collective intelligence of group members to jointly produce better results than any individual entity, resulting in increased content validity, and is useful when evidence is lacking or limited.15 It has been inconsistently implemented in medical education, and best practices for implementation have been described.17

Demographics

Our residency program has a total of twelve residents with training taking place primarily in Salt Lake City at the VA Medical Center, University of Utah Hospital, and Primary Children’s Medical Center. The last two institutions are large academic health centers in the Intermountain West with a catchment area spanning seven states.

Literature Review

We performed an initial literature review as a means of idea generation for resident values to inform the iterative process of the modified Delphi method. Searches were made in PubMed with the following search terms: our first search using the terms “medical resident”, “success”, and “characteristic” generated 750 manuscripts, 43 of which were manually reviewed based on title and keywords; our second search with the terms “values” “holistic review” and “residency” generated 203 results, 17 of which were manually reviewed based on title and keywords. Of these 60 manuscripts, 15 were identified as relevant.18,19,20,21,22,23,24,25,26,27,28,29,30,31,32 These studies spanned across 9 specialties: plastic surgery (2 articles), family medicine (2 articles), pediatrics (2 articles), general surgery (2 articles), internal medicine, otolaryngology, neurosurgery, orthopedics, and anesthesia. Two articles applied more broadly to medical school education and graduate medical education. The style and methods of these articles varied widely. Some articles specifically explored different personality tests to determine correlation with high achieving residents24,25,30; others were expert opinion from program directors or co-residents.28,31,32 One article also used a Delphi framework.26 In addition, some of the articles focused on resident selection while others focused on their current residents. Overall, 94 values were identified from this search.

Stakeholder Interview

To capture those values not reflected in the literature review, we surveyed local stakeholders, including eleven residents and four members of program leadership of the University of Utah Internal Medicine-Pediatrics program. We used a semi-structured group interview format with prompts to elucidate specific values important for resident success in our program. Resident and program leadership input were valued equally. Each person was given opportunity to answer the questions “What, in your eyes, makes a resident successful at this program?” and “What defines residents in our program?” to generate values. Values were written on a common viewing space in real time by one of the researchers and corrected by the interviewee if inaccurately represented. The group was then asked, “Which of these values are most important?” and “Which of these values are unique to the field of Internal Medicine-Pediatrics?” to identify the most important values. After values were reviewed, each person was asked if they had any other comments to add to the discussion. Thirteen additional values were identified during the stakeholder interview.

Initial Consolidation

The 94 values from the literature review were combined with the 13 values identified on stakeholder interview. The Internal Medicine-Pediatrics program director and chief resident performed an initial consolidation based on redundancy (e.g., “Professional” was combined with “Professionalism”, “Professional Behavior”, etc.) to distill down to 41 total values. Values were sorted into recurrent themes and weighted according to the number of initial values that were combined into the distilled values (Supplemental Table 1).

Expert Selection

The local expert panel was comprised of the current internal medicine-pediatrics program director, two current internal medicine-pediatrics associate program directors, one current internal medicine associate program director, the current pediatrics program director, and one current pediatrics associate program director. Two internal medicine-pediatrics clinic preceptors were also included as they follow our residents longitudinally throughout all 4 years, as well as one internal medicine faculty member who has expertise in the resident milestones.

Expert Questionnaire

We then developed a questionnaire where we asked a local expert panel to rank these values on a Likert scale from 1 to 5, 1 defined as “minimally important for resident success,” 3 defined as “moderately important for resident success,” and 5 defined as “essential for resident success” (Fig. 1). This survey was anonymous and sent through REDCap(R). Experts were provided background information regarding our goal to develop values characteristic of successful residents in our program as well as the results of the literature review and stakeholder interview. There was also a free text box to suggest values that the experts felt could be combined. Values were eliminated if they did not achieve greater than a 3.5 average. There were also two free text boxes, one to suggest values that the experts felt could be combined, and another to suggest values that were possibly missing. Based on scoring and recommendations of the expert panel, 11 values were eliminated and the remaining values were further combined to a list of 16 values. Three additional values were considered based on expert suggestions in the free text box: empathy, self-discipline/endurance, and “embraces discomfort or difficult conversations” which were combined with compassion, work ethic, and emotional intelligence, respectively. We subsequently mapped the values to the ACGME Core Competencies (Fig. 2). One value (desire to be in a medical career) was eliminated, as it could not be directly tied to a Core Competency.

Fig. 1
figure 1

First cycle of expert review. Experts were asked to rate each value on a Likert scale from 1 to 5, with 1 meaning minimally important for resident success, 3 meaning moderately important for resident success, and 5 meaning essential for resident success. Each value was reviewed by 9 experts. Black bar represents average rating, with tails indicating standard deviation. Red dashed line indicates an average rating of 3.5, which was used as the value cut-off point.

Fig. 2
figure 2

Value mapping with ACGME Competencies.

Iterative Process

Expert response rate was 100%. We collected survey results and drafted formalized definitions for the remaining values, including examples for how the value may be demonstrated on a residency application. Results were then submitted to the expert committee for a second cycle of input, along with a free text option for experts to provide feedback. Experts were instructed to disagree if a value was misrepresented or did not adequately contribute to the qualities of a successful resident. Group response and comments were shared after each cycle. The iterative process resulted in additional consolidations resulting in 11 values. Agreement threshold for each value was set at 80%, which was determined a priori and reached after two iterative cycles

Implementation

Based on the final value definitions, a rubric was developed to quantify values in resident applications (Supplementary Table 2). One value, intellectual curiosity, was not included on this initial implementation of the values, as it was consolidated with academic strength on a preliminary iteration of the values prior to official validation. Because the interview process occurs annually and our program was eager to implement our new system, the preliminary iteration was used for this initial trial data. Due to time constraints of our faculty, we implemented the values score at the level of the interview rather than initial applicant screen. The initial screen was performed without change from prior years and consisted of the program director briefly reviewing all applications and selecting interview candidates on his discretion. The rubric was distributed to interviewers, who were encouraged to use the rubric to score each applicant they interviewed from 1 to 3 for each value to generate a total value-based score for the applicant. Each applicant was scored twice to minimize interviewer bias. Interviewers were found to agree within one point of each other on every value.

Analysis

The final rank list for 2021 was analyzed to determine correlation of applicant ranking with the value-based score of each applicant and with the Step 2 score. Step 2 score was chosen in the primary analysis as Step1 has been changed to a pass-fail grade and Step 2 will be the only quantitative standardized variable on medical school applications going forward. Values-based score was also tested for correlation with Step 2 score. A rank list from 2017, prior to implementation of a formal holistic review, was also analyzed to determine prior ranking correlation with Step 2 score. A linear regression with dependent variable of applicant ranking and independent variables of applicant year, Step 2 score, and year-score interaction was performed to test whether the contribution of Step 2 score to ranking was significantly different between 2017 and 2021. Because of the sensitive nature of Step 2 data, the log-value was used in some graphs to obfuscate the actual scores of the applicants.

IRB

We obtained approval from the local IRB at the University of Utah, which considered this study exempt, application number 00124799.

RESULTS

After initial consolidation based on redundancy and overlapping terms, “knowledge and skills” was the value described most in the literature and was described in 11 of the 15 articles reviewed. Oral communication (9), professionalism (7), learning (7), critical thinking and judgment (6), and teamwork (6) were also mentioned in over one-third of the articles. Twelve of the 13 values identified in the semi-structured interview could be mapped to a value identified in the literature review with the only exception of “Teachable” (Supplementary Table 1). Using the modified Delphi process (described above), 11 values were defined as important for resident success in the Internal Medicine-Pediatrics program at the University of Utah: academic strength, intellectual curiosity, compassion, communication, work ethic, teamwork, leadership, self-awareness, DEI (diversity, equity, and inclusion), professionalism, and adaptability (Table 1).

Table 1 Final Set of Validated Values. The First Column Displays the Final Name for the Value Identified, the Second Column Displays the Formal Definition with Parenthetical Reference to Relevant ACGME Milestones, and the Last Column Displays Possible Examples Which Could Be Quantified from a Residency Application. DEI Stands for Diversity, Equity, and Inclusion

Using the agreed upon definitions and examples, a rubric was developed to quantify values in resident applications (Supplementary Table 2). This rubric was distributed to interviewers, who used the rubric to score each applicant they interviewed from 1 to 3 for each value to generate a total value-based score (minimum 11, maximum 33), which was then used by the program director to inform final ranking of applicants for 2021. Mean values score was 27.8 with standard deviation of 3.9, minimum score of 17.5, and maximum score of 33. In 2021 there were 194 applications and 55 were selected for interview. In 2017 there were 143 applications and 31 were selected for interview. The rank list from 2021 was then compared to the rank list of 2017 (prior to implementation of the values-based scoring) (Fig. 3). In evaluation of our primary outcome, while the rank list in 2017 was positively associated with Step 2 score (meaning that higher Step 2 score correlated with higher applicant rank) with R2 = 0.32, the rank list from 2021 had a negligible association (R2 = −0.05). Logistic regression of the interaction between application year and Step 2 score in respect to ranking revealed that this difference was statistically significant (p = 0.001) (Fig. 3A, log-value of Step 2 score shown given sensitive nature of data). Further analysis revealed that the values-based score of an applicant was not associated with their Step 2 score (R2 = 0.00, Fig. 3B). Lastly, the 2021 rank list was analyzed with respect to the values-based score of each applicant and was found to be positively associated (R2 = 0.40) (Fig. 3C).

Fig. 3
figure 3

A Correlation of applicant ranking (x-axis) and log-value of Step 2 score (y-axis) in 2021 (closed circles, solid trend line) and 2017 (open circles, dashed trend line), R2 values listed accordingly. B Correlation of Step 2 score (x-axis) and values-based assessment (y-axis) for each applicant in 2021 cycle, R2 value listed. C Correlation of applicant ranking (x-axis) and values-based assessment (y-axis) for each applicant in 2021, R2 value listed.

DISCUSSION

We systematically derived a set of values that are important for a resident to thrive in our combined residency program, and implemented these values into our interview process to create a rank list that was not correlated with standardized test scores. By defining these values, we hope to identify applicants who will thrive and innovate in our program and provide transparency in our selection process for applicants.

As these values came from a systematic review of other specialties, they are well-situated in the current body of literature and well-aligned with the literature from categorical Internal Medicine and Pediatrics programs.18,21,32 As the first specific Internal Medicine-Pediatrics study of its kind, key differences such as adaptability (as the resident must interface seamlessly with two categorical programs) were identified. The systematic nature of our study is more thorough and comprehensive than prior work in the field.

Our study provides new expert opinion regarding the need for explicit values related to DEI and antiracism work. While many holistic review efforts emphasize creating a more equitable screening process, most of these efforts aim to remove biases rather than explicitly qualifying social justice work or diversity as values, with only one study aligning justice with the attributes of a EACM model.33 Our approach explicitly identifies DEI as a value. Using our values rubric uncouples Step 2, which is known to be racially biased, from our rank list.9 It is expected that with the change of Step 1 to pass/fail, Step 2 will only increase in importance regarding residency application review, and so finding other methods of evaluating applicants will be increasingly important.34

Our study has several limitations. In order to tailor our values to the institution-specific vision of residency selection outlined by the AAMC, our study is specific to our local institutional culture. While aspects of our work including our literature review and comparison to the ACGME Core Competencies are quite broad and we anticipate that several of these values are also paramount for success in other internal medicine-pediatrics programs and categorical internal medicine and pediatric programs, this has not been studied. Our study is also limited in that, while we can say that our values-informed rank list had different applicant characteristics in respect to standardized testing, we cannot definitively say that these applicants were different in other metrics. In addition, for this initial implementation, the reviewers were able to see the applicant’s Step 2 score. Importantly, we acknowledge that the residency application is fraught with bias, as letters of recommendation and medical student performance evaluation are subjective and can introduce bias into the application process.35 As these sections form the backbone of the application, it is possible we have simply replaced one biased metric with another. We also acknowledge that while Step 2 is biased, there is evidence that it has some predictive value in resident success.36 Because the values score uses the entire application rather than simply grades or standardized scores and is not correlated with Step 2 score, we are optimistic that values scoring may mitigate the bias from traditional metric data. We have not been able to follow these applicants over time to evaluate their productivity, innovation, or satisfaction in our residency program. Lastly, although this tool may be able to translate to applicant screening, this has not yet been validated.

We sought to publish our process to serve as a roadmap for other programs to create their own specific values as an aid in their resident selection and training process. While many values may overlap, we expect that other programs could identify additional values important for success in their programs. Given the local nature of this study, next steps could include expanding our study to include a multisite validation with other residency programs and potentially a more inclusive, multidisciplinary set of stakeholders. Ideally performing a multisite validation of the rubric would strengthen the validity of our values-based system. Given the time-intensive process of this work, we limited it to the interview stage. Next steps may also include creating a machine-learning tool using natural language processing to automate screening for these values in residency applications to aid in values-based interview selection. Our objective would be to further mitigate bias through an automated objective tool, rather than human evaluation, and to implement these values at the applicant screening stage. Further work will also validate these values with ACGME milestone data and other markers of residency success as we are able to track residents over time in our program.