BACKGROUND

In 2014, the Association of American Medical Colleges (AAMC) introduced 13 Core Entrustable Professional Activities (EPAs) for Entering Residency (CEPAER) to improve graduates’ preparedness for residency.1 Over seven years, a 10-school consortium (the AAMC Core EPA Pilot) developed curricula and assessment strategies and rendered entrustment decisions across diverse institutions.2 In kind, schools outside the pilot have implemented EPAs as part of the assessment of students, even making entrustment decisions on readiness for indirect supervision on specific tasks.3 Despite these shifts towards workplace-based assessments, validity evidence is conflicting.4,5 The CEPAER pilot found EPA assessments showed promise, but none of the participating schools were ready to make high-stakes decisions by the end of the 7-year study, suggesting that further development was needed to ensure medical school graduates were prepared for residency.6

Several studies have investigated the use of CEPAER in undergraduate medical education (UME), showing variable uptake. A 2020 survey of osteopathic schools found only 31% of respondents implemented EPAs at least “moderately,” with approximately two-thirds of those schools reporting EPA outcomes.7 Similarly, a 2020 study of pediatric clerkship directors showed that 24% of respondents were using an EPA assessment framework in the core pediatric clerkship.8 Both studies called for amplified support for educators using systems-based strategies to develop valid workplace-based assessments. In parallel, there has been increasing interest in competency-based outcomes in UME. In 2021, the Undergraduate Medical Education to Graduate Medical Education Review Committee called for “ [t]he UME community, working in conjunction with partners across the continuum” to “commit to using robust assessment tools and strategies, improving upon existing tools, developing new tools where needed, and gathering and reviewing additional evidence of validity.”9

Prior to this study, the most recent Clerkship Directors in Internal Medicine (CDIM) Annual Survey exploring EPA use in the core IM clerkship occurred in 2015; it assessed anticipated barriers to EPAs and identified six “key” EPAs for assessment, but did not explore the extent of EPA use, and no study to date has reported EPA use in internal medicine (IM) clerkships.10,11 Our study’s aim was to understand the prevalence of EPA use in IM clerkships and the degree to which IM clerkship directors (CDs) evaluate the validity of their assessments. We also explored CD attitudes towards assessment and the role of validity in evaluating candidates for residency training.

METHODS

CDIM is a charter organization of the Alliance for Academic Internal Medicine (AAIM), a professional association that includes academic faculty and leaders responsible for third- and fourth-year UME. CDIM has conducted annual surveys of IM CDs on topics essential to UME since 1999 (Appendix A).

Survey Development

In February 2022, CDIM opened a call for thematic survey section proposals to its members. In April, the CDIM Survey and Scholarship Committee blind-reviewed all submissions and selected three for inclusion based on relevance to the core IM training experience. The authors who proposed the EPA section (KG, AF, DD) were not involved in this selection process. From May through mid-June, the committee (AW, ND, MJ) and section authors (KG, AF, DD) revised the questions through an iterative process. From July through August, the elected CDIM Council and five CDIM experts (blinded to the committee) pilot-tested the web survey. The survey section authors then addressed and resolved comments from the pilot test, and final revisions were incorporated. To screen for problematic question content, all pilot data were analyzed for anomalies (e.g., illogical response combinations or out-of-scope values) and were resolved by the committee and section leads. All section authors, committee members, Council members, and pilot testers included subject matter experts with extensive experience in the clerkship setting.

In addition to a recurring 13-question section on “Clerkship Director and Medical School Characteristics,” the section upon which this manuscript is based contained 20 questions, including multiple-choice (single-selection and select-all-that-apply), three- and five-point Likert scale, numeric entry questions, and open-text response options, with logical skip and display patterns. Due to conditional logic or item-non-response, denominators for some questions do not sum to the total number of survey respondents. See Appendix B for the section-specific survey instrument.

Early in August 2022, MK/SM exported from the AAIM/CDIM database all individuals designated as “clerkship director” (CD) at 149 of 154 U.S. medical schools with full or provisional Liaison Committee on Medical Education (LCME) accreditation; one designated clerkship director per school was included in the survey. Generic identification numbers were appended to respondent contact files to match them back to the survey database, which included prepopulated characteristics such as medical school classification. Unique participant URLs were disseminated via email invitation through the Qualtrics Surveys platform. The original population of 149 was adjusted to 140 after removing nine medical schools confirmed not to have renewed their AAIM/CDIM membership at survey closure. The final population represented 91 percent (140/154) of all fully and provisionally accredited LCME schools as of the study period.

The study (22-AAIM-118) was declared exempt by Pearl IRB (U.S. DHHS OHRP #IRB00007772). Only MK and SM had access to the survey population and survey software during fielding.

The survey launched on September 7, 2022, closed on December 6, 2022, and included four email reminders to non-respondents. All email communications included voluntary opt-out links and the survey landing page displayed an informed consent statement. Contacts who voluntarily communicated they no longer were their medical school’s IM CD were replaced with the new CD of record or most appropriate next contact.

The survey dataset was stored in an encrypted drive accessible only to project personnel. Respondents’ records were merged with the complete population file to include demographic and medical school characteristics, and the study dataset was de-identified. Descriptive statistics were used to report the summary results. Fisher’s exact test or Pearson’s chi-square with one degree of freedom (or with Sidak-adjusted p-values for multiple comparison tests) was used to test for associations between categorical variables. Due to their non-parametric distribution, continuous variables were compared to categorical variables using the Wilcoxon rank-sum (Mann–Whitney) test, reporting median and interquartile range (IQR). Statistical significance was designated using an alpha level of 0.05. Data analysis was conducted in Stata 16.1 SE.

Thematic Analysis

Open-ended questions were analyzed using thematic analysis. Two study authors (KG and DD) reviewed comments to familiarize themselves with the dataset and generate initial codes. The authors discussed these codes together and grouped them into related themes, then coded independently to ensure themes/subthemes were consistently applied; discrepancies in coding were discussed among the study group until consensus was reached.

RESULTS

A summary of the 2022 CDIM Annual Survey overall results can be found in Appendix C. The survey response rate was 80.0% (112/140). Two additional individuals responded to the EPA portion of the survey and are included in this analysis (n = 114). Of the total survey respondents, 54 (47.4%) reported their clerkship is currently using EPAs to assess medical students. Table 1 shows there were no statistically significant differences in EPA use between type of school, region, size, accreditation features, number of students who rotate on the IM clerkship at a given time, median number of years the CD has been in position, median full-time equivalent for the CD position, or use of pass/fail grading. CDs at medical schools accredited after 1977 were associated with reporting a higher likelihood of using EPAs in the IM clerkship. Among CDs who reported no EPA use (60/114), 40.0% reported no current plan to add EPAs, 8.3% reported a plan to start using EPAs, and 33.3% reported they were unsure. The remaining 18.3% reported “other” responses, and described barriers to EPA implementation congruent with the thematic analysis below.

Table 1 Characteristics and Validity Efforts of CDs Who Use EPAs and Do Not Use EPAs as Part of Their Core Internal Medicine Clerkship

Of CDs who reported EPA use (referred to in this manuscript as “EPA CDs”), most (55.1%) used the CEPAER guides as the sole resource for developing EPAs, followed by a combination of CEPAER and EPAs developed at their institution (36.7%) (Table 2).

Table 2 Responses to Key Questions on EPA Development and Use by Internal Medicine Clerkship Directors

Of EPA CDs, most used them for both formative and summative assessment (Table 2). Twenty-six (48.1%) reported using EPAs as part of the final clerkship grade, and a majority of EPA CDs used tiered grading (e.g., honors, letter grades). The median percentage of the grade attributable to EPAs at these schools was 65% (IQR 25), with 38.5% of these CDs using EPAs for ≦50% of the final grade and 61.5% using EPAs for > 50% of the final grade (15.4% used EPAs for 100% of the final grade). Faculty and residents were most frequently reported as those who obtained EPA assessments of students, and were most commonly reported as individuals who received guidance on EPA assessment (Table 2). In comparative analyses, CDs who used EPAs as a component of the final grade were no more likely to report providing guidance to assessors on how to perform assessments compared to CDs who did not use EPAs for the final grade (p = 0.23); however, there was an association with having derived EPA instruments from the literature (p < 0.01). Pass/fail schools that used EPAs were no more likely to have specific individuals perform EPA assessments (e.g., attending physicians, residents; p > 0.05 for all comparisons) and were no more likely to provide training to the individuals who performed EPA assessments compared to non-pass/fail schools (p > 0.05 for all comparisons).

There was no association between EPA CDs and non-EPA CDs with respect to efforts to ensure validity of assessment, including internal review of instruments, external review of instruments, use of validated instruments from the literature, or use of faculty development compared to non-EPA CDs (Table 1). Similarly, compared to non-EPA CDs, EPA CDs were no more likely to report monitoring assessments for validity. EPA CDs were no more likely than non-EPA CDs to report written assessments were a valid measure of students’ performance in the IM clerkship, nor were they more likely to report the Medical Student Performance Evaluation (MSPE) accurately represented a student's level of entrustability (Table 1).

Tasks Assessed Across Respondents

All respondents were asked what CEPAER-based clinical tasks they assessed in their clerkship, regardless of whether they reported EPA use (Table 3). There were no associations between the types of tasks assessed at clerkships that formally used EPAs and clerkships that did not, with some exceptions: compared to non-EPA CDs, a higher percentage of EPA CDs reported assessment of (1) entering and discussing orders (p = 0.01); (2) handoff communication (p < 0.01); and (3) informed consent (p = 0.04).

Table 3 Clinical Tasks Currently Assessed as Reported by Respondent Clerkship Directors, by Categories of Those Who Were Formally Using EPAs and Those Who Were Not Formally Using EPAs as Part of Their Assessment Schema

Perceived barriers to implementing EPAs in 2023 were compared to anticipated barriers described in the 2015 CDIM Annual Survey results (Table 4). These results were not tested for statistical associations because the language and methodology for each survey differed. Commonly reported barriers in the 2022 survey included lack of evaluator understanding of EPAs, insufficient faculty and resident/fellow time, and no/inadequate faculty development.

Table 4 Anticipated Compared to Experienced Barriers to Implementing EPAs as Part of Internal Medicine Clerkships. Respondents to the 2015 CDIM Annual Survey Included CDs Who Had Not Implemented EPAs, Whereas the 2022 Annual Survey Respondents Included Only CDs Who Had Implemented EPAs as Part of Their IM Clerkship

THEMATIC ANALYSIS

Reasons for Not Implementing EPAs

Reasons CDs did not implement EPAs included leadership decisions, perceived barriers, or alternative methods for assessment. Leadership decisions were due to centralized assessment systems; the school either decided against using EPAs or had not yet decided, and control over these decisions was made above the clerkship director level. Barriers included technical (no tracking system/platform), time (bandwidth), and implementation issues (faculty development, reliability questions). Alternative methods often overlapped with central control (i.e., there being a mandate for a standardized assessment form used across clerkships) but also included individual preference for other assessment systems (e.g., RIME). Some CDs reported using assessments that were essentially EPAs but they did not use that terminology or were not assessed using a supervisory scale.

Perceived Benefits of Implementing EPAs

The most frequently cited benefit was related to the concrete, realistic, observable behaviors outlined in the EPAs. CDs found EPAs offered granularity that improved the effectiveness of direct observation, enhanced feedback specificity, and increased frequency of formative feedback (compared to prior to implementing). Process-wise, CDs appreciated the ability to gauge the progress of entrustment over time. Additionally, CDs perceived students found EPAs provided clarity of expectations. CDs also expressed that the standardization offered by EPAs helped increase the inter-observer reliability of assessors.

Disadvantages of Using EPAs

Many respondents reported concerns over the administrative burden on already busy faculty and residents and the power dynamics that can arise if students are driving the requests. CDs also expressed concern about the number of observations needed for each EPA to be reliable, and how reliability may be affected by limited exposure where multiple observations by a single observer may be challenging. CDs reported some faculty did not fully understand EPAs or entrustment scores, necessitating faculty development, and that students saw EPAs as a “checkbox” activity, impeding organic feedback.

DISCUSSION

To our knowledge, this is currently the only nationally representative survey of EPA use in IM clerkships. Our study found nearly half of responding CDs use EPAs in some form, indicating substantial adoption since the CEPAERs were introduced in 2014. A greater percentage of CDs reported using EPAs compared to a 2020 snapshot of 135 participating LCME-accredited medical schools, which found that 34% of respondents used the CEPAER as a framework, as part of their assessments, or both.12,13 Interestingly, nearly all CDs in our survey reported using at least some elements of the CEPAER, although they did not uniformly refer to them as EPAs or use the entrustment framework for assessments. Together, these findings suggest a notable uptake of EPAs in general and CEPAER in particular in IM clerkships.

We found no substantial differences in demographic characteristics between schools that used EPAs compared to those that did not with one exception: schools of more recent accreditation were associated with adopting EPAs in their IM clerkship. Although such findings may suggest that more recent accreditation is associated with school’s incorporation of “newer” forms of competency-based assessment such as EPAs, this is unclear given the limitations of available data and should be further studied. Despite this, our findings show the priorities of medical school leadership have a strong effect on assessment in general and the ability to implement EPAs in particular, with many CDs reporting the presence or absence of such support as either a facilitator or barrier to their implementation plans, respectively.

Although Table 4 should be interpreted with caution (perceived barriers are not the same construct as experienced barriers), our results demonstrate substantial differences with respect to administrative support, financial support, and coordination across clerkships and campuses. Together, these results indicate schools that successfully added EPAs either experienced fewer barriers or—more likely—were supported in their efforts by their institutions. This theory aligns with a recent study demonstrating the pace of EPA implementation is dependent on system factors such as leadership involvement, investment in data management, and school-specific resources.14

EPA CDs reported assessment of additional tasks compared to those reported by non-EPA CDs. Given most EPA CDs used the CEPAER as a guide for their implementation, it is possible the list of 13 “Core” activities assisted these CDs in comprehensively planning assessment opportunities for their learners, or—as implied by our thematic findings—that EPA assessment was part of a larger school-wide effort to enhance assessment. These three EPAs (handoffs, placing orders, informed consent) have been reported by IM program directors as having gaps between expected and reported performance in new interns15; however, there is some debate about when it is most appropriate to assess these specific CEPAER. In a recent summary report, the CEPAER Pilot suggested these three EPAs, among others, might be most appropriate for more advanced trainees such as sub-interns or early interns.6 It would be helpful to study how and why these CDs incorporated more advanced EPAs into their curricula.

This study explored the relationship between IM clerkship assessment and the collection of validity evidence. Although there have been numerous studies on the validity and reliability of EPAs in UME, the existing evidence is limited,4,5,16,17,18,19,20,21 and a recent study investigating the reliability of assessment across multiple institutions found EPAs produced poor reliability even when considering dimensions of faculty development, assessment coaches, or entrustment scale type.22 Although it has long been theorized that “formative reports” of workplace-based assessments (such as EPAs) could be used to inform summative entrustment decisions,23 there remains a question about whether this is consistently true for CEPAER.22 Given these findings, it is notable the EPA CDs in our study were no more likely than non-EPA CDs to monitor the validity of their assessments despite a majority reporting EPA use as part of summative assessment and nearly half of EPA CDs using EPAs as part of the final clerkship grade.

While most of our participants used the CEPAER to develop their instruments, validity is highly context-specific and should be established and monitored in individual learning environments.24 Ongoing reporting of competency-based outcomes and the associated validity evidence underlying high-stakes decisions are particularly important given shifts towards pass/fail clerkship grading in recent years.12 It has been argued that competency-based assessment might prompt institutions to improve the quality of assessment data reported25; however, our study shows EPAs were not associated with increased efforts to establish validity nor did it add additional nuance in the MSPE in regard to student entrustability. Such findings are disappointing given known issues with MSPE variability between schools, leading to concerns over grade inflation and difficulty interpreting medical student performance.26,27,28

It should be noted that this study did not investigate the timing of when individual schools started using EPAs; therefore, it is possible some respondents did not have sufficient time to develop validity or collect outcomes data on their learners. Given the increased attention on providing high-quality information to residency training programs to best support learners in their transition to advanced training,9 we believe our study represents an important call to action: There is a pressing need to collect and—just as vital—share validity evidence on assessments in IM clerkships, and provide this information to stakeholders.

LIMITATIONS

Despite our 80% survey response rate, some degree of measurement error and/or bias might have been introduced by survey non-response or item non-response. Further, only IM CDs were surveyed (not all IM clerkship leadership). The final survey population represented 140 LCME fully or provisionally accredited medical schools with AAIM/CDIM membership out of 154 schools, and five of the 112 responding schools were provisionally accredited. It is possible that responses from provisionally accredited schools might have biased the results.

CONCLUSIONS

A substantial number of U.S. medical schools are using EPAs for formative and/or summative assessment in the IM clerkship and the adoption of EPAs has grown. Lack of support and infrastructure at the institutional level were barriers to uptake. There is a lack of data demonstrating the desired outcomes of increased student readiness for residency in a reliable and valid fashion. CDs should perform these psychometric analyses to ensure that high-quality data are being generated when used for summative purposes.