Introduction

Coaching is an evidence-based approach to supporting physician well-being that is increasing in prevalence for trainee populations. In contrast to the more familiar “mentor,” “advisor” or “sponsor”, coaches can serve to help trainees design and achieve their personal and professional goals while also supporting their well-being and addressing burnout. There are also technical and clinical skills coaches that are prevalent in the surgical literature; their approach differs in style, process, and intended outcome. Trainee coaching programs have been developed and implemented, including the Professional Development Coaching Program (PDCP) [1]. The PDCP was founded to guide residents and fellows to focus on their strengths and support their resilience through the development of skills in coping, positive psychology, and reflection. Evaluations of these programs have focused on the impact on the coachees and have been shown to reduce burnout, improve well-being, and increase coping skills [1, 2].

Despite the evidence of benefits for coachees, the effect of coaching on the experience of physician coaches is less clear. Business literature has described higher workplace satisfaction and reduced work-related mental strain in coaches compared to the general population [3]. In contrast, a study of 106 business coaches found that the average coach experienced nearly six negative effects from coaching, the most common of which were being negatively affected by topics discussed with coachees and insecurities about fulfilling their role as a coach. These effects were also correlated with increased emotional exhaustion, a major component of burnout [4].

Physicians—in particular, those in academic faculty positions—are known to be at high risk for burnout, psychological stress, and professional attrition [5]. The research on coaching in medicine has focused on coachees and only recently have the experiences of coaches been explored. A 2020 qualitative study by Brooks et al. of medical student coaches identified both professional benefits (helping struggling students, providing holistic guidance) and challenges (unclear expectations of the role, balancing between coaching and supervising) [6]. A mixed methods study from 2021 by Elster et al. found that a higher proportion of medical student coaches was burned out compared to physician faculty with administrative roles. The coaches in this study also felt challenged by the high emotional output required to support students [7]. Neither of these studies investigated the change from before-to-after coaching. To our knowledge, no work has investigated whether coaches gain skills or benefits like those endorsed by the coachees. Of note, the PDCP engages volunteer faculty who undergo training in positive psychology and coaching to serve as novice coaches. These are not professional certified coaches, as have been studied in the business literature; nor are they academic learning coaches that are utilized in medical schools to support students.

Given that the coaches in the PDCP are academic physician volunteers serving as novice coaches and not certified physician coaches, the objective of this study was to investigate the impact of a longitudinal coaching intervention on the experience of physician faculty who underwent training to become novice coaches. The study focused on burnout, well-being, and professional fulfillment before and after coaching. Given that non-medical coaches suffer emotional exhaustion from their work and that physician coaches have higher rates of burnout than non-coach faculty in other studies, we hypothesized that burnout and well-being might worsen after coaching. We hypothesized professional fulfillment might be higher despite that decline, given the professional benefits of coaching mentioned in the prior studies. We also sought to investigate whether the coaches benefited from the coaching program in terms of the development and use of skills with others.

Methods

Settings and participants

This was a prospective longitudinal study in a single large academic medical center spanning two year-long iterations of the coaching program from 2017 to 2019. Coaches were unpaid volunteers from faculty in the departments of medicine, pediatrics, and various surgical subspecialities including general surgery, obstetrics and gynecology, urology, oral maxillofacial surgery, vascular surgery, pediatric surgery, and thoracic surgery. Coachees were residents and fellows from the departments listed above who, as part of a randomized control trial (manuscript currently in progress), were assigned to the intervention coaching arm vs. a comparison non-coaching arm. In the intervention arm, they were assigned a coach, but their participation in coaching was voluntary.

Program description

The goals of the PDCP are to support coachees by reducing their burnout, improving their resilience, and maximizing their strengths and potential. The PDCP was founded on the principles of positive psychology to provide trainees a safe environment to reflect on their experiences and build the skills to optimize their strengths and cope with the stressors and challenges. The PDCP was developed at the senior author’s home institution in 2011 through collaboration with the Institute of Coaching, and first implemented with internal medicine interns in 2012 [8]. It has since been expanded to several institutions and organizations with several publications describing the model, approach, and impact [9,10,11]. The research protocol and all research materials and methods were approved by this institution’s Institutional Review Board (Protocol #2017P00056).

Intervention

In the spring of 2017, the first cohort of coaches was recruited via emails from program directors. They then received two hours of interactive training in the principles and practices of coaching and positive psychology led by a subject-matter expert in professional development coaching. Training focused on core coaching techniques through exercises in reflective listening, asking future-oriented questions, facilitating goal-setting conversations, and maximizing strengths and positives. After practicing these skills, coaches were led through the same exercises they would be using with their coachees. Through a dyadic approach, coaches were able to practice and receive coaching.

Coaches also received curricular guides for their meetings with the coachees. At the beginning of the 2017 academic year, coaches were paired with at least one coachee from outside of their subspecialty. This was done to ensure a safe space for the coachees, and to assist the coaches in not defaulting to traditional mentoring conversations. Pairs were expected to meet quarterly for approximately 60 min. using the curricular guides. These sessions focused on the coachees’ development of skills for processing feedback, reflecting on their experiences, and coping with stressors. The contents of these meetings were kept confidential.

Coaches were asked to complete baseline surveys after completing training but before the start of coaching, and end-of-year surveys at the end of each academic year they participated as coaches. Coaches participated for either one academic year (2017–2018 or 2018–2019) or two academic years (2017–2019). This allowed for survey data at baseline, after one year of coaching, and—for some coaches—after two years of coaching. Survey measures were the same at all time points, except that end-of-year surveys asked about length and quality of coaching meetings. Surveys were administered through and stored in REDCap. This study was reviewed and considered exempt by this institution’s Institutional Review Board with exemption #45 CFR 46.101(b)(1) research conducted in established and commonly accepted educational settings, involving normal educational practices.

Outcomes

There were 3 main categories of survey metrics, in addition to demographics and department. Program measures included characteristics of the coaching relationship (number of coachees; length and quality of meetings) and perceptions of the coaching program. Primary outcomes included burnout, well-being, and professional fulfillment. The Stanford Professional Fulfillment Index (PFI) scale measures an overall burnout score (5-point Likert scale [0–4], 10 items, total range 0–40) through subscales for Workplace Exhaustion (WE) and Interpersonal Disengagement (IPD) (5-point Likert scale [0–4], 6 items for WE and 4 items for IPD) [13]. The PFI also separately measures Professional Fulfillment (5-Point Likert scale [0–4], 6 items for Professional Fulfillment, total range 0–24). Well-being was measured using the PERMA (Positive Emotions, Engagement, Meaning, Relationships, Accomplishments) (5-point Likert scale [1,2,3,4,5], 15 items, total range 15–75) [12]. Secondary outcomes included program-related skills and benefits including specific coping skills, experiences with reflection and receiving feedback, and use of coaching skills in other relationships.

Statistical analyses

Coaches who had submitted both a baseline survey and at least one end-of-year survey were included. This allowed for paired pre-post analysis. To maximize sample size, surveys were included even if not all questions were answered. Baseline, end-of-year-1 (EOY-1) and end-of-year-2 (EOY-2) surveys were paired for each coach.

Demographic data were summarized with descriptive statistics. The scores from burnout (and its WE and IPD subscales), and Professional Fulfillment were calculated as averages on a 0–10 point scale and PERMA was summarized into a continuous measure. All primary outcomes, secondary outcomes, and characteristics and perceptions of the coaching program were dichotomized into categorical measures. The dichotomization was done based on existing thresholds published in the literature (averages of burnout, WE, IPD: high >  = 3.325, low < 3.325; sum of Professional Fulfillment: high >  = 8; low < 8) [13][13]. If thresholds were not available, the variables were dichotomized into the top 2 quartiles and bottom 2 quartiles based on the median value at baseline. This latter approach was also used to convert all other survey metrics into categorical variables.

Initially, bivariate analyses were performed for all 3 categories of metrics to compare change from baseline to EOY-1, baseline to EOY-2, and EOY-1 to EOY-2. The coach population at these 3 time points was not balanced, and therefore, repeated measures analysis was not performed. For the primary outcomes, the continuous measures were compared between each of the three time points (baseline to EOY-1, baseline to EOY-2, and EOY-1 to EOY-2) with paired t-test. The categorical measures were compared through paired analyses using McNemar’s test.

Then, adjusted analyses were performed through multivariate linear regression. The dependent variable was the continuous measure of the primary outcomes (burnout, well-being, and professional fulfillment). Covariates included the value of that primary outcome at baseline, time spent coaching, gender, race, and department. Lastly, bivariate analyses were performed to understand if there were elements of the coaching relationship or skills emphasized in coaching that had an association with the primary outcomes. Chi-squared tests were performed with unpaired categorical measures of both the primary outcomes vs. coaching characteristics and secondary outcomes. A two-sided p value < 0.05 was considered statistically significant for all tests. All statistical analyses were performed on Stata 17.

Results

Demographics and coaching program experience

From 2017 to 2019, 136 total coaches participated in the PDCP, and 44% (n = 60) of coaches submitted a baseline and at least one end-of-year survey. Table 1 describes the demographics of the respondents, 63% (n = 35) of whom identified as female, 85% (n = 46) of whom identified as white, and 25% (n = 15) of whom were from a surgical specialty. Forty-nine percent (n = 29) coached for 1 year and 52% (n = 31) coached for 2 years. Table 2 describes characteristics and perceptions of the coaching program. In the first year of coaching, 68% (n = 34) of coaches had more than 1 coachee, while in the second year, 53% (n = 16) of coaches had more than 1 coachee. There was no significant difference in these measures between the first vs. second years of coaching. Of coaches who started in 2017–2018, 95% (104/109) coached again in 2018–2019.

Table 1 Demographic characteristics of the participants and survey respondents
Table 2 Characteristics of the coaching relationship and perceptions of the coaching program during the first and second years of coaching

Bivariate analyses of primary outcomes vs. time spent coaching

The bivariate analyses of our primary outcomes vs. time spent coaching are reported in Table 3. There was a non-significant increase in the proportion of coaches with a low burnout score (68% at baseline [n = 38] to 69% at EOY-1 [n = 36, p = 0.69 vs. baseline] to 84% at EOY-2 [n = 26, p = 0.45 vs. baseline]). The proportion of coaches with low WE also increased, but not significantly (54% at baseline [n = 31] to 61% at EOY-1 [n = 33, p = 0.61 vs. baseline] to 74% at EOY-2 [n = 23, p = 0.39 vs baseline]). There was no significant change in the percentage of coaches scoring low on IPD (75% at baseline [n = 43] to 81% at EOY-1 [n = 42, p = 0.77 vs. baseline] to 72% at EOY-2 [n = 23, p = 1.0]). The proportion of coaches with high Professional Fulfillment scores did not significantly change (46% at baseline [n = 26] to 50% at EOY-1 [n = 27, p = 1.0] to 42% at EOY-2 [n = 13, p = 1.0]). Lastly, the proportion of coaches who scored in the top 2 quartiles for PERMA did not significantly change across timepoints (46% at baseline [n = 27] to 52.8% EOY-1 [n = 28, p = 0.75 vs. baseline] to 41% at EOY-2 (n = 13, p = 0.51 vs. baseline]).

Table 3 Comparison of means (with standard deviation) and proportions of burnout, WE, IPD, Professional Fulfillment, and PERMA at baseline to end of first year to end of second year

Benefits and skills from coaching

Table 4 demonstrates potential benefits and skills gained from coaching. A significantly higher proportion of coaches felt confident about staying emotionally balanced after coaching compared to baseline (55% at baseline [n = 32] to 71% at EOY-1 [n = 37, p = 0.04 vs. baseline] to 66% [n = 21, p = 0.03 vs. baseline]). There was no significant change in the proportion of coaches who reported “Good/Excellent” experiences with skills emphasized in the PDCP, including practicing reflection (58% [n = 35] at baseline vs. 68% at EOY-1 [n = 36, p = 0.18 vs. baseline] and 71% at EOY-2 [n = 24, p = 1.0]). Finally, compared to baseline, a significantly higher proportion of coaches reported using the skills gained through the PDCP in other non-coaching relationships at EOY-1 and EOY-2 (i.e., faculty colleagues: 25% at baseline [n = 15] vs. 69% at EOY-1 [n = 36, p < 0.001 vs. baseline] and 84.4% at EOY-2 [n = 27, p < 0.001 vs baseline]).

Table 4 Comparison of proportions of coaches endorsing program-related skills and benefits at baseline to end of first year to end of second year (* indicates p < 0.05)

Adjusted analyses of primary outcomes

We conducted multivariate linear regression to analyze the relationship between our primary outcomes and time spent coaching after adjusting for demographics, department, and the value of that outcome at baseline. These outcomes are listed in Tables 6, 7, 8, 9, 10 in the appendix.

Burnout was significantly lower if coaches had low burnout at baseline (β = − 1.30, p < 0.001) and significantly higher at EOY-1 and EOY-2 compared to baseline (β = 0.0780, p = 0.039 and β = 0.871, p = 0.039, respectively). There was no significant change at EOY-2 compared to EOY-1 (β = 0.091, p = 0.781). On subscale analyses, WE was significantly lower for coaches with low WE at baseline (β = − 1.35, p = 0.001). IPD was significantly lower for coaches who had low IPD at baseline (β = − 1.29, p = 0.001). IPD was significantly higher at EOY-1 and EOY-2 compared to baseline (β = 1.01, p = 0.017 and β = 1.13, p = 0.011 respectively). There was no significant change in IPD at EOY-2 compared to EOY-1 (β = 0.127, p = 0.711).

PERMA was significantly higher for coaches who scored highly on PERMA at baseline (β = 10, p < 0.001) and significantly lower at EOY-1 and EOY-2 compared to baseline (β = − 4.47, p = 0.011 and β = − 7.01, p < 0.001 respectively). There was no significant change in PERMA at EOY-2 compared to EOY-1 (β = − 2.54, p = 0.146). Professional Fulfillment was significantly higher for coaches who had high Professional Fulfillment at baseline (β = 2.05, p < 0.001) and for non-surgical coaches compared to surgical coaches (β = 0.361, p = 0.0430).

Bivariate analyses of primary outcomes vs. coaching program characteristics and skills

The categorical measures of primary outcomes were also compared with categorical measures of coaching program characteristics and skills through bivariate analyses. The results are reported in Table 5. Of coaches who scored in the bottom 2 quartiles of PERMA, 76% had more than 1 coachee while 24% had only 1 coachee (p = 0.013). The length of the quarterly meetings and quality of communications in the meetings did not show any significant relationships with the primary outcomes.

Table 5 Comparison of proportions of coaches scoring high vs. low on primary outcomes with characteristics of the coaching relationship and program-related skills (*indicates p < 0.05)

Being able to come up with emotionally balanced thoughts during negative times (vs. unable to do so) was associated with a significantly higher proportion of coaches with low burnout (70 vs. 30%, p = 0.005), low WE (71 vs. 29%, p = 0.022), low IPD (71 vs. 29%, p = 0.001), high Professional Fulfillment (77 vs. 23%, p = 0.004) and the top 2 quartiles for PERMA (74 vs. 27%, p = 0.016). Being able to ask people for support (vs. unable to do so) was associated with a significantly higher proportion of coaches who had low burnout (57 vs. 43%, p = 0.014), low Professional Fulfillment 69 vs. 31%, p < 0.001), and in the top 2 quartiles of PERMA (72 vs. 28%, p < 0.001). Good or excellent experiences with reflection (vs. poor or fair experiences) were associated with a significantly higher proportion of coaches who had low burnout (73 vs. 27%, p < 0.001), low WE (74 vs. 26%, p = 0.004), low IPD (71 vs. 29%, p = 0.003), and high Professional Fulfillment (77 vs. 23%, p = 0.008).

Discussion

We evaluated the experience of academic physician coaches participating in a longitudinal coaching program. Coaches reported several benefits related to their experience, such as being able to stay emotionally balanced and using their coaching skills in non-coaching relationships. In our adjusted analyses, burnout, interpersonal disengagement, and well-being decreased from baseline. However, these changes do not correlate with time spent coaching and appeared to be more a reflection of the coaches’ baseline state rather than an effect of program participation itself. On further analyses, better experiences and comfort with the skills emphasized in the PDCP were significantly associated with better outcomes on burnout, professional fulfillment, and well-being.

Coaches reported benefits from coaching in the PDCP. More coaches felt confident about staying emotionally balanced during difficult times after coaching compared to baseline. There was also an upward trend in the proportion of coaches who had good/excellent experiences with reflection and receiving feedback, though these were not significant. A significantly higher proportion of coaches reported using skills gained through coaching in other relationships after coaching compared to baseline. Given that baseline was measured in between completion of training and onset of coaching, it is likely that use of these skills with non-coachees was not necessarily motivated by the training, but by using them with the coachees.

Based on the available literature, we hypothesized that burnout might increase over time in our coach population. In our bivariate analyses, both the mean scores and proportions of coaches with high burnout and WE trended downward over time, but this change was not significant. At the 3 time points, only 16.1 to 32.1% of our coaches had high burnout, with mean scores ranging from 2.48 to 2.74. This is similar to reports of non-coach physicians in other studies measuring burnout using the PFI. A 2020 study of 7364 physicians across 9 institutions had a mean burnout of 2.8 [14]. The PERMA scores in our study ranged in the three time points from 54.7 to 57.3, which is lower than the mean PERMA of 154 medicine faculty (60.9) who participated in PDCPs across multiple institutions. This suggests that our faculty in this study had slightly lower PERMA than prior cohorts [11]. We had also hypothesized that professional fulfillment might increase due to participation as a coach. We did not see an improvement in professional fulfillment in our coaches, and noted that the professional fulfilment means of our coaches was similar to the means of the population in the aforementioned 2020 study (6.5–6.96 compared to 6.61, respectively) [14]. Given the stressors that academic faculty physicians experience, a decline in professional fulfillment over time might be expected and the stability in that outcome that we see in our volunteer faculty coaches could be a “positive” finding. We are unable to explore this further without a control group of non-coach physicians; however, this is a notable result to explore in future studies.

It is challenging to compare the baseline values of our physician coaches to the non-coaching physicians given the lack of a control arm. However, published and internal data on burnout can provide some context. Rao et al. 2019 surveyed physicians at our institution from all specialties and found a burnout rate in 2017 of 45% on the Maslach Burnout Index (MBI) [15]. While this is a different burnout scale, Trockel et al. 2018 showed that MBI and Stanford PFI correlate strongly (r = 0.71). Furthermore, internal survey data from this institution in 2019 found similar rates of burnout in physicians across all specialties on the Stanford PF and the MBI (~ 43%). Given that these scales are comparable, it is worth noting that the physician coaches in our study who started in 2017 (from select specialties) had a burnout rate of 35% at their baseline, slightly lower than the overall physician population at our institution.

Comparison of these findings with studies of other coaches is challenging given the heterogeneity of coaching programs and the few studies that have been performed. Elster et al.’s previously mentioned mixed methods analyses found that nearly 2/3 of academic physicians who were trained and paid to coach medical students met criteria for burnout on the MBI. It is unclear why rates of burnout in Elster’s study are different from ours, though there are likely factors specific to the institution or setting [7].

Both Elster et al.’s and Brooks et al.’s studies of medical student coaches revealed similar themes that could contribute to a coach’s professional satisfaction or fulfillment, such as guiding struggling trainees, building a longitudinal relationship, and observing their successes [6, 7]. Coaches in both studies were challenged by the fragmentation of one’s professional identity due to the coaching role and the expectations and burden this carried. Our measure of Professional Fulfillment asks about a global view of the work experience, of which coaching is simply one part. Given that the professional benefits reported by coaches in prior studies may not be captured in this Stanford Professional Fulfillment metric, this could help explain why we do not observe our hypothesized improvement in Professional Fulfillment. Future studies should investigate effects of coaching on both global- and role-specific professional fulfillment.

We adjusted our analyses of primary outcomes vs. time spent coaching with covariates representing who the coaches were at baseline including the value of that primary outcome at baseline, gender, race, and department. The covariate with the largest effect size was how the coach scored on that outcome at baseline. After adjusting for baseline value, gender, race, and department, we found that burnout, IPD, and PERMA did decline from baseline. However, for those who coached for 2 years, none of the outcomes significantly changed between EOY-1 and EOY-2. Thus, this was likely not an accumulative, dose-dependent effect from coaching. These cohorts also had a very high retention rate, so this lack of change between EOY-1 and EOY-2 was likely not due to self-selecting attrition. Without a comparison arm, it is hard to interpret how much of these changes are due to time spent being a physician. Ultimately, the level of burnout, well-being, and professional fulfillment of the physicians at baseline influenced outcomes more significantly than the time spent coaching. Lastly, while the trends in burnout and IPD (improving) and well-being on PERMA (worsening) may seem contradictory, it is worth emphasizing that burnout and well-being are not mirror opposite constructs. They capture different elements of a subject’s emotional health and recent work by this group has even shown that PERMA and burnout (on the MBI) are not correlated [11].

To further investigate if elements of the coaching experience were associated with our primary outcomes, we performed bivariate analyses of the outcomes vs. characteristics of the coaching relationships and skills emphasized in the program. Having more than one coachee was significantly associated with scoring in the bottom 2 quartiles on PERMA, suggesting a potential impact of increased burden on the coach’s time. Good or excellent experiences with reflection, as well as coping skills like staying emotionally balanced and asking for support, were associated with a higher proportion of coaches who had low burnout, high professional fulfillment, and scoring in the top 2 quartiles for PERMA. This suggests that these skills may be protective in these outcomes and could be emphasized further in training.

This study has several limitations to consider. There was a relatively low response rate when using paired data for analyses, which could indicate a non-response bias. Because we did not gather data on the non-respondents, we cannot draw inferences on how that population might differ from our respondents. The relatively low response rate also led to challenges in our analyses. Constructs such as burnout and well-being are complex and multi-factorial. While we attempted to control for some of these factors in our multivariate analyses, we were limited by our relatively low sample size. Furthermore, we did not have a comparison arm of non-coach physicians, which makes it harder to distinguish how much of the effects are due to being a coach vs. being a physician. Without a comparison arm, it also makes it harder to understand what “no change” may mean in a particular metric, as other coaching studies with comparison groups have shown a decline in many metrics within the comparison arm. This suggests a potentially supportive and stabilizing role that coaching may play in physician resilience and well-being. Finally, this study took place in a single institution with heterogeneity in coachee population, prior experience, and many other factors.

While this variability does make it harder to generalize this study to other academic physician coaches, our findings create an initial, albeit incomplete, profile of burnout, professional fulfillment, and well-being of the physician faculty who volunteer to be novice coaches. These findings demonstrate changes in these outcomes that warrant further investigation and provide potential elements and skills in coaching that can be emphasized to improve these outcomes. Future studies with larger populations and comparison groups will allow for further exploration of the role of coaching on the coach.