Introduction

Empathy is central to the physician–patient relationship. It enables the bidirectional sharing of information and has therapeutic benefit for the patient. Physician empathy is linked with patient satisfaction, patient adherence [1,2,3,4], diagnostic accuracy [5], and disease outcomes in blood pressure and diabetes. In 2013, the American Association of Medical Colleges incorporated empathy into its list of core competencies for health professionals [6].

The effect of medical education on empathy is of great interest to many medical educators. Several studies [7,8,9,10,11,12,13,14,15] have investigated how empathy changes across medical school, although the clinical significance of empathy declines in medical school has been contested [16]. Many of these studies, as Sulzer and colleagues point out, have inconsistent and unclear definitions of empathy, and fail to differentiate the separate components of empathy [17].

Empathy has several components including cognitive and emotive components [18, 19]. The cognitive component involves intellectually entering into an understanding of the patient’s perspective, while the emotive component involves an emotive, subjective sharing in another person’s feelings of the experiences [18, 20]. Though some have argued that clinical empathy should be defined as predominantly cognitive, others have pointed out that this definition fails to capture the full breadth of empathy [17, 19] and may introduce possibilities for unethical behavior by clinicians [21].

Many studies in medical students have consistently found a gender difference in empathy, with those who identify as women consistently scoring higher on empathy scales than male identified students [7, 10, 11, 22, 23]. In this context, “gender” needs to be clarified because it is such a central variable in empathy response. We use gender here as the American Psychological Association (APA) defines it: “the condition of being male, female, or neuter,” referring “especially to social or cultural traits,” in contrast to sex, which the APA defines as “refer[ing] especially to physical and biological traits” [24].

The differences in empathy scores between men and women are generally from the affective component. In studies using the Interpersonal Reactivity Instrument (IRI), empathy differences come from empathic concern (EC) [7], while in studies using the Jefferson Scale of Empathy (JSE), women scored higher on the “compassionate care” and “standing in the patient’s shoes” subscales; there was not a significant difference between men and women on the “perspective taking” subscale [25].

Cognitive empathy involves “theory of mind,” and putting oneself in the place of another person while putting aside one’s own perspective [26]. Affective empathy involves feelings evoked by another person’s affective state, although these feelings do not need to be identical to the other person’s [26]. Emphasizing the distinctness of these two aspects of empathy, different brain regions are involved in the different aspects of empathy [27]. Some researchers suggest that cognitive and affective empathy are employed differently in justice and care theories of moral reasoning [28]. Moral justice reasoning involves applying universal principles across similar cases and uses perspective taking, while understanding the nuances of specific cases uses affective empathy [28].

Some have made the case that empathy as it relates to patient-care should be defined as solely cognitive, rather than a mix of cognitive and affective components [29]. However, divorcing empathy from any emotional, affective component would leave little distinction between an empathic and caring physician, and an empathic and manipulative one. Halpern describes how a focus on purely cognitive empathy, termed “detached concern,” led to tolerance for both paternalistic attitudes towards patients and failures of compassion and medical ethics [21]. Though cognitive empathy is vital, affective, emotional empathy is likely critical to avoid dehumanizing patients [21]. With this perspective, differences in affective empathy are relevant to patient care.

Several studies have reported differences in empathy trends between men and women during medical school. Austin et al., for example, found men increasing in empathy between years 1 and 2 and women decreasing [9]. Hojat et al. found opposing results in a study focusing on empathy decline during the third year of medical school, with all groups of students declining in empathy over the third year, but men having steeper empathy declines than women [13]. Baez et al. found mixed results between men and women in general on empathy [30]. For medical students, Colliver et al. in a re-evaluation of the research in this area found that there may be very little decline or change at all during medical school or between men and women [16].

Given this continued confusion of the gender differences and possible stability or change of empathy in medical students, further research is required. Accordingly, in the present study, we examined the stability of both cognitive and affective components of empathy and the differences between self-identified men and women.

Materials and Methods

Participants

A total of 672 (379 women (56%), 293 men) from 970 medical students (~ 70% response rate) across years 1 – 4 of the 2018–2019 academic year voluntarily participated in either the baseline survey or the sample survey or both. Of these total participants, 631 participated in the baseline survey and 513 participated in the sample surveys. The mean age of participants was 24.3 years (SD = 3.21) at matriculation into medical school. There were 175 year 1 (26%), 165 year 2 (25%), 151 year 3 (23%), and 181 year 4 (26%) total respondents, with 164 MS1, 155 MS2, 141 MS3, and 171 MS4 participants in the baseline survey, and 156 MS1, 134 MS2, 116 MS3, and 130 MS4 participants in the sample surveys.

Study Design

A panel design study — longitudinal over one academic year and cross-sectional across 4 academic years — was used to follow MS1, MS2, MS3, and MS4 students throughout the 2018–2019 school year. This study was approved by the University of Minnesota Institutional Review Board.

Empathy Measures

The Brief Interpersonal Reactivity Index (B-IRI), a self-assessment measure of empathy, with evidence of reliability and validity [14] was employed. The B-IRI is made up of cognitive and affective empathy components [25].

The B-IRI consists of sixteen mixed positive and reverse-coded negative statements rated on a 0–4 Likert scale of “Describes me not well at all,” to “Describes me extremely well,” with four subscales consisting of four statements each [31]. Two B-IRI subscales are perspective taking (PT) and empathic concern (EC), which measure the adoption of others’ viewpoints and “other-oriented” feelings of concern, respectively [20]. We chose these subscales of four items each for a maximum score of 16 for the PT and EC subscales and a maximum total B-IRI score of 32 (Table 1). We omitted the two other B-IRI subscales of Fantasy and Personal Distress as they measure the ability to transpose oneself into fictional characters, and self-oriented unease in tense situations [20]. These are both considered “self-oriented” and we were interested in the “other-oriented” dimensions of empathy, as addressed by the PT and EC scales [25].

Table 1 IRI scale questions used in the pulse survey, their coding status (reverse coded or not), and the IRI subscale to which they belong

Data Collection

University of Minnesota Medical students from all four class years were invited by e-mail to voluntarily take part in a pulse survey, which measured empathy using the B-IRI scale, as well as various other measures, including satisfaction, tolerance for ambiguity, burnout, behaviors experienced during medical school, awareness of mistreatment procedures, depression screen, and moral distress. Data on gender, ethnicity, and campus of origin were also collected. To decrease overall survey fatigue, each student was invited to participate in the pulse survey twice over the course of the year, once in a baseline survey and once in a sample survey. Every student was invited to fill out the baseline survey during September and October of 2018. For the sample surveys, random samples without replacement were selected so that students could not be selected more than once during the year, thus avoiding confounded repeated measures, but still allowing inferences to the population parameters from each sample. Thus, each student was invited to fill out the baseline survey and one sample-group survey (Table 2).

Table 2 Groups and timing of pulse surveys, consisting of an initial baseline survey for every student in September and October, and follow-up surveys administered to random samples without replacement throughout the year

Statistical Analyses

We used analysis of variance (ANOVA) to test for differences in empathy among multiple groups and Student’s t-test to test for differences in empathy between two groups. When using matched data, we used two-tailed paired-sample t-tests and when not using matched data, we used two-tailed two-sample t-tests. To compare the magnitude of empathy changes between groups, we used matched data, generating the difference between baseline empathy score and empathy score later in the year with sample data, then conducted Student’s t-tests on these values. For all statistical tests, we used a value of 0.05 and considered p < 0.05 to be significant.

Results

Women self-rated empathy significantly higher than did men (p < 0.0001) on the B-IRI, with a mean B-IRI score of 23.3 compared to 20.9 for men. This difference in B-IRI score was due to the EC component, with women scoring significantly higher than men (p < 0.0001), but not from the PT component, which had no significant difference between men and women (p > 0.05; ns) (Fig. 1). Gender differences were consistent across all four medical student year cohorts, with women scoring significantly higher than men (p < 0.05) on the B-IRI. This difference was again due to the EC component, where women scored significantly higher than men (p < 0.05) in all four MS years, but not from the PT component, which had no significant difference between men and women (p > 0.05) for any MS year (Fig. 2).

Fig. 1
figure 1

Empathy differences between men and women. Data from the baseline pulse survey. The x-axis shows the IRI subscale or combined IRI and the y-axis shows the score on the IRI scale. Asterisks denote significance (p < 0.05)

Fig. 2
figure 2

Empathy differences between men and women across MS years. Data from the baseline pulse survey. The x-axis shows the year in medical school y-axis shows the score on the IRI scale

Self-identified women scored significantly higher than men (p < 0.05) on the B-IRI in three of the six random samples taken throughout the year. In four of the six samples, women scored significantly higher (p < 0.05) than men on the EC component. None of the samples showed a significant difference in PT between men and women (p > 0.05). Although two samples did not show significantly differences on the EC component between men and women, women trended toward higher scores (Fig. 3). There were no significant differences between self-identified men and women in the magnitude of empathy changes over the course of a year for the EC or the PT scale (p < 0.05). The trend lines by MS year over time are shown in Fig. 4. From this, it can be seen that the trend lines fluctuate over time but women tend towards maintaining higher EC scores than men in all four years.

Fig. 3
figure 3

Empathy differences between men and women. Data from the sample pulse survey. The x-axis shows the sample y-axis shows the score on the IRI scale. Asterisks denote significance (p < 0.05)

Fig. 4
figure 4

Empathy differences between men and women over the course of 1 year. Each panel shows a different MS year, with the PT and EC empathy trajectories of men and women over the course of 1 year

Discussion

We found that women had higher empathy scores than men in medical school. Many empirical studies about empathy in medical education, as Sulzer et al. point out, often view empathy as monolithic and a “black box” and are therefore unhelpful in guiding medical educators [17]. Using the B-IRI, we were able to analyze both affective and cognitive components, and were thus able to shed light on the mechanisms that may underlie empathy.

In an analysis of studies concerning empathy decline in medical education, Colliver and colleagues point out the difference between statistical and clinical significance [16]. They argue that empathy score differences of 0.1–0.5 points are not clinically significant. In our research, the difference in empathy between men and women was 2.4 points, or 7.5% of the total possible score of 32, which potentially has more clinical significance, in addition to statistical significance.

Our results show women scoring higher than men on the EC component of empathy, with no difference between men and women on the PT component (Figs. 1, 2, 3, and 4). Using a panel design study over the course of a year, we found that this trend was persistent across time. We did not find differences between trajectories of empathy over the course of 1 year of medical school for men and women with no significant differences in magnitude of empathy change between men and women at any timepoint.

Our findings on women having higher empathy scores are in keeping with other studies [10, 13, 15, 22, 23], and our finding that this difference came from the affective component agrees with previous studies using the B-IRI to measure empathy [7]. Contrary to our findings, other studies have found differences in empathy trajectories in medical school years for men and women, although the differences are not consistent between studies [9, 13]. Austin et al. found opposing trends in empathy between years 1 and 2, with empathy in men starting low in year 1 and rising in year 2 and empathy in women starting high in year 1 and falling in year 2, such that there were significant differences in empathy for men and women in year 1, but that these differences had disappeared by year 2 [9].

Studies using self-reported empathy scales have generally found women to have higher empathy scores, perhaps reflecting self-perceived gender stereotypes of empathy [30, 32, 33]. Behavioral analyses of specific tasks involving empathy, however, have shown no consistent difference between men and women [32,33,34,35]. This leads some researchers to propose that the higher scores of women on self-rated empathy may be influenced by social expectations, where women feel more comfortable reporting empathy than men [30, 32, 33]. Applying this theory to our study suggests that the socially gendered empathy expectations apply to affective empathy, but not to cognitive empathy. Specifically, there is no social expectation for women or men to have higher cognitive empathy; and therefore, both men and women are equally comfortable rating their cognitive empathy, whereas social expectations of women to have more affective empathy and men to have less lead to women being more comfortable than men reporting affective empathy.

Others have proposed a biological basis for empathy differences between men and women. We use the term gender in this paper as the APA defines it: the societally defined masculinity or femininity. However, since sex and gender are related, though not identical, evolutionary theories of empathy differences which hinge on a male–female sex binary are relevant. In a review of empathy and gender, Christov-Moore and colleagues argued that based on studies on primates, human infants, children, and adolescents as well as human adults, empathic differences between men and women are partially biological and can be explained via an evolutionary framework [36]. Interestingly, Christov-Moore et al. reported that in general, women do not have a clear advantage over men in cognitive empathy the way they do in affective empathy [36], in concordance with the results of the present study.

Our study has limitations. Self-report instruments as employed in the present study may be influenced by social desirability. Additionally, some meta-analyses suggest self-reported empathy may be disconnected from clinical outcomes [16, 17]. Nonetheless, self-reported empathy data have been correlated to measurable patient outcomes [2, 5, 37,38,39]. Our findings provide overall empathy baseline data as well as some cross-sectional, longitudinal, and sex-differences results. Implications include the question “should we assess and select for empathy as an admission criterion to medical school?” Future work might also investigate whether empathy can be taught and if so, what are effective pedagogical means to achieve this. What curricula can be used?

In addition to self-report data, limitations of the present study include non-optimal response rates and all binary self-identified gender as male or female. Further work should be done examining the effects of gender on a spectrum of continuous data on empathy (Table 3). Finally, while the curriculum remained largely the same for all classes, it is possible that subtle curricular changes contributed to the experiences in each class though this is unlikely.

Table 3 Survey participant data

Conclusions

In conclusion, we found that among first to fourth year medical students, women scored higher than men in overall empathy. This difference came specifically from the EC component of empathy with no differences in the PT empathy component, possibly due to either societal or biological differences. This trend was stable longitudinally and across years of medical school, indicating that these differences cannot be explained by differing reactions to a single event but are indicative of persistent, baseline differences. Thus, the present results suggest that the differences in empathy between men and women are stable, and the effect of medical school on these scores is the same for men and women.