Almost all social interactions involve participants’ attempts to identify and understand each other’s thoughts and emotions. These efforts utilize an ability often termed empathic accuracy (Hodges, Lewis, & Ickes, 2015). Successful empathic accuracy is associated with a variety of beneficial outcomes such as better interpersonal interactions (Schmid Mast & Hall, 2018) and more satisfying romantic relationships (Sened et al, 2017a). On the other hand, in specific circumstances, such as when a relationship is under threat, greater empathic accuracy may be detrimental (Simpson, Ickes, & Blackstone, 1995).

Three Viewpoints of Empathic Accuracy

Empathic accuracy has been studied in various areas of research (sometimes under different terms). Three of the prominent ones, detailed below, are neuroscientific research, dual process models, and close relationship research. While integrating these lines of research has great potential, few efforts have been made to do so to date (Zaki & Ochsner, 2011). As we will detail below, there is some evidence connecting dual process and neuroscientific models, and some evidence suggesting connections between neuroscientific models and ones drawn from relationship research. The current study aims to strengthen this triple link by linking close relationship and dual process research, showing that the assumed similarity vs. direct accuracy distinction, common in close relationship studies, fits a dual process model. Below, we briefly summarize main models from all three lines of research and existing studies connecting them.

Close Relationship Research

Close relationship studies, usually looking at empathic accuracy between members of romantic couples, suggest that empathic accuracy inferences can be partitioned into two main components (Kenny & Acitelli, 2001; Wilhelm & Perrez, 2004). The first, referred to as assumed similarity, taps the extent to which perceivers’ inferences are associated with their own emotions. It may reflect perceivers’ tendency to assume that targets’ emotions are similar to their own; importantly, if the assumption is correct (i.e., target and perceiver emotions are indeed similar), assumed similarity will indirectly lead to accuracy. The second, referred to as direct accuracy, taps the extent to which perceivers’ inferences are “correct”—i.e., associated with the targets’ actual emotions after adjusting for any effects of assumed similarity. It reflects all sources of information, including both verbal and non-verbal cues, other than assumed similarity (Hall & Schmid Mast, 2007). Both components of empathic accuracy have been researched extensively (e.g., Kouros & Papp, 2019; Powers, Rauh, Henning, Buck, & West et al., 2011)

Dual Process Research

Dual process models (Chaiken & Trope, 1999) distinguish between type-1 mental processes, which are autonomous and do not require attention or working memory resources, and type-2 processes, which are slower and sequential, and do utilize working memory resources. Dual process models have been proposed for a wide variety of psychological phenomena; examples from areas pertinent to empathic accuracy include emotion regulation (Gyurak et al., 2011) and impression formation (Wyer & Srull, 2014). Similarly, within empathic accuracy research, Ma-Kellams and Lerner (2016) demonstrated that participants who were encouraged to think intuitively (i.e., use type-1 processes) had less accurate empathic inferences than those who were encouraged to think reflectively (i.e., use type-2 processes).

Identifying dual processes in psychological phenomena can help generate a variety of hypotheses, as the two types of processing have been associated with various individual differences (e.g., in working memory; Barrett et al., 2004) and situational variables (e.g., intuitive vs. rational modes of thought; Shenhav et al., 2012). Such situational variables can then be experimentally manipulated, allowing for tighter causal inferences.

Neuroscientific Research

Social-neuroscience models of empathy distinguish between two brain systems involved in empathic inferences (e.g., Shamay-Tsoory, 2011; Singer, 2006; Zaki & Ochsner, 2011)Footnote 1: Emotional empathy, which involves sharing another person’s emotions and assuming correctly that the shared emotion reflects the other person’s emotion, tends to be fast, intuitive, and automatic. Cognitive empathy, which involves explicitly processing all available information, tends to be slow and deliberate.

The distinction between these two systems helps clarify the ways in which various biological mechanisms (e.g., hormones; Bethlehem, van Honk, Auyeung, & Baron-Cohen, 2013) might play a role in empathic accuracy and how it might operate in individuals characterized by various conditions known to have neural correlates (e.g., borderline personality disorder; Dziobek et al., 2011).

Integrating Lines of Research

Some links between these lines of research have already been demonstrated in previous studies. For example, Bohl and van den Bos (2012) and Spunt and Lieberman (2013) linked neuroscientific and dual process models by showing that emotional empathy involves type-1 processes and cognitive empathy involves type-2 processes. Other studies have linked neuroscientific models with the accuracy components model in close relationship research. For example, Sened, Yovel, et al. (2017b) provided evidence supporting the identification of emotional empathy with assumed similarity and of cognitive empathy with direct accuracy.

In an attempt to integrate these findings, we outline a conceptual framework for understanding empathic accuracy according to these three lines of research (see Fig. 1). The framework suggests that direct accuracy results from type-2 processes carried out by the cognitive empathy system, whereas assumed similarity results from type-1 processes carried out by the emotional empathy system. It can help cross-pollinate these lines of research, leading to new hypotheses, research methods, and possible applications. However, the framework suggests that dual process models should also be linked directly to the accuracy components model, a link for which we could not find evidence in previous studies.

Fig. 1
figure 1

Conceptual framework of empathic accuracy

The Current Study

Thus, the current study aims to provide more evidence to support the suggested conceptual framework by completing the “missing link” between these three lines of empathic accuracy research, the link between the accuracy components model from close relationship research and dual process models. In line with the proposed framework, we hypothesized that assumed similarity reflects type-1 processes, as both have been linked to emotional empathy; in contrast, direct accuracy may require slower and more involved type-2 processes, as both have been linked to cognitive empathy. Thus, we expected slower inferences to reflect greater direct accuracy but not greater assumed similarity, as in the time scale typical of our methods (tens of seconds), fast type-1 processes would have already had ample time to achieve their final results.

To test our hypothesis, we examined empathic inferences in three daily diary samples of romantic couples and measured the time taken to reach each inference. The main reason we looked at this population is that recruiting couples is a simple way to recruit dyads of people who can be reasonably expected to be motivated and have opportunity to note each other’s emotions on a daily basis. Importantly, we are looking at the way spending more time on empathic inferences changes the involvement of these two components in arriving at the final inference; our hypothesis is indifferent as to why participants spend more time on specific days (e.g., higher motivation to be accurate on a specific day, or having more free time to spend on the task).

Method

De-identified data used in the study, analysis code, and a list of measures participants completed which were not used in this study are openly available at https://osf.io/wsuta/

All data collection was approved by institution IRBs.

Participants

Sample 1

Eighty-nine cohabiting mixed-gender couples were recruited to take part in a study on dyadic relationships in which they completed a background questionnaire and a 4-week daily diary. To be included, participants were required to be over 18 years old. Nine couples were excluded because at least one partner had completed less than 6 entries of the required 28 diary entries.Footnote 2 Demographics are reported in Table 1. T test analysesFootnote 3 revealed that participants who remained in the study were significantly older (mean age difference in years 4.087, 95% CI (1.723, 6.452), t(80.712) = 3.44, p < .001) and with longer relationships (mean relationship length difference in years 4.013, 95% CI (1.862, 6.165), t(77.314) = 3.714, p < .001) than participants who were excluded. Wilcox tests revealed no difference in income or education level between participants who remained in the study and participants who were excluded (p > .1). Participants did not receive any compensation for this study.

Table 1 Demographics

Sample 2

After results from study 1 confirmed our hypothesis (see the “Results” section, below), we ran similar analyses on data from a large sample of new parents which was already available at the time (e.g., Sened, et.al, 2019). This was done to replicate our results and to ensure that they remain consistent with couples undergoing a specific major life transition—in this case, into parenthood. One hundred eight mixed-gender couples, expecting their first child, were recruited for a larger study on the transition to parenthood. To be included, participants were required to be over 18 years old, and expecting a single child (i.e., not twins). Five couples left the study before beginning the daily diary component, and 3 more were excluded because at least one partner had completed less than 6 entries of the required 21 diary entries. Demographics are reported in Table 1. T test analyses revealed no difference in age or relationship length between participants who remained in the study and participants who dropped out (p > .1). Wilcox tests revealed no difference in income or education level between participants who remained in the study and participants who were excluded (p > .1). Participants received the equivalent of 150 USD for completing the daily diary alongside other procedures, including a lab meeting.

Sample 3

To ensure that our findings do not rely (unintendedly) on “researcher degrees of freedom” (Wicherts et al., 2016), we pre-registered our hypotheses and analytic code, and recruited a new sample. Ninety-five cohabiting couples were recruited in a manner similar to sample 1, but couples completed diaries for 3 weeks only. Eight couples were excluded because at least one partner had completed less than 6 entries of the required 21 diary entries. Demographics are reported in Table 1. T test analyses revealed no difference in age or relationship length between participants who remained in the study and participants who dropped out (p > .1). Wilcox tests revealed no difference in income or education level between participants who remained in the study and participants who were excluded (p > .1). Participants did not receive any compensation for this study. Three couples (3.1%) were same-gender women couples, all other couples were mixed-gender.

Procedure

Sample 1

Participants completed a daily diary questionnaire every day for 28 days using the Qualtrics platform. They were instructed to complete the questionnaire an hour before they go to sleep and were asked not to share their answers with their partners. Participants who missed a questionnaire could complete it during the following day; they were asked to reply as if they were completing it at the appropriate time (i.e., the previous evening). Over 83% of the entries (3760/4480) were completed. The daily diary included a self-reported mood questionnaire, several items unrelated to the current study, and then the partner-report (i.e., empathic inference) mood questionnaire.

Data cleaning included removing entries in which the whole diary (including various questionnaires not reported in this study) was completed in less than 2 min, which was assumed to indicate careless completion, leaving 3678 entries. We then removed the 5% of entries with the longest response times for self-reported moods and for empathic inferences, leaving 3354 entries.

Sample 2

Three months after their first child was born, participants completed a daily diary questionnaire every day for 21 days in the same manner used in sample 1. Close to 97% of the entries (4067/4200) were completed.

Data cleaning included removing entries in which the whole diary (including various questionnaires not reported in this study) was completed in less than 2 min, which was assumed to indicate careless completion. No entries were removed. We then removed the 5% of entries with the longest response times for self-reported moods and for empathic inferences, leaving 3686 entries.

Sample 3

Procedure for sample 3 was identical to that for sample 1, on the same site. Items not analyzed in this study were different, and the diary was only 21 days. Data collection and full analysis code was pre-registered at https://osf.io/cfhsn.Footnote 4 Over 85% of the entries (3128/3654) were completed.

Data cleaning included removing entries in which the whole diary (including various questionnaires not reported in this study) was completed in less than 2 min, which was assumed to indicate careless completion, leaving 3123 entries. We then removed the 5% of entries with the longest response times for self-reported moods and for empathic inferences, leaving 2835 entries. Unfortunately, though these cleaning steps were pre-planned and pre-coded (and thus performed in a manner identical to the previous two samples), we failed to mention them in the pre-registration document.

Measures

Negative Affect

To be able to calculate empathic accuracy, we first measured participants’ self-reported moods and their reports of their partners’ moods using a version of Lorr and McNair’s (1971) Profile of Mood States (POMS) questionnaire, adapted for daily diary use by Cranford et al. (2006). For self-reported moods, the questionnaire prompt was “Please indicate the extent to which you are experiencing these feelings at the moment, in the evening”. For reports on partner moods the prompt was “Please indicate the extent to which you think that your partner is experiencing these feelings at the moment, in the evening”. To avoid confusion, from now on we will refer to reports on partner moods as empathic inferences.

In sample 1 and sample 3, participants rated six positive and negative moods (e.g., anger, calmness) on a 0–100 sliding scale, with 0 marked “not at all” and 100 marked “to a very large extent”. In sample 2, participants rated the same moods using 3 items each, on a 5-point Likert-type scale, with 0 marked “not at all” and 5 marked “extremely”. The current study focused only on negative moods, as existing evidence from psychology research in general (cf. Baumeister, Bratslavsky, Finkenauer, & Vohs, 2001) and from empathic accuracy studies in couples specifically (e.g., Sened et al., 2017b) shows that across a wide range of phenomena, effects for negative emotions are more pronounced than for positive ones.Footnote 5 Thus, in each sample, scores for the three negative mood items (anxiety, anger, and sadness) were averaged to create a general negative mood measure.

We used the techniques suggested by Cranford et al., (2006) to calculate reliability for detecting within-person changes, which are roughly equivalent to calculating Cronbach’s alpha after subtracting each person’s mean over the diary period from the ratings. Shrout (1998) suggests that a reliability of 0.0 to .1 can be seen as “virtually no reliability”, .11 to .4 as “slight”, 0.41–0.6 as “fair”, .61 to .80 as “moderate”, and .81 to 1.0 as “substantial”. However, Nezlek (2017) notes that in within-person studies, standards should be somewhat relaxed, for two reasons: First, we expect participants to complete these measures daily, which means they must be kept shorter. Second, multilevel analysis methods deal better with low reliability than traditional analyses (e.g., multiple regression).

Reliability for self-reported negative moods was .55 for men and .53 for women in sample 1, .83 for men and .86 for women in sample 2, and .58 for men and .60 for women in sample 3. Reliability for empathic inferences was .59 for men and .41 for women in sample 1, .87 for men and .85 for women in sample 2, and .59 for men and .63 for women in sample 3.

Assumed Similarity and Direct Accuracy

Both indices were obtained using multilevel models. Assumed similarity was calculated as the slope of perceivers’ self-reported moods when predicting their empathic inferences. Direct accuracy was calculated as the slope of targets’ self-reported moods when predicting the perceivers’ empathic inferences, in the same model (see statistical analysis below for more details).

Response Time

In all three samples, the response time for each report was assessed using the Qualtrics platform. Response times were measured to a 1-ms precision and are reported in seconds.

The main response time variable, which we refer to as empathic inference response time, is the time it took each participant to make empathic inferences, measured as time between entering the questionnaire page inquiring about the partner’s moods (which included no other questions) and clicking the button to exit this page and progress to the next one. Values that were 3 standard deviations above or below the mean were considered outliers and removed from analysis, leaving 3205 entries in sample 1, 3556 entries in sample 2, and 2769 entries in sample 3.

Importantly, response time included both positive and negative mood questions, even though we focused on accuracy regarding negative emotions. With our procedures, we cannot be sure whether (faced with mood inference items) participants infer their partners’ overall affective states and then answer all items accordingly, or instead make separate inferences for each item. If the former is the case (which we strongly suspect), the effects of inference processes on response times would be reflected only in the first few items; measuring separate response times by item type (positive vs. negative) would render the results meaningless. More importantly, even if the latter is the case, the only adverse consequence of measuring total response time to all items would be adding some random noise.

To control for general processing speed as well as for individual differences in motor operation (e.g., mouse use) speed, we regressed the response time for empathic inferences on the self-reported mood response time, and then used the residual scores.Footnote 6 The self-reported mood response time is the time between entering the questionnaire page for reporting one’s own moods (again, a page which included no other questions) and clicking on the button to exit the page and progress to the next one. The self-reported mood questionnaire was presented and rated earlier than the empathic inference questionnaire (inquiring about the partner’s moods), with other questionnaires unrelated to the current study in between the two.

Analyses

Data were analyzed in a mixed model analysis, using SAS PROC MIXED, according to recommendations by Bolger and Laurenceau (2013) for analyzing longitudinal dyadic data. According to these recommendations, we treated our data as containing two levels: a daily level (level 1, a specific day for a particular participant) and a couple level (level 2, the couple within which this participant was a partner). Such mixed models take into account the statistical non-independence of partners in each couple. This permits us to simultaneously test the prediction with regard to both partners. Our dependent variable was the perceiver’s empathic inference, which was unique for each partner and day. The independent variables were the empathic inference response time (similarly unique for each partner and day), alongside both the perceiver’s and the target’s self-reported mood (which were unique for each day, and served dual roles—a specific partner’s self-reported mood served as a perceiver variable when predicting their own empathic inference and as a target variable when predicting their partner’s empathic inference).

Our statistical model was designed as a truth and bias (T&B; West & Kenny, 2011) model. West and Kenny provide useful terminology and some guidelines for situations in which some kind of judgment or assessment is performed—in our case, the perceiver’s attempt to assess the target’s mood. The authors suggest predicting the judgment (the empathic inference) using two independent variables. The first is the actual target of inference—in our case, the target’s actual mood. This is termed the truth variable. The effect of the truth variable on the judgment, termed the truth force, is our operationalization of direct accuracy. The second is some alternative source of influence on the judgment—in our case, the perceiver’s own mood. This is termed the bias variable. The effect of the bias variable on the judgment, termed the bias force, is our operationalization of assumed similarity. These associations are moderated by the daily empathic inference response time (Mik) and by a gender difference dummy variable for each participant (Gi)—coded − 0.5 for men and 0.5 for women. The full model is as follows:

$$ {J}_{\mathrm{i}\mathrm{k}}=\left({b}_0+{u}_{0\mathrm{i}}\right)+\left({b}_1+{u}_{1\mathrm{i}}\right){T}_{\mathrm{i}\mathrm{k}}+\left({b}_2+{u}_{2\mathrm{i}}\right){B}_{\mathrm{i}\mathrm{k}}+\left({b}_3\right){T}_{\mathrm{i}\mathrm{k}}{M}_{\mathrm{i}\mathrm{k}}+\left({b}_4\right){B}_{\mathrm{i}\mathrm{k}}{M}_{\mathrm{i}\mathrm{k}}+\left({b}_5+{u}_{5\mathrm{i}}\right){G}_{\mathrm{i}}+\left({b}_6+{u}_{6\mathrm{i}}\right){G}_{\mathrm{i}}{T}_{\mathrm{i}\mathrm{k}}+\left({b}_7+{u}_{7\mathrm{i}}\right){G}_{\mathrm{i}}{B}_{\mathrm{i}\mathrm{k}}+\left({b}_8\right){G}_{\mathrm{i}}{T}_{\mathrm{i}\mathrm{k}}{M}_{\mathrm{i}\mathrm{k}}+\left({b}_9\right){G}_{\mathrm{i}}{T}_{\mathrm{i}\mathrm{k}}{M}_{\mathrm{i}\mathrm{k}}+{e}_{\mathrm{i}\mathrm{k}} $$

with Jik, Tik, Bik, and Mik being the judgment, truth, bias, and moderator variables respectively for participant i on day k. The model estimates a coefficient for each variable and for interactions between the truth and bias variables on the one hand and the moderator on the other. Each coefficient includes a fixed component (e.g., b0). All main effect coefficients included a random component varying by participant (e.g., u0i); we could not estimate random components for interaction effects as the models did not converge when such components were included.

According to West and Kenny’s (2011) suggestion, the judgment, bias, and truth variables were all centered on the mean truth value for the target. Response times were centered on each participant’s mean. To calculate effect sizes for specific predictors, we estimated Cohen’s f2 using the procedures outlined by Selya, Rose, Dierker, Hedeker, and Mermelstein, (2012). We evaluated moderation effect sizes by reference to a widely cited review of moderation effects (Aguinis, Beaty, Boik, & Pierce, 2005), which notes that moderation effect sizes tend to be small and reports a median f2 across studies of .002, with a 4th quartile at .0053. Thus, we considered moderation effects above .002 as meaningful.

Our operational hypotheses were as follows:

  1. 1.

    The perceiver’s empathic inference response time will positively moderate the direct accuracy effect. This moderation will be reflected by a positive interaction between the target’s self-report of negative mood and the perceiver’s empathic inference response time; b3 > 0.

  2. 2.

    The perceiver’s empathic inference response time will not positively moderate the assumed similarity effect. This lack of moderation will be reflected by a non-positive interaction between the perceiver’s self-report of negative mood and the perceiver’s empathic inference response time; b4 ≤ 0.

  3. 3.

    The moderation by the perceiver’s empathic inference response time of direct accuracy will differ significantly from (and be more positive than) its moderation of assumed similarity. This difference will be reflected by a positive contrast between the two effects; b3 − b4 > 0.

Power Analysis

For all samples, data was collected as part of larger studies and sample size was not determined by the considerations of the current study. To ensure the adequacy of these samples for testing our hypothesis, we ran a post hoc power analysis on the results of sample 1, using the R simulation package simr (Green & MacLeod, 2016). Since our hypothesis was directional, we calculated power for a one-tailed test. Post hoc power for sample 1 was .706 (95% CI .677, .734). Using the data from sample 1 to extrapolate the simulation to higher sample sizes, we determined that 100 couples were needed to achieve adequate power (β = .804, 95% CI .778, .828), a sample size which was achieved in sample 2. Unfortunately, due to budget considerations, the sample size for sample 3 was slightly smaller; extrapolating the data from sample 1 to the achieved sample size of 87, power for sample 3 was .769 (95% CI .742, .795).

Post hoc power analyses revealed a power of .797 (95% CI .771, .862) for sample 2 and .764 (95% CI .736, .79) for sample 3. Aggregating the post hoc results, power for finding the effect in at least two out of three studies was .851.

Results

Descriptives

Means, standard deviations, ranges, and quartile values are provided for self-reported moods, self-reported mood response times, empathic inferences, and empathic inference response times in Table 2. We performed mixed model analyses to compare self-reported mood response time and empathic inference response time; self-reported mood response time was higher in sample 1 (b(SD) = 4.053(.358), t(6984) = 11.33, p < .0001, f2 effect size = .022), sample 2 (b(SD) = 41.155(.354), t(7724) = 5.56, p < .0001, f2 effect size = .004), and sample 3 (b(SD) = 4.478(.336), t(5925) = 13.34, p < .0001, f2 effect size = .032).

Table 2 Descriptive statistics

Correlations between the three negative mood measures (self, partner, and empathic inference), accounting for repeated measures (Bakdash & Marusich, 2017), are provided in Table 3.

Table 3 Negative mood correlations accounting for repeated measures

Response Time Interactions

To test whether response time moderated assumed similarity and/or direct accuracy, we conducted a mixed model analysis. Full results are presented in Table 4. The association between perceiver self-reported mood and perceiver empathic inference (i.e., assumed similarity) was not moderated by response time in any sample. The association between target self-reported mood and perceiver empathic inference (i.e., direct accuracy) was moderated by response time in all samples (sample 1 f2 effect size = .006; sample 2 f2 effect size = .005; sample 3 f2 effect size = .004). In all samples, slower inferences involved more direct accuracy. To test whether moderation of assumed similarity and moderation of direct accuracy differed significantly, we contrasted these effects; moderation of direct accuracy was significantly larger in sample 1 (b(SD) = .005 (.003), t(694) = 1.99, p = .047); sample 2 (b(SD) = .005(.002), t(810) = 2.78, p = .006); and sample 3 (b(SD) = .007(.004), t(596) = 2.04, p = .042). To illustrate the moderation effect, Fig. 2 shows standardized r coefficients for assumed similarity (i.e., the association between perceiver self-reported mood and perceiver empathic inference of target mood) and direct accuracy (i.e., the association between target self-reported negative moods and perceiver empathic inference of target mood). Slopes are presented for fast (i.e., < − 1 SD residual response time) and slow (i.e., > + 1 SD residual response time) inferences in each sample.

Table 4 Effects of perceiver and target negative affect and residual response time on perceiver’s inference of target negative affect
Fig. 2
figure 2

Direct accuracy and assumed similarity standardized coefficients in fast and slow responses

Discussion

We set out to test the associations between direct accuracy, assumed similarity, and response times. As expected, direct accuracy was associated (in all three samples) with longer response times, suggesting that it might indeed reflect a slow, deliberate thought process. In contrast, assumed similarity showed no such association. These findings are in line with a dual process account of empathic inferences. Specifically, they support a default interventionist dual process model (Evans & Stanovich, 2013), or an anchoring and adjustment sequence (Tversky & Kahneman, 1974). Accordingly, type-1 processes (in this case, assumed similarity) are likely to provide an initial, quick judgment, and type-2 processes (in this case, direct accuracy), when implemented, are likely to adjust the initial judgment, but not replace it.

Our findings provide support for the conceptual framework stated above (see Fig. 1), tying findings from close relationship, dual process, and neuroscientific research. However, while we focused on one link in the framework for which we could find no prior research, more studies are required to provide evidence for all three links. In our opinion, the ties between the accuracy components model and the other two lines of research warrant specific attention, whereas the tie between dual process and neuroscientific accounts of empathy is already relatively well established. Studies linking assumed similarity and direct accuracy to other characteristics of type-1 or type-2 processes (e.g., the use of working memory) and studies using neuro-imaging or neuro-stimulation methods while recording participants’ empathic inferences alongside their own mental states could provide crucial evidence supporting this framework.

Beyond supporting this conceptual framework, our findings have direct implications for the understanding of empathic processes. Specifically, the fact that quicker thinking involves more assumed similarity and less direct accuracy can help predict the effects of different time investments in accuracy within different situations. For example, when people share experiences and spend time together, their emotions may grow more similar (Golland, Arzouan, & Levit-Binnun, 2015; Hatfield, Cacioppo, & Rapson, 1993). In such situations, quick and immediate empathic inferences which rely mainly on assumed similarity may be very accurate and useful. Consequently, any investment of deliberate and effortful social perception may be a waste of time and energy better spent on simply experiencing the moment; indeed, they may even lead to less accurate inferences. Conversely, when trying to assess the states of mind of strangers or even of close others who have spent some time apart, more effort might be helpful or even required to achieve accurate inferences through direct accuracy.

The current study has some limitations, which can be addressed in future studies. First, reliability of the negative affect measure in sample 1 and, to a slightly lesser extent, sample 3, was quite low—in all likelihood, because of the brevity of the scales used therein. Given the complete replication with longer (and thus, higher reliability) scales in sample 2, the pre-registration of sample 3 hypotheses and analytic code, and the fact that the negative affect construct itself (unlike the construct of empathic accuracy) is not central to our findings, we are not overly concerned about this issue; nevertheless, future studies would benefit from measures with higher reliability.

Second, our study is correlational, raising the possibility that some other variables would lead participants to both be more accurate and to spend more time on their inferences with no real association between the two. We believe such an explanation to be unlikely, as it would mean that participants are consistently spending time working on the inference task without that time investment influencing their performance. Nevertheless, ruling it out completely would require an experimental study.Footnote 7

Third, our measurement of response times was not as precise as it could be; we do not have a record of what device the participant used to complete each questionnaire (e.g., cell phone or computer), a factor which might have influenced completion time. Additionally, the measured response times reflect the time it took to complete all mood items (positive and negative). Future studies could retain information on the device used and separate timing data for the rating of each individual item, although this method has its own caveats (see our Method section for a detailed explanation).

Finally, all samples involved romantic couples. While we have no reason to suspect the results are not generalizable to other dyads, future studies should test this question with other types of dyads (e.g., parents and children, friends, patients, and therapists). Additionally, though we had no hypotheses as to the effects of relational variables such as relationship length or quality on these findings in this initial examination of response time effects on empathic accuracy, such variables could be examined in future studies.