Introduction

Human social interaction is enriched with patterns of synchronized movements. This becomes evident both on large scales involving whole organisms (e.g., rhythmic dancing, fans’ chanting at sports events) and on smaller scales involving only single modalities of the organism (e.g., facial mimicry, synchronization of tone of voice, synchronization of hand gestures). Humans tend to seek coordination in movement and evaluate interpersonal encounters with higher degrees of nonverbal synchrony more favorably (Bernieri 1988; Lakin and Chartrand 2003; Tickle-Degnen and Rosenthal 1990; van der Schalk et al. 2011). Thus, the quantification of synchronized nonverbal movements has become an important field of research in recent approaches to understanding the flow of human interaction in social psychology (Laroche et al. 2014). Moreover, translating this understanding to clinical psychology is increasingly relevant, as some mental disorders, such as schizophrenia, have been linked to deficits in movement synchronization (Kupper et al. 2015; Varlet et al. 2012).

One commonly investigated phenomenon in this regard is emotional facial mimicry, the tendency of humans to mirror emotional facial expressions (e.g., respond with a smile upon seeing a smile).Footnote 1 Regardless of whether emotional facial mimicry is a merely perceptual (Chartrand and Bargh 1999; Dimberg et al. 2000) or a more valence-shaped (Hess and Fischer 2013) human behavioral response to facial expressions, it is considered an interactional behavior. Surprisingly, almost the entire body of research on facial mimicry relies on the investigation of participants’ reactions to watching standardized facial stimuli on a computer screen. Researchers have repeatedly requested to include face-to-face interactions into the research on social interactions more frequently (Fischer and van Kleef 2010; Heerey 2015). However, this request has remained largely unanswered so far. One reason for this is that methodological difficulties in the quantification of facial expression synchrony in face-to-face interactions impede the research field (Heerey 2015). Available approaches often use frame-by-frame observer ratings to decide on whether or not a given expression or action unit is present at a given time (e.g., Heerey and Crossley 2013; Stel and Vonk 2010). This procedure is both cumbersome and prone to measurement errors due to the subjective evaluation of facial expressions by human coders. This is why researchers recently have tried to objectify and simplify the assessment of facial expression synchrony, for instance by using real-time avatars and face-tracking (Boker et al. 2009). However, face-tracking software is still advancing and prone to measurement error due to position shifts, head turns, and non-detection of minimal movements (Cohen et al. 2013; Messinger et al. 2009).

Electromyography (EMG) allows for a more objective and economical assessment of facial movements, is able to produce time series data with high frequency sampling rates, and can even detect covert or subthreshold movements (Dimberg 1982). Laboratory studies of facial mimicry therefore, are the gold-standard assessment method for facial movements. However, so far, no viable approach has been presented to analyze the synchrony of EMG data among two people interacting face-to-face. To our knowledge, the only study to assess facial mimicry in a dyadic setting using EMG was published by Hess and Bourgeois (2010). The authors showed that smiling but not frowning was mimicked in dyadic affiliative interactions of strangers. For this, they averaged the EMG activity of a muscle site for epochs of 15 s within participants, which then were correlated across two interaction partners. This approach, though innovative, might not have been ideal, however, as research on the timing of facial mimicry responses has shown that the peak mimicry response is usually observable within 1 s after stimulus onset (Dimberg et al. 2000; Heerey and Crossley 2013). Moreover, synchronizationsFootnote 2 of genuine smiles may even occur as quickly as within 200 ms after stimulus onset indicating an anticipated synchronization of smiling (Heerey and Crossley 2013). Because the anticipation of smiles might be the product of (social) cognitive processes different from those involved in the reflexive mimicry response, assessing the temporal dynamics of mimicry responses as precisely as possible is crucial.

Therefore, maintaining the high temporal resolution of the EMG data is important when quantifying the temporal dynamics of facial expression synchrony in face-to-face interactions. Windowed cross-lagged correlation (WCLC; Boker et al. 2002) is a statistical analysis technique that allows for the quantification of the synchrony of two behavioral time series while largely maintaining the temporal resolution of the input data. Moreover, it accounts for the possibility that synchrony between the time series may occur with a time lag as well as simultaneously. Interpersonal synchrony assessed with WCLC has been shown to be reliably quantifiable for whole body movements (Boker et al. 2002; Ramseyer and Tschacher 2011) and to be a significant predictor of more positive interpersonal outcomes (Ramseyer and Tschacher 2011; Tschacher et al. 2014). However, WCLC has not yet been used to analyze EMG data or to quantify facial expression synchrony in face-to-face conversations (one exception is a case study by Messinger et al. 2009, who used WCLC to investigate mother-infant-interactions). Adapting WCLC for facial EMG data would close these research gaps and help to objectify the investigation of facial expression synchrony in face-to-face interactions by bypassing a human coding procedure.

In this study we therefore tested the suitability of WCLC to quantify the temporal dynamics of facial expression synchrony in dyadic interactions. We analyzed smiling muscle (Zygomaticus Major) and frowning muscle (Corrugator Supercilii) activations within dyads of participants who talked about positive and negative life events. We validated WCLC against a control condition of chance-level synchrony and tested its reliability over the course of an interaction by means of the consistency of the synchrony estimate produced by WCLC. As an additional indicator of construct validity, we expected to replicate the key findings on facial expression synchrony in face-to-face interactions mentioned above. These include the finding by Hess and Bourgeois (2010) who found that smiles but not frowns were mimicked by interaction partners in an affiliative interaction and the finding by Heerey and Crossley (2013) that smiling synchronizations occur within 200 ms.

We hypothesized that (1) WCLC would detect synchrony of facial muscle activity better than would be expected by chance, (2) frowning would be mimicked less than smiling, (3) the synchrony measure would be reliable over the course of an interaction, and (4) we would be able to measure smiling synchrony occurring within time lags of ±200 ms.

Method

Sample

We collected data from 33 dyadic interactions of mentally healthy subjects (N = 66), recruited from the community of the city of Hamburg, Germany. The subjects either participated as control subjects in an ongoing research project investigating the effects of reduced facial expressiveness in schizophrenia or in a parallel project employing the same protocol investigating expressiveness in mentally healthy people. Inclusion criteria for all participants were age 18–65, no life time diagnosis of any mental disorder (exception: diagnosis of alcohol/drug abuse within the past year), no self-reported history of any neurological disorders, and verbal IQ >80. Dyads were matched with respect to sex and approximate years of education. After excluding three of the 33 dyads due to missing EMG data (technical recording failure = 1 dyad) or excessively noisy EMG-data due electrodes not properly adhering to the skin (= 2 dyads), we finally included 30 dyads (N = 60) in our analyses. Nineteen of these dyads were female/female, the remaining eleven were male/male. Mean age of participants was 35.1 years (SD = 12.2; range 19–62) and mean years of education were 16.1 years (SD = 3.1; range 10–22). All participants were unaware of the clinical focus of the project, uninformed about the fact that the EMG electrodes measured facial movement during the interactions, unaware of the hypotheses of this study, and indicated in the end of the experiment that they had never encountered their interaction partner before.

Social Interaction Task

For the dyadic interaction, the participants prepared a positive and a negative life event according to standardized instructions. For the positive events, we told participants to recall an enjoyable event that had happened to them within the last 12 months. For the negative event, we told participants to recall an event where they had felt annoyed or angry (cf. Hess and Bourgeois 2010). Participants rated how enjoyable/annoying this event was for them on a scale from 1 (not at all enjoyable/annoying) to 9 (maximally enjoyable/annoying). The questions read “To which extent was the event enjoyable for you?” and “To which extent was the event annoying for you?”. Additionally, participants were also asked to indicate a “title” of each event and an estimation of which emotion was the primary emotion felt throughout the event (participants had to choose among 6 basic emotions and neutral/other). Participants were instructed to try to explain to the interaction partner why the event was enjoyable/annoying. In the instructions we used the terms ‘conversation’ and ‘conversational topics’ to refer to the interaction and to the events, respectively, to avoid passivity in participants in the listener role. Accordingly, the instructions also included a passage specifically encouraging participants to actively respond to the told stories when in listener role.

The interaction started with a 2-min warm up phase in which participants were supposed to introduce one another and discuss whether they wanted to use the formal or informal way of addressing each other in German. In fact, all participants chose to address one another using the informal way (equivalent to using the first name in English). Then participants discussed their positive and negative life events. Each event was the conversational topic for approximately 3 min, while experimenters announced topic shifts via an intercom. Event valances were blocked for the interaction so that both positive and both negative events were discussed directly after one another. The interaction ended with a cool down period of 2 min, announced as a waiting period by the experimenters. We had intended to counterbalance the order of valence blocks across all dyads in the sample. However, because one experimenter team neglected this single aspect of the study protocol and started each interaction with the positive events, only eight out of 30 dyads actually started with the discussion of negative events.

EMG Measurement

We collected facial EMG data throughout the interactions continuously from the Zygomaticus Major (the muscle that pulls the corners of the mouth up in a smile) and the Corrugator Supercilii (the muscle that draws the eyebrows together). For simplification, in the following, we will refer to the movement elicited by the Zygomaticus as smiling and to the movement elicited by the Corrugator as frowning. We restricted the assessment to these two muscle regions for two reasons: One was that, the study was designed to also be conducted in a clinical sample with participants with a diagnosis of schizophrenia. We thus limited the number of electrodes in the face to a minimum to reduce the aversiveness of the situation (talking to a stranger, being filmed, having “psychophysiological data” recorded from the face) and the invasiveness of the EMG method in order to increase the ecological validity of our study. The other reason was that we adopted the study design of Hess and Bourgeois (2010), who did not find significant effects or incremental validity, respectively, for two further potentially relevant candidates for the assessment, the Levator Labii Alaeque Nasi (associated with disgust displays) and Orbicularis Oculi (associated with smiling).

The electrodes were placed directly before the beginning of the interaction and after cleansing the skin sites using Nuprep™ abrasive gel. We used 6 mm Ag/AgCl surface electrodes with an outer diameter of 13 mm that were placed onto the skin of participants using 13 mm adhesive discs with an inner diameter of 5 mm. The electrodes were filled with TEN20™ conductive paste. Electrodes were placed in pairs (bipolar measurement) at the two muscle sites (Corrugator Supercilii and Zygomaticus Major) 13 mm apart from one another according to Fridlund and Cacioppo (1986). Ground electrodes were 7 mm Ag/AgCl electrodes that were placed via an ear-clip on one earlobe of each participant (cf. Tassinary et al. 2007).

We measured the EMG signal in both participants simultaneously using a single NeXus-10 Mark-II (Mind Media BV, The Netherlands) device. We connected both participants’ ground electrodes to the EMG device’s ground electrode input channel via a ground-splitter (Mind Media BV, The Netherlands). We used BioTrace software (Mind Media BV, The Netherlands) for monitoring, initial preprocessing, and data storage on a computer in another room which was connected to the NeXus via USB cable. The EMG data was initially sampled at 2048 Hz. The signal was down-sampled to 32 Hz using the root mean square (RMS) method, which was performed on the rectified signal and with a time constant of 1/16 s. EMG data were then exported for further analysis in MATLAB (MathWorks, MA, USA) environment.

Procedures

Presence of mental disorders was ruled out based on an interview session using the Structured Clinical Interview for DSM-IV Axis I (Wittchen et al. 1997). Verbal IQ was assessed with the German Multiple Choice Vocabulary Test (Lehrl 1999). This assessment of exclusion criteria (psychopathology) and other questionnaires relevant to the clinically focused study lasted for ca. 1.5 h. Suitable participants were then invited to take part in a dyadic interaction on another day. As a cover story, we told the participants that we were interested in differences between interactions of close friends and strangers and that they had been assigned to the stranger-group. They were also informed that, throughout the conversations, ‘psychophysiological data’ would be collected from their faces by four ‘sensors’.

On the day of the interaction, the two interaction partners arrived in separate rooms, where they gave written consent. When the participants had prepared their positive and negative event, they were taken to the room in which the interaction took place. In the following, the application of electrodes occurred at the same time for both participants of each dyad by two different experimenters. After visually checking signal quality, participants were instructed to knit their eyebrows as much as they could and to smile as widely as they could. We used the resulting activation levels as benchmarks for the maximum contraction of each muscle site. To prevent priming participants to pay extra attention to facial movements, participants were told that the movements were necessary to check whether the electrodes would adhere to their faces when talking. After this, the experimenters left the room and the interaction task was carried out as described above. After the interaction, participants filled in several questionnaires, were compensated monetarily with 10 € per hour of attendance, debriefed, thanked, and dismissed.

Preprocessing of EMG Time Series Data

To create a proper control condition of chance level synchrony, we generated pseudo-interactions (cf. Bernieri 1988; Ramseyer and Tschacher 2011) by randomly assigning the time series of participants taken from different dyads to one another (thus, these were participants who did not communicate with each other), cutting the longer series to match the length of the shorter one. In these pseudo-interactions all synchrony should be subject to chance.

Before running the WCLC analysis, the time series data of both genuine and pseudo-interactions were preprocessed in several steps using specifically written MATLAB scripts. First, data were cleared from artifacts by deleting all data points that exceeded the critical benchmark level obtained from participants before the interaction. By this we deleted excessive spikes in the EMG signal that occurred as artifacts due to participants touching the electrodes or significantly moving the cables. This resulted in a mean loss of .01% of data points per dyad. Second, the time series were normalized to a range of 0 to 1 by using min–max normalization. Third, the time series were smoothed using a moving average with a time constant of .5 s.Footnote 3

For a manipulation check, we compared mean muscle activation levels across the various interactional phases (warm up, positive/negative events, cool down). For these analyses only, we applied a baseline correction to the EMG data. An algorithm searched for the 10 s interval in each muscle’s time series that showed the smallest standard deviation. We then used the mean of this interval plus three standard deviations as a baseline threshold for activation. We used a conservatively high cut-off (3 SDs) to have activity represent overt facial movements (cf. Hess and Bourgeois 2010). The baseline correction was performed individually, accounting for inter-individual measurement differences. The median cut-off level was at 14.58 μV RMS for the Zygomaticus and at 10.49 μV RMS for the Corrugator, coming slightly below the 15 μV cut-off used by Hess and Bourgeois (2010).

Synchrony Quantification

The WCLC analysis produces a parameter of the strength of the association of two time series (= synchrony) for each assessed time point of the interaction and additionally takes into account time lags between the two time series. Essentially, WCLC calculates the correlation of two time series for a series of overlapping time windows (= “a vector of sequential measurements sampled from a time series”; Boker et al. 2002, p. 341) of a constant size. The time window is “moved” along the time line in pre-specified incrementsFootnote 4 to obtain the correlation of the two time series at each time point. Thus, the moving of the window resembles the procedure of a moving average, with WCLC producing correlations of the two time series’ windows. Therefore, a WCLC analysis with a window size of 1 s maintains the same temporal resolution of the input data as a moving average with a time constant of 1 s. Additionally, before moving the time window an increment further in the analysis, pre-specified time lags are subsequently applied to each of the time windows and a correlation for each resulting window combination is calculated (cf. Boker et al. 2002, pp. 343–344). By shifting the time series back and forth in this manner, the WCLC analysis is able to identify the amount of synchrony at each given time point, and the time lag with which one of the two time series was leading the other time series at this time point.

We ran the WCLC analysis for both genuine and pseudo-interactions using a time window size of 7 s (= 224 observations), which was moved over the time series in increments of 1/32 s (= 1 observation). Our maximum time lag was 2 s (= 64 observations), which was applied in increments of 1/32 s (= 1 observation). Window size and time lag in the analysis have to be chosen due to theoretical considerations.Footnote 5 For example, the choice of a window size may depend on how long the behavioral units to be analyzed for synchrony are assumed to be. The maximum time lag should mark the maximum delay with which a synchronization response may occur, in order to be considered as a synchronization response and not as a distinct behavioral action. Here, we expected that facial expression synchronizations would generally have durations shorter than 5 s and that these synchronizations would generally occur within time lags of 1 s. In order to also capture slightly longer or more delayed responses we set the values slightly higher than what our actual expectations implied.

The correlation coefficients in the resulting WCLC matrices were Fisher z-transformed, as recommended by Ramseyer and Tschacher (2011). Matrices were then cleared of all correlation coefficients not exceeding a dyad-specific significance threshold. The threshold was derived by correcting an initial α-level of p = .010 for the number of correlation coefficients stored in a dyad’s WCLC result matrix using Bonferroni correction. Because the time series varied in length for each interactional phase, the WCLC matrices also varied in number of stored correlation coefficients. Therefore, we calculated the Bonferroni corrected significance thresholds individually for each interactional phase. In contrast to Ramseyer and Tschacher (2011), who used absolute values of correlation coefficients, we cleared the WCLC matrices of all negative correlation values. We opted for this solution because negative correlations mark asynchrony of the two time series. This asynchrony would potentially mark what is known as counter-mimicry (Lanzetta and Englis 1989), which we did not expect to be present in the affiliative interactions, since counter-mimicry is known to occur in competitive situations. Furthermore, asynchrony may represent an artifact of the analysis. For example, a negative correlation value could represent a scenario in which person 1 started smiling as a response to person 2 stopping smiling. However, if the smiling onsets of both participants occurred within 2 s of one another, we would describe this behavior as a lagged smiling synchronization. If not, we would consider the two smiles being distinct actions. Both cases are already accounted for by analyzing the positive correlation coefficients. In such cases, the negative correlation coefficients are redundant (for more details see online supplementary material). Nevertheless, we additionally performed all analyses on the WCLC matrices when only considering significant negative correlation coefficients. Moreover, because situations in which a frown is being responded to with a smile or vice versa are possible interactional behaviors, we also performed all analyses across muscle sites, although we did not expect these behaviors to be present in the affiliative interactions.

In all dyads, we defined the person who was first asked to begin the conversation as person 1 and the other person as person 2. Thus, in all WCLC result matrices, coefficients at positive time lags referred to person 1 leading the interaction and coefficients at negative time lags referred to person 2 leading the interaction. We calculated the WCLC analyses separately for each muscle site and for each interactional phase [warm up, positive event of person 1 (positive 1), positive event of person 2 (positive 2), negative event of person 1 (negative 1), negative event of person 2 (negative 2), and cool down].

To obtain a single parameter of synchrony for each of the six interactional phases, we calculated the grand mean of all coefficients in the respective WCLC matrix.Footnote 6 We averaged the six resulting mean synchrony levels to obtain a parameter of overall synchrony across the interaction. We additionally determined for each row (= each time window) of each WCLC matrix the time lag corresponding to the peak synchrony (WCLC peaks).Footnote 7 By this, we obtained the amount of WCLC peaks that occurred at each time lag. This number was then expressed as number of WCLC peaks within a 3 min period (= anticipated length of an interactional phase) to be able to compare the values between dyads. WCLC peaks were determined individually for each interactional phase and then summed up for each time lag over the six interactional phases. In this way we were able to explore the temporal dynamics of the synchronization more closely as the number of WCLC peaks represents a better estimation of the frequency of synchronizations at a given time lag than the mean of the WCLC amplitude at this given time lag. This is because the WCLC coefficients within a single time window, across time lags, are usually substantially autocorrelated. We averaged the WCLC peaks for several segments of adjacent time lags. We chose a segment width of approximately ±200 ms around time lag 0 for the inner segment to be able to test our hypothesis 4 which stated that significant smiling synchrony would be present within time lags of ±200 ms. All other segments were then designed to have the same size. We thus averaged the WCLC peaks for a segment of ±6 observations around time lag 0 (overall segment width = 13 observations ≈ 406 ms or ±203 ms) and for four more segments of 13 observations (≈ 406 ms) in both the positive and negative time lag direction (overall 9 segments covering ±1828 ms around time lag 0).

Data Analysis

We ran three manipulation checks. First, we explored the event intensity ratings provided by the participants to check whether participants generally chose events of high personal relevance. Second, we checked the benchmark non-normalized amplitude levels obtained for each muscle before the interaction in order to assure that the EMG assessment in two participants with only one assessment device resulted in valid EMG amplitudes. We expected that benchmark contractions across the whole sample should result in μV RMS amplitudes comparable to those known from previous research. For instance, Epstein (1990) reported mean benchmark Zygomaticus amplitudes of approximately 140 μV RMS and Corrugator amplitudes of approximately 80 μV RMS. Furthermore, we correlated the benchmark amplitudes within the two dyad-members to check whether connecting two participants within a dyad to the same device systematically distorts the EMG amplitudes. We expected non-significant correlations in this regard. Third, we analyzed the mean activation levels of Zygomaticus and Corrugator activity in the different valence conditions of the interaction, to check whether valence induction affected the muscle activities. For this we used the baseline-corrected time series. We tested this in a three-way repeated measures ANOVA performed on the 30 dyads. We defined muscle site (Zygomaticus vs. Corrugator) and valence type (positive vs. negative) as two within-subject factors in this analysis. For this we averaged the EMG activations for the two positive and the two negative events a dyad-member had participated in, respectively. We further included a third within-subject factor dyad-member (person 1 vs. person 2), to account for both participants within a dyad. We expected more Zygomaticus activation in the discussion of positive events and more Corrugator activation in the discussion of negative events, thus a significant interaction of muscle site by valence type. In this case we planned to follow up the analysis with respective repeated measures analyses within each muscle site with the within-subject factors valence type and dyad-member.

To test hypotheses 1 and 2, we calculated a three-way repeated measures ANOVA [muscle site (Zygomaticus vs. Corrugator) by interaction type (genuine vs. pseudo-interaction) by (interactional) phase (warm up vs. positive 1 vs. positive 2 vs. negative 1 vs. negative 2 vs. cool down)] on the WCLC synchrony. Following hypotheses 1 and 2, we expected a muscle site by interaction type interaction (with smiling but not frowning being superior to chance level synchrony). We thus planned to follow up significant interactions with repeated measures ANOVAs or paired-samples t tests, depending on the exact type of comparison. We did not have a specific hypothesis about varying amounts of WCLC synchrony in the varying experimental phases and thus looked at emerging differences exploratively.

To test the reliability of the synchrony analyzed with WCLC over the course of the interaction (hypothesis 3), we calculated internal consistencies (Cronbach’s α) across the six experimental phases for smiling and frowning synchrony, respectively, and for both genuine and pseudo-interactions. We considered values higher than .75 to represent sufficient reliability.

Our hypothesis 4 stated that smiling synchrony would be present within time lags of ±200 ms. To test this, we calculated a paired samples t test to test for the difference between genuine and pseudo-interactions in the sum of WCLC peaks that fell within the inner segment of ±203 ms around time lag 0. In order to confirm hypothesis 4, this segment should contain more WCLC peaks than expected by chance. We also explored the differences in the numbers of WCLC peaks between genuine and pseudo-interactions within the other segments. For all analyses we tested at an α-level of p = .050 and corrected for family wise error rates using Bonferroni correction where needed.

Results

Manipulation Checks

The intensity ratings on the 1-to-9 scale provided by participants before the interactions indicated that the events chosen by the participants were generally high in intensity for both happiness (M = 8.0, SD = 1.1) and anger (M = 7.2, SD = 1.7) (for more information see online supplementary material).

The mean μV RMS benchmark amplitudes for Zygomaticus and Corrugator were M = 147.0 (SD = 73.0) and M = 153.2 (SD = 72.5), respectively. Thus, the sensitivity of the EMG measurement of Zygomaticus was comparable with the levels reported by Epstein (1990) and higher for Corrugator. The within-dyad correlations of the benchmark amplitudes for Zygomaticus and Corrugator were r(30) = .21, p = .268 and r(30) = −.12, p = .529, respectively. These results indicate proper measurement of both muscle sites which confirmed the impression gained by visual inspection of parallel video and EMG recordings.

The repeated measures ANOVA on the mean activation levels showed significant main effects of muscle site, F(1, 29) = 106.45, p < .001, η 2 P  = .79, and of valence type, F(1, 29) = 17.95, p < .001, η 2 P  = .38. The interaction of muscle site by valence type was also significant, F(1, 29) = 21.05, p < .001, η 2 P  = .42. The planned follow-up repeated measures ANOVA on Zygomaticus activation resulted in a significant main effect of valence type, F(1, 29) = 20.98, p < .001, η 2 P  = .42. In the respective analysis on the Corrugator, a trend-level significant main effect of valence type emerged, F(1, 29) = 3.82, p = .060, η 2 P  = .12. As can be seen in Fig. 1, participants activated the Zygomaticus more during the discussion of positive versus negative events and tended to activate the Corrugator more during the discussion of negative versus positive events. Overall, participants activated the Zygomaticus more than the Corrugator.

Fig. 1
figure 1

Mean normalized EMG activation levels for Zygomaticus and Corrugator during the discussion of positive and negative events. Each bar represents the cumulated means for persons labeled as person 1 and 2, respectively. The p values correspond to the respective main effects in the repeated measures ANOVAs on positive versus negative events within each muscle site

Facial Expression Synchrony

The distribution of mean WCLC synchrony across the applied time lags for both genuine and pseudo-interactions are illustrated in Fig. 2 for smiling and in Fig. 3 for frowning. In the three-way repeated measures ANOVA on WCLC synchrony, significant main effects of muscle site F(1, 29) = 88.26, p < .001, η 2 P  = .75, interaction type, F(1, 29) = 45.35, p < .001, η 2 P  = .61, and phase emerged, F(1, 29) = 10.57, p < .001, η 2 P  = .27. Significant two-way interactions were found for muscle site by interaction type, F(1, 29) = 33.17, p < .001, η 2 P  = .53, and muscle site by phase, F(1, 29) = 13.22, p < .001, η 2 P  = .31. The three-way interaction muscle site by interaction type by phase was also significant, F(1, 29) = 3.80, p = .003, η 2 P  = .12. As can be seen in Figs. 1 and 2, smiling but not frowning synchrony in genuine interactions could be statistically distinguished from chance level synchrony (pseudo-interactions). Supporting hypotheses 1 and 2, this differential effect held true for all of the six interactional phases, when tested individually using paired samples t tests [smiling: ts(29) = 4.66–6.05, all ps < .001, d z s = .85–1.11; frowning: ts(29) = .03–1.61, ps = .117–.980, d z s = .01–.30]. It is worth noting, though, that all effect sizes for frowning synchrony were in the direction of genuine synchrony outscoring chance-level synchrony. That being said, the effects of frowning synchrony were likely too small to detect with the power of our analyses and our results do not per se suggest that frowning was not reciprocated at all but likely at only rare occasions.

Fig. 2
figure 2

Distribution of mean (Zygomaticus) smiling synchrony estimates along the different time-lags for each interactional phase for both genuine (solid line) and pseudo-interactions (dotted line). The shaded gray areas represent ±1 standard error of the mean

Fig. 3
figure 3

Distribution of mean (Corrugator) frowning synchrony estimates along the different time-lags for each interactional phase for both genuine (solid line) and pseudo-interactions (dotted line). The shaded gray areas represent ±1 standard error of the mean

In support of hypothesis 3, smiling synchrony showed high internal consistency over the course of the interaction (Cronbach’s α = .93). Frowning synchrony did not provide sufficient internal consistency (Cronbach’s α = .69) matching the low internal consistencies of the two pseudo-interaction conditions (Cronbach’s α = .66 and .69 for smiling and frowning, respectively). These results indicate that smiling but not frowning synchrony was sufficiently reliable over the course of the interaction and under different conditions (change of conversational topics throughout the interaction).

Timing of Facial Expression Synchrony

Figure 4 illustrates the differences between genuine and pseudo-interactions in number of WCLC peaks throughout the interaction for the nine analyzed time lag segments. Supporting hypothesis 4, the inner segment of ±203 ms around time lag 0 contained significantly more WCLC peaks in genuine than in pseudo-interactions, t(29) = 5.72, p < .001, d z  = 1.21. This indicates that the WCLC analysis detected significant smiling synchrony within time lags below 203 ms. The four segments adjacent to the inner segment (two in each time lag direction) also contained significantly more WCLC peaks in genuine than in pseudo-interactions, ts(29) = 3.77–6.55, ps = .001–<.001, d z s = .78–1.33. These findings, that remained significant when correcting for multiple testing, indicate that smiles were predominantly synchronized with maximum delays of ca. 1000 ms.

Fig. 4
figure 4

Number of (Zygomaticus) smiling synchrony peaks occurring in varying time lag segments for genuine (light grey bars) and pseudo-interactions (superimposed dark grey bars). Error bars represent standard error of the mean. **p < .010; ***p < .001

Further Analyses: Counter-Mimicry

When analyzing negative correlations (asynchrony) within muscles, significant differences in asynchrony emerged between genuine and pseudo-interactions for the Zygomaticus, F(1, 29) = 44.88, p < .001, η 2 P  = .61. However, the effects were in the opposite direction than the effects for synchrony: In all interactional phases pseudo asynchrony outscored genuine asynchrony, ts(29) = −8.23 to −4.14, all ps < .001, d z s = −1.5 to −.94. Concerning Corrugator, no significant differences occurred, F(1, 29) = .12, p = .912, η 2 P  < .001. When analyzing across muscle site synchrony, there was a trend towards genuine synchrony outscoring pseudo-synchrony, F(1, 29) = 3.98, p = .056, η 2 P  = .12. Post-hoc tests then showed that there was a significant effect only during the discussion of the first positive event, t(29) = 2.80, p = .009, d z  = .59. However, this effect did not survive correction for multiple comparisons. For all other phases, there were no significant differences between genuine and pseudo-interactions, ts(29) = .92–1.84, ps = .076–.927, d z s = .02–.37. These results can be read in the same way as the results for frowning synchrony: the instances where across muscle site synchrony emerged were likely too rare to be captured significantly by the analyses. There was no evidence of significant asynchrony across muscle sites, F(1, 29) = .32; p = .576, η 2 P  = .01).

Discussion

In this study we examined whether a method for quantifying the synchrony of time series data (windowed cross-lagged correlation; WCLC) is suited to quantify the temporal dynamics of facial expression synchrony, when the facial expressions are continuously measured with EMG in face-to-face dyadic interactions. We analyzed the synchrony of smiling (Zygomaticus Major) and frowning muscle (Corrugator Supercilii) contractions during the discussion of positive and negative life events of the interaction partners. As hypothesized, significant smiling synchrony was detected by the WCLC analysis compared to a control condition of pseudo-interactions. Smiles were predominantly synchronized within 1000 ms and a significant portion of these as quickly as within 200 ms. Construct validity and parameter stability of the analysis appeared to be high.

In accordance with other studies investigating nonverbal synchrony with WCLC in face-to-face interactions, we used pseudo-interactions to simulate chance-level synchrony. Like Bernieri (1988) and Ramseyer and Tschacher (2011), we also found genuine synchrony to outscore pseudo-synchrony. The effect sizes for overall smiling synchrony in our study in the different interactional phases (d z s = .85–1.11) were considerably larger than the effect sizes reported by Ramseyer and Tschacher (2011) (ds = .50–.59), indicating that our analysis provided a higher signal-to-noise ratio. This difference is possibly explained by the fact that, unlike Ramseyer and Tschacher, we employed a significance-threshold for the WCLC coefficients in order to suppress non-significant synchrony. Moreover, we suppressed all negative correlation coefficients marking asynchrony between the time series. Accordingly, when analyzing our smiling muscle data without suppressing negative and non-significant correlations in the WCLC matrices (see online supplementary material), the effect sizes (d z s = .42–.61) were reduced to amounts comparable to those observed by Ramseyer and Tschacher.

We did not find significant frowning synchrony in our dyads. In the light of findings that have shown that the Corrugator is reliably deactivated in response to positive stimuli in general (Larsen et al. 2003) and positive facial expressions in particular (Dimberg et al. 2000), one might have expected frowning synchrony at least in the positive interaction phases because the Corrugators of both participants should relax during active smiling. In facial mimicry studies based on stimulus presentations via computer, this deactivation becomes evident when signal-change to baseline is calculated. This baseline is usually part of the within-trial pre-stimulus interval (e.g., Dimberg et al. 2000; Likowski et al. 2008) or even a neutral expression (e.g., Hess and Blairy 2001) in order to control for possible unrelated muscle activations. The mimicry responses (both of activation and deactivation) then are usually in a range of several μV (e.g., ±3 μV in Dimberg et al. 2000). In our continuous EMG assessment, we accepted a “calm” baseline with a maximum of 5 μV RMS, when checking signal quality before the interactions. This baseline level was comprised of general alertness and muscle tone as well as noise. When muscles were activated, as in active frowning or smiling, the RMS EMG amplitude in most participants reached levels of more than 100 μV RMS (cf. also Epstein 1990). This is because the facial movements we recorded were much less subtle than the ones usually assessed in facial mimicry research. Thus, we particularly investigated overt facial expressions, but in turn we did not detect deactivation of muscles in the way the aforementioned studies did. Our method appears thus to be specifically useful to detect overt facial mimicry as opposed to covert facial mimicry.

By finding smiling to be synchronized between interaction partners to a larger extent than frowning, we confirmed findings by Hess and Bourgeois (2010). A possible explanation for the differential effect related to smiling versus frowning synchrony could be that both Hess and Bourgeois (2010) and our study investigated affiliative situations. In these types of situations, frowning, which may be considered a non-affiliative signal, is less likely to be synchronized than the affiliative signal smiling (e.g., Hess et al. 2000). The notion that the interactions in our study were affiliative in nature is also reflected by our finding that counter-mimicry reactions (e.g., out of contempt) did not occur on a significant basis. Counter-mimicry is known to particularly be present in competitive, as opposed to affiliative situations (e.g., Lanzetta and Englis 1989). Moreover, asynchrony in the smiling muscle was significantly suppressed in the affiliative interactions. This further underlines the importance of synchronization of smiles in affiliative interactions.

When further disentangling the temporal dynamics of smiling synchrony, we found smiling synchrony within time lags up to 200 ms, thus corroborating findings by Heerey and Crossley (2013). Heerey and Crossley assumed that synchronization within 200 ms after smile onset reflects an anticipated response because perceptive processing in addition to subsequent motor output would take longer than these 200 ms. They found that genuine but not polite smiles were synchronized by interaction partners within 200 ms. It is thus likely that the high proportion of low time-lag synchronizations found in our study reflects a high proportion of reciprocated genuine smiles in the interactions. This interpretation also fits in well with the findings of Hess and Bourgeois (2010), who found that in comparison to genuine smiles, polite smiles were only rarely expressed in first encounters. Because of this cross-validation with existing findings in the literature, we assume the construct validity of the outcome parameter of our WCLC analysis to be high.

Some limitations to our study design need mentioning. First, due to the convenience sampling, the age range was larger than in most other experiments on facial EMG. Even though studies on the topic suggest that neither facial EMG nor facial mimicry are affected by age (Bailey and Henry 2009; Hühnel et al. 2014), we found a significant negative correlation of age and Zygomaticus activation levels. However, controlling for age did not change the results in any of the analyses on the EMG data, neither for the manipulation checks, nor for the synchrony calculations. Second, we restricted the EMG assessment to two muscle sites and the emotional event types to happiness and anger situations, so that conclusions about the synchronization of other facial expressions such as fear or disgust are limited. Third, we conducted our study in a Western European country and culture and thus our findings need replication in other cultural areas. Finally, because our design was not completely counterbalanced, we cannot rule out that sequencing effects produced the differential results for positive and negative event types. Nevertheless, we believe that the core research question of this study (whether or not facial expression synchrony can be assessed by using EMG and WCLC) can still be answered by our data. For example, the findings in the “warm up” phase in which the participants did not yet know whether they would start with positive or negative stories did not differ from our general findings in terms of finding Zygomaticus synchrony (p < .001; d = 1.15), but not Corrugator synchrony (p = .60; d = .10) when comparing to pseudo-interactions. Moreover, the warm up period seems to be a good proxy in terms of synchrony of Zygomaticus movements in the interaction overall. Across the various interactional phases, Zygomaticus synchrony showed a Cronbach’s α = .93. The corrected item-to-scale correlation of the warm up period with respect to the overall interactional synchrony was r = .87. Thus, the order of valence blocks does not seem to have influenced the feasibility and validity of the analytic approach presented in this study. However, the analyses of differences between positive and negative narratives were explorative and due to the lacking counterbalancing need to be interpreted with caution.

The WCLC approach presented here could be used to address research questions on how the temporal dynamics of facial mimicry in dyadic interactions are affected by experimental manipulations (such as valence) or inter-individual differences related to interpersonal behavior, such as social cognition (Heerey 2015). Another important unanswered research question that could be addressed is how the temporal dynamics affect the quality of social interactions. This may be particularly indicated for investigations aiming at understanding social interaction skills deficits in clinical populations in which facial mimicry deficits might play a role, such as depression (Wexler et al. 1994), autism (McIntosh et al. 2006), or schizophrenia (Varcin et al. 2010). Moreover, WCLC is not restricted to EMG data and also likely to provide valid results for other methods measuring facial expressions, such as automated tracking of facial expressions from video recordings of classical two-dimensional cameras (Messinger et al. 2009) or newly from three-dimensional depth cameras (e.g., Zhang 2012).

In sum, our findings provide good evidence to support using behavior time series analysis, and WCLC in particular, for quantifying the temporal dynamics of facial expression synchrony. WCLC largely maintains the temporal resolution of the input data, relies on an objective assessment of movements, offers a high signal-to-noise ratio, and provides high construct validity. Using the analysis, we for the first time were able to provide EMG data derived from face-to-face interactions that support a rapid, presumably anticipated, synchronization of smiles within 200 ms. The method could be used to investigate this phenomenon in more depth in both healthy and clinical populations.