Introduction

Human motion is sensitive to periodic auditory information, such as footsteps or hands clapping. We unintentionally and transiently (i.e., for short and intermittent periods) synchronize to the regular pulse of external auditory stimuli. This effect called spontaneous synchronization is known to influence the pace of our periodic movements and their stability. Recent research has questioned the conditions of emergence of this unintentional phenomenon. Although a large repertoire of human movements—tapping, running, standing, swaying in a rocking chair and other physiological rhythms (e.g., cardiovascular and respiratory systems) (Bernardi et al., 2009; Coste, Salesse, Gueugnon, Marin, & Bardy, 2018; Demos, Chaffin, Begosh, Daniels, & Marsh, 2012; Hattori, Tomonaga, & Matsuzawa, 2015; Peckel, Pozzo, & Bigand, 2014; Repp & Su, 2013; Van Dyck et al., 2015), and auditory stimuli—simple metronome, music and even subliminal rhythmic stimuli (Coste et al., 2018; Schurger, Faivre, Cammoun, Trovó, & Blanke, 2017; Van Dyck et al., 2015; Varlet et al., 2018) have been tested, the dynamics of this transient phenomenon is still largely unknown.

Spontaneous synchronization is not constrained to a one-to-one frequency relationship between movement and stimulus cycle, but also emerges for more complex frequency ratios (Kelso & DeGuzman, 1988; Peckel et al., 2014; Varlet, Williams, Bouvet, & Keller, 2018). Even if ratios other than 1:1 can emerge, they are less likely to be observed, as predicted by the mathematical concept of the Farey tree. The Farey tree orders the modes of coordination between two oscillators—mainly observed so far in bimanual synchronization studies (Peper, Beek, & van Wieringen 1995a, b; Treffner & Turvey, 1993)—based on the shape and complexity of the relationship (Bare Winding Number), and the strength of the coupling (K) between the two oscillators, as seen in Fig. 1. These two parameters determine the width of the Arnold tongue, corresponding to the region of attraction where oscillators become coupled to each other and synchronize at a given ratio with varying degrees of stability (e.g., higher level ratios have wider, more stable tongues, Arnold, 1965). The stability of the frequency ratio depends on K and the model predicts that low stability encourages a switch from higher-order modes (e.g., 2:5, 4:1 in the lower levels on the right side of the tree, see Fig. 1) to lower-order ones (1:2, 1:1 in the upper levels).

Fig. 1
figure 1

Modified with permission from the authors from Fig. 1c. Hoffmann, Torregrosa, & Bardy (2012)

Joint representation of Arnold tongues and the first four levels of the Farey tree. The width of the Arnold tongues, which is related to coordination stability, is defined by the strength of the coupling between the two oscillators (K) and the complexity of their relationship (Bare Winding Number). The routes described in the Farey tree show how coupled oscillators transition from less stable frequency ratios at higher levels to more stable ratios at lower levels.

Although simple movements may benefit from spontaneous synchronization with the auditory stimulus at a 1:1 frequency ratio (Hattori et al., 2015; Repp, 2006; Schurger et al., 2017; Van Dyck et al., 2015), certain more complex types of limb movements (i.e., multiple limbs and/or movements in multiple directions) might profit from more complex frequency ratios. For example, Toiviainen and colleagues observed that in a dance setting (spontaneous movement mostly intentionally synchronized to music), limbs and directions of movements are synchronized to different metrical levels of the music (Toiviainen, Luck, & Thompson, 2010). Specifically, the lowest order ratio 1:1 (and hence the faster component) was associated with vertical hand and torso movements, while the most complex ratio identified, 1:4, was associated with lateral flexion of the torso and rotation of the upper torso. If participants exhibit various modes of coordination when performing an intentional synchronization task with music, spontaneous synchronization might benefit from the adaptation of the stimulus to these modes. Moreover, tapping at a higher order ratio seems to push the limits of human ability. Research has shown that the rate limit of intentional beat tapping (i.e., period between two taps) at a 1:4 ratio (Repp, 2003) is lower than when tapping at a 1:1 ratio (Peters 1989; Repp 2005a). People benefit from the grouping of the tones at faster tempi. However, it is not known whether this grouping mechanism is also observed in unintentional synchronization, and in general, the limitations on, and the parameters that facilitate the stability of multiple-ratio dynamics in spontaneous auditory–motor synchronization remain to be identified.

To address this issue, we investigated whether finger tapping movements spontaneously entrain to three different modes of coordination. As tapping movements have been suggested to be more attracted by a periodic external stimulus than other movements (Peckel et al., 2014), our study focused on multiple-ratio dynamic between the auditory stimulus and the finger pace. After an evaluation of their finger tapping preferred frequency (PF), participants were asked to produce tapping movements at their preferred tempo while an auditory metronome was presented at a frequency ratio of 1:1, 1:2 or 1:3 (i.e., 1 cycle of movement PF for 1/2/3 beat of auditory stimulus). Manipulating the frequency ratio of the auditory stimulus allowed us to measure the dynamics of the three corresponding modes of coordination between movement and metronome (i.e., 1:1, 1:2 and 1:3 modes belonging to the first, second and third levels of the Farey tree). We hypothesized that participants’ tapping movement would be spontaneously influenced and exhibit different degrees of synchronization in the three different frequency ratio conditions. We expected that according to their position in the Farey tree (Cvitanovic, Shraiman, & Söderberg, 1985; Hardy & Wright, 1979), the spontaneous synchronization exhibited would be stronger for 1:1 ratio compared with 1:2 and 1:3 ratios and for 1:2 ratio compared with 1:3 ratio.

A parameter that might enhance the stability of spontaneous synchronization to 1:2 and 1:3 ratios, and more complex frequency ratios in general, is the metrical structure of the stimulus. An important cue to metrical structure in music is accentuation, which occurs when an acoustic event is perceived to be more salient than its surrounding context (Bouwer, Burgoyne, Odijk, Honing, & Grahn, 2018; Dawe, Platt & Racine, 1993, 1995; Ellis & Jones, 2009; London, 2012; Palmer & Krumhansl, 1990). Accentuation may arise from the physical structure of the stimulus (i.e., phenomenal accents, Lerdahl & Jackendoff, 1983) and/or from internal processes involved in a listener’s perception of the stimulus (i.e., subjective rhythmization, Bolton, 1894; Bååth, 2015). Phenomenal accents such as modulations of pitch, timbre, duration or intensity are mainly used for expression purposes but can also play a role in the marking of the metrical structure of the stimulus (Large & Kolen, 1994). Research that tested the influence of the nature of accentuation on beat perception, highlighted a strong benefit of intensity accents for musicians and non-musicians (Bouwer et al., 2018; Drake, 1993; Grahn & Rowe, 2009). These intensity variations were also associated with a facilitation of intentional synchronization (Etani, Miura, Okano, Shinya, & Kudo, 2019, Repp 2005b). Repp (2005b) reported beneficial effects on participants’ performance at on-beat tapping when accentuation was coherent with target tap timing position. Similarly, the combination of an explicit metrical structure supported by a simple accented pattern that is coherent with a multiple frequency ratio, may push the boundaries of spontaneous synchronization dynamics and lead to the stabilization of naturally unstable frequency ratios. In addition, accentuation is assumed to reinforce the unintentional grouping of the auditory stimulus and could simplify the process of matching each tapping cycle and a group of auditory beats (i.e., the frequency ratio relationship).

To test the influence of accentuation on spontaneous synchronization at different frequency ratios, we presented each isochronous auditory stimulus with three different patterns of accentuation in the current experiment: unaccented, binary accented (i.e., one accent every two beats, as in a march), and ternary accented (i.e., one accent every three beats, as in a waltz). This accentuation manipulation can affect two specific components: the frequency ratio between movement and the accented stimulus, as proposed above, and the frequency closeness of the perceived stimulus to PF. Indeed, PF is hypothesized to be a powerful attractor of synchronization because of its relationship to attentional processes. The dynamic attending theory (Drake, Jones, & Baruch, 2000; Jones, 1976) assumes that attention can be modeled by an intrinsic rhythm (oscillation) influenced by other periodic information, such as the PF of our movements. In this context, tapping at a certain PF would tend to favor the perception of the stimulus at the corresponding pace. If spontaneous synchronization is observed in response to PF closeness only, the benefit in terms of occurrence and stability should be equivalent in the three conditions: binary accented 1:2 ratio, ternary accented 1:3 ratio, and unaccented 1:1 ratio, the accentuation pattern reinforcing an equivalent relationship (i.e., 1:1) between the movement oscillator and the auditory stimulus oscillator. If spontaneous synchronization is not (or not only) sensitive to PF closeness but also to the frequency ratio, we should observe different responses to the accentuation patterns across matching conditions. The unaccented 1:1 ratio would favor ratio emergence and stability in comparison to the 1:2 binary accented ratio, and to the 1:3 ternary accented ratio. Conversely, the mismatch between accentuation pattern and frequency ratio (i.e., ternary accented 1:2 frequency ratio, or binary accented 1:3 frequency ratio) should result in the perception of a more complex frequency ratio, and should decrease ratio emergence and/or stability compared to unaccented or coherently accented stimuli.

Material and methods

Participants

Twenty-two right-handed participants (11 males and 11 females) with a mean age of 26.39 (SD = 5.39) years were tested in this experiment. Based on a priori power analysis using G × Power software (Faul, Erdfelder, Lang, & Buchner, 2007), to capture within-subject main effects of medium sizes (f = 0.25) with a minimum 80% power, 9 measurements, nonsphericity correction ε = 1, and correlation coefficient ρ = 0.5, a minimum sample of 15 participants was required. Participants had no history of hearing impairment. They provided written informed consent prior to the experiment. The experiment protocol was approved by the Institutional Review Board of the EuroMov Research & Innovation Centre at Montpellier University (N°IRB-EM: 1702B).

Apparatus

Participants were seated in a chair with their forearm resting on a platform configured to perform an air-tapping task, i.e., flexion–extension of the right index finger. As illustrated in Fig. 2a, a platform with a cut-out (inspired by Studenka, Zelaznik, & Balasubramaniam, 2012), a box covering the platform and the participant’s forearm, and the sensor attached to the index finger with its wire fixed to a small mast were used to avoid any auditory, visual or haptic feedback of the tapping movement. Finger oscillations were recorded at a 240-Hz sampling rate using a Polhemus Liberty motion tracker (Polhemus Ltd., VT, USA).

Fig. 2
figure 2

a Position of participants for the air-tapping task, with the forearm resting on a platform. Finger movements were occluded from sight and did not generate auditory or haptic sensory feedbacks. b Structure of the nine conditions, each of 1 min duration, with a frequency equivalent to a 1:1, 1:2 or 1:3 ratio relative to the participant’s preferred movement frequency (PF), with the three accentuation patterns: unaccented, binary accented and ternary accented

The auditory metronome used in the experiment was generated using MATLAB software (The MathWorks, Inc., Natick, Massachusetts, United States), a relaxation oscillator (NE555), an analog multiplexer (MAX4619), a digital analog card (National Instruments USB-6501), and delivered via headphones. The auditory stimuli were 210-Hz pure tones of 50 ms duration presented at a comfortable intensity kept constant for all participants. Intensity accentuation was created by simultaneous presentation of two 210-Hz pure tones.

Procedure

In two sessions 1 day apart, participants were asked to produce air-tapping with their finger at the most comfortable frequency, following the instruction: “try to adopt a tempo that you could maintain for a long period of time without fatigue”. Each session began with three trials of 60 s air-tapping without any stimulus, to measure participant’s preferred frequency (PF). The PF was then used to generate the following nine stimulus sequences, involving three different frequencies—multiples of the PF corresponding to frequency ratios between the average inter-beat interval and the different inter auditory stimulus onsets intervals: 1:1, 1:2 and 1:3 (i.e., PF × 1, PF × 2 and PF × 3). Each frequency condition was associated with three specific patterns of accentuation: unaccented, binary accented (one beat accented every two beats) and ternary accented (one beat accented every three beats). Each trial began with a period of silence, randomly fixed between 5 and 10 s to avoid expectation of the auditory stimulus, followed by the 60 s stimulus period. Participants were instructed to stay relaxed and to maintain their preferred movement tempo even when auditory stimuli were presented, however they were not asked to ignore it. These nine conditions were presented in a random order, twice per session, resulting in a total of 4 trials per condition. The duration of each session was 45 min.

Data analysis

Position time-series of the vertical axis of movement were extracted for each participant, centered on zero and band-pass filtered with 0.1 and 10 Hz cut-off frequencies. The first 5–10 s of tapping in silence were discarded to eliminate transient behavior. Movement peaks (flexion and extension) were determined using the peakdet function that identifies lower and higher points in local areas defined by a peak threshold (Billauer, 2012). The timing positions associated with maximal finger flexion were then used for the computation of movement frequency, amplitude, and the Index of Stability (IS) (Zelic, Varoqui, Kim, & Davis, 2017, see below for details), which allowed the occurrence and stability of the synchronization to be assessed. We also calculated the asynchronies of taps relative to the events (i.e., all events vs. accented events) in the stimulus sequences, using both maximal flexion and extension timing positions.

Frequency and amplitude

Stability of the participants’ PFs was assessed by analyzing the mean frequency of the three test trials performed at the start of each session. The variability of each participant’s movement frequency within trials was estimated by computing the coefficient of variation (COV = SD/Mean  × 100), which allowed the assessment of the frequency variability independently of between-subject differences in mean values. Mean frequency and COV of each test trial were submitted to a 3 by 2 repeated-measures ANOVA with Trial number (1–3) and Session number (1 and 2) as within-subject factors.

Effects of the stimulus on tapping stability were determined by analyzing the mean and COV of participants’ movement frequency and amplitude. Averaged values per condition per participant were submitted to a repeated-measures ANOVA with two within-subject factors (3  × 3): Frequency ratio (1:1 vs. 1:2 vs. 1:3) and Accentuation pattern (Unaccented vs. Binary accented vs. Ternary accented).

Average asynchronies

General modulation of the tapping timing-position relative to the stimulus was quantified by computing the absolute mean time between the maximum flexion and extension of each tapping cycle and the nearest stimulus event. The distance between two auditory stimulus cycles differed across the three conditions (faster stimuli have a shorter time interval between two successive stimuli) and between participants (differing participant PFs lead to different stimulus frequencies), which could affect the asynchronies between the movement peak and nearest stimulus (i.e., the faster the frequency, the smaller the asynchrony). To compare these intrapersonal and interpersonal differences we normalized the data by the period of the auditory stimulus (i.e., 1/frequency).

To ensure that the observed synchronization periods were due to the presence of the stimulus and not to chance, asynchronies were also computed on the PF trials in which there was no auditory stimulus presented. The set of flexion and extension timing positions of the PF trials was compared three times: (1) to the timing positions of the (artificially superimposed) 1:1 frequency ratio stimulus; (2) to the timing positions of the (artificially superimposed) 1:2 frequency ratio stimulus, and (3) to the timing positions of the (artificially superimposed) 1:3 frequency ratio stimulus. This procedure allowed the assessment of the degree of synchrony that could occur per chance, and controlled for effects related to differences in intrinsic movement variability between participants.

Mean absolute asynchronies and their COV were submitted to a 2-way (3 × 4) repeated-measures ANOVA with Frequency ratio (1:1 vs. 1:2 vs. 1:3) and Type of stimulus (No stimulus vs. Unaccented vs. Binary accented vs. Ternary accented) as within-subject factors. As the absolute asynchronies’ analyses do not take into account the position of the accents but only the timing position of every auditory cycle (which is the same in each of the No stimulus, Unaccented vs. Binary accented vs. Ternary accented), we combined the Presence of the stimulus and the kind of Accentuation into the Type of stimulus factor.

Accent-related analyses

The effect of attraction of the accents was estimated by quantifying the degree to which finger flexion and extension peaks occurred closer to accented stimuli compared to unaccented stimuli. For each flexion and extension peak we identified the position of the previous and following beat in the auditory stimulus temporal structure. We selected the closest of the two and extracted whether it was accented or unaccented. We then computed the number of peaks close to an accent divided by the total number of peaks per trial. We also distinguished flexion vs. extension peaks by estimating the percentage of flexions among the peaks close to accents. The percentages of peaks closer to the accent, and flexions closer to the accents were averaged for each condition.

To provide a baseline for these two measures, conditions without accentuation (No stimulus and unaccented conditions) were also analyzed. For each flexion and extension peak in these two conditions we identified the closest beat of (1) the Binary accented stimulus structure, (2) the Ternary accented stimulus structure, for each Frequency ratio. These control measures allowed us to differentiate if an effect can be related to the presence of the accentuation (Accented vs. Unaccented) or to the presence of the stimulus (Stimulus vs. No Stimulus).

The percentage of peaks closer to accents and percentage of flexions among them were analyzed in a 3-way (3 × 3 × 2) repeated-measures ANOVA with Frequency ratio (1:1 vs. 1:2 vs. 1:3), Type of stimulus (No stimulus vs. Unaccented vs. Accented) and Pattern of accentuation (Binary vs. Ternary) as within-subject factors.

Index of stability analysis

To examine the occurrence and stability of movement synchronization for different frequency ratios, we used a recent method developed by Zelic et al., (2017), the Index of Stability analysis. This method measures the occurrence and stability of the different possible modes of coordination between movements and metronomic time-series with three different outputs. The first one identifies the local modes of relationship (i.e., ratio of integers) belonging to the Farey tree between the two time-series that promotes an overall maximum stability through the trial. For each time onset of both time-series a “real quotient” is computed, contrasting the timing relation between participant’s movement and auditory metronome. The time-series of this “real quotient” is then transformed into series of closest “rational ratios” belonging to the n level of the Farey tree (n defined by leveldef as explained below). We will refer to this measured relationship as the mode of coordination. The other two—the percentage of occurrence (i.e., proportion of the different modes) and their Index of Stability (IS)—depict the stability of each mode. More precisely, IS quantifies the fluctuation of the mode-related relative phase angles. Based on the “rational ratios”, it estimates the relative phase time-series of the two oscillators and their relation, to later quantify the circular dispersion using the mean resultant vector length (for more details, please refer to the S3 Appendix of Zelic et al., 2017), via a value between 0 and 1, with 0 indicating an absence of stable relationship between the two oscillatory cycles (here the movement and the stimulus), and 1 indicating perfect synchrony. This analysis considered the first 10 levels of the Farey tree (leveldef = 10), as it is the recommended standard for higher resolution (i.e., to avoid an increased amount of noise being assigned to levels of interest). By using the optional variable KR = x, the stability of the synchrony between the two oscillators can be tested for only x dominant modes. However, because spontaneous synchronization is transient, synchronization to a stable ratio cannot be expected in 100% of the trials. To increase the resolution of the analysis and ensure that noise is not assigned to a given mode of coordination, we did not constrain the Index of Stability analysis to the consideration of one mode of coordination per trial. The KR parameter was set at KR = 0: no hypothesis was therefore made relative to the number of modes that can emerge between the participants’ movement and the metronome.

IS per trial

The stability of the interaction between each participant’s movement and metronome time-series was first considered at the whole trial level. The mean IS per trial is assessed by calculating the mean resultant vector length of the dispersion of the relative phase time-series, considering the modes of coordination produced locally (for more details, please refer to the S3 Appendix of Zelic et al., 2017). For each condition, the mean IS of the four trials per condition were submitted to a 2-way (3 × 3) repeated-measures ANOVA with the factors Frequency ratio (1:1 vs. 1:2 vs. 1:3) and Accentuation pattern (Unaccented vs. Binary accented vs. Ternary accented).

IS per level of the Farey tree

To detect the emergence of stable ratio relations in the presence of the stimulus and differences in the distribution of ratios between the different stimulus conditions, we focused on the occurrence of ratios belongings to the first 5 levels of the Farey tree (lower levels represented less than 10% of occurrence). For each level and trial, the sum of the percentage of occurrence of the modes (n cycle of movement/n cycle of stimulus) belonging to the given level was calculated (e.g., 1:2 and 2:1 for level 2).

The percentage of occurrence per level of the Farey tree was submitted to a 3-way (3 × 3 × 5) repeated-measures ANOVA testing for effects of Frequency ratio (1:1 vs. 1:2 vs. 1:3), Accentuation pattern (Unaccented vs. Binary accented vs. Ternary accented) and Levels of the Farey tree (Level 1 vs. Level 2 vs. Level 3 vs. Level 4 vs. Level 5).

IS per mode of coordination

Based on the results of the IS per level of Farey tree analysis, we extracted the IS and the percentage of occurrence of the main modes of coordination for each condition: IS of the 1:1 mode for the 1:1 frequency ratio, IS of the 1:2 mode for the 1:2 frequency ratio and IS of the 1:3 mode for the 1:3 frequency ratio. When considering IS per mode, the score was calculated based on the return-map technique to take into account any phase transitions inside the same mode (Zelic et al., 2017).

The IS and the percentage of occurrence of the main modes of coordination were submitted to a 2-way (3 × 3) repeated-measures ANOVA considering Mode of coordination (1:1 vs. 1:2 vs. 1:3) and Accentuation pattern (Unaccented vs. Binary accented vs. Ternary accented) as factors.

In presence of a significant departure from sphericity (evaluated by Mauchly tests) in repeated-measures ANOVAs, the Greenhouse–Geisser correction was applied to the p values and degrees of freedom. When necessary, pairwise contrasts were used to examine the significant effects with Bonferroni adjustment for multiple comparisons.

Results

Movement frequency and amplitude

Participants’ mean preferred frequency, displayed in Fig. 3a, was 1.52 Hz (SD = 0.54, min = 0.59 Hz, max = 2.72 Hz), in line with data reported in the tapping literature (e.g., Large & Grondin, 2008). The ANOVA on the mean frequency of the test trials did not reveal any effect of Trial number, F(1.08,22.77) = 2.51, MSE = 0.008, p = 0.13, \(\eta_{\text{G}}^{2}\) =0.0005, Session, F(1,21) = 0.93, MSE = 0.16, p = 0.35, \(\eta_{\text{G}}^{2}\) =0.004, nor interaction between the two, F(1.51,31.61) = 0.96, MSE = 0.002, p = 0.37, \(\eta_{\text{G}}^{2}\) <0.0001. The COV of the frequencies (mean = 6.28, SD = 2.07, min = 3.03, max = 17.01) was not influenced by Trial number, F(1.63,34.18) = 1.39, MSE = 2.65, p = 0.26, \(\eta_{\text{G}}^{2}\) = 0.01, Session, F(1,21) = 0.96, MSE = 6.17, p = 0.34, \(\eta_{\text{G}}^{2}\)=0.01, or interaction of the two factors, F(1.53,32.16) = 0.28, MSE = 2.86, p = 0.70, \(\eta_{\text{G}}^{2}\)=0.002. These results show that the self-paced movement produced by participants was stable across trials and sessions.

Fig. 3
figure 3

a Density distribution of participants’ preferred frequencies produced during the three first testing trials of each session (n = 132, bandwidth = 0.19). b Participants’ mean frequencies as a function of the frequency ratio and the pattern of accentuation of the auditory stimulus. c Participants’ absolute normalized mean asynchronies as a function of the pattern of accentuation of the auditory stimulus. Error bars represent Cousineau–Morey within-participant 95% confidence intervals. Asterisks represent *p < 0.05, **p < 0.01

As illustrated in Fig. 3b, the movement frequency of participants tended to increase across the nine conditions with presence of the auditory stimulus, with this increase being greatest in magnitude in the 1:1 condition. The ANOVA conducted on the mean frequency data highlighted a significant main effect of Frequency ratio, F(1.35,28.36) = 6.41, MSE = 0.016, p = 0.01, \(\eta_{\text{G}}^{2}\)=0.11. There was no significant effect of Accentuation pattern, F(1.93,40.43) = 2.21, MSE = 0.007, p = 0.12, \(\eta_{\text{G}}^{2}\)=0.028, or interaction between the two factors, F(2.21,46.39) = 1.57, MSE = 0.008, p = 0.22, \(\eta_{\text{G}}^{2}\)=0.023. Pairwise comparisons with Bonferroni correction (3 comparisons in total) revealed a significant difference in mean frequency between the 1:1 ratio and both 1:2 (t(42) = 2.81, p = 0.02) and 1:3 (t(42) = 3.32, p = 0.006) ratios.

The ANOVA on the COV of movement frequency revealed no significant effects of Frequency ratio F(1.55,32.52) = 0.44, MSE = 3.40, p = 0.60, \(\eta_{\text{G}}^{2}\)=0.003, Accentuation F(1.81,37.93) = 1.09, MSE = 1.96, p = 0.35, \(\eta_{\text{G}}^{2}\) = 0.005, or interaction between the two factors F(2.85,59.84) = 0.29, MSE = 2.23, p = 0.82, \(\eta_{\text{G}}^{2}\)=0.002. Moreover, the ANOVAs on the mean amplitude and its COV did not reveal any effects of the stimulus conditions. For mean amplitude: effect of Frequency ratio, F(1.92,40.32) = 0.20, MSE = 0.045, p = 0.82, \(\eta_{\text{G}}^{2}\)< 0.001, effect of Accentuation pattern, F(1.82,38.2) = 0.33, MSE = 0.029, p = 0.72, \(\eta_{\text{G}}^{2}\)< 0.001, and their interaction, F(2.78,58.42) = 1.00, MSE = 0.056, p = 0.40, \(\eta_{\text{G}}^{2}\) < 0.001. For COV of the amplitude: effect of Frequency ratio, F(1.89,39.66) = 0.04, MSE = 4.14, p = 0.96, \(\eta_{\text{G}}^{2}\) < 0.001, effect of Accentuation pattern, F(1.53,32.11) = 1.24, MSE = 2.91, p = 0.29, \(\eta_{\text{G}}^{2}\) = 0.002, and their interaction, F(3.15,66.24) = 1.72, MSE = 2.61, p = 0.15, \(\eta_{\text{G}}^{2}\)= 0.005.

Average asynchronies

The ANOVA on the overall mean asynchronies indicated a significant main effect of Type of stimulus, F(2.61,54.8) = 3.26, MSE < 0.001, p = 0.03, \(\eta_{\text{G}}^{2}\)= 0.023 (No stimulus vs. Unaccented vs. Binary accented vs. Ternary accented). As observed in Fig. 3c and with pairwise comparisons with Bonferroni correction (6 comparisons in total), the presence of the stimulus decreased the asynchronies compared to the No stimulus condition, particularly in the case of ternary accented t(63) = 2.79, p = 0.04. The ANOVA indicated no main effect of Frequency ratio, F(1.27,26.64) = 3.87, MSE = 0.001, p = 0.05, \(\eta_{\text{G}}^{2}\)= 0.066, nor interaction between Frequency ratio and Type of stimulus, F(2.93,61.61) = 1.83, MSE < 0.001, p = 0.15, \(\eta_{\text{G}}^{2}\)= 0.027.

The ANOVA on the COV of the overall asynchronies per condition revealed no difference related to the stimulus properties: effect of Frequency ratio, F(1.71,35.98) = 2.57, MSE = 17.98, p = 0.09, \(\eta_{\text{G}}^{2}\)= 0.025, effect of Accentuation pattern, F(2.35,49.36) = 1.15, MSE = 15.51, p = 0.33, \(\eta_{\text{G}}^{2}\)= 0.013, and their interaction, F(4.22,13.68) = 1.34, MSE = 13.68, p = 0.26, \(\eta_{\text{G}}^{2}\) = 0.024.

Accent-related analyses

Movement synchrony with accents

When considering how participants’ movements were distributed around the accents by examining the percentage of extension and flexion peaks close to an accented stimulus, a modulation was observed in only two conditions, as seen in Fig. 4a: 1:2 frequency ratio with binary accentuation and 1:3 frequency ratio with ternary accentuation. The ANOVA on these data indicated main effects of Frequency ratio, F(1.82,38.2) = 3.86, MSE =13.35, p = 0.03, \(\eta_{\text{G}}^{2}\)= 0.019, Type of stimulus (i.e., No stimulus vs. Unaccented vs Accented) F(1.06,22.18) = 6.25, MSE =42.71, p = 0.02, \(\eta_{\text{G}}^{2}\) = 0.054, and Pattern of accentuation (Binary accented vs Ternary accented), F(1,21) = 6648.33, MSE = 4.15, p < 0.0001, η2G  = 0.85. The interaction between these three factors was also significant, F(1.21,25.4) = 4.88, MSE = 54.13, p = 0.03, \(\eta_{\text{G}}^{2}\)= 0.061. Pairwise comparisons with Bonferroni correction (153 comparisons in total), indicated a significant difference with the 1:3 ratio for the ternary pattern compared to no accentuation, t(84) = 4.82, p < 0.001, and compared to the no stimulus condition, t(84) = 5.21, p < 0.001, showing that the presence of the stimulus and the accent pattern spontaneously induced a new distribution in participants’ movements by bringing them closer to the accent timing position.

Fig. 4
figure 4

a Percentage of movement peaks around the accents and b the corresponding proportion of flexion, as a function of the Frequency ratio and the Pattern of accentuation in the stimulus. Error bars represent Cousineau–Morey within-participant 95% confidence intervals. Asterisks represent * p < 0.05, **p < 0.01, ***p < 0.001

Nature of the movement synchronized to the accents

Figure 4b shows that the shift of distribution of movements around the accent positions was associated to a greater degree with flexion than extension of the finger. The ANOVA on the percentage of flexion movements close to accents revealed an effect of Type of stimulus, F(1.47,30.91) = 4.79, MSE =29.98, p = 0.02, \(\eta_{\text{G}}^{2}\) = 0.025. Pairwise comparisons with Bonferroni correction (3 comparisons in total) yielded a significant difference between the accented condition and the unaccented condition, t(42) = 2.70, p = 0.03, and the no stimulus condition, t(42) = 2.66, p = 0.03. In presence of accented stimuli (binary or ternary), there were significantly more flexions close to the accents compared to the two other conditions. The ANOVA did not reveal effects of the factors Frequency ratio F(1.81,38.09) = 2.28, MSE =22.82, p = 0.11, \(\eta_{\text{G}}^{2}\) = 0.002, Pattern of accentuation F(1,21) = 0.95, MSE = 29.98, p = 0.34, \(\eta_{\text{G}}^{2}\) = 0.002, nor any interaction between the three factors (all p values > 0.05).

Index of stability analysis

Mean IS per condition

The ANOVA on mean IS per condition did not reveal any significant effects (Frequency ratio, F(1.66,34.81) = 1.07, MSE =0.011, p = 0.34, \(\eta_{\text{G}}^{2}\)= 0.012, Accentuation pattern, F(1.71,35.97) = 0.21, MSE = 0.002, p = 0.81, \(\eta_{\text{G}}^{2}\)< 0.001, and their interaction, F(3.38,70.93) = 2.08, MSE =0.002, p = 0.09, \(\eta_{\text{G}}^{2}\)= 1.009), indicating that the overall stability of the relationship between participants’ movements did not differ reliably when taking into account the Accentuation pattern and the Frequency ratio of the stimuli.

Percentage of occurrence per level of the Farey tree

Figure 5a illustrates how binary and ternary patterns of accentuation affected the occurrence of the mode of coordination belonging to the second and third levels of the Farey tree. The ANOVA indicated a significant main effect of Frequency ratio, F(1.88,39.43) = 10.76, MSE =0.005, p < 0.001, \(\eta_{\text{G}}^{2}\) = 0.006, and Levels of the Farey tree, F(1.66,34.76) = 14.05, MSE =0.096, p < 0.0001, \(\eta_{\text{G}}^{2}\)= 0.11. It also indicated a significant interaction between Frequency ratio, Accentuation and Levels of the Farey tree, F(5.04,105.91) = 3.54, MSE = 0.021, p = 0.005, \(\eta_{\text{G}}^{2}\) = 0.021.

Fig. 5
figure 5

a Percentage of occurrence of the modes of occurrence belonging to the 5 first levels of the Farey tree, averaged across Frequency ratios and Pattern of accentuation. b Percentage of occurrence and c index of Stability of the main mode of coordination for each Frequency ratio and Pattern of accentuation. Error bars represent Cousineau–Morey within-participant 95% confidence intervals. Asterisks represent *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001

Pairwise comparisons with Bonferroni correction (990 comparisons in total) revealed a significant difference at the 2nd level of the Farey tree (i.e., 1:2 and 2:1 modes of coordination) for the 1:2 frequency ratio stimulus, between the percentage of occurrence of the binary and the ternary accented pattern, t(336) = 4.53, p = 0.008. The binary accented pattern led to greater occurrence of modes of coordination at the second level of the Farey tree compared to the ternary accented one.

The ternary accented pattern 1:3 Frequency ratio and the 3rd level of the Frey tree was associated with a significant increase in occurrence compared to the unaccented, t(336) = 6.28, p < 0.0001, and the binary accented condition, t(336) = 7.16, p < 0.0001. The ternary accented pattern led to greater occurrence of the third level modes (i.e., 1:3, 2:3, 3:2 or 3:1) compared to the two other conditions.

Occurrence and IS of the main modes of coordination

Figure 5b represents the occurrence of the main mode of coordination for each Frequency ratio of the auditory stimuli (i.e., 1:1 mode of coordination for 1:1 Frequency ratio). The ANOVA on these data indicated a main effect of the Frequency ratio, F(1.52,31.87) = 31.03, MSE =0.048, p < 0.0001, \(\eta_{\text{G}}^{2}\) = 0.35, and a significant interaction between Frequency ratio and Accentuation pattern, F(2.35,49.37) = 8.39, MSE =0.030, p < 0.001, \(\eta_{\text{G}}^{2}\) = 0.12. Pairwise comparisons with Bonferroni correction (36 comparisons in total) yielded a significant difference for the 1:1 mode of coordination with any pattern of accentuation compared to most of the 1:2 and 1:3 modes of coordination—all t(84) values > 6.01 and p values < 0.0001 for each 1:1 Accentuation pattern (i.e., Unaccented, Binary and Ternary) vs. 1:2 Ternary accented, vs. 1:3 Unaccented and vs. 1:3 Binary accented; all t(84) values > 3.94 and p values < 0.01 for each 1:1 Accentuation pattern vs. 1:2 Unaccented, for 1:1 Unaccented vs. 1:2 Binary accented, and vs. 1:3 Ternary accented; t(84) = 3.48, p = 0.03 for 1:1 Binary accented pattern vs. the 1:2 Binary accented one.

These results indicated greater occurrence of the main modes of coordination in the presence of a 1:1 Frequency ratio stimulus with or without accentuation. There was also a significant difference between the 1:3 ternary accented mode of coordination (1:3 frequency ratio) and the unaccented, t(84) = 3.90, p = 0.007, and binary, t(84) = 4.47, p < 0.001, accented pattern of the same mode, showing a direct influence of the ternary accented pattern on the occurrence of the 1:3 mode of coordination. There was no effect of the Accentuation pattern, F(1.69,35.58) = 0.42, MSE =0.017, p = 0.66, \(\eta_{\text{G}}^{2}\)= 0.003.

The IS of these modes of coordination, represented in Fig. 5c, show a specific increase in the stability of the 1:3 mode of coordination performed with ternary accented stimulus compared to all the other stimuli, and of the 1:2 mode of coordination performed with binary accented stimulus compared to its counterpart unaccented and ternary accented stimuli. The ANOVA on the local IS of the main modes revealed no main effects of Frequency ratio F(1.35,28.31) = 2.25, MSE = 0.023, p = 0.14, \(\eta_{\text{G}}^{2}\) = 0.053, and Accentuation pattern F(1.71,35.98) = 1.52, MSE =0.005, p = 0.23, \(\eta_{\text{G}}^{2}\) = 0.010, but a significant interaction between them, F(2.97,62.3) = 2.72, MSE = 0.005, p = 0.03, \(\eta_{\text{G}}^{2}\) = 0.033. Pairwise comparisons with Bonferroni correction (36 comparisons in total) yielded a significant difference between the IS with the ternary accented stimulus at the 1:3 ratio with most of the other conditions (with 1:1 ratio unaccented t(84) = 3.33, p = 0.046, binary t(84) = 3.64, p = 0.02, ternary accented t(84) = 3.36, p = 0.04, and with 1:2 unaccented t(84) = 4.78, p < 0.001 and ternary accented t(84) = 3.85, p = 0.008).

Discussion

The present study was designed to investigate the dynamics of spontaneous auditory–motor synchronization at different frequency ratios and to examine the benefit of coherent vs. incoherent accentuation on the emergence and stability of these dynamics. Participants were asked to produce vertical movement oscillations with their index finger at their preferred frequency while an auditory metronome was presented at matching frequency ratios (PF \(\times 1\), \(2,\) or 3 for 1:1, 1:2 and 1:3 ratios) with different patterns of accentuation (unaccented, binary or ternary). Across the different conditions, we observed an increase in the occurrence and stability of three modes of coordination: 1:1, 1:2 and 1:3 (1 movement cycle for 1, 2 and 3 auditory stimulus cycles, respectively). We furthermore found that coherent conditions—a 1:2 frequency ratio with a binary accented pattern, and a 1:3 frequency ratio with a ternary accented pattern—facilitated synchronization compared to incoherent conditions—a 1:2 frequency ratio with a ternary accented pattern, and a 1:3 frequency ratio with a binary accented pattern. The patterns of accentuation applied to the 1:1 frequency ratio were not considered to be coherent or incoherent, as they did not affect synchronization at the 1:1 mode of coordination.

Independently of the pattern of accentuation, participants tended to increase their tapping frequency in the presence of auditory stimuli, more so in the presence of a 1:1 stimulus frequency compared to the two faster frequency ratios: 1:2 and 1:3. This observation could reflect some resistance to spontaneous synchronization by the participants, as is observed when they are explicitly instructed to avoid synchronization with the auditory stimulus (Repp, 2006). The 1:1 ratio relationship, being the strongest attractor and more likely to induce spontaneous synchronization (Bouvet et al., 2019), might have led participants to increase their movement frequency even more to be sufficiently far away from a 1:1 frequency relationship.

As expected, the presentation of the auditory stimulus led to changes in the participants’ tapping movements. One type of impact was a shift in the movement timing position peaks (i.e., flexion and extension), which were attracted toward the auditory stimulus timing position, as indicated by smaller asynchronies. This simple measure highlights a spontaneous effect of the auditory stimulus that was evident for all ratios and patterns of accentuation, and was amplified for the ternary accented 1:1 ratio.

Importantly, our results demonstrate a strong effect of the coherence between the pattern of accentuation and the frequency of the stimulus (i.e., binary accented 1:2 ratio and ternary accented 1:3 ratio). In these two conditions, attraction of movement flexion and extension within the tapping cycles towards the accents was observed. Furthermore, this effect was mostly driven by an increase in flexion towards the accents, which is in line with previous intentional synchronization studies that found that the flexion end-points of the tapping movement on a surface or in the air are preferentially synchronized with the beat (Repp & Su, 2013). This difference of placement of the movement turning points reflects an influence of accentuation on the kinematics of tapping. Moreover, this effect is limited to specific combinations of frequency ratio and pattern of accentuation (i.e., when the two are coherent).

Benefits associated with the two coherent conditions were also observed for the occurrence and the stability of spontaneous synchronization modes, as revealed by the IS analysis. Although the 1:1 ratio was the most frequently observed ratio independently of accentuation, in accordance with the Farey tree dynamics, its stability was rather low. However, the coherent conditions induced a benefit by increasing the occurrence of spontaneous synchronization and the stability of the tapping when compared to the same frequency ratio stimulus without or with incoherent accentuation. These differences might be explained by the closeness between participants’ movement frequency and the periodicity marked by the accentuation pattern in the coherent conditions. According to the Dynamic attending theory, participants’ attention would be directed to the PF carried by the coherent accentuation pattern, creating a stable 1:1 ratio relationship with their spontaneous tapping frequency. However, this explanation is nuanced by the different observed responses to the unaccented 1:1 ratio and to the two coherent patterns. Our results, to a small extent, could suggest a disadvantage of incoherent accentuation (e.g., ternary accented with a 1:2 frequency ratio) as expected. Nonetheless, it can be noted that we did not observe a difference between the unaccented and the non-coherent pattern of accentuation, showing that accents facilitated synchronization when they were compatible but did not degrade it when incompatible. This finding came as a surprise, as we expected the incoherent condition to serve as a distractor that attracts movement towards a more complex and so less stable frequency ratio inducing a different response to the three patterns of accentuation. The unexpected finding may be due to the intensity of the accents being sufficient to reinforce the grouping pattern favored by dynamic attending in the coherent conditions, but not distinct enough to induce distraction towards another pattern (e.g., 2:3 for binary accented 1:3 frequency ratio), thus resulting in incoherent patterns being perceived equivalently to the unaccented ones.

Our most interesting finding is that, despite a relatively small probability of emergence, the 1:3 ratio was highly stabilized by the ternary pattern of accentuation and reached a superior stability level compared to the two other ratios. It thus appears that spontaneous synchronization at more complex frequency ratios, even when less frequent in the human rhythmic repertoire, can be stabilized by a coherent (ternary, in this case) pattern of accentuation. The presence of a faster pace auditory stimulus might decrease the stability of natural (1:1) tapping, allowing the emergence of more complex synchronization modes that are stabilized by coherent accentuation.

The 1:2 mode of coordination also benefited from coherent (binary) accentuation, but to a smaller extent. It can be questioned why the 1:3 coherent condition supported the stability of spontaneous auditory–motor synchronization more than the 1:2 coherent condition, as this observation does not match the dynamics of the 1:2 and 1:3 modes described by the Farey tree, nor corroborate with an equivalent 1:1 coordination between participants’ movement and coherent accentuation patterns. Dynamic attending theory and Farey tree dynamics cannot independently account for our findings, but rather seem to interact in explaining the processes underlying the auditory–motor coordination exhibited by our participants in the different conditions. Listeners have a natural tendency to perceive a binary or a ternary metrical structure in auditory sequences, even if they are physically isochronous and unaccented (Abecasis, Brochard, Del Río, Dufour & Ortiz, 2009; Brochard, Abecasis, Potter, Ragot & Drake, 2003; Temperley, 1963). However, listeners usually have production and perception preferences for the binary subdivision over the ternary one (Collier and Wright, 1995; Fraisse, 1956; Povel, 1981). In our case, the binary accented pattern might have strengthened participants’ natural tendency to decompose the stimulus into two-beat groups whereas the ternary pattern might have strengthened participants’ perception of the beat according to a three-beat grouping. To begin with, it is well described that faster tempi increase the grouping of auditory stimulus into larger units (Bolton, 1894; Fraisse, 1956; Repp, Windsor & Desain, 2002). Consequently, the faster tempo of the 1:3 frequency ratio, compared to the 1:2 frequency ratio, might have played a role in encouraging ternary grouping. Furthermore, the structure of the ternary accented pattern compared to the binary one involves more contrast between the units of the stimulus, and hence increases the salience of the accentuation. More specifically, binary structure involves a sequence of an accented beat followed by an unaccented beat and so on. In contrast, ternary structure involves one accent followed by two unaccented beats before the repetition of another accent. This repetition creates a stimulus-specific adaptation of the auditory perception system that is interrupted by the ‘surprising’ subsequent accent, which may release the system from adaption (Antunes & Malmierca, 2014). This phenomenon is known to favor contrast perception, i.e., in difference in term of intensity or frequency of the sound (Nozaradan, Mouraux, & Cousineau, 2017). These two mechanisms (tempo and accent salience) taken together might have modulated the spontaneous grouping of the auditory stimulus sequences by our participants, facilitating the emergence of spontaneous synchronization with a 1:3 frequency ratio more than the two other accentuation patterns (i.e., unaccented and binary accented) and the 1:2 frequency ratio.

A potential limitation of this study is that the experimental design did not allow a clear-cut distinction between the contribution of frequency closeness to participants’ PFs and the emergence of actual frequency ratios in the spontaneous auditory–motor synchronization. Furthermore, our observations may be limited to conditions that are strongly encouraging spontaneous synchronization, and one can wonder about the generalization of the benefits of accentuation for stimulus frequency ratios that are not perfect multiples of an individual’s preferred movement tempo. Finally, the large number of statistical analyses performed on the same data sample, even with the appropriate statistical corrections for multiple comparisons, invites further studies to confirm and better understand the benefits of accentuation for rhythmic motor performance.

To conclude, the results of the current study indicate that spontaneous auditory–motor synchronization is not restricted to a stimulus frequency close to an individual’s preferred frequency but extends to frequency ratios of the second and third levels of the Farey tree. Our results also demonstrate that simple and coherent patterns of accentuation increase the occurrence and stability of these more complex frequency ratios. These findings encourage future research to focus on how auditory rhythmic stimulation can promote the emergence and stability of a range of spontaneously adopted modes of coordination. For instance, interventions with auditory cueing without instruction might be a promising tool for physical training and rehabilitation, such as running improvement (Bood, Njissen, Van Der Kamp & Roerdink, 2013; Pfleiderer, Steidl-Müller, Schiltges & Raschner, 2019) or gait improvement for patients with Parkinson’s disease (e.g., Benoit et al., 2014; Cochen de Cock et al., 2018; Dalla Bella, Dotov, Bardy, & Cochen de Cock, 2018). Contrary to finger tapping, gait or running require precise control and coordination of the entire body, involving numerous body segments moving in several directions and at different frequencies. The potential to use auditory sequences with coherent accentuation patterns opens new avenues to stabilize and improve such more complex human rhythmic behaviors through spontaneous synchronization, in sport and clinical contexts.