Introduction

Most voluntary actions, such as peeling an orange, engage multiple senses ranging from touch, vision and proprioception to sound and smell. Although the senses seldom exist as isolated entities, the manner in which they form unitary, coherent percepts is not well understood. In recent years, however, multisensory integration or multimodal binding, has become a prominent focus of studies of perception (Bushara et al. 2003; Calvert 2001; Calvert et al. 2004; Driver and Spence 2000; Meredith 2002; Meredith and Stein 1983; Stein 1998). The view that the senses are extensively interconnected has gained momentum due to detailed descriptions of multimodal neurons in single cell recording studies (Meredith et al. 1987; Meredith and Stein 1983, 1986; Stein and Meredith 1993; Stein et al. 1988), as well as the discovery of early interactions between primary sensory areas in the brain (Dehner et al. 2004; Foxe et al. 2000, 2002; Fu et al. 2003; Molhom et al. 2002; Murray et al. 2005). Multimodal integration may no longer be considered the mere result of a late fusion of the senses in so-called associative regions of the brain, but rather as a dynamical organization emerging from the coupling of otherwise segregated sensory processing areas (Hummel and Gerloff 2005; Calvert et al. 2004). This understanding of multisensory integration echoes current theories of brain functioning that emphasize the role of the large scale activity of the brain (Bressler and Kelso 2001; Edelman and Tononi 2000; Haken 1996; Kelso 1995; Nunez et al. 2001; Varela et al. 2001).

To date, most of our knowledge about multimodal integration in humans has been gained by manipulating the temporal and spatial congruency of sensory stimuli. When two stimuli from different sensory modalities are perceived as a single event, reaction time has been shown to be faster and orientation behavior facilitated (Bernstein 1970; Dunlap 1910; Hershenson 1962; Nickerson 1973; Raab 1962; Todd 1912). When the stimuli are perceived as distinct events, however, a simple reaction to one of the stimuli often proves slower. In the present study we investigated the coordination dynamics of perception and action to better understand the factors governing the assembly, maintenance and breakup of multimodal coordination. We performed a parametric study of the binding of movement with tactile and auditory senses with the latter clearly separated in both space and time.

Not much is known about the organization of movement in harmony with multimodal stimuli such as sound and touch despite prevailing views that multisensory integration is intimately connected to movement (Fogassi and Gallese 2004; Graziano and Gross 1995; Graziano et al. 2000; Jeka and Lackner 1994; Jeka et al. 1997; Lloyd et al. 2003; Stein and Meredith 1993; Shore et al. 2002). As far as audio-tactile interactions are concerned, the available evidence on multimodal binding is restricted to the study of perception (Bresciani et al. 2005; Gobbelé et al. 2003; Guest et al. 2002; Jousmäki and Hari 1998; Lam et al. 1999; Lütkenhöner et al. 2002; Spence et al. 1998).

The theoretical framework of coordination dynamics (e.g., Beek et al. 2002; Bressler and Kelso 2001; Carson and Kelso 2004; Jirsa and Kelso 2004a; Kelso et al. 1990, 1992; Turvey 1990, 2004) has allowed breakthrough contributions to the understanding of perception and action relationships (Byblow et al. 1994; DeGuzman and Kelso 1991; Haken et al. 1996; Kelso et al. 1990; Kelso and DeGuzman 1988; Peper et al. 1995; Schöner and Kelso 1988; Stins and Michaels 1999; Swinnen et al. 1993; Tuller and Kelso 1989; Wimmers et al. 1992; Zanone and Kelso 1992) and appears well-suited to explore and understand multisensory integration. In the paradigmatic case of synchronizing and syncopating finger movement with a rhythmic auditory stimulus (Kelso et al. 1990), the scaling of stimulus frequency allowed the discovery of abrupt, qualitative changes in the phase relations between the movement and the stimulus. At low frequencies, auditory and motor components were stably coordinated either in phase (synchronization) or anti-phase (syncopation). However, systematic increases in frequency drove the coordination pattern from syncopation to synchronization through an abrupt transition, a behavior that is accompanied by a dramatic reorganization of brain activity (Daffertshoffer et al. 2000; Frank et al. 2000; Fuchs et al. 1992, 2000a, 2000b; Kelso et al. 1991, 1992; Mayville et al. 1999; Meyer-Lindenberg et al. 2002; Wallenstein et al. 1995). Such discontinuous changes are reminiscent of (non equilibrium) phase transitions, in that the growth of instability causes behavioral and brain patterns to switch from one coordinated state to another. This kind of switching between pattern generators corresponds to a kind of dynamic decision-making (see Yuste et al. 2005; Kelso 1995 for reviews).

A second type of transition was also observed in the Kelso et al. study (1990), namely from coherent patterns defined by stable phase relations to the loss of frequency locking and phase synchrony. This last régime revealed that despite being mutually coupled, auditory and motor components display a tendency to become independent, essentially following their own intrinsic properties. This feature was incorporated into the HKB coordination law developed for bimanual coordination by introducing a symmetry breaking term representing intrinsic differences between the spontaneous frequency of movement and the frequency of the stimulus (Kelso et al. 1990). According to coordination dynamics, frequency- and phase-locking reflect the interdependence or integration of individual coordinating elements such as neuronal populations. Loss of frequency- and phase-locking, on the other hand, is indicative of independence or segregation among individual coordinating components (Kelso 1995). Operationally, such tendencies are measured by phase “wrapping”. In the latter, the subject is no longer able to maintain a one to one relation with the stimuli, a drifting of the phases of the movement and the stimuli is observed, and the relative phase “wraps” in the interval {−π; π} radians (Kelso et al. 1990). Note also that in between stable phase locking and total independence among the components a more subtle “metastable” regime exists that reflects the coexistence of both integration and segregation processes (DeGuzman and Kelso 1992; Kelso 1991, 2001). Metastable coordination dynamics (Bressler and Kelso 2001; Friston 1997, 2000; Kelso 1991, 2001) is characterized by partially coordinated tendencies (strictly speaking stable coordination states no longer exist) in which individual coordinating elements are neither completely independent of each other (“locally segregated”) nor fully linked in a fixed mutual relationship (“globally integrated”). Metastability is hypothesized to arise as a result of changes in the dynamic balance between the coupling among neural ensembles (mediated, typically by reciprocal pathways in the brain) and the expression of each individual neural ensemble’s intrinsic properties (typically heterogeneous in nature; see Jirsa and Kelso 2000). Such transient, metastable coordination has been embraced in a number of recent syntheses (e.g., Edelman 2004; Freeman and Holmes 2005; Koch 2004) as a new principle of brain organization (Fingelkurts and Fingelkurts 2004; Varela et al. 2001).

The behavioral coordination between perception and action exhibits additional features pertinent to the present study. For example, when a metronome is present, the variability of amplitude and relative phase is lowered at points in the movement trajectory related to specific stimuli (Beek 1989; Byblow et al. 1994; Carson 1995; Kelso et al. 1991), an effect referred to as “anchoring”. The effect of a double auditory stimulus (i.e., subjects synchronizing both flexion and extension with a sound) has proved to stabilize coordination under conditions in which it would otherwise have become unstable (Fink et al. 2000; Jirsa et al. 2000). When sound and touch coincide as when subjects coordinate flexion or extension with both an auditory stimulus and a physical stop (Kelso et al. 2001) the resultant multimodal coordination exhibits higher stability (less variability) than coordinating with sound alone. Moreover, regardless of whether the subject produces flexion or extension on the sound, transitions occur such that sound, movement and active touch are integrated as a coherent unit. These results indicate that such multimodal integration may override the documented preference for synchronizing flexion over extension to sensory stimuli (Byblow et al. 1994; Carson 1996, 2004; Carson and Riek 1998; Carson and Kelso 2004).

The goal of the present study is employ a parametric manipulation in order to illuminate the factors determining the stability and breakdown of multimodal integration. We explore the hypothesis that the stability of multimodal coordination is influenced by preferential relationships between specific features of movement (flexion and extension) and specific sensory modalities (sound and touch). To test this hypothesis, we replaced the active contact used in Kelso et al. (2001) by a vibro-tactile mechanical stimulus. This change in the experimental set up enabled us to investigate new combinations of movement and multimodal stimuli. The main experimental task was to flex on touch and extend on sound (and vice-versa). According to our hypothesis, by increasing the stimulus rate, instability growth and phase transitions should select out the most stable multimodal coordination pattern, thereby identifying which combination of movement and modality is favored by the Central Nervous System. A second purpose of the study was to investigate the extent to which ordered phase relations between movement and multimodal stimuli were maintained relative to the tendency of the components to separate according to their intrinsic dynamics. Adopting the parametric approach of coordination dynamics enables us to identify the conditions under which failures of multimodal ‘binding’ occur and the factors that cause them.

Methods

Participants

All experimental protocols received full approval from the IRB of Florida Atlantic University. Seven, self-declared right-handed volunteers (one female and six males aged between 23 and 35 years) from the university population took part each giving their informed consent before participating in the study.

Apparatus

Participants were seated in front of the apparatus, with the height of the chair adjusted to permit the forearms to rest horizontally. An adjustable support around the subject’s forearm restrained movements of the wrist and digits. The right finger was inserted in a sleeve that pivoted around an axis in a way that restricted movement to the plane defined by the metacarpo-phalangeal joint. Motion of the index finger was picked up by a potentiometer and sampled at 256 Hz by an ODAU analog-digital converter connected to an Optotrak 3010 system. Auditory signals (trains of 80 ms, sine wave pulses, carrier frequency 500 Hz) were sent via a digital to analog card to large headphones worn by the participants. Vibrotactile stimuli (trains of 80 ms sine wave pulses, carrier frequency 300 Hz) were delivered to the tip of the right thumb using a custom-built electromagnetic device. The frequency of tactile stimuli was chosen to match the eigenfrequency of the electromagnetic device. A delay smaller than 2 ms between the electrical signal and the onset of the motion of the vibro-tactile stimulator was measured in a pilot study by measuring the response of a photosensitive chip to the deviation of a laser beam projected onto the vibrating metallic part. Using a similar protocol, the frequency of vibration of the electromagnetic device when used in the same condition as in the experiment reproduced accurately the electrical signal frequency sent to it. Pilot experiments also allowed us to equate subjective intensities of the auditory and tactile stimuli. To isolate participants from external noise, the headphones were tightly attached and adjusted on each participant’s head thereby eliminating any sound emitted by the vibrotactile stimulator.

Procedure

All the conditions were run with the participants’ eyes closed. In the key multimodal conditions, participants were instructed to synchronize peak flexion (extremum of position) of the index finger with the vibrotactile stimulus, and peak extension with the sound (or vice-versa). Tactile and auditory stimuli were delivered anti-phase to each other (see Fig. 1). In control conditions, participants were asked to synchronize either peak flexion or extension of the index finger in three different conditions: touch alone, sound alone, and sound and touch delivered simultaneously. Experimental conditions are summarized in Table 1. On each trial, the frequency of the stimuli was increased from 1.0 to 3.5 Hz in steps of 0.25 Hz, every 12 cycles. After explaining the different conditions, participants were instructed to do their best to synchronize exactly with the stimuli, and to make sure to produce one movement on every stimulus. They were also told that if they felt the initial pattern change to stay synchronized 1:1 with the stimuli in whatever pattern was most comfortable. These instructions were repeated three times during the experiment in order to encourage participants to sustain attention to the task. Three trials were recorded for each condition for a total of n=21. Between 30 s and 1 min of rest was provided between each trial.

Fig. 1
figure 1

Illustration of the experimental paradigm. A time series of finger position along with the onsets of touch (black dots and solid line) and sound stimuli (white dots and dashed line) in the Flex on Touch and Extend on Sound pattern. The sinusoidal solid line represents the time evolution of the position; the square waves represent the stimuli

Table 1. Percentages of transitions for the eight experimental conditions

Data processing and analysis

The first two movement cycles of each frequency plateau were removed in order to discard the transient effect due to frequency change. After detection of the local minima and maxima in the time series of finger position and the stimuli, a point estimate of the relative phase (φ) was calculated (Kelso 1984). Analogous to a Poincaré section, the point estimate is probed periodically at the time of onset of the stimulus and expresses the latency between matching events of the two time series (Δt) relative to the current cycle duration of the stimulus events (T) : φ=2π×Δt/T. In the multimodal condition, two relative phases were calculated, one for each synchronizing point (flexion and extension). The time difference, Δt, between synchronization points of finger motion and stimulus onsets was also calculated from the local extrema. According to convention a negative Δt indicates that the finger leads the stimulus; conversely a positive Δt indicates that the finger lags the stimulus. As in the case of relative phase, two values of Δt were calculated for the multimodal conditions, one for each stimulus. The standard deviation of the relative phase, calculated for each subject from the time series of relative phase for a given frequency plateau, was used as a metric for the stability of multimodal coordination (Kelso et al. 1986, 1987). Analysis of relative phase was performed using circular statistics (Batschelet 1981), transformed to suit the use of inferential tests based on standard normal theory (Mardia 1972). Analysis of variance (ANOVA with Huynh-Feldt corrected degrees of freedom) was applied only on data that preceded a transition, i.e., on the first eight plateaus, ranging from 1.0 Hz to 2.75 Hz. In this parameter range the relative phase is distributed uniformly around a single peak. One subject displayed early transitions and was discarded from this ANOVA. Despite the presence of finite frequency and phase shifts, most of the excursions of the relative phase away from 0° were confined within a ±60° limit. Accordingly, the pattern of coordination was identified as flexion or extension on the respective sound and touch stimuli, or as wrapping as follows: (1) A relative phase of flexion closer to zero than a relative phase of extension on four consecutive cycles was identified as flex on the stimulus; (2) A relative phase of extension closer to zero than a relative phase of flexion on four consecutive cycles was identified as extend on the stimulus; (3) All other patterns were classified as phase wrapping. This allowed us to classify the multimodal patterns either as Flex on Touch and Extend on Sound or as Flex on Sound and Extend on Touch. The number of trials that exhibited switches from the initial pattern to a different synchronization pattern corresponded to the number of switch transitions (see Fig. 2a, b). Transitions from the initial pattern to wrapping corresponded to wrapping transitions (see Fig. 2c). We recorded also the number of trials displaying a wrapping transition that followed a switch transition (Fig. 2d), and the number of switches back to the initial pattern that followed a switch transition. In order to decide whether a given coordination epoch belonged to a phase-locked pattern or to an epoch of phase wrapping, two indices of stationarity of the relative phase were calculated. First, the circular standard deviation (angular deviation) of the relative phase was calculated in a sliding window of three consecutive points (for illustration, see Fig. 3b). Secondly the first time derivative of the relative phase was averaged in a sliding window of three consecutive points (see Fig. 3c). The same analyses were performed on all the control conditions. Differences in the number of transitions in a given condition were tested for significance using χ2 tests.

Fig. 2
figure 2

Types of changes in multimodal coordination are shown from sample time series in four different subjects. The relative phase between Flexion onset and onset of Touch stimuli, and between Extension onset and onset of Sound stimuli, in the Flex on Touch-Extend on Sound condition are presented as frequency increases in time. a An initially stable pattern looses stability at 2.5 Hz, falls into the alternate multimodal pattern and finally slowly “wraps” at 3.5 Hz. b An early switch to the alternate multimodal pattern occurs at 1.75 Hz, the new pattern is temporally destabilized at 2.75 Hz, retrieves its stability with a shifted relative phase and starts to drift at 3.5 Hz. c An initially stable pattern looses its stability at 2.0 Hz and wraps at a stimulation of 2.5 Hz. d An initially stable pattern looses its stability at 1.5 Hz and is again phase and frequency locked 10 s later but in the alternate pattern. The new pattern looses it stability and wraps rapidly at a stimulation of 3.0 Hz

Fig. 3
figure 3

Indices of local stationarity used to classify changes in multimodal coordination. a The relative phase at flexion and extension corresponding to the trial presented in Fig. 2d is shown. b The SD of relative phase computed in a sliding window of three points. c The average of the first time derivative of the relative phase computed in a sliding window

Results

Phase locking in multimodal coordination

Participants were able to maintain a multimodal pattern consistently across a range of frequencies. As shown in the distributions presented in Fig. 4, phase locking centered close to a relative phase of 0° was successfully established for both multimodal conditions, for both flexion and extension. However, as frequency was increased, the shape of the distributions in both multimodal conditions departed more and more from a single peaked distribution. In particular, a change in the relative phase distribution for the Flex on Touch and Extend on Sound condition was noticeable between 2.75 Hz and 3.0 Hz (Fig. 4a), a first hint that transitions from the initial multimodal pattern occur (see next section). Notice also that no such qualitative changes in the distribution of the relative phase were observed in the Flex on Sound and Extend on Touch coordination pattern (compare panels a and b in Fig. 4). For better visualization, differences between the distributions of the relative phase of the two multimodal patterns are emphasized in panel c of Fig. 4. A Kuiper testFootnote 1, which is the circular version of the classical Kolmogorov-Smirnov test (Batschelet 1981; Kuiper 1960), confirmed that the phase distribution in the Flex on Touch-Extend on Sound condition differed significantly from a uniform distribution at 3.0 Hz (P<0.001). At a frequency of 3.25 Hz the significance level of the Kuiper test for the same conditions dropped to P<0.05. No significant difference from randomness was found at a stimulus frequency of 3.5 Hz.

Fig. 4
figure 4

Distributions of relative phases for the two anti-phase multimodal conditions across all frequency plateaus. The distributions for the Flex on touch and Extend on sound pattern (a) and for Flex on sound and Extend on touch pattern (b) are presented. For each multimodal pattern the relative phases are shown at both flexion (first row) and extension (second row). 180 data points were included in each histogram; bin size was 20°. For better visualization of supplementary peaks in the least stable condition, that was Flex on sound and Extend on touch, distributions of the two multimodal conditions are displayed for plateaus 2.75, 3.0, and 3.25 Hz (c)

Inspection of the phase portraits on the position-velocity plane showed that for both multimodal counter-phase conditions, the trajectories spent more time at extrema (Fig. 5c) whereas in unimodal (Fig. 5a) and simultaneous conditions (Fig. 5b) such anchoring was observed only at the side opposite the synchronizing point. The phase portraits for multimodal conditions (Fig. 5c) nicely illustrate the presence of an effective coupling between finger motion and both sensory modalities, a result that resembles the effect obtained for bimanual coordination when movement is driven by two auditory stimuli presented counterphase (Fink et al. 2000).

Fig. 5
figure 5

Representative epochs of phase plane trajectories for unimodal (control) and anti-phase multimodal conditions. The onsets of the stimuli delivered at a frequency of 1.25 Hz are represented by dots superimposed on the trajectories, white for auditory and black for tactile. For touch plus sound presented simultaneously, onsets are shown by white dots. The data displayed were filtered by a low pass fourth order Butterworth filter with a cut off frequency of 8 Hz

Transitions in multimodal coordination

A summary of the percentages of trials (n=21 for each cell) that exhibit transitions is presented in Table 1. The eight conditions include the two multimodal counterphase conditions and the six control conditions. No systematic differences were observed between the three repetitions of the trial within a given condition. For control conditions (numbered 3 to 8 in Table 1) more switching occurred for extension (49%) than flexion (19%), a result confirmed by χ2 test (χ 21 =8.39, P<0.05). This finding may reflect the known preference for flexion over extension (Carson 1996; Carson and Riek 1998). Here, however, we show that this preference generalizes across different sensory modalities (auditory, tactile, and simultaneous).

Which coordination patterns were most stable? Viewed with respect to the percentage of switches, a χ2 test confirmed that modality influenced the stability of coordination (χ 23 =9.25, P<0.05). Multimodal (1) and tactile conditions (2) switched on 50 and 54% of trials compared to auditory (3) and simultaneous conditions (4), which switched only 21 and 26% of the time. It is worth noting that simultaneous touch and sound did not seem to enhance coordinative stability any more than sound alone. This result was also apparent in the distribution of the relative phase across frequency plateaus for control conditions (not shown).

Figure 6 illustrates a transition from the Flex on Touch-Extend on Sound pattern to the Flex on Sound-Extend on Touch pattern. Phase plane trajectories and associated movement time series for one trial are displayed as a function of frequency (in Hz). The onsets of tactile and auditory stimuli (filled and unfilled dots respectively) indicate that the multimodal pattern is successfully established at the beginning of the trial and is maintained across a range of frequencies. During the plateau at 2.25 Hz the pattern appears to be perturbed, but stabilizes again on the next plateau, the shifted relative phases suggesting a tendency to couple peak velocity with the stimuli. Transient behavior is observed at 2.75 Hz and on the next plateau (3.0 Hz) actions and modalities have switched places, flexion now synchronizing with sound and extension with touch. At the two highest frequencies, multimodal coordination appears to lose coherence altogether, the phases wandering around the circle. Statistical analysis bears this picture out. Flex on touch and extend on sound switched more to flex on sound and extend on touch (71%) than vice-versa (29%). This difference between multimodal conditions was confirmed by χ2 test (χ 21 =3.86, P<0.05).

Fig. 6
figure 6

Phase plane trajectories and associated time series illustrating a transition from one multimodal pattern ( Flex on touch and extend on sound) to another ( Flex on sound and extend on touch). One full trial is displayed across frequency plateaus ranging from 1.25 Hz (top row left) to 3.5 Hz (bottom row right). The onsets of tactile (black dots) and auditory (white dots) stimuli indicate that the multimodal pattern is successfully established at the beginning of the trial and maintained across a range of frequencies. During the plateau at 2.25 Hz the pattern appears perturbed but stabilizes again on the next plateau, the shifted relative phases possibly indicating that peak velocity is coordinated with the stimuli. At 2.75 Hz the initial multimodal pattern becomes unstable and on the next plateau (3.0 Hz) actions and modalities have switched places. This newly ‘bound’ multimodal pattern looses coherence on the last two frequency plateaus

Loss of binding in multimodal coordination

Loss of binding takes the form of a second kind of transition from multimodal patterns directly to phase wrapping (see Figs. 2c, 7a, which illustrate this for the Flex on Touch and Extend on Sound pattern). In the two key experimental conditions, more direct transitions to wrapping were associated with Flex on sound and Extend on touch (29%) than the alternative pattern (14%). However this difference did not reach significance (χ 21 =0.2, P>0.05). A clue to what is going on may be gleaned from the mean critical frequency which was usually higher for wrapping than for pure switches (see fourth and sixth columns in Table 1). Patterns that switch later, i.e., at higher frequencies, tend to transit directly to wrapping. Those that switch earlier, on the other hand, tend to transit to another pattern before wrapping. Thus, more stable patterns such as flex on sound and extend on touch tend to lose coordination completely at high frequencies whereas less stable patterns such as extension on touch switch to an alternative pattern before losing coherence.

Fig. 7
figure 7

a Transition from the initially prepared multimodal pattern Flex on touch and Extend on sound to a loss of coherence. b Transition from the same initial pattern as in (a) to the alternate multimodal pattern, followed by a switch back to the initial pattern, and finally to a loss of binding. On the left panels time series of finger position and stimulus onsets (touch in black and sound in white) are shown. Right panels display the corresponding relative phase for both flex on touch (black) and extend on sound (white). The convention used for this plot is a relative phase between flexion and touch of 0° and 180° between extension and sound. Solid lines superimposed to the data indicate epochs of local stationarity as illustrated in Fig. 3

The relatively small number of direct wrapping transitions is complemented by a final interesting feature, namely loss of phase locking after a first switch. Considered as percentages, the number of trials first exhibiting a switch followed by a second instability into wrapping was 60 and 65% in the two multimodal conditions (see Table 1). Inspection of epochs of the time series of position and relative phase (Fig. 7b) reveals that the switch from the initially prepared pattern (Flex on touch and Extend on sound) is occasionally followed by switching back and forth between the two patterns prior to a transition to phase wrapping. Interestingly, such switching back and forth occurred more often in the least stable multimodal condition, namely Flex on touch and Extend on sound (60% of total trials exhibiting a switch) than in the alternative pattern (33%). Although further research is needed to examine this feature of multimodal coordination in more detail, such switching back and forth may be regarded as the expression of a broken symmetry, by analogy to the switches back and forth between the two stable states derived from the HKB model (Haken et al. 1985; Fuchs and Jirsa 2000).

Stability of multimodal coordination

An index of stability, the circular SD of the relative phase, was calculated for each subject from the time series of relative phase for a given frequency plateau, and then averaged across trials (Fig. 8a). Analysis of variance was then conducted to test differences in stability between combinations of modality and action. In order to analyze all combinations of modality and action, we distinguished five “modalities” (three unimodal conditions (sound, tactile, sound and tactile simultaneous) and two multimodal antiphase conditions) and two actions (flexion vs. extension). The resulting 5×2×8 (modality × action × frequency) ANOVA of the standard deviation of relative phase, showed a main effect of modality (F 4,20=5.33, P<0.005), action (F 1,5=9.57, P<0.05) and frequency (F 7,35=12.2, P<0.0005). The interaction of modality × action was also significant (F 4,20=3.68, P<0.05). Pairwise contrasts revealing significant differences are shown in Table 2 and strengthen the hypothesized role of action and modality in multimodal coordination dynamics. First, stability at the flexion point for flex on Sound and extend on touch (fSet) was greater (less variable) than its alternative pattern, flex on Touch and extend on sound (fTes). Second, the multimodal pattern flex on Touch and extend on sound (fTes) was less stable (more variable) than flexion in the simultaneous condition (fTS). Synchronizing on extension in the former pattern (fteS) proved less stable than unimodal extension to sound (eS).

Fig. 8
figure 8

a Mean standard deviation in deg. of relative phase and b mean delta t in ms. at synchronization points for the multimodal and control conditions: f denotes flexion and e extension; ftes denotes the multimodal condition flex on touch and extend on sound, fset denotes the multimodal condition flex on sound and extend on touch. Capital letters specify the modality used for computing the relative phase. For the control conditions, t denotes touch, s sound, and ts simultaneous touch and sound. The error bars represent the between subjects standard deviation

Table 2. Pair wise comparison of mean SD of relative phase

Timing errors in multimodal coordination

Figure 8b shows the time difference (Δt) between action and stimuli for all experimental conditions. A negative (positive) Δt indicates that the finger leads (lags) the stimulus. For unimodal stimuli, finger motion systematically lagged tactile and led auditory stimuli. In contrast, for multimodal conditions mean timing error was centered around zero. Differences between modalities were confirmed by a 5×2×8 (modality × action × frequency) ANOVA which revealed a significant effect of modality (F 4,20=10.1, P<0.0005) and action (F 1,5=8.44, P<0.05). Although timing errors are difficult to interpret, the shift in timing error from unimodal to multimodal conditions suggests that the most stable multimodal pattern, flex on sound and extend on touch reflects a balance or a compromise between its action and modality components, namely an overshoot for flex on sound and an undershoot for extend on touch. These shifts in flexion and extension are of opposite direction and of approximately equal magnitude, whereas this is not the case for the least stable pattern.

Discussion

Multistability and phase transitions in multimodal coordination

Over the last couple of decades the theoretical framework of coordination dynamics has aimed to establish laws, principles and mechanisms of biological coordination at neural, behavioral and social levels (e.g., see recent contributions in Jirsa and Kelso 2004a). Experimentally, parametric scaling of control parameters has been systematically employed as a means to identify key coordination variables or order parameters. Somewhat ironically, instabilities or bifurcations—places where transitions occur—have been shown to play a crucial role in identifying relevant coordination variables and their dynamics (stability, change, etc., Haken 1996; Kelso 1995).

In the present research, the problem of multimodal integration and segregation—how the senses and movement work together or not—is treated fundamentally as a coordination problem. By focusing on the relative roles of action and sensory modality we aimed to better understand the factors governing the stability of multimodal coordination. We show that the binding of movement, touch and sound is preferentially affected by both the type of action and the sensory modality to which action is coupled. When participants are instructed to coordinate finger movement with touch and sound presented anti-phase to each other, a clear cut phase-locking between movement and stimuli is observed. This result demonstrates that despite the separation in space and time of the two sensory stimuli, stable coordination of movement, touch and sound may be successfully established within a couple of cycles of movement. Such coherent organization between movement and the two sensory modalities is maintained over a range of rates or frequencies. More importantly, we have shown that phase transitions from one multimodal pattern to another may occur. At a critical value of the pacing rate, the action to be synchronized with touch and the action to be synchronized with sound abruptly switch places. These transitions appear to be less rigidly determined than the phase transitions found in bimanual coordination (Kelso 1984). In particular, the most stable pattern preferentially chosen via a switch (flex on sound and extend on touch) persisted briefly, only to be followed by the loss of multimodal binding (see next section). In addition, stable phase-locked patterns were occasionally interspersed with “wrapping” epochs consisting of slow and fast phase drift, analogous to the relative phase dynamics reported and modeled by Kelso et al. (1990) for the case of synchronization and syncopation in unimodal conditions.

According to the predictions of coordination dynamics (Kelso 1995; Kelso et al. 1987; Schöner et al. 1986), one should observe an increase in variability—indicative of loss of stability-in the vicinity of a transition regardless of whether said transition takes the form of switching or “wrapping.” Consistent with prediction, rapid change from the least stable multimodal pattern, flex on touch and extend on sound, to the alternate multimodal pattern, flex on sound extend on touch occurred as the variability of the collective variable, the relative phase between movement and stimuli, increased with the scaling of frequency. Coordination dynamics also predicts that when the frequency parameter is varied slowly enough, switching between patterns may occur before loss of stability can actually be observed, due to the presence of noise that “kicks” coordination out of a given pattern (Kelso et al. 1986, 1987; Schöner et al. 1986; Schöner et al. 1990). Additionally, if the control parameter is scaled down and up on different trials, hysteresis should be observed: changes between stable multimodal patterns or between “wrapping” and stable multimodal patterns should take place at distinct values of the frequency, depending on the direction (up vs. down) of parameter change. Such predictions about multimodal coordination are fully operational and testable in further experiments. In the present study all switching scenarios typically occurred abruptly, within a couple of cycles only, shorter than the duration of the individual plateau (12 cycles). These included changes from the initial stable pattern to a new stable pattern; from the initial pattern to wrapping; from an interspersed epoch of “wrapping” to a new pattern; and or finally from a new stable pattern following a switch to “wrapping”.

The presence of transitions and the various forms they take constitutes evidence for multistability in multimodal coordination. Multistability is an expression of the flexibility of multimodal integration, and could be regarded as the dynamical hallmark of crossmodal matching (Meredith 2002; Murray and Mishkin 1985), intermodal invariance (Gibson 1966) or intersensory equivalence (Lewkowitz 2000). By showing that sensory rearrangement is shaped by the stability of particular relationships between action and modality, our results extend these notions. We found that switches toward Flexion on sound and Extension on touch far outnumbered transitions in the other direction. From the perspective of coordination dynamics this means that the former pattern, having been preferentially ‘selected out’ via the mechanism of instability, proves to be the most stable combination. Additional evidence for the greater stability of Flexion on sound and Extension on touch comes from the relative phase analysis. Lower variability for the Flexion on sound and Extension on touch pattern over its multimodal counterpart indicates a stronger resistance to inherent stochastic forces and attests to its greater stability (cf., Schöner et al. 1986). These results are a further indication that understanding goal-directed behavior in a ubiquitous multisensory environment rests on considering the stability of the relationship between perception and action (Katsumata et al. 2003; Kelso et al. 1990).

Significance of transitions for understanding breakdowns in multimodal binding

A prominent feature of the present results was transitions from phase synchrony in multimodal coordination to a phase wrapping régime in which the component subsystems become independent. Typically, such transitions occurred more frequently in multimodal than in unimodal conditions. The onset of wrapping between the phases of movement and stimuli constitutes an indicator of the loss of entrainment between the parts and a destabilization of multimodal coordination. That is, by scaling an appropriate control parameter, multimodal integration undergoes a transition that drives the coordination from sustained binding to its breakdown. In the parlance of dynamical systems, this phenomenon corresponds to a saddle node bifurcation which is a generic mechanism for the formation or disappearance of a stable stationary solution as a control parameter is varied. Although not investigated in detail here, much previous work shows that the shift from coherent phase locking to phase drift between movement and stimuli likely originates from a broken symmetry, reflecting differences between the intrinsic properties of the coordinating elements (Kelso et al. 1990; Kelso 1995; Schmidt et al. 1993; Turvey and Schmidt 1994). When dealing with synchronizing two different sensory modalities, the asymmetry may also arise due to time delays in neural transmission (Jirsa and Kelso 2004b), typically modeled by time delay couplings.

Time delays and multimodal coordination dynamics

For unimodal conditions, we found that finger movement precedes both sound and simultaneous sound and touch, but lags touch. The fact that the finger leads an auditory stimulus has been observed long ago (Woodrow 1932), and still remains the focus of active investigation (Aschersleben 2002; Engström et al. 1996; Ishida and Sawada 2004). However, differences in delta(t) between auditory and tactile modalities have not been discussed so far. The presence of a finite delta(t) in the form of so-called phase shifts is characteristic of nonlinearly coupled oscillators (Guckenheimer and Holmes 1990; Haken 1983). A phase shift can be explained by differences between the frequency of the stimulus and the spontaneous frequency of the movement (see Kelso et al. 1990). Theoretically, the shift is related to the stability of the pattern in the fully symmetric case (Haken et al. 1985; Tass 1995), which explains the empirical observation in studies of unimanual and bimanual coordination that the phase shift for an anti-phase pattern is often larger than for an in-phase pattern (Turvey and Schmidt 1994 for review). At the brain level, synchronizing with a sound is known to engage a large network of distributed brain areas including the superior temporal gyrus (Mayville et al. 2002; Jantzen et al. 2004). One may assume that a similar network is involved when synchronizing with touch rather than sound, save that auditory processing areas are replaced by primary somatosensory areas.

The modality specific delta(t) could originate in the intrinsic dynamics specific to auditory and somatosensory areas, eventually reflected by differences in the frequency bands at which the respective active areas in the brain oscillate (Chen et al. 2003). Moreover, recent developments in the study of large scale brain networks suggest that, modality effects may also depend on the strength of the coupling between motor and sensory areas, and/or the directionality of this coupling (see Brovelli et al. 2004; Hummel and Gerloff 2005 for an illustration). A further, albeit complementary line of reasoning is to take into account the time delays peculiar to each sensory modality (Dhamala et al. 2003). Empirically, time delays may be inferred from reaction times. As an illustration, the optimal intersensory facilitation effect in a reaction time task between vibrotactile stimuli delivered to the foot and auditory stimuli was found when the stimulus onset asynchrony (SOA) placed touch before sound by an interval ranging from 30 to 70 ms, depending on the intensity of the stimuli (Diederich and Colonius 2004). In addition, numerical simulations indicate that delta(t) can be modulated by varying the time delay in the coupling of the movement to the stimulus (Chen et al. 1997; Ishida and Sawada 2004). In the present work, we found that the action also influenced delta(t). Hence, not only the conduction time to the primary sensory areas has to be taken into account—about 50 ms for the earliest large somatosensory evoked response (Cheyne et al. 1998; Hamalainen et al. 1990); a value that falls into the middle latency (P1/ P1m) range for auditory evoked potentials, the largest evoked response for the latter occurring 100 ms after the stimulus (Picton et al. 1974)—but also the particular interactions between sensory and motor brain areas involved in the timing of movement.

For multimodal conditions in which touch and sound were antiphase, we found that timing errors were shifted toward zero. Interestingly, these shifts in multimodal patterns for flexion and extension relative to unimodal conditions were of opposite sign only for the most stable multimodal pattern. Accordingly, stability may be increased when the two couplings to the anti-phase stimuli “pull” and “push” the movement with similar strength, thereby introducing a kind of symmetry constraint. Note however that our data show that the tendency for symmetry in multimodal timing errors coexists with differences in the stability of relative phase at synchronization points. It seems likely therefore that direct neural interactions between somatosensory and auditory areas during multimodal coordination, and changes in coupling between motor and sensory areas could explain differences in stability between multimodal patterns and loss of coherence.

More research is needed to understand the connection between time delays and multimodal coordination. Several experimental investigations on the effect of feedback time delays have shown that coordination is destabilized, mainly via a transition from stationary to oscillatory relations between stimulus and movement (Beuter et al. 1989; Finney and Warren 2002; Glass et al. 1988; Langenberg et al. 1998; Miall et al. 1986; Tass et al. 1996; Vercher and Gauthier 1992; but see Fujisaki et al. 2004). On the theoretical side, a growing literature demonstrates that stable solutions for synchronization exist despite time delay couplings (Yeung and Strogatz 1999). In specific cases time delay may actually increase the span of stable synchrony (Dhamala et al. 2004), a prediction worth testing in future studies of multimodal coordination.

Conclusion

Adopting the strategy of coordination dynamics we inquired how the senses and movement are bound together and how this ensemble evolves as the non-specific parameter of frequency is varied. This approach, which one might term multimodal coordination dynamics, provides new results that reveal a blend of coherence and flexibility in the cooperation between the senses and movement. We were able to create reproducible experimental conditions for the onset of binding of sound, touch, and movement; sudden changes in this assembly; and the passage from well-defined multimodal coordination to loss of coherence among coordinating elements. Importantly, we also provided evidence for preferred tendencies in multimodal coordination, namely that sound, touch and movement self-assemble into favored combinations, the most dominant one being Flexion on sound and Extend on touch. The current research rests heavily on the study of simultaneous stimuli, leaving the issue of the brain’s adaptation to non-simultaneous multimodal stimuli largely untouched (Meredith 2002). Nevertheless, the paradigm of coordinating movement with counter-phased stimulus modalities provides clear operational measures of binding and its complementary aspect, the degree of independence between participating subsystems. This opens the way for further developments, in particular a focus on cortical dynamics (Bressler and Kelso 2001; Kelso et al. 1992) in order to investigate whether adjustments in functional connectivity between areas of the brain can be related to the stability of coordination when multiple senses and movement are combined.