Introduction

When we synchronize finger taps with regular events, such as isochronously presented sounds or light flashes, the taps tend to precede the pacing stimuli by a few tens of milliseconds [i.e. negative mean asynchrony (NMA)]. NMA is a well-established phenomenon, first reported by Dunlap (1910), and confirmed repeatedly in later studies (Aschersleben, 2002; Aschersleben & Prinz, 1995; Fraisse, 1980; Mates, Radil, & Pöppel, 1992). Still, the mechanism underlying NMA is not fully understood (Repp, 2005, for a review of several explanations).

In sensorimotor synchronization (SMS) taps and pacing stimuli are perceived as synchronized if their internal codes (i.e. auditory/visual incoming from the stimuli and kinesthetic-tactile incoming from the taps) coincide in time at a central representational level (Aschersleben, 2002). The amount of time needed for code generation depends on the nature of the code. Generating the kinesthetic-tactile code is thought to require more processing time than the auditory or visual stimulus codes. For these codes to coincide at the central level, therefore, the taps have to precede the pacing stimuli by approximately the amount of difference between the processing times in the two afferent systems (Vos, Mates, & van Kruysbergen, 1995).

These assumptions underlie two major hypotheses accounting for NMA: the Paillard–Fraisse hypothesis and the sensory accumulator model (SAM). According to the Paillard–Fraisse hypothesis (Fraisse, 1980; Paillard, 1948), NMA is due to differences in nerve transmission times between the onsets of pacing stimuli and taps. For example, with auditory pacing stimuli, it takes less time for sensory information to travel from the ear to the brain (i.e. auditory stimuli takes 10–20 ms to reach the auditory cortex; e.g. Lütkenhöner et al., 2003) than from the fingertip to the brain (approximately 15–30 ms for a distance of 1 m, considering that nerve fibers carrying touch sensation transmit the signal at a speed of 35–75 m/s; e.g. Kandel, Schwartz, & Jessell, 2000). Therefore, in order for kinesthetic-tactile and auditory information to be synchronized at the central level, the tap has to precede the auditory pacing stimuli. Evidence in support of this hypothesis comes from a study by Aschersleben and Prinz (1995) who manipulated the effector (hand vs. foot), and the body side (left vs. right) in SMS tasks. The absolute asynchrony between the pacing stimuli and the taps was larger when the movement was executed with the foot than with the hand. No effect of effectors’ body side was obtained. Larger NMA in foot tapping than in finger tapping was replicated in subsequent studies (Billon, Semjen, Cole, & Gauthier, 1996; Stenneken, Aschersleben, Cole, & Prinz, 2002).

Other evidence in favor of the Paillard–Fraisse hypothesis comes from studies showing smaller NMA when auditory feedback is added to kinesthetic-tactile feedback than when kinesthetic-tactile feedback alone is provided (Aschersleben & Prinz, 1995; see also Mates & Aschersleben, 2000; Mates et al., 1992). It is noteworthy, however, that additional auditory feedback did not completely suppress NMA. To account for this finding, it was assumed that the two feedback sources (i.e. auditory and kinesthetic-tactile) enter into a multisensory code. Yet, kinesthetic-tactile information of the tap requires more time than the auditory feedback to reach the brain centers where the multisensory code resides.

However, there are findings that cannot be accounted for by the Paillard–Fraisse hypothesis. NMA is observed even when the afferent pathways from the pacing stimulus and from the tap have the same length (e.g. with tactile pacing stimuli instead of auditory stimuli; Kolers & Brewster, 1985).Footnote 1 Moreover, important individual differences (e.g. smaller NMA in musically trained individuals when compared with nonmusicians; see Aschersleben, Stenneken, Cole, & Prinz, 2002; Repp & Doggett, 2007) can hardly be explained in terms of variability in nerve transmission times. Another finding which is inconsistent with the Paillard–Fraisse hypothesis is that NMA increases with the increase of the inter-tap interval (Repp, 2003). Finally, kinematic factors instead of nerve transmission time may be responsible for the observed differences between foot and finger tapping (for a model of NMA based on movement kinematics, see Vaughan, Mattson, & Rosenbaum, 1998). In sum, there are indications that delays derived from nerve transmission times in afferent pathways may not be the only factor responsible for NMA. Central factors (i.e. the processing time needed to create a central representation of the pacing stimulus and of the tap) are also likely to play a role.

Based on this idea, Aschersleben and collaborators (Aschersleben, 2002; Aschersleben, Gehrke, & Prinz, 2001) proposed the SAM. According to the SAM, the afferent information from incoming events is accumulated over time at a central level until the evoked neural activity reaches a given criterion (i.e. the functional onset threshold). The threshold indicating functional onsets at the central level is assumed to be constant (i.e. having the same level of activation) for all incoming stimuli; however, the slope of the accumulation function varies depending on stimulus characteristics (Aschersleben, 2002; Aschersleben et al., 2001), such as the intensity of pacing stimuli or taps’ pressure force. The steepness of the accumulation function determines the time elapsed between the onset of an external event and its internal representation. The steeper the function, the shorter the processing time for the external stimulus to be represented at the central level. In SMS with auditory stimuli, the slope for auditory pacing stimuli is assumed to be steeper than for kinesthetic-tactile information from the taps. Therefore, for both sources of information to reach the functional onset threshold at the same time (i.e. to be perceived as synchronized), the tap has to precede the pacing stimulus. Note that it is likely that tactile information is more relevant than kinesthetic information in tapping. Indeed, it seems counterintuitive that kinesthetic information reaches the central level so late (and that it accumulates so slowly), considering that such information is available even before finger’s contact with the surface.

A way to test the SAM is by manipulating features of the pacing stimulus (e.g. modality, or stimulus duration) or of the kinesthetic-tactile feedback (for a review, see Aschersleben et al., 2002). Varying the intensity of pacing stimuli or of kinesthetic-tactile feedback is assumed to modify the slope of their accumulation function, thereby affecting NMA (provided that the accumulation function had not already reached maximum steepness). The intensity of afferent information from the tap was manipulated by Gehrke (1995) by asking participants to produce different finger amplitudes while tapping in synchrony with an auditory pacing signal. Larger finger amplitude is associated with increased pressure force at tapping time, which increases the slope of the accumulation functions for the tactile and the kinesthetic components of sensory stimulation. Greater finger amplitude was associated with smaller NMA, as predicted by the SAM. It is worth noting that reduced NMA in this case cannot result from differences in peripheral transmission times, as participants tapped with the same effector in all conditions (Aschersleben, Gehrke, & Prinz, 2004). Effects of other manipulations of somatosensory information on NMA have also been reported (Aschersleben et al., 2001, 2004; Gehrke, 1995; Mates & Aschersleben, 2000; Vos et al., 1995), thus lending further support to the SAM. Yet, note that SAM is unable to account for the effect of varying inter-tap-interval on NMA (Repp, 2003).

There is a paucity of studies on the effect of manipulating the intensity of the pacing stimulus (i.e. auditory or visual) on NMA. To our knowledge in only one study (Repp & Penel, 2004) pacing stimulus intensity was varied while investigating the effect of distractors (i.e. isochronous sequences) on synchronization with isochronous visual or auditory sequences. In this study, only two intensity levels were used for auditory stimuli (i.e. pacing stimuli or distractors) in an interference paradigm. Intensity manipulation did not affect the magnitude of the distractor effect, regardless of whether auditory stimuli were targets or distractors. Yet, these results may not apply to a situation in which participants have to synchronize with an isochronous sequence in absence of distractors. Moreover, the effect of manipulating intensity of the pacing stimuli on NMA was not systematically investigated.

In contrast, the effect of intensity of target stimuli has been quite extensively documented in reaction-time studies. Varying the intensity of target stimuli affects response latency (e.g. for a review, see Jaśkowski, 1996). Indeed, simple reaction time (RT) decreases by about 100 ms when the intensity of a visual stimulus increases from near-threshold to extremely bright (Jaśkowski, 1992). A similar effect, although weaker, is found with auditory stimuli: RT is about 60 ms shorter for extremely loud than for barely audible tones (Jaśkowski, Rybarczyk, & Jaroszyk, 1994). These differences in action preparation depending on stimulus intensity are thought to result from the duration of perceptual processes; the duration of motor implementation is not affected by stimulus manipulation (Miller, Ulrich, & Rinkenauer, 1999; Mordkoff, Miller, & Roch, 1996). Overall, these results indicate that the perceptual mechanisms responsible for RT are sensitive to intensity manipulation. A similar manipulation is likely to affect the perceptual processes engaged in action preparation in SMS tasks. This possibility is compatible with the SAM model. Indeed, the SAM predicts larger NMA in SMS tasks with increasing pacing stimulus intensity. High-intensity pacing stimuli, because they are associated with a steeper accumulation function, require less time than low-intensity stimuli to reach the functional onset threshold. Thus, the functional onset of the high-intensity stimuli would be farther from the perceived tap onset than with low-intensity stimuli, leading to larger NMA in the former case as can be seen in Fig. 1.

Fig. 1
figure 1

Illustration of the predictions of the effect of stimulus intensity on NMA according to the SAM theory

These predictions were tested in the present study in the visual and in the auditory modality. A group of participants was asked to produce short tapping-like pulses using their index finger in synchrony with isochronous visual stimuli presented at different intensities, ranging from near-threshold to very bright. In addition, for comparison, participants performed a simple visual RT task in which target stimulus intensity was similarly manipulated. A second group of participants performed the same tasks with auditory pacing stimuli from barely audible to very loud.

Method

Participants

Two groups of students from the University of Finance and Management in Warsaw participated in the study for class credits. Group 1 consisted of ten right-handed participants (3 males and 7 females) aged between 19 and 23 years (M = 21.0 years). All had normal or corrected-to-normal visual acuity. Group 2 was formed by 10 participants (1 male and 9 females; 8 right-handed, 2 left-handed) aged between 19 and 23 years (M = 21.8 years). All had normal hearing by self-report.

Material

Visual and auditory target stimuli were used in the experiment. The visual target stimulus was a 10 × 10-mm white square that was presented for 100 ms on a black background in the center of the screen. Stimulus intensity was manipulated to obtain five sequences of target stimuli with the following degrees of luminance: 0.06 (near-threshold), 0.09, 0.22, 12.6, and 120.9 cd/m2 (extremely bright). The auditory target stimuli were 100-ms pure tones (frequency = 1,000 Hz). Loudness was manipulated to obtain five sequences of target stimuli with 9, 11, 23, 56, and 82 dB SPL.

Each of the five sequences in the synchronization task (referred to as SMS task hereafter) consisted of 80 target stimuli (i.e. pacing stimuli) isochronously presented [inter-onset interval (IOI) = 800 ms]. Each of the five sequences in the reaction-time task (RT task) was formed by 60 target stimuli presented with a variable IOI. The duration of each IOI was randomly sampled from an exponential distribution with a mean of 700 ms plus a constant period of 700 ms as done in Jaśkowski and Włodarczyk (2006).

Procedure

The experiment included two conditions, visual and auditory. Group 1 performed the visual condition, Group 2, the auditory condition. In both conditions, participants were asked to perform two tasks: a SMS task and a simple reaction-time (RT) task. In the SMS task, participants were presented with each of the five sequences of pacing stimuli with different intensities in the visual or the auditory modality. For each sequence, participants were asked to put their index finger on the surface of a low-profile force transducer and to increase the finger’s pressure force in synchrony with the pacing stimulus, still keeping the finger in contact with the surface of the transducer. Note that this task requires an isometric response, which differs from a standard tapping task. We used isometric response, since this measure was found as being sensitive to stimulus intensity in previous RT studies (Jaśkowski & Włodarczyk, 2006). In the RT task, participants were presented with five sequences of target stimuli to which they had to respond as quickly as possible by producing force pulses with their index finger using the same transducer as above.

The experiment was run in a sound-proof dark room on an IBM-compatible computer equipped with Presentation software (Neurobehavioral Systems). Stimuli in the visual condition were presented at the centre of a Mitsubishi Diamond Plus 200 22-inch computer screen. Stimuli in the auditory condition were presented binaurally through headphones (Sennheiser eH2270). Pressure force data were recorded by the force transducers, amplified (QuickAmps, BrainProducts Inc.), and stored on disk with a sampling rate of 250 Hz (BrainRecorder software, BrainProducts Inc.).

Each task in each condition included three identical experimental sessions performed on separate days. In each session, participants were presented with five sequences corresponding to the five levels of intensity of the target stimuli. Sequence order was varied across subjects using a Latin square design. Moreover, task order was counterbalanced across subjects. Each participant performed only one condition consisting of six experimental sessions (i.e. three for each task), in six consecutive days. Each session lasted 15 min. In addition, in the visual condition participants adapted to darkness for 15 min before each experimental session.

Results

The first session in each task was treated as training and not analyzed. Force pulses corresponding to the first 20 stimuli in each sequence were discarded and only the data for the subsequent 60 stimuli were analyzed. Three measures were derived from the force pulses obtained in each sequence. Time to threshold (in ms) is the time interval between the stimulus onset and the moment at which force reached a threshold value of 1.5 N.Footnote 2 The force peaks produced by 16 out of 20 participants were all greater than 1.5 N. The remaining four participants produced a very few force peaks (6% of all taps, on average) with intensity below 1.5 N. These were treated as “missing taps” and force trajectories in these cases could not be used to compute time to threshold. Negative time to threshold indicates that the threshold is reached before the occurrence of the stimulus. Time to peak (in ms) is the time interval between the stimulus onset and the moment when force reaches its peak. Negative time to peak indicates that the force peak is reached prior to the stimulus. Force peak (in μV) indicates the magnitude of the force peak closest to the stimulus. These measures are illustrated in Fig. 2 for a typical force pulse obtained in the SMS task. Force pulses in the RT task were very similar, but the response always occurred well after the stimulus (i.e. time to threshold and time to peak were both positive).

Fig. 2
figure 2

Time to threshold, Time to peak, and Force peak for a typical force time course obtained in the SMS task

Average time to threshold, time to peak, and force peak as a function of the intensity of the pacing stimuli are presented in Fig. 3 (visual condition) and Fig. 4 (auditory condition). The two conditions were analyzed separately. Time to threshold, time to peak, and force peak data in each condition were submitted to separate 2 (task) × 5 (stimulus intensity) repeated-measures analyses of variance. Task (SMS, RT), and stimulus intensity were the within-subject factors; participants were taken as the random variable.Footnote 3

Fig. 3
figure 3

Visual condition: Mean time to threshold (a, b), Mean time to peak (c, d), and Mean peak force (e, f) as a function of stimulus intensity (log scale), obtained in the SMS task (see right column) and in the RT task (left column). Error bars indicate SE of the mean

Fig. 4
figure 4

Auditory condition: Mean time to threshold (a, b), Mean time to peak (c, d), and Mean peak force (e, f) as a function of stimulus intensity, obtained in the SMS task (see right column) and in the RT task (left column). Error bars indicate SE of the mean

Visual condition

As can be seen in Fig. 3a, b, time to threshold decreased with intensity only in the RT task, as attested by a significant Task × Intensity interaction (F(4,36) = 48.86; ε = 0.55; p < 0.001).Footnote 4 In the RT task, mean time to threshold was always positive and monotonically decreased with intensity (F(4,36) = 40.37; ε = 0.57; p < 0.001). In contrast, in the SMS task mean time to threshold was always negative and did not significantly vary with intensity (F(4,36) = 3.67; ε = 0.27; p = n.s.). Time to peak (Fig. 3c, d), showed the same Task × Intensity interaction (F(4,36) = 30.84; ε = 0.55; p < 0.001). In the RT task, time to peak decreased with increasing intensity (F(4,36) = 38.18; ε = 0.57; p < 0.001). In the SMS task, time to peak was positive but not significantly affected by intensity (F(4,36) = 3.97; ε = 0.29; p = n.s.). Finally, in both tasks force peak decreased while stimulus intensity increased (Fig. 3e, f), as revealed by a main effect of Intensity (F(4,36) = 10.31; ε = 0.32; p < 0.001). The Intensity × Task interaction and the main effect of Task did not reach significance.

We further examined whether time to threshold and time to peak covaried with force peak. For each subject, we computed average time to threshold, time to peak and force peak across the five different intensities. Average time to threshold and time to peak were correlated with the average force peak, separately for each task. Time to threshold increased with decreasing response force in the SMS task (r = −0.69; p < 0.05); but not in the RT task (r = 0.48; p = n.s.). Moreover, time to peak increased with increasing response force in the RT task (r = 0.82; p < 0.05); in the SMS task time to peak decreased with increasing response force but this effect was not significant (r = −0.30; p = n.s.). The other correlations between measures of response time and response force did not reach significance.

We also examined if stimulus intensity affects the difference between time to threshold and time to peak (i.e. the interval between time to threshold and time to peak measured in ms). In the visual condition, this measure decreased when intensity increased, but only in the SMS task (F(4,36) = 32.78; ε = 0.26; p < 0.001).

Auditory condition

As illustrated in Fig. 4a, b, the effect of Intensity on time to threshold in the auditory condition depended on the Task (F(4,36) = 15.04; p < 0.001). In the RT task, time to threshold was always positive, and it decreased as intensity increased (F(4,36) = 31.00; p < 0.001). In contrast, in the SMS task time to threshold was always negative and was not significantly affected by Intensity (F(4,36) = 2.01; p = n.s.) whereas force peak still tended to precede the pacing stimulus. Similarly, the effect of Intensity on time to peak (Fig. 4c, d) depended on the Task (F(4,36) = 11.89; p < 0.001). In the RT task, time to peak was always positive and systematically decreased with increasing intensity (F(4,36) = 22.88; p < 0.001). A similar effect was not found in the SMS task (F(4,36) = 1.12; p = n.s.). Finally, force peak did not change as a function of stimulus intensity in both tasks (Fig. 4e, f). The Intensity × Task interaction and the main effect of Task did not reach significance. None of the correlations between measures of asynchrony and force peak in the auditory condition reached significance.

In the auditory condition, the difference between time to threshold and time to peak tended to decrease when intensity increased, as observed in the visual condition; however, the effect did not reach significance [in the RT task F(4,36) = 2.33; p = n.s; in the SMS task F(4,36) = 2.17; p = n.s.].

Discussion

In this study, we sought to examine whether negative asynchrony in SMS is sensitive to manipulations of pacing stimulus intensity. Varying stimulus intensity did not affect asynchrony in SMS tasks. In contrast, consistent with previous evidence (Jaśkowski, 1992), higher intensity of the target stimulus reduced response latency in simple RT tasks. This discrepancy between RT and SMS tasks was observed in both the visual and the auditory modalities.

The finding that accuracy in SMS (i.e. asynchrony) is insensitive to intensity manipulation is not consistent with our predictions based on the SAM (Aschersleben, 2002; Aschersleben et al., 2001). Indeed, according to this model, we expected higher intensity of the pacing stimuli to affect NMA, due to a steeper accumulation function. The lack of confirmation of this hypothesis may lead to conclude that the SAM is not an adequate account of NMA. Another possibility, however, is that additional processes, such as entrainment, may impinge on the functioning of the components in the SAM, and thereby counter the effect of intensity. This possibility, although quite speculative at this stage, is examined below.

Discrepancies between sensitivity of different tasks to stimulus intensity are not unusual in the timing literature. A notable example is the comparison between simple RT tasks and temporal order judgment (TOJ) tasks (for reviews, see Jaśkowski, 1999; Miller & Schwartz, 2006). Although manipulating intensity affects both tasks, this effect in RT tasks is approximately twice as large as in TOJ (Jaśkowski, 1992; Menendez & Lit, 1983; Roufs, 1974; Sanford, 1971, 1974). This finding was generally taken as evidence supporting the existence of different systems underlying timing in RT tasks and in TOJ tasks (Neumann, Esselmann, & Klotz, 1993).

In this context, it is relevant that the temporal order threshold obtained in the TOJ task may be closely related to SMS. Indeed, in SMS tasks, perceiving the difference between the times of occurrence of the tap and of the pacing stimulus (i.e. asynchrony) and their order are likely to be important for error correction (Mates, 1994a, b; Michon, 1967; Thaut, Miller, & Schauer, 1998; Vorberg & Wing, 1996). However, note that other lines of evidence suggest that error correction (e.g. phase correction) may not rely exclusively on asynchronies. SMS may be the outcome of successive phase resettings in response to pacing stimuli (i.e. time points instead of asynchronies) (Hary & Moore, 1985; see Repp, 2005, for a thorough discussion). Moreover, subliminal timing perturbations can also trigger correction mechanisms (for a discussion, see Repp, 2000).

Given the link between temporal order and SMS tasks, an examination of the theories accounting for different effects of stimulus intensity on RT versus TOJ tasks may shed light on the mechanisms underlying the discrepancies observed here between SMS and RT. Two categories of accounts emerge from the RT/TOJ literature: two-system accounts and one-system accounts (for a review, see Miller & Schwartz, 2006). Two-system accounts postulate that the TOJ and RT tasks engage different mechanisms associated with distinct brain areas (i.e. the ventral and the dorsal pathways, respectively) (Neumann et al., 1993; Neumann & Niepel, 2004). In contrast, one-system accounts more parsimoniously explain RT/TOJ dissociations as the result of one shared mechanism. For example, Miller and Schwartz (2006) recently proposed a one-system diffusion model. They hypothesized lower detection criteria for the TOJ task than for the RT task. According to the model, discrepancies between the two tasks result from performance optimization based on the same system when participants are faced with conflicting task demands. It is noteworthy that this theory assumes that the motor triggering response level is higher than the perceptual decision variable (Cardoso-Leite, Gorea, & Mamassian, 2007; Miller & Schwartz, 2006; Waszak & Gorea, 2004).

As in other standard statistical decision models of time course of perceptual detection and information accumulation (cf. Luce, 1986), in this model it is assumed that the observer trying to detect the onset of a stimulus has access to a time series of noisy sensory observations. Miller and Schwartz (2006) postulate that due to the different task demands observers need to use higher criterion in the RT task than in a TOJ task, since in the former case responses have to be provided as quickly as possible, and trying to avoid false alarms. In contrast, in the TOJ task, participants have no time pressure to respond, nor there is a severe penalty for making false alarms. In sum, in spite of the fact that perception and action rely on a common evidence-accumulating process in both the RT task and in the TOJ task, they are assumed to be triggered at distinct and independent levels of internal activity.

One-system and two-system accounts can similarly be considered in connection with our present results. A one-system account of the discrepancies between timing in RT and SMS tasks would be particularly appealing, because of its parsimony. Indeed, it is likely that certain processes are shared by the two tasks, such as sensory accumulation postulated in the SAM. However, due to the different task demands, we are inclined to favor an explanation based on at least partially independent mechanisms. In RT tasks, participants have to respond to target stimuli as quickly as possible. Because the interval between target stimuli is not regular, the time of occurrence of the stimulus cannot be predicted, and the response follows the stimulus. In contrast, in SMS tasks participants have to synchronize with pacing stimuli as accurately as possible. Because of the regular temporal properties of pacing stimulus sequences (i.e. constant IOI) participants after listening to a few stimuli can predict when the pacing stimulus will occur, and thereby anticipate the upcoming stimuli. Prediction is likely to be supported by the entrainment of an internal attentional rhythm (e.g. as modeled by oscillator theory, Large & Jones, 1999; for a review, see Large, 2008) to the temporal properties (i.e. period and phase) of the pacing sequence.

Here, we speculate that the reliance of SMS on entrainment mechanisms, as opposed to RT, may be the reason why intensity manipulation differentially affects the performance in SMS and RT tasks. This possibility allows to maintain the SAM as a valid account for SMS by postulating that SAM mechanisms (e.g. accumulation thresholds) are dependent on additional processes (e.g. attention) related to entrainment. Indeed, it seems reasonable to assume that greater pacing stimulus intensity may be conducive to a higher degree of entrainment (i.e. underlined by larger amplitude of internal oscillations); this should entail stronger expectancy for future pacing stimuli. Such increase in expectancy can be modeled within the framework of the SAM by a proportional reduction of the functional onset threshold. When participants synchronize with pacing stimuli having higher intensity, sensory accumulation for these pacing stimuli will be faster, in keeping with the SAM. However, at the same time, stimuli having higher intensity are likely to trigger stronger expectations (due to attentional entrainment), thereby lowering the accumulation threshold. This reduction of the threshold would compensate for the steeper accumulating function evoked by the pacing stimulus at greater intensities, thus keeping NA constant (see Fig. 5, for an example of pacing stimuli with different intensities, and still leading to the same NMA).

Fig. 5
figure 5

Illustration of the effect of stimulus intensity on the functional onset threshold in the SAM

These suggestions seems to be at least partially supported by the peak force behavior in our experiment. In both tasks, participants tended to reduce peak force when stimulus intensity was increased. This effect reached significance only in the visual modality. A similar relationship was found in a previously reported RT study: when the imperative stimulus is less expected participants tend to respond more forcefully than when the stimulus is expected (Jaśkowski & Verleger, 1993). This finding is accounted for by a hypothesis that postulates that participants estimate the time needed for response initiation and the time course of motor preparation. The outcome of such estimation processes modulates arousal/activation and can directly affect response force (i.e. increase it when the stimulus is unexpected; Jaśkowski & Verleger, 1993; Jaśkowski et al., 1994; Jaśkowski, van der Lubbe, Wauschkuhn, Wascher, & Verleger, 2000). In sum, increasing response force at lower intensities (i.e. when stimuli are less expected) may result from a tendency to boost arousal/activation, as a compensatory strategy for poor motor preparation (Jaśkowski & Verleger, 1993; Jaśkowski et al., 1994). A similar mechanism can provide a viable explanation of the relation between intensity and force found in our study. Indeed, as low-intensity target stimuli may have been less expected than high-intensity stimuli, because of lower entrainment, participants were likely less prepared to respond to the former than to the latter. Moreover, we found that greater peak force was associated with smaller asynchrony in the SMS task (as shown by time to threshold). In sum, decrease of force peak with stimulus intensity is compatible with the hypothesis of lower onset thresholds for higher intensity stimuli due to higher predictability of stimulus timing. At low intensities, even regular stimulus onsets may appear to be subjectively more diffused in time than at higher intensities.Footnote 5 Nevertheless, this account, although appealing, predicts generally higher force in the RT task than in the SMS task, since in the latter case the stimuli were presented more regularly than in the former. Moreover, the effect of intensity on force peak should have been task-independent.

Other alternative explanations can be considered to account for the lack of the effect of intensity on NMA and for the observation that greater peak force is associated with smaller asynchrony. First, there exists an intriguing curious agency-related phenomenon whereby people sometimes feel that they produce the pacing signal events in SMS tasks (Repp & Knoblich, 2007). In light of this phenomenon, it is possible that the observed increase in force when the intensity of pacing stimuli is low may represent an unconscious attempt to increase the intensity of pacing signals that are perceived as effects of one’s actions. Another possibility is that by increasing force at lower stimulus intensities participants attempted to increase tactile feedback for enhancing synchronization accuracy when information about the pacing signal is limited, assuming that the multisensory central representations need a critical amount of information from at least one of the modalities involved. This possibility is consistent with motion capture data showing the role of force in modulating tactile feedback as a way to increase timing accuracy (Goebl & Palmer, 2008). Finally, it is worth mentioning that the SAM is a model accounting for timing in a discrete finger tapping task (i.e. where the finger is not always in contact with the surface upon which taps are made). However, in the current study we employed a force production task where the finger remained in contact with a pressure transducer, which is more similar to a continuous tapping task (Spencer, Zelaznik, Diedrichsen, & Ivry, 2003). Such differences in type of action could have implications for interpreting the results. For example, the task adopted in the present study and classical finger tapping (i.e. discrete) may tap partly on different timing systems dealing with continuous versus discrete timing (Spencer et al., 2003). These possibilities deserve to be tested in further studies.

In sum, sensory accumulation and threshold mechanisms like those postulated in the SAM may account for timing in RT tasks and in SMS tasks, with some additional mechanisms. In the case of SMS, stimulus predictability may impinge on the functioning of some components of the SAM, such as the accumulation threshold. We are aware that this possibility is highly speculative at the present stage. Further research is in order to examine the effect of the degree of entrainment evoked by the pacing stimuli (e.g. by using stimuli leading to various degrees of entrainment, such as weakly metrical vs. highly metrical stimuli) on NMA.