Introduction

One of the best known and most venerable effects in time perception is the result that stimuli in different modalities give rise to different duration estimates, even when the stimuli actually last for the same length of time. Particularly relevant in the present context is the result that “sounds are judged longer than lights” (Goldstone & Lhamon, 1974): that is, auditory stimuli appear to last longer than visual ones of the same real duration. Differences between duration estimates of auditory and visual stimuli were remarked on by Vierordt in 1868 (Lejeune & Wearden, 2009) and also noted by Guyau (1890), and have been the subject of many experiments since (see Wearden, Edwards, Fakhri, & Percival, 1998, for some examples). The auditory/visual effect is shown by adults (Wearden et al., 1998) and children Droit-Volet, Tourret, and Wearden (2004), is almost always manifested when auditory and visual stimuli are directly or indirectly compared by the same participants, but can occur in some cases when different participants estimate the duration of stimuli in the different modalities (Wearden, Todd, & Jones, 2006). Although auditory/visual effects are the focus of the present work, they are only one of a number of effects of stimulus type on duration judgements (see Jones, Poliakoff, & Wells, 2009; Wearden, Norton, Martin, & Montford-Bebb, 2007, for others).

One question is whether stimuli in the different modalities (specifically, auditory and visual) are timed by a common timing mechanism, or by different ones, and this question gives rise to what might be called the modality paradox. On the one hand, if stimuli in different modalities are timed by a common mechanism, why are there any marked differences in duration judgements at all? To give a real-life example, if a person times different events using a physical stopwatch, there is a “common code” of stopwatch ticks for everything timed and differences between the timing of different types of events should be zero or minimal (e.g. arising from small differences in starting and stopping times of the stopwatch) if their real durations are the same. On the other hand, if stimuli in different modalities are timed by completely different mechanisms, why are time judgements from stimuli of different sorts so similar in general form? And similar they are: when standard techniques like temporal generalization (Wearden, Denovan, Fakhri, & Haworth, 1997) and bisection (Wearden et al., 2006) are used, the timing of auditory or visual stimuli gives rise to very similar performance, although the timing of visual stimuli is often less sensitive to their duration than timing of auditory ones (see Wearden et al., 1998, Figure 1, p. 102, for example). To illustrate, in a typical bisection task, people are initially presented with examples of Short and Long standard durations, identified as such (e.g. stimuli 200- and 800-ms long). Then they receive comparison stimuli (e.g. between 200 and 800 ms in 100-ms steps) and have to decide whether each comparison has a duration more similar to the short or long standard, making a “short” or “long” response. If the proportion of “long” responses is plotted against comparison duration, a psychophysical function of an ogival shape running from near-zero “long” responses at the shortest comparison, to near 100 % at the longest comparison, is obtained with both auditory and visual stimuli (e.g. see Wearden et al., 2006, upper panel of their Figure 1, p. 1711).

When verbal estimation of duration is used, mean estimates of both auditory and visual stimuli increase more or less linearly with real duration (albeit with a smaller slope in the visual case), and some evidence suggests that estimates of auditory durations are closer to the real stimulus durations than estimates of visual stimuli are (e.g. Wearden et al., 1998, Figure 4, p. 110, see also Wearden et al., 2006, Figure 4, p. 1717), although this has not been systematically explored. Duration estimates of both sorts of stimuli are increased when the stimuli are preceded by click trains which are supposed to “speed up” timing processes (Penton-Voak, Edwards, Percival, & Wearden, 1996; Wearden et al., 1998). Obviously, timing differences suggest different mechanisms, and timing similarities common ones.

A popular idea (e.g. Wearden et al., 1998) is that both auditory and visual stimuli are timed by a common clock-like internal timer, which “ticks” faster, for some reason, for auditory than for visual stimuli. An alternative is that there are separate timing mechanisms for auditory and visual stimuli, based on the neural systems employed for perception of stimuli in the different modalities, but such separate systems must necessarily produce similar behaviour in many cases, as well as generating the well-known modality difference in average subjective duration, by some mechanism yet to be specified.

One way out of the modality paradox is to suppose that there are both differences in timing processes that depend on modality, and some common processes somewhere within the timing system. However, most experiments on auditory and visual duration judgements are variants on the theme of demonstrating that subjective duration differences depend on modality, and do not provide any evidence of common timing processes, over and above the finding that timing of stimuli in different modalities is usually similar. A notable exception is a recent article by Stauffer, Haldemann, Troche, & Rammsayer (2012). These authors tested the same participants on a range of tasks involving judgements of the durations of auditory and visual stimuli, as well as a rhythm perception task involving both sorts of stimuli. They then used structural equation modelling to account for the pattern of correlations found between performance on the tasks. Their conclusion was that the best-fitting model involved both modality-specific timing processes and another process common to both modalities.

The present article takes a different approach and seeks to provide direct experimental evidence that there might be, somewhere in the human timing system, something in common between auditory and visual duration representations using an interference technique.

The idea of interference indicating competition for processing resources has been used in a number of experiments where non-timing tasks had to be performed concurrently with timing (e.g. Fortin & Breton, 1995), but here we are more concerned with the influence of judgements about one set of time intervals on judgements of other ones, when the intervals are not presented concurrently, something more like the situation obtained in conventional studies of memory. As is well known, when items are stored in memory, a commonplace effect is that earlier items may interfere with the memory of items presented later, the phenomenon of proactive interference (see Underwood, 1957, for a classic example and Hartshorne, 2008, for a more recent one). Interference is usually taken to imply that the items which interfere with one another have something in common in terms of their representation (for example, being words for animals).

Demonstrations that timing of one event or set of events interferes with timing of another one without concurrent presentation have been, perhaps surprisingly, rather rare. Jones and Wearden (2004) found situations in which memorizing two temporal standards increased timing variability compared with memorizing only one, although the effect depended on the temporal standards being used to make judgements about comparison stimuli, rather than on their mere presentation: see also Grondin (2005). Ogden, Wearden, and Jones (2008) found evidence of dramatic interference between time representations in some cases, but this depended on the use of a complex method, involving both interfering conditions and retention delays.

The present article reports interference effects derived from what seems a simple paradigm resembling that used when lists of words are remembered. The basic procedure involved presenting participants with a “changing standard” variant of a temporal generalization task (Jones & Wearden, 2003; Wearden, 1992) in the form of alternating blocks, following an ABABABAB design. Here, the B blocks (test blocks from which data were taken) involved judgements of durations which were on average the same, whereas A blocks (interference blocks) involved durations which were either systematically shorter than those in B (short interference) or systematically longer (long interference).

At the start of each block, three examples of a “standard” duration were presented, then participants were asked whether each of a series of comparison stimuli (some longer than the standard, some equal to it, some shorter) had the same duration as the standard. Then a new standard, and its associated comparisons, was presented for the next block, and so on.

Our main focus of interest was whether performance on the (averagely constant) test (B) blocks was affected by the duration of standards and comparisons used in the interference (A) blocks: in particular, were comparisons judged as shorter after short interference and longer after long interference, suggesting that memories of standard durations in the interference blocks may have influenced those in test blocks and thus altered performance? For example, suppose a person tries to remember the standard on a particular test block. One possibility is that they sometimes confuse the current standard with one previously presented (e.g. in an interference block). This may cause a shift in the memory of the standard, and consequently a shift in responding to comparison stimuli. However, the possibility of such effects depends logically on the durations of the stimuli in one block having something in common with those in other blocks, so the particularly interesting cases are those where interference and test blocks involve stimuli in different modalities. If, somewhere in the timing system, a “common code” for auditory and visual durations exists (as Stauffer et al. 2012 suggest), then interference between auditory and visual standards might be expected, whereas if completely separate “codes” for auditory and visual stimuli are used, then no interference should occur. Note that, in our experiment, all judgements of comparison durations are supposed to be performed relative to the standard presented at the start of each block, so the task never requires people to compare the durations of stimuli from different modalities.

We report data from conditions where the stimuli in the interference and test blocks were both auditory, both visual, or when the interference block used one modality and the test block another one. The group where only auditory stimuli were used (auditory/auditory, below) was initially run as a pilot to test our putative interference paradigm, so the procedure for this group differed from that used for the other groups in various minor ways.

Methods

Participants

Sixty people participated in the four groups. Twelve acquaintances of the first author served in the auditory/auditory group. For the others Keele University undergraduate students participated. There were 16 in the visual/visual group, 15 in the visual/auditory group, and 17 in the auditory/visual group.

Apparatus

Standard desktop PC computers were with LED screens used for all but the auditory/auditory group, for which a Sony Vaio laptop was employed. Responses were registered on the computer keyboard. The visual stimuli were 10 × 10 cm blue squares displayed in the centre of the computer screen, the auditory stimuli were 500 Hz tones either played through the computer speaker (auditory/auditory group), or delivered through headphones (other groups). The procedure for the auditory/auditory group was programmed in MEL-1, the others in E-Prime 2 (both programs were products of Psychology Software Tools, Inc.).

Procedure

All experiments took place in a single experimental session consisting of two conditions, which all participants completed. In all groups, approximately half the participants started with the long interference condition first and the rest began with the short one. Groups are identified by the type of stimuli in the interference blocks (first), and the test blocks (second) so, for example, in the auditory/visual group the stimuli in the interference blocks were tones and those in the test block were squares. Consider the procedure for the auditory/auditory group. The short interference condition started with three presentations of a standard (participants were aware that the standard was different in each block) which was randomly chosen from a uniform distribution running 190–210 ms. Standard presentations were separated by gaps randomly selected from 1,500 to 2,000 ms. Then, seven comparison durations were presented which were the standard on that block (whatever it was) multiplied by 0.4, 0.6, 0.8, 1, 1.2, 1.4, and 1.6, presented in a random order (interference block). The participant produced each comparison in response to a “press spacebar for next trial” prompt, and this was followed by a random delay from 500 to 1,000 ms before comparison presentation. After each comparison stimulus was presented participants judged whether it had the same duration as the standard of the block and responded by pressing the “Y” (YES) or “N” (NO) keys. Then a test block was presented where the standard was drawn from a uniform distribution running from 380 to 420 ms. After the three presentations of the standard seven comparison durations were presented that were multiples of the standard, whatever it was on the block, multiplied by 0.4, 0.6, 0.8, 1, 1.2, 1.4, and 1.6 in a random order. Participants then completed another interference block, and so on. The entire experiment consisted of the interference block-test block sequence repeated four times. The procedure for the long interference condition was identical, except that standards in the interference blocks were randomly chosen from a 570 to 630 ms range.

The procedure was identical for the other groups, except for the type of stimuli used. For the visual/visual group, stimuli in both interference and test blocks were 10 × 10 cm blue squares centred on the computer screen. For the auditory/visual group the stimuli in the interference blocks were tones, and those in the test blocks were squares, and for the visual/auditory group the stimulus types were reversed.

Results

Temporal generalization gradients from the test blocks in the form of proportions of YES responses (judgements that the comparison duration was the standard), plotted against comparison/standard ratio, from the four experimental groups, are shown in Fig. 1. Note that within each panel the data come from judgements of the duration of stimuli that are on average physically identical. Inspection of the data in the four panels of Fig. 1 suggests that, in all conditions, (a) participants were sensitive to comparison stimulus duration and (b) the temporal generalization gradients appeared to be displaced relative to one another with gradients from the short interference condition being displaced to the left of those from the long interference condition.

Fig. 1
figure 1

Each panel shows temporal generalization gradients (mean proportion of YES responses—judgements that the comparison duration was the standard—plotted against comparison/standard ratio) from the test blocks. Vertical bars indicate standard error of the mean. Within each panel data are shown separately from test blocks from the short and long interference conditions. The different panels show data from the different groups (e.g. top left auditory/auditory), as indicated by the legend in each panel. The first-described modality is the interference block modality, and the second the test block modality

There are various ways of analysing the data shown in Fig. 1. We will first use repeated-measures ANOVA performed on all the groups (auditory/auditory, etc.) considered as a single sample. In this case there may be an effect of group (i.e. one group may produce more YES responses than another one), an effect of comparison duration (i.e. the different durations give rise to different proportions of YES responses, something to be expected if participants are sensitive to duration), and an effect of interference condition (i.e. the different interference conditions may produce different proportions of YES responses). In addition, there may be a number of interactions between these factors. As noted above, inspection of the generalization gradients in each panel suggests that they are displaced leftwards in the short interference condition, and rightwards in the long interference condition. Such an effect should manifest itself as a comparison duration × condition interaction, although this may not be the best test of putative gradient displacement, as discussed further later.

The overall ANOVA found no significant effect of interference condition, F(1, 56) = 0.82, but there were significant effects of comparison duration, F(6, 336) = 71.99, p < 0.001, η 2 = 0.56, and group, F(3, 56) = 10.77, p < 0.001, η 2 = 0.36. The interference × comparison duration interaction, which might indicate gradient displacement, was significant, F(6, 336) = 2.54, p < 0.05, η 2 = 0.043, but neither the interference condition × group, F(3, 56) = 0.24, nor the comparison duration × group, F(18, 336) = 1.31, interactions were significant. There was, however, a significant three-way interaction between interference condition, comparison duration, and group, F(18, 336) = 3.04, p < 0.01, η 2 = 0.15.

Inspection of the data in Fig. 1 suggests that, as is normally the case, generalization gradients from visual comparison stimuli are flatter than those from auditory comparisons (e.g. see Wearden et al., 1998, Figure 1, p. 102), so interactions of other factors with group are likely, as well as an overall effect of group, as found above. However, differences between judgements of auditory and visual durations are not the focus of interest here: in our study, auditory and visual stimuli never had their durations directly compared. Our focus was on effects of interference conditions, so to clarify some of the effects obtained in the overall ANOVA we analysed data from each group separately, and tested two main effects (condition: whether short or long interference changed the overall proportion of YES responses; comparison duration: whether different stimulus durations gave rise to different proportions of YES responses) and the condition × comparison duration interaction, which in this case may indicate significant left/right displacement of the gradients.

Auditory/auditory

There was no significant effect of condition, F(1, 11) = 0.16, NS, η 2 = 0.01, but there was a significant effect of comparison duration, F(6, 66) = 27.11, p < 0.001, η 2 = 0.71, and a significant condition × comparison duration interaction, F(6, 66) = 2.82, p < 0.05, η 2 = 0.20.

Visual/visual

There was no significant effect of condition, F(1, 15) = 0.01, NS, η 2 = 0.001, but there was a significant effect of comparison duration, F(6, 90) = 10.92, p < 0.001, η 2 = 0.42, and a significant condition × comparison duration interaction, F(6, 90) = 2.71, p < 0.05, η 2 = 0.15.

Auditory/visual

There was no significant effect of condition, F(1, 16) = 0.29, NS, η 2 = 0.02, but there was a significant effect of comparison duration, F(6, 96) = 22.78, p < 0.001, η 2 = 0.59, and a significant condition × comparison duration interaction, F(6, 96) = 2.94, p < 0.05, η 2 = 0.15.

Visual/auditory

There was no significant effect of condition, F(1, 14) = 1.14, NS, η 2 = 0.07, but there was a significant effect of comparison duration, F(6, 84) = 19.54, p < 0.001, η 2 = 0.58, and a significant condition × comparison duration interaction, F(6, 84) = 4.08, p < 0.05, η 2 = 0.22.

The significant condition × duration interaction found in all cases, coupled with visual inspection of the data in each of the panels of Fig. 1, supports the view that the generalization gradients from each group, in spite of being derived from stimuli which were on average physically identical within the group, were displaced in the direction of the putative interference: to the left in the short interference case and to the right with long interference. However, the significant comparison duration × interference condition interaction may have other causes.

A more direct measure of the effect of the interference conditions might be derived from an analysis of temporal generalization gradient skews. In many cases, although not all, temporal generalization gradients obtained from humans are right-skewed, that is, more YES responses occur to stimuli longer than the standard than to stimuli shorter by the same amount (Wearden, 1992). For the present data, we constructed a skew statistic, by averaging together the proportion of YES responses to comparisons shorter than the standard, and those longer than the standard, and subtracting the longer ones from the shorter. If gradients are right-skewed, then this should produce a negative value (as there are more YES responses to longer comparisons), whereas a leftward skew should produce a positive value (as there are more YES responses to shorter comparisons). The mean values of this measure for the different groups and conditions are shown in Fig. 2.

Fig. 2
figure 2

Mean values of the skew measure described in the text (standard error shown as vertical lines) for the different conditions. AAS, AAL auditory/auditory short and long interference; VVS/VVL visual/visual short and long interference. Similarly for the other groups

As can be seen from inspection of Fig. 2, long interference conditions produced systematically negative values (i.e. more YES responses to comparisons longer than the standard), whereas short interference conditions produced positive values (or zero in the case of the visual/auditory interference condition). Recall, once again, that the interference comparisons from each group come from judgements of stimuli which were on average physically identical within each group. The skew measures were entered into a repeated-measures ANOVA with interference type (short or long) as the within-subject factor, and group (auditory/auditory, etc.) as the between-group factor. The interference type produced a highly significant effect, F(1, 56) = 16.35, p < 0.001, η 2 = 0.23, but there was no effect of group, F(1, 3) = 0.15, nor a significant interference type × group interaction, F(3, 56) = 0.11. Thus the values obtained from the skew measure, and its statistical analysis, strongly suggest that the interference conditions were systematically displacing the generalization gradients in the direction of the putative interference: to the left in short interference conditions, and to the right in long interference ones.

Discussion

Obviously, the general pattern of results from all four groups was strikingly similar. The type of interference never produced an overall change in the proportion of YES responses, people were highly sensitive to comparison duration (of course, this is only to be expected in a timing experiment), and the temporal generalization gradients in each panel were always displaced from one another in the direction of the duration of the standards and comparisons in the interference blocks. The gradient shifts were small, as were effect sizes for the interaction, but this is unsurprising as within each group the comparison durations in the test blocks were on average physically identical, and the instructions required participants to judge comparison durations with respect to the standard presented at the start of each block, so there was no obvious reason for participants to be influenced by the values of stimuli in other blocks, although they clearly were even when the stimuli were in a different modality from that used in the test blocks. The fact that effect sizes for the interference type × duration interactions were similar for unimodal and cross-modal conditions, and the fact that the there was no significant effect of group when the skew analysis was performed, might suggest that unimodal and cross-modal interference effects were of similar size, but this conclusion may be premature. The type of stimuli we use here have been employed in previous work (e.g. Wearden et al., 1998, 2006) and robust effects of mean subjective duration differences were obtained, with the auditory stimuli having estimated durations around 20 % longer than the visual ones. In the present case, therefore, a 200 ms visual stimulus is further away from a 400 ms auditory one on the subjective time scale than a 200 ms auditory stimulus is. In contrast, a 600 ms visual stimulus will be expected to have a subjective duration which is subjectively closer to a 400 ms auditory stimulus than a 600 ms auditory stimulus has, and so on for the other cross-modal conditions. This sort of consideration obviously complicates comparisons of unimodal and cross-modal interference effects, and suggests that concluding that they are of identical size is risky.

In general, it is hard to imagine how the interference between blocks which used stimuli in different modalities could have occurred if there were no “common code” for auditory and visual durations at some level in the human timing system. Our results show that time representations in this common code exerted an influence on performance, even though our experiment, unusually for this research area, never required people to compare the duration of auditory and visual stimuli. Although our study establishes by experiment that some common code for durations in different modality is probably formed in our experiment (as opposed to supporting this suggestion by inspection of results from different studies, as mentioned earlier), it cannot conclusively tell us where in the timing system this common code resides. One possibility is that “standards” which are used as the basis for judgements of a number of comparison stimuli exist in an amodal form, even though the initial timing of auditory and visual stimuli is accomplished by different mechanisms. This might suggest different modality-specific “clocks” which produce the initial duration representations, followed by some transformation into an amodal form if these representations are identified as standards. This position is similar to that of Stauffer et al. (2012), who argued that both modality-specific and modality-independent timing components exist. Their article, however, found correlations between performance indices on some auditory and visual tasks where it is less likely that common standards might be generated than in our procedure (e.g. rhythm perception tasks with auditory and visual markers, see their Table 2, p. 26), so it may be that processes common to auditory and visual timing occur even when no standards are used. Our results also suggest that if an amodal common standard is the basis of the cross-modal interference effects we obtain here, its production is “automatic” at least in the sense that it does not depend on the task requiring participants to directly compare the duration of stimuli in different modalities, something which our procedures never involves.

However, another explanation of the effects obtained here which may not imply this kind of amodal representation of duration comes from the idea of “memory mixing” derived from work of Penney, Gibbon, and Meck (2000). Their study investigated timing performance with both auditory and visual stimuli. To simplify the procedure of their Experiment 1 slightly, 3 and 6 s durations in the auditory and visual modality were presented as short and long standards in a bisection task. Then, stimulus durations ranging between 3 and 6 s were presented as either single auditory or visual stimuli, or given in a simultaneous presentation. The task required the participants to classify the duration of each comparison stimulus as short or long relative to the standards presented previously. The normal result that auditory stimuli were judged as longer than visual ones was obtained. Penney et al. explained their results by supposing that (a) more temporal accumulation occurred with auditory stimuli than visual ones, and (b) the auditory and visual standard durations gave rise to a “mixed” duration representation used to classify the comparison stimuli. When the different modalities were presented in different sessions, no auditory/visual difference was found, a result interpreted as resulting from the absence of “memory mixing”. In the present case, the interference between the different conditions could be accounted for by a similar mixing process, so standards in short interference blocks could mix with those in test blocks, thus shortening the standard representation in those blocks, with the standards in the long interference blocks producing the opposite effect.

However, this explanation is subject to two qualifications. For one thing, the existence of memory mixing between stimuli in different modalities surely presupposes some sort of “common code” for time between the modalities, as it is hard to see how any mixing could occur if this were not the case. For a mixture to occur, the ingredients involved must be “mixable”. However, it may be that no general amodal time representation in the sense of Stauffer et al. (2012) is formed. For another thing, Wearden et al. (2006) investigated the conditions under which auditory/visual differences in duration judgements occurred. Their Experiment 1 used a bisection task, with standards being presented at the start of each block, followed by comparison stimuli which had to be compared with those standards, a procedure rather similar to that used in the present study. When standards and comparisons were in the same modality (a condition that always held in the present study), no auditory/visual differences were found (see the upper panel of their Figure 1, p. 1712), suggesting that the “segregation” of standards and comparisons within blocks eliminated memory mixing, although Wearden et al.’s Experiment 2, designed to encourage potential memory mixing, found the usual auditory/visual difference, as Penney et al. (2000) would predict.

In general, progress in understanding modality effects in timing probably requires the use of what are novel methods for the research area, one of which is represented here, rather than what are in effect increasingly sophisticated demonstrations of the fact known for more than a 100 years that auditory stimuli seem to last longer than visual ones. Areas of interest might include investigation of potential auditory–visual interference in timing procedures which do not employ standards which are relevant for a number of trials (e.g. Wearden & Bray, 2001), studies of interactions of timing and non-timing tasks in the same or different modalities, and task-switching. The latter procedure has recently been employed in a timing study by Wearden, O’Rourke, Matchwick, Zhang, and Maeers (2010), where it was shown that switching from a rapidly executed arithmetic task to timing of a tone changed perceived tone duration. There are many potential variants of this procedure which might be useful in investigating modality effects on timing, and thus making progress with the puzzling question of why the type of stimulus timed should have such strong effects on the perception of duration, which nineteenth century time psychology has bequeathed to us.