Humans and many other species direct their attention at locations in space (without making eye movements) not only because they expect something to occur over there (endogenous orienting) but also because of sudden changes at that location (exogenous orienting). Several studies (Theeuwes 1991; Yantis and Jonides 1990) have been performed on the relation between exogenous and endogenous orienting, and these suggest that exogenous orienting effects of irrelevant visual onsets vanish when attention is focused elsewhere. In the current study, we examined whether this also applies to sudden irrelevant auditory onsets, and when irrelevant visual onsets occur far in the periphery.

Both exogenous and endogenous orienting have been studied by employing variants of the Posner paradigm (Posner and Cohen 1984). Typically, to be detected or discriminated visual targets occur at two possible positions, and either an abrupt visual onset precedes the target, being unpredictive about the forthcoming target position, or an arrow points to the likely locus of the target. Responses are mostly faster and more accurate when the target occurs at the cued position than when it appears elsewhere. The time courses of exogenous and endogenous orienting effects are different. The orienting effect evoked by abrupt onsets reaches a maximal benefit with a stimulus onset asynchrony (SOA) between cue and target of about 200 ms, but it disappears or even reverses to become detrimental for SOAs longer than 500 ms. Endogenous orienting effects develop more gradually, reaching an asymptote with an SOA of approximately 300 ms (Cheal and Lyon 1991), and remain stable with SOAs of up to 1200 ms (Müller and Rabbitt 1989). In addition, some studies suggest that attention may be less concentrated in the case of endogenous orienting than in the case of exogenous orienting, as no expectancy effects were observed on target detection times when expected and unexpected target locations occurred within the same hemifield (Hughes and Zimba 1985; but also see Cepeda et al 1998; Scharlau 2004), whereas varying the size of the exogenous cues within the relevant hemifield had profound effects on target discrimination times in other studies (see Van der Lubbe and Woestenburg 2000; Van der Lubbe and Keuss 2001).

Recent studies (McDonald and Ward 2000; Schmitt et al 2000; Spence and Driver 1997) have demonstrated that not only the locus of visual onsets, but also the locus of auditory onsets affect responses to visual targets. The latter effect is denoted as a crossmodal orienting effect to indicate the transfer of the orienting effect from one modality to another modality (here from auditory to visual). To obtain clear exogenous orienting effects of auditory stimuli, however, rather wide separations between cued and uncued stimuli have to be employed, as spatial resolution is also poorer than in vision (Julesz and Hirsh 1972; Spence and Driver 1996). Interestingly, with regard to the origin of this crossmodal orienting effect, the study by McDonald et al (2003) indicated that it may result from feedback from spatial representations in multimodal superior temporal cortex to ventral visual areas.

Initially, exogenous orienting was thought to take place automatically (Jonides 1981; Müller and Rabbitt 1989) whereas endogenous orienting by definition was considered to be under strategic or top-down control. For example, Müller and Rabbitt (1989) presented advance symbolic cues that indicated the probable locations of to be discriminated targets and observed that endogenous orienting effects by symbolic cues were interrupted by exogenous orienting effects induced by random onsets, which accords with the view that exogenous orienting is fully automatic. Later studies, however, reported that onsets do not capture attention when participants are in a highly focused attentional state (Theeuwes 1991; Yantis and Jonides 1990). Specifically, in Theeuwes’ study, exogenous orienting effects of irrelevant onsets (at about 4°) were no longer present when a central arrow, presented in advance, reliably indicated the location of a target letter in a four-letter display. This finding led Theeuwes to the suggestion that “ ... outside the focus of attention, abrupt transients are not capable of attracting attention” Footnote 1. This view became generally accepted, and was extended to the account of contingent capture by Folk et al (1992), which assumes that attentional capture is completely under top-down control. Although this may be true with regard to unimodal visual settings (but see our results), the question may be raised as to whether this also applies to crossmodal settings with irrelevant auditory onsets and to-be-attended visual stimuli.

Regarding the relation between endogenous and exogenous orienting within the auditory domain, Treisman (1960) showed that when attention is directed at one ear, there may still be breakthrough of information from the ear to be ignored, indicating that endogenous orienting does not prevent advanced processing from to-be-ignored auditory onsets. With regard to crossmodal settings, Spence et al (2000; experiment 1) revealed that when visual attention was focused at a specific location to read lip movements, there was a drop in shadowing performance (repetition of an auditory message) when an irrelevant auditory stream suddenly changed its position. As a consequence, it appears that we may become distracted by the locus of an irrelevant auditory event when our visual attention is focused elsewhere, which also seems to imply that endogenous orienting does not suppress auditory exogenous orienting effects. Apart from spatial orienting effects, auditory stimuli are known to increase our alertness (e.g. see Fernandez-Duque and Posner 1997), and these effects may also be modulated by endogenous orienting, or, the other way round, it may be more difficult to keep attention focused in the case of highly alerting stimuli.

Two experiments were performed in which we further examined the relation between endogenous and exogenous orienting in both unimodal visual settings and crossmodal settings with to-be-discriminated visual targets and irrelevant visual or auditory onsets. We employed a discrimination task as cueing effects in this task are more easily interpretable than in detection tasks, as those effects may be due to changes in decision criteria (see Shaw 1984). We additionally examined whether alerting effects of auditory and possibly visual events are modulated by endogenous orienting. Visual targets and irrelevant auditory or visual onsets appeared to the left or the right of fixation. Arrows (always valid) or warning cues preceded visual targets. On most trials, targets were preceded by either irrelevant auditory or irrelevant visual abrupt onsets, but on other trials no onsets were presented, thereby enabling the establishment of alerting effects. The time intervals from arrow onset until target onset, and from abrupt onset until target onset were chosen at 1000 and 200 ms, which seem appropriate for studying endogenous and exogenous orienting effects. In our first experiment, targets and onsets were presented far to the left and the right to assure that we would obtain crossmodal orienting effects in the conditions without top-down control (in the case of warning cues).

Experiment 1

Methods

Participants

Informed consent was obtained from 17 students, who were paid €14 for their participation. The data from one of them were excluded, as performance was near to chance in conditions with visual cues, probably due to poor vision, which left 16 participants (mean age 20.4 years, five males, one left-handed). The study was approved by a local ethics committee.

Stimuli

Stimuli (Fig.1) occurred on three units (21 × 12 cm), consisting of a sound passing 8×8 green LED display (10×10 cm) in front of a loudspeaker. Units were placed at a distance of 160 cm from the participant, one in front of the participant, and two lateral units at 86 cm to the left and the right from the middle unit, implying a visual angle of 28.3°. Each trial started with a fixation dot (0.7×0.7°) on the middle unit, which was exchanged for 200 ms by a symbolic cue, either an arrow (2.1×2.9°) pointing to the left or right, or a warning cue (with an equal number of active LEDs to that used for the arrow). Eight hundred milliseconds after onset of the symbolic cue, either a visual (0.2×3.1°) or an auditory onset (a burst of white noise) was presented for 50 ms, equiprobable on the left or the right unit, or no onset occurred. A thousand milliseconds after onset of the symbolic cue the visual target (a triangle, 2.6×1.4°, pointing upwards or downwards) appeared for 100 ms, either on the side indicated by the arrow or in the case of the warning cue equiprobable at the left or right unit. The next trial started 1500 ms after a response or 2300 ms after target onset.

Fig. 1
figure 1

An example of the stimuli in the task with visual targets and visual abrupt onsets, which were presented on three units, here presented in a row (for details see “Methods”). The trial starts from the bottom, and the moment at which each event occurred is indicated along the time axis

Task and procedure

Six blocks of 192 trials were presented. Forty different trial types were constructed. Eight trial types (target type × side × symbolic cue) without onsets were presented eight times each per block, which served to obtain baselines. The same eight trial types could occur with visual or auditory onsets, presented on the target or the non-target side, being displayed each four times per block. The type of trial varied randomly within a block of trials.

Participants had to press a left or right button for a triangle pointing upwards or downwards, and were told to ignore the irrelevant onsets. Responses were to be as fast and accurate as possible and eye movements were to be avoided. Before the experimental part, participants had to indicate the side (left/right) of the auditory target, and a score of at least 95% correct was required to participate in the experiment.

Apparatus and recording

Participants were seated in an armchair in a silent and darkened chamber. Response buttons were placed on a hand-rest in front of the participant. Presentation of stimuli and the emission of triggers signaling the moment and the type of stimulus were controlled by a CMO-module (Version 3.7f, developed in cooperation with IGF, Physics department). The triggers were received by Vision Recorder (Version 1.0b, BrainProducts GmbH), which additionally measured the horizontal and vertical electrooculogram (EOG) and button presses. EOG was recorded at a rate of 250 Hz (TC=5.0 s, low-pass 100 Hz) from Ag/AgCl ring electrodes placed above and below the left eye and at the outer canthi of both eyes. The baseline was determined from −100 to 0 ms before presenting the symbolic cue. Trials with amplitudes exceeding 60 μV on the EOG channels from onset of the symbolic cue until target offset were excluded to rule out the possible contribution of eye movements.

Data analysis

Premature (<100 ms) and slow responses (>1500 ms) and errors were excluded from the RT analyses. The data were collapsed across response and stimulus side.

In a first analysis, RTs and the proportion of correct responses were evaluated with a repeated measurements ANOVA for the conditions containing onsets, with the factors symbolic cue (arrow or warning cue), modality of onset (visual or auditory), and side of onset (target or non-target) to examine whether exogenous orienting effects were modulated by top-down control, and the modality of the irrelevant onset. In a second analysis, we examined whether there were effects or interactions with the factors symbolic cue, and type of onset (no onset, visual or auditory onset) independent of the side of onset, to determine whether alerting effects of irrelevant visual or auditory onsets were affected by top-down control. Huynh-Feldt epsilon correction was applied to adjust the degrees of freedom whenever appropriate.

Results

Mean RTs are displayed in Fig.2 and the proportion of correct responses are listed in Table 1. Trials with eye movements (13.9%) were removed from further analyses Footnote 2. No premature responses occurred, whereas misses were present on 1.35% of the trials.

Fig. 2
figure 2

Mean RTs for experiment 1 in which arrows (always valid) or warning cues preceded visual targets by 1000 ms. Two hundred milliseconds before the target, either a visual or an auditory onset occurred at the target or the non-target side, or no onset occurred. Targets and irrelevant onsets occurred at 28.3° from fixation. Error bars were determined per type of onset, by employing the method for multifactor within-subjects designs advocated by Loftus and Masson (1994)

Table 1 Proportion of correct responses (in percent) as a function of modality of onset, side of onset, and symbolic cue in experiments 1 and 2

RT

The first analysis with the factors symbolic cue, modality of onset, and side of onset showed faster discrimination responses after arrows (657 ms) than after warning cues (684 ms), F(1,15)=33.6, p<0.001, faster responses when the onset was auditory (645 ms) than when it was visual (695 ms), F(1,15)=59.8, p<0.001, and faster responses when the onset occurred on the target side (655 ms) than when it occurred on the non-target side (686 ms), F(1,15)=20.4, p<0.001. Thus, top-down control was effective and the abrupt onsets induced exogenous orienting effects. However, no interaction was observed between symbolic cue and side of onset, F(1,15)=1.2, and no interaction between symbolic cue, side of onset, and modality of onset was found, F(1,15)=0.1, signifying that top-down control had no impact on exogenous orienting effects of abrupt onsets. An interaction was found between symbolic cue and modality of onset, F(1,15)=6.3, p=0.024, which indicated that the difference between trials with visual and auditory onsets was larger in the case of warning cues (60 ms) than in the case of arrows (40 ms).

The second analysis with the factors symbolic cue and type of onset (no, visual or auditory) again revealed that responses were faster in the case of arrows (663 ms) than in the case of warning cues (690 ms), F(1,15)=41.2, p<0.001, and faster in the case of auditory onsets (645 ms) than in the case of visual (695 ms) or no onsets (688 ms), F(2,30)=37.9, p<0.001. Contrast analyses revealed no difference between no onsets and visual onsets, F(1,15)=1.2, but faster responses after auditory onsets than after visual onsets or no onsets, F(1,15)>59.8, p<0.001. A trend toward an interaction was found between type of onset and symbolic cue, F(2,30)=2.9, p=0.072, which is due to a difference between trials with visual and auditory onsets that depends on the type of endogenous cue (see analysis 1).

Proportion of correct responses

The first analysis revealed that responses were less accurate in the case of visual (93.3%) than in the case of auditory onsets (95.0%), F(1,15)=11.4, p=0.004. An interaction was observed between symbolic cue and side of onset, F(1,15)=9.1, p=0.009. For warning cues an advantage of 0.95% was found when the onset occurred on the target side, but a reversed effect of 1.12% was found in the case of arrows, which could indicate that effects of side of onset on RT are partially due to a speed-accuracy trade-off. To control for this, correlations were determined between the effects of side of onset on RT and proportion of correct responses for the arrows and warning cues per modality of abrupt onset (for a comparable procedure, see Yantis and Jonides 1990). Correlations were far from significant (p values > 0.49), thus, effects of side of onset on RT were not related to effects on the proportion of correct responses.

The second analysis with the factors symbolic cue and type of onset revealed a main effect of onset, F(2,30)=9.3, p=0.001. Contrast analyses showed that responses were less accurate in the case of visual onsets (93.3%) than in the case of auditory (95.0%) or no onsets (94.9%), F(1,15)>11.4, p<0.005, whereas no difference was observed between no onsets and auditory onsets, F(1,15)=0.1.

Discussion

Several conclusions can be drawn on the basis of the RT and accuracy data. First, top-down control (or endogenous orienting) as manipulated by the employment of arrows or warning cues was effective, which can be inferred from the speeding-up of responses when the target side was reliably indicated by the arrow. Second, exogenous orienting effects of to-be-ignored visual and auditory onsets were evidently present, as indicated by slower responses when onsets occurred on the non-target side. Third, alerting effects of auditory onsets were found, indicated by faster responses after auditory onsets, than after visual onsets or no onsets. Regarding the questions of interest, endogenous orienting did not eliminate exogenous orienting effects, and alerting effects of auditory onsets were not reduced due to foreknowledge of the target position. Thus, on the basis of our findings it may be suggested that irrelevant visual and auditory stimuli outside the focus of attention are still capable of attracting attention. A likely reason is that our stimuli were presented far in the periphery. For example, the potency of irrelevant stimuli to attract attention may be high at eccentric locations, as ignoring such stimuli in more natural conditions might be disastrous. Consequently, our observation that irrelevant stimuli occurring at a side to be ignored attract attention may be an exception rather than a rule. This issue was further explored in our second experiment, by presenting stimuli at a shorter distance from fixation.

Experiment 2

Methods

Methods were the same as in experiment 1, except for the distance between the two peripheral units and the center unit, which was set at 56 cm, which implies a visual angle of 19.3°.

Participants

Informed consent was obtained from 16 students, who were paid €14 for their participation (mean age 22.1 years, three males, one left-handed).

Results

Mean RTs are displayed in Fig. 3, and the proportion of correct responses are given in Table 1. Trials with eye movements (22.5%) were removed from further analyses. No premature responses occurred, whereas misses were present on 1.72% of the trials.

Fig. 3
figure 3

Mean RTs and error bars for experiment 2 in which arrows (always valid) or warning cues preceded visual targets by 1000 ms. Two hundred milliseconds before the target either a visual or an auditory onset occurred at the target or the non-target side, or no onset occurred. Targets and irrelevant onsets occurred at 19.3° from fixation. Error bars were determined per type of onset, by employing the method for multifactor within-subjects designs advocated by Loftus and Masson (1994)

RT

The first analysis for the trials with onsets, with the factors symbolic cue, modality of onset, and side of onset, confirmed that responses were faster in the case of arrows (643 ms) than in the case of warning cues (659 ms), F(1,15)=7.3, p =0.017, and faster when the onset was auditory (633 ms) than when it was visual (669 ms), F(1,15)=21.5, p<0.001. Responses were faster when the onset occurred on the target side (632 ms) than when it occurred on the non-target side (670 ms), F(1,15)=68.9, p<0.001, being unaffected by symbolic cue, F(1,15)=0.6. The effect of side of onset tended to be larger with visual cues (47 ms) than with auditory cues (28 ms), F(1,15)=3.6, p=0.076, but no other effects were found, F values < 0.7.

The second analysis, with the factors symbolic cue and type of onset (no, visual or auditory), again revealed that responses were faster in the case of arrows (651 ms) than in the case of warning cues (668 ms), F(1,15)=11.6, p=0.004, and suggested that responses were faster in case of auditory onsets (633 ms) than in case of visual (669 ms) or no onsets (678 ms), F(2,30)=24.4, p<0.001. Contrast analyses confirmed that responses were faster after auditory onsets than after no onsets or visual onsets, F(1,15)>21.5, p<0.001, whereas no difference was found between visual onsets and no onsets, F(1,15)=1.4. No interaction was found between type of onset and endogenous cue, F(2,30)=0.4.

Proportion of correct responses

The first analysis for the trials with onsets showed that responses were more accurate in the case of arrows (95.6%) than in the case of warning cues (94.2%), F(1,15)=7.5, p=0.015, but no other effects were observed, F values < 1.4. The second analysis with the factors symbolic cue and type of onset revealed a trend effect of symbolic cue, F(1,15)=4.0, p=0.064, which reflects the effect of the first analysis. No significant effect of type of onset was found, but a slight trend toward an interaction between type of onset and symbolic cue was present, F(2,30)=2.7, p=0.087. However, additional contrast analyses indicated that the effect of symbolic cue was no different for visual onsets (1.6%) and auditory onsets (1.2%) than for without onsets (0.0%), F(1,15)<2.6, p>0.13.

Discussion

The results replicate the findings from experiment 1, and extend them to settings with stimuli at a shorter distance from fixation. Top-down control was effective, irrelevant visual and auditory onsets induced exogenous orienting effects, although the crossmodal exogenous orienting effect tended to be smaller than the unimodal exogenous orienting effect. In addition, an alerting effect of auditory stimuli was found. Importantly, again no influence of top-down control on exogenous orienting and alerting effects was found.

General discussion

The common view on the interplay between endogenous and exogenous spatial attention holds that abrupt onsets occurring outside the current focus of attention are not capable of attracting attention (Yantis and Jonides 1990; Theeuwes 1991). The question was raised as to whether this also applies to crossmodal settings with irrelevant auditory onsets, and whether it holds for unimodal settings with stimuli occurring in the peripheral visual field.

In two experiments, to-be-ignored auditory onsets clearly induced a crossmodal exogenous orienting effect, demonstrated by an effect of their locus on the discrimination speed of visual targets. This was not only the case when the auditory onsets occurred at 28.3° from fixation (experiment 1), but also when they appeared less peripheral, at 19.3° (experiment 2). In both experiments, top-down control of spatial attention (endogenous orienting), as manipulated by presenting either reliable arrow cues or warning cues, was effective, but the crossmodal orienting effect was unaffected by this manipulation. So, a first conclusion that can be drawn is that top-down control does not eliminate crossmodal exogenous orienting effects arising from to-be-ignored auditory stimuli.

Unexpectedly, the same pattern of results was obtained in the case of to-be-ignored visual onsets. In both experiments, irrelevant visual onsets induced strong exogenous orienting effects, which were unaffected by top-down control. This was not only the case for stimuli far in the periphery but also when they occurred nearer to fixation, which indicates that this capture effect of to-be-ignored stimuli is not exceptional. This raises the question of why our findings are different from the findings reported by Theeuwes (1991) and Yantis and Jonides (1990). A crucial aspect seems to be that our stimuli were displayed at more peripheral locations than usual, which we chose because of the poorer spatial resolution in audition. It may be argued that potentially threatening stimuli in the periphery require a very rapid adaptation, and because of that, exogenous orienting effects of peripheral stimuli may be much stronger than of centrally presented stimuli. Alternatively, the effectiveness of top-down control may be much stronger near fixation (for example, up to 6°) than far from fixation (for example, >19°), as attention may be spread over a much larger region (but see Hughes and Zimba 1985). Nevertheless, our findings indicate that the view that abrupt onsets outside of the focus of attention are no longer capable of attracting attention is not generally true. Another possibly relevant aspect is that we did not employ a search task, which implies that abrupt onsets in our experiments may be more salient than in the studies by Theeuwes (1991) and Yantis and Jonides (1990). Clearly, further experiments seem to be required to determine the limits of top-down control of spatial attention.

A relevant aspect for the study of top-down control concerns the possibility that the arrows, as employed in the current study but also in the studies of Theeuwes (1991) and Yantis and Jonides (1990), not only produce endogenous orienting effects, but also exogenous orienting effects. Namely, some recent studies observed that unpredictive arrows induce automatic orienting effects, which are likely due to overlearning of the meaning of these symbolic cues (Hommel et al 2001; Tipples 2002). It appears that this potential effect played only a minor role in the current study, as our exogenous orienting effects of irrelevant onsets were the same for arrows and for warning cues. Nevertheless, future studies may exclude this potential problem by employing other cues. For example, use of a diamond cue consisting of a green and a red triangle, each pointing to one side, with the instruction to attend to the side indicated by the red or the green triangle in different conditions (see Nobre et al 2000) avoids this problem.

With regard to alerting effects of auditory stimuli, both experiments revealed that responses were faster in the case of irrelevant auditory onsets than in the case of irrelevant visual onsets or no onsets, which was not at the cost of accuracy. This improvement in performance with auditory onsets seems not due to the use of foreknowledge of the moment of target onset (endogenous temporal orienting; see Coull et al 2000), as no improvement was found with visual onsets. Nevertheless, we cannot exclude the possibility that the effect obtained is partially dependent on the predictive value of the auditory onsets. Most importantly, the data from both experiments indicate that the speeding up of responses after auditory onsets was independent of top-down control, or, put differently, there was no support for the notion that top-down control was distorted due to effects of auditory stimuli.

In conclusion, neither crossmodal nor unimodal exogenous orienting effects of peripheral auditory and visual onsets were modulated by endogenous orienting. These findings indicate that irrelevant auditory and visual events outside the focus of our attention are still capable of attracting our attention.