Introduction

It is common for humans engaged in demanding tasks to deploy their attentional resources to a single stimulation channel while putatively withdrawing resources from potentially competing channels (e.g. stimuli from irrelevant parts of space or other sensory modalities). This deliberate deployment of attention is known as endogenous (voluntary) attention. Implicit in the concept of endogenous attention is the exercising of an intention to selectively attend, based on some internal representation of what will be attentionally relevant in the near future. This intention interacts with attention deployment systems to reorganize the attentional set of the brain in preparation for incoming stimuli—a preparatory attentional state. Understanding the nature of such preparatory states and how top-down mechanisms both establish and maintain these states will be critical to our understanding of attention mechanisms in the human brain. Recent evidence strongly supports the contention that the brain can be placed in biased attentional states in anticipation of attentionally relevant stimuli (e.g. Luck et al. 1997; Foxe et al. 1998; Chawla et al. 1999; Kastner et al. 1999; Shulman et al. 1999; Corbetta et al. 2000; Hopf and Mangun 2000; Hopfinger et al. 2000; Nobre et al. 2000; Worden et al. 2000; Fu et al. 2001) and that a network of frontal and parietal areas may give rise to such biases (e.g. Posner and Petersen 1990; Corbetta 1998; Kastner et al. 1999).

A central question is the neural level at which such biased preparatory states occur. For instance, early sensory areas could be placed in a more receptive (or less receptive) state in preparation for subsequent selective processing and this would likely manifest as baseline shifts in activity levels in early sensory areas during the preparatory phase. Alternatively, preparatory biasing might occur in higher level “control” areas, which in turn would interact with earlier sensory areas when the subsequent imperative stimulus arrived; a more dynamic interactive process than the simple sensory gain mechanism proposal above. Such a biased state would manifest as increased activity during the preparatory phase in “higher” level areas, such as regions of the frontal and parietal cortices. Of course, these two mechanisms are not mutually exclusive and both may occur.

The current study was designed to address this issue of preparatory attentional biasing in the context of an inter-sensory selective attention paradigm. Our primary goal was to investigate cue-specific deployment of inter-sensory selective attentional processes that were specifically invoked when attention was deployed to either the visual or auditory sensory modality. We used symbolic cue-words to direct attention to either the visual or auditory portions of a subsequent compound audio-visual imperative stimulus. High-density electrical mapping of the brain activity in the period following the cue but preceding the imperative stimulus allowed us to investigate the brain mechanisms and neural regions involved in the process of attentional deployment and the resultant biased brain sets that lead to selective stimulus processing.

Behaviorally, inter-sensory selective attention results in enhanced performance (faster reaction times) in the selectively attended sensory-modality, compared to conditions when attention is distributed between two sensory modalities (e.g. Spence and Driver 1997). This performance gain suggests that stimuli of a selectively attended sensory-modality have access to increased neural processing. Evidence from both electrophysiological (e.g. Hillyard et al. 1984; Hackley et al. 1990; Alho et al. 1992, 1994; Woods et al. 1992, 1993; de Ruiter et al. 1998; Eimer and Schroeger 1998; Teder-Salejarvi et al. 1999) and hemodynamic imaging (e.g. Woodruff et al. 1996; Alho et al. 1999) studies suggest that this is indeed the case and further, that stimuli in the unattended sense may receive less processing (e.g. Roland 1982; Alho et al. 1994; Haxby et al. 1994; Fiez et al. 1995; Kawashima et al. 1995). The nature of such stimulus processing effects allows us to make predictions about the nature of preceding anticipatory effects. If biasing attention manifests as a baseline shift in activity levels within early sensory areas, one might then predict that subsequent selective attention effects would be seen as simple amplitude modulations of early ERP componentry. Most currently available data argue against this possibility pointing to the more dynamic interactive processes proposed above. That is, the earliest component of the visual evoked potential (VEP), the C1, does not appear to modulate with attention but rather, complex differential effects upon the subsequent ERP components are seen depending upon the particular attention paradigm engaged (e.g. Mangun and Hillyard 1988; Martinez et al. 1999). Effects of cued shifts of inter-sensory attention upon subsequent selective processing of the imperative stimulus are reported in the second article in this series (Foxe and Simpson 2005), where they are related to the preparatory processes described here.

In the present report, we concentrate on the deployment phase. We envisage attentional deployment as a three-stage process. First, the subject must encode the cue stimulus, then having extracted the instruction; the system must be selectively deployed. Lastly, once deployment has occurred and a biased state has been achieved, the system must remain or be sustained in the biased state until the imperative stimulus arrives and is selectively processed. The general timecourse of such processes has been shown in intra-sensory studies providing a temporal framework for the current investigation (e.g. Worden et al. 2000).

Materials and methods

Subjects

Twelve (four females and eight males), right-handed (except for one male) volunteers participated. All subjects reported normal or corrected-to-normal vision, no current psychiatric diagnosis or history of head injury, provided written informed consent and received an hourly fee for their efforts. Ages ranged from 20 to 34 (mean=26, SD=4.9).

Task

During the experiment, each trial consisted of a sequential pair of stimuli (S1 and S2). The first stimulus (S1) was a central, visually presented cue-word, which instructed the subject which modality (auditory or visual) was to be attended to when the second stimulus (S2) arrived (Fig. 1). The S2 was a compound auditory-visual stimulus and subjects performed a go/no-go discrimination task on this S2 in the cued modality. Subjects responded with a speeded button push to the occurrence of target stimuli in the cued modality and withheld responses to the absence of such targets. The cue-words were BEEP, FLASH and PAUSE (red, 57 ms duration, 4.7×1.6° of visual angle) and were displayed in random order across trials on a computer screen, 0.25° above a yellow fixation cross on a black background. These cue-words informed the subject which task was to be engaged in when the next stimulus (S2) occurred: FLASH=visual, BEEP=auditory, PAUSE=no S2. The use of a compound S2 whereby subjects received simultaneous auditory and visual stimulation insured that the S1 information must necessarily be used for successful task completion. The S2 consisted of bilateral red disks (3.75° in diameter, centered 4.2° left and right of fixation) and binaural pairs of 2000 Hz or 2050 Hz tones (80 dB SPL; TDH39 headphones). On 80% of trials the two disks were identical to each other and were not to be responded to (same, no-go). Similarly, on 80% of trials the auditory tone stimuli were identical in both ears and were not responded to. Visual targets (20% of FLASH-cue trials) involved a 0.25° upwards and outwards shift in position of either one of the disks. Auditory targets (20% of BEEP-cue trials) involved pairing a 2000 Hz tone in one ear with a 2050 Hz tone in the other ear Footnote 1. Note that due to the randomization of target distributions within a run, on a very small proportion of trials, double targets (i.e. targets in both the auditory and visual sensory modalities) were presented. Such bisensory targets were not analyzed.

Fig. 1
figure 1

a Stimulus configuration. Cue word (S1) onsets at 0 ms, followed 1085 ms later by the compound S2, consisting of bilateral visual (V) disks and tones (A) presented to left (L) and right (R) ears. The BEEP cue indicates an attend-auditory condition and the FLASH cue an attend-visual condition. The inter-trial interval between S2 and the next S1 was 1915 ms. b Electrode montage. The 128-channel electrode montage (green disks), co-registered with a three-dimensional reconstruction of the head surface from one subject’s structural MRI data

The timing of the compound S2 was as follows. The visual disks appeared for a duration of 280 ms, and 140 ms after their onset, a 60 ms tone was presented in one ear, followed 20 ms later by a tone in the opposite ear. The stimulus-onset-asynchrony (S1–S2) was 1085 ms and the inter-trial-interval (S2–S1) was 1915 ms. Subjects were instructed to always maintain fixation on the central cross, regardless of which attention condition was currently being engaged in.

A passive control condition was run in advance of all active cueing conditions. The three cue-words (BEEP, FLASH and PAUSE) were presented in random order with a 1085 ms SOA (the same as used in the active condition). Subjects were uninformed as to the future significance of these words and were simply instructed to maintain fixation. This allowed us to assess the physiologic response to each of the cue-words when they did not serve as a predictor of an upcoming attention condition and although arousal state of the subject must be an important caveat, this provided a means to assess the activity in the attend condition that was due to added information processing. We had to exclude the passive condition for one of the 12 subjects due to excessive blinking and movement artifact.

Particular attention was paid to issues of fatigue and general state changes by explicitly briefing subjects as to this possibility before the experiment began. Runs consisted of 69 S1/S2 pairs giving a purposefully short individual run length of only 2.3 min. Subjects were required to take a short break between runs and encouraged to take longer breaks and to leave the testing room when they felt that concentration might be lapsing. Testers, to ensure that concentration lapses were kept to a minimum, constantly debriefed subjects. Between 10 and 24 blocks were run on each subject.

Measurements

High-density EEG recordings (filtered with a pass-band of DC to 100 Hz; digitally sampled at 1000 Hz; impedances <5 kΩ) were acquired from 128 scalp electrodes referenced to an electrode on the nose (average inter-electrode spacing of approximately 2.4 cm; see Fig. 1). Trials with eye movements and blinks were rejected offline on the basis of horizontal and vertical DC electro-oculogram. An artifact criteria of ±60 μV was used at all other electrode sites to reject trials with excessive EMG or other noise transients. Accepted trials were epoched separately for the S1-cues (−100 ms pre-stimulus to 1100 ms post-stimulus) and S2 (−100–600 ms) stimuli. The baseline was defined as the mean voltage from 50 ms before to 20 ms after the onset of S1. Separate averages were made for two of the S1 stimuli (cue-BEEP and cue-FLASH) in both passive and attention conditions. The “PAUSE” cue was not analyzed for the current report. Responses to S2 stimuli are reported in the accompanying paper (Foxe and Simpson 2005).

Topographic maps

Scalp current densities (SCD) (second spatial derivative of the potential) were computed from the spherical spline interpolation of the surface voltage recordings (Perrin et al. 1987), with the spline smoothing coefficient λ set at 1×10−6 . This emphasizes local contributions to the surface map, providing for better visualization of approximate locations of intracranial generators. From the surface spline-interpolated SCD data, a set of SCD-waveforms were derived at standard “International extended 10–20 system” scalp sites (American EEG Society, 1991) and it is upon these data that all analyses were conducted. All waveforms plotted in this paper are SCD-waveforms rather than voltage waveforms.

Analysis strategy

There are two stages to the present analysis. First, we compare the responses to the cue-word stimuli when they were passively viewed versus when they served as informative cues. This allows us to assess cue-related activity that is general to both cue types (both the auditory and visual instruction) and to assess general attentional deployment mechanisms. The second stage in our analysis, and the one of primary interest, involves assessing cue-specific activity; that is, the differences in cue-related activity associated with deploying attention to one or the other sensory modality.

The first stage of analysis compared the S1-cue responses during passive versus cueing conditions. While any passive viewing condition is cognitively non-specific, in this study it provides a general referent for the active cue conditions. The differences between passive viewing and active cueing conditions reflect the attentive processing of the stimuli in general, which can include differences in arousal and stimulus/task related processes. Primarily, this comparison allows us to dissociate cue-related deployment processes from simple sensory processing of the physical stimuli themselves, and to assess the timing of onset of such cue-related activity. In turn, this comparison allows us to examine the temporal relations between cue-related and cue-specific processes. For example, it is possible that both the general and sensory-specific attentional processes onset at similar times and are accomplished in parallel. Alternately, general attentional deployment might precede more specific sensory biasing mechanisms.

Further, as there has been no previous high-density ERP study of the deployment of inter-sensory attention, the comparison of passive and cueing conditions allows us to broadly define the general ERP pattern of cue-related activity across the scalp. We wished to first outline the timecourse and general scalp topography of the various stages of attentional deployment without regard for the main dependent measure, i.e. the difference between visual and auditory deployment. We had clear hypotheses about the involvement of frontal and parietal areas in this attentional deployment. We also had clear a priori reasons for expecting sustained differences over fronto-central scalp and parieto-occipital scalp during the late sustained biasing period based on previous within-modality investigations of the deployment of selective attention. It should be noted, however, that all previous studies of attentional deployment have assessed voltage data with its inherent spatial smearing (i.e. volume conduction effects). Here, our statistical analyses are conducted on Laplacian data and as such, the pattern of activity could not easily be predicted based on prior data.

Our second goal and the main emphasis of this report was to define the cue-specific effects during deployment; that is, we wanted to interrogate the cue-related processes for sensory-specific modulations. Our strategy for determining the scalp regions and latency bins within which to conduct our statistical tests was as follows. For previously well-defined components such as the C1 and P1, clear windows and scalp distributions were predicted from the plethora of previous studies conducted on these components. For the later phases of cue-related activity, such previous studies do not exist. Component structure within these later processing phases was assessed by visual inspection of the group-averaged data where all conditions were collapsed together (i.e. without regard to the dependent measure) during the comparison of cueing and passive conditions as outlined above. This allowed us to identify peak latencies and durations as well as maximal amplitude scalp regions for given processing stages without reference to our dependent measures (see Wylie et al. 2003). The latency window within which we made our dependent measures for a given component/processing phase was determined based on the observed onset/offset duration of that component. The latency windows and scalp sites that were used emphasized the peaks of activity, and thus were dependent upon the duration and spatial extent of the component processes. This strategy resulted in five major processes that could be dissociated through topographic mapping.

Repeated measures analysis of variance (ANOVAs) were used to statistically test for effects within predetermined latency bins. The factors were Attention condition (Auditory versus Visual), Hemisphere (right versus left) and Electrode (three electrodes at homologous locations on each side of the scalp were chosen that best represented the maximal topography of the component of interest in a given analysis). The data at each electrode site were averaged across the appropriate epoch, and the area under the waveform (versus the 0 μV baseline) during this epoch was calculated for each of the six electrodes in each of the three trials. These area measures were then used as the dependent variable.

A more directed strategy was used to test the earliest sensory evoked potentials (the C1 and the P1), a strategy that accounted for individual differences in topography. We have shown previously (Foxe and Simpson 2002; also Clark et al. 1995) that these components have quite variable topographies across subjects, in large part due to the high inter-individual variability in the cortical geometry of early retinotopically mapped visual cortices. Thus, in order to maximize power for tests of these components, the latency of peak amplitude was determined at each component’s maximal amplitude site for each individual subject. A 9 ms window was then derived (peak latency plus four digitization points before and four after). From these windows, an area beneath the curve (with reference to the 0 μV baseline) was derived and this served as the measure to be used in ANOVA analyses of attention effects as above.

Lastly, we were also interested in estimating the onset of differential cue-specific activity and the contribution to this onset phase from frontal and parietal cortices. Therefore, we also conducted a third exploratory stage of analysis where we used the running t-test method of Guthrie and Buchwald (1991) to explore the general timeframe over which such differential processing occurred (see Murray et al. 2002 for further explication of this strategy).

Results

Behavioral results

A percent correct score was computed for 10 of our 12 subjects (response data were corrupted for the other two subjects). Average hit rate was 92.9% (±8.4) for the visual S2 targets and 89.5% (±6.7) for the auditory targets. A paired t-test showed no significant difference in accuracy between tasks.

General waveform description

Figure 2 shows representative SCD waveforms from two scalp sites at Cz (vertex) and Oz (central-occipital) that illustrate the overall temporal structure of events in these data. Between the onset of the cue at 0 ms and the subsequent arrival of the S2 at 1085 ms, the three phases of the cue response are in evidence over the occipital scalp (at Oz). The early phase of this response, representing the sensory encoding of the cue, is marked by the typical rapid transient componentry of the VEP. These are the initial negative deflection (C1; ~85 ms), followed by a positive deflection (P1; ~130 ms), a negative deflection (N1; ~190 ms) and then again by a positivity (P2; ~240 ms). The second phase of the cue response is characterized by slower transient processes, and begins during the P2 component. This gives way to a third phase of sustained activity, which persists until the arrival of the S2 stimulus (dashed yellow line).

Fig. 2
figure 2

Representative source-current-density (SCD) waveforms (subject G.F.) from the vertex (Cz; top panel) and mid-occipital (Oz; bottom) scalp sites are shown. S1 onsets at 0 ms and S2 at 1085 ms (yellow dashed line). Darkened spots in inset cartoon-heads (left side view and occipital view, respectively) indicate the locations of the Cz and Oz scalp sites. The green waveform depicts the response when the subject was cued to the visual modality (FLASH) and the red when he was cued to the auditory modality (BEEP)

Following the onset of S2, another negative deflection at Oz indicates the arrival of the visual portion of the S2 in visual cortices. The auditory portion of the S2 is reflected by a prominent auditory N1 response at Cz, consistent with the projection of auditory cortices to fronto-central scalp. The attention effects that can be clearly seen upon these visual and auditory responses to S2 stimuli will be discussed in the second of these two articles (Foxe and Simpson 2005).

Passive versus active cue conditions: cue-related processes common to the attend-auditory and attend-visual instructions

In this first stage of our analyses, we compare the responses evoked by the cue-word stimuli when they were passively viewed (before subjects were informed of the nature of the subsequent attention task) to the responses evoked when these stimuli were subsequently viewed as informative cues during the inter-sensory attention task. As expected, the responses to the cue-word stimuli (BEEP and FLASH) differ dramatically depending on whether they were viewed as symbolic cues or observed during the passive viewing condition when the subject was naive to the coming import of the words. After the initial transient componentry associated with sensory processing of the stimuli, the responses to passively viewed cue-words are devoid of activity during both the middle and late phases of the response window, whereas the actively viewed cues show robust activity in these two phases. These striking cue-related processing effects are shown in Fig. 3, which shows representative responses from one subject over occipital and occipito-parietal scalp sites. Figure 6 shows the group-averaged responses over pre-frontal scalp sites, where again, large-scale differences between passive and active cue processing are evident.

Fig. 3
figure 3

Representative SCD responses to the S1 stimuli are shown for the passive condition (black and white) and active cueing condition (red and green) from a parieto-occipital selection of scalp sites (subject J.F.). Large differences in the extent of processing that the word stimuli receive when they serve as cues are evident, particularly as a large sustained positivity at scalp sites PO1 and PO2

Statistical testing in this stage of the analyses was limited to establishing when the cue-related processing differences began over posterior and frontal scalp regions. We therefore identified the early components of the VEP (C1 and P1) to test for early sensory processing differences over visual processing regions. We also identified the first observable frontal component for statistical testing by visually inspecting the responses over the frontal scalp. These two tests are detailed below.

Early occipital effects

The earliest visual sensory response (the C1 component), was seen as a negativity over midline parieto-occipital scalp sites (mean peak latency=85 ms, SD ±3.7). It was similar in amplitude in both the passive and cueing conditions and also between cue type. The following positivity (P1: 139 ms, ±9.6), however, showed a large processing related divergence when the words served as cues as opposed to when they were passively viewed (Fig. 4). We tested the significance of this effect with a two (attention-condition; Attend, Passive)×2 (cue-type; Beep, Flash)×2 (ERP-component; C1, P1) repeated measures analysis of variance (ANOVA). Footnote 2 A significant Component×Attention-condition interaction was found (F(1,10)=32.90, P<0.001; see Fig. 4a). Follow-up 2 (Condition)×2 (Cue-type) ANOVAs showed that there were no significant differences in the C1 component either between viewing condition (P=0.53) or between cue-type (P=0.53) whereas the P1 component was significantly larger for the attend condition than for the passive condition [F(1,10)=42.34, P<0.0001] but was not different between cue-types (P=0.85). The central C1 topography is consistent with generators in the posterior and mid-occipital early visual areas and the subsequent P1 topography is consistent with additional activation of higher-level dorsal and lateral occipital visual areas (Fig. 4b).

Fig. 4
figure 4

Early cue-related enhancement of visual processing. a Group-averaged SCD waveforms (scalp site PO2) show no difference in C1 amplitude but significant enhancement of the P1 component between active cueing and passive viewing conditions. Mean areas for both the C1 (left four bars; plotted positively for illustration purposes) and P1 (right four bars) components are shown (standard errors indicated). b SCD topographic maps (occipital view) of the passive (top maps) and active (bottom maps) conditions at the peak of the C1 (left) and P1 (right) components

Early frontal effects

During both cue-conditions (BEEP and FLASH), a left fronto-central positive component emerged in the latency range of 120–250 ms with a peak at approximately 180 ms. This component was virtually absent in the passive condition (Fig. 5a). A measure of area beneath the curve was derived for a 30 ms window spanning the 180 ms peak (165–195 ms) of this component at the two scalp sites that best represented the centre of this focus (AF5 and F7). A 2 (attention versus passive condition)×2 (cue-type)×2 (scalp site) repeated measures ANOVA revealed a main effect of condition (F(1,10)=8.5957, P=0.015) with the active cueing condition displaying more positivity in this latency range than the passive viewing condition (Fig. 5b). No effect of cue-type was found for this component [F(1,10)=1.0762, P=0.324].

Fig. 5
figure 5

Early Cue-related enhancement of frontal processing. a Group averaged SCD waveforms from two left frontal scalp sites (AF5; yellow disk and F7; green) reveal a positive component enhancement during the cueing condition (green and red traces) versus the passive condition (black and white; n=11). Mean areas for a 30 ms window (165–195 ms) for both the AF5 (left four bars) and F7 (right four bars) scalp sites shows significantly greater positivity for both the BEEP-cue (red bars) and FLASH-cue (green bars) when they served as cues than when they were passively viewed (black and white bars) but no difference between cue type. b The SCD mapping reveals the topography of this component at the peak of its activity (180 ms; dashed yellow line in waveforms). Similar topographies are seen for both the BEEP cue (left pair of maps) as for the FLASH cue (right pair of maps). Note that both a frontal and left side view are plotted for both cues (no maps for passive condition are displayed: blue isocontours, negativity; red, positivity)

Cue-related processes that differentiate the attend-auditory and attend-visual instruction

While the above cue-related processes speak to the general deployment of attention in that they did not occur during the passive viewing task, the primary goal of the current study was to investigate the cue-specific deployment of inter-sensory selective attentional processes that were specifically invoked when attention was deployed to either the visual or auditory sensory modality. Three main types of cue-specific processing were examined by analyzing the differentiation between responses to the cue stimuli in the late sustained phase of attentional deployment. We investigated (1) sustained frontal differentiation, (2) sustained fronto-central differentiation and (3) sustained parieto-occipital differentiation. We postulated that there would be enhanced processing over fronto-central scalp when auditory attention was selectively deployed, and that conversely, there would be enhanced processing over parieto-occipital scalp when anticipatory visual attention was engaged. We examined frontal scalp in this period because we postulated that as well as general sustained attention processes in frontal regions, we might expect to find cue-specific (sensory-specific) effects in this late period also, reflecting modality specific frontal control processes.

Finally, a fourth analysis was undertaken to explore the timing of the earliest differentiation in activity between cue conditions—a cue-specific process that was expected to occur during the middle deployment phase of the cue-related response. We postulated that frontal areas would show the earliest cue-related divergences related to the encoding of the cue instruction. We predicted that this frontal differentiation would give rise to sensory specific priming related to the modality being singled out for upcoming selective attention. These analyses are discussed in the following four sections.

Sustained late frontal differences

Sustained late frontal activity was seen in the responses to both BEEP and FLASH when they served as cues rather than when they were passively viewed and these late sustained responses differed as a function of cue-instruction (attend-auditory or attend-visual; Fig. 6).

Fig. 6
figure 6

Sustained cue-related enhancement of frontal involvement. Group averaged (n=11) SCD waveforms from six frontal/prefrontal scalp sites are shown. There is relatively little activity over the frontal and prefrontal areas during passive viewing (black and white traces) but during the active cueing condition (red and green traces) larger amplitude activity is evident in both the early processing and late sustained periods

During both cue conditions, bilateral orbito-frontal positivities were seen (Fig. 7). These positivities were of greater amplitude to the BEEP-cue (attend auditory instruction). We tested the significance of these differences by deriving an area measure for a 350 ms window (730–1080 ms) immediately preceding the onset of the S2 at the two scalp sites that best represented the focus of activity in both hemispheres (left hemisphere=AF5 and FP1; right=AF6 and FP2). A separate two (cue-condition)×2 (scalp site) repeated measures ANOVA was performed for each of the two positivities and a main effect of condition was found for both the left [F(1,11)=5.47, P=0.039] and the right [F(1,11)=10.0, P=0.009] foci.

Fig. 7
figure 7

Frontal dissociation between attention conditions. Group averaged (n=12) SCD waveforms from two anterior-frontal (AF1 and AF2) and two fronto-polar (FP1 and FP2) scalp sites with SCD maps plotted at latency 1070 ms (yellow dashed lines) for each attention condition. Sustained late positive enhancements are seen bilaterally at fronto-polar sites when subjects were cued to attend-auditory (BEEP, red traces); seen as bilateral frontal positivities in the difference map (bottom map; cue-BEEP minus cue-FLASH). When subjects were cued to attend-visual (FLASH, green traces) an enhanced anterior frontal positivity was evident; seen as a central negative focus in the difference map (blue isocontours, negativity; red, positivity)

In contrast, a midline prefrontal positive focus was seen in response to the FLASH-cue, which was markedly decreased in amplitude in response to the BEEP-cue (Fig. 7). Area measures (730–1080 ms) from three scalp sites within this central focus (Fz, AF1 and AF2) were subjected to a 2 (cue-condition)×3 (scalp site) repeated measures ANOVA. This revealed a significant main effect of cue-condition [F(1,11)=13.709, P=0.003] such that the response to FLASH was significantly more positive at these sites than the response to BEEP.

Sustained late fronto-central negativity

During both cue conditions, a sustained late fronto-central negativity, predominantly over the left hemisphere, was seen. This negativity continued up to and beyond the onset of the S2 at 1085 ms (Fig. 8a). The two scalp sites that represent the left fronto-central negativity (FC3 and FC5) and the two for the right (FC2 and FC4; note that the right focus extended more medially and not as laterally as the left) were analyzed. Area measures (730–1080 ms) from these scalp sites were subjected to a 2 (cue-condition)×2 (hemisphere)×2 (scalp site) repeated measures ANOVA which yielded a main effect of cue-condition [F(1,11)=18.662, P=0.001]. A tendency towards a condition×hemisphere interaction was noted (P=0.094). Since the right hemisphere focus appeared to be less prominent than that in the left hemisphere, we performed a pair of 2 (condition)×2 (scalp site) follow-up ANOVAs within each hemisphere. These revealed that the left hemisphere cue-condition effect was significant [F(1,11)=19.317, P=0.001], whereas the right hemisphere difference only approached significance [F(1,11)=3.685, P=0.081], strongly suggesting a hemispheric asymmetry in the late fronto-central preparatory activity.

Fig. 8
figure 8

Fronto-central dissociation between attention conditions. a Group averaged (n=12) SCD waveforms from two left (FC3-red disk, FC5-green) and one right hemisphere (FC4-yellow) fronto-central scalp sites. Cueing subjects to attend-auditory (BEEP, red traces) elicits a greater fronto-central negativity over both hemispheres than cueing them to attend-visual (FLASH, green). b SCD maps (left-side and top views) during the late phase of this sustained negative response (1070 ms; yellow dashed lines in A) are shown for both attention conditions. The difference maps (cue-BEEP minus cue-FLASH) show the topographic distribution of this negative difference (blue isocontours, negativity; red, positivity)

These fronto-central negative foci are more anterior than typically expected for the topography of generators in primary and secondary auditory cortices on the superior temporal plane, particularly as they are not strongly associated with positive current density beneath the Sylvian fissure.

Due to the proximity of this negative focus to the orbito-frontal positivity described in the preceding section, it is possible that these two foci are opposite ends of the same current generator and therefore should be treated as a unitary phenomenon. We have largely discounted this possibility for the following reasons. Observation of the time course and topography of the fronto-central negativity showed that it onset earlier (~400 ms) than the fronto-polar positivity (>600 ms). We tested this onset difference by deriving areas from two consecutive 100 ms latency windows (400–499 and 500–599 ms) and performing 2 (attention-condition)×2 (activation-focus)×2 (scalp site) repeated measures ANOVAs. In the 500–599 ms window, a significant condition × activation-focus interaction was observed [F(1,11)=7.19, P=0.021]. Follow-up planned comparisons showed that this interaction was due to the presence of an attention-condition effect at the fronto-central negative focus [F(1,11)=19.669, P=0.001] in the absence of a condition effect for this latency window over the scalp region where the orbito-frontal positivity will later appear (P=0.232). For the earlier window (400–499 ms), the condition × activation-focus interaction only tended towards significance [F(1,11)=3.627, P=0.08]. Focused contrasts, nonetheless, showed a main effect of condition for the fronto-central negativity [F(1,11)=15.979, P=0.002] but none over the orbito-frontal scalp region (P=0.467). Also, the fact that the attention effect as seen in the difference maps (Fig. 8b) is not coextensive with the foci of ongoing activity over frontal and fronto-central areas indicates the presence of multiple simultaneously active generators in these regions of the brain. However, the possibility remains that some portion of the fronto-central negativity represents the negative pole of the fronto-polar positivity.

It was also important to dissociate the fronto-central effects we found from possible motor-preparatory potentials. Preparation for a button push response to target S2 stimuli will evoke a motor-preparatory potential over motor areas preceding the arrival of the imperative S2. We found two distinct negative foci over the left fronto-central and left mid-central region (see Fig. 8). The more frontal of these, described above, had an onset (~250 ms) that was earlier than that of the central one, was sustained until the arrival of the S2 and differentiated between cue instruction. The more central negativity onset later (~400 ms) and did not distinguish between cue instruction which can be seen in the difference maps of Fig. 8b where the left central negativity has cancelled. This is to be expected, as the probability of S2 targets were equivalent between attend-auditory and attend-visual conditions, the tasks were equated for difficulty and motor preparation should therefore have been equivalent for both tasks.

Sustained late parieto-occipital positivity

During both cue conditions, a sustained late parieto-occipital positivity was seen that continued up to and beyond the onset of the S2 (Fig. 9a). Topographic mapping of the 12-subject group-averaged data shows this positivity to have a predominantly right hemispheric distribution (Fig. 9b) and inspection of individual topographies revealed a right hemisphere distribution for 10 of the 12 subjects.

Fig. 9
figure 9

Parietal/parieto-occipital dissociations between attention conditions. a Group averaged (n=12) SCD waveforms from a right hemisphere parieto-occipital scalp sites (PO2-yellow disk) and a left superior parietal site (P1-green disk). Cueing subjects to attend-visual (FLASH, green traces) elicits a greater right parieto-occipital positivity than cueing them to attend-auditory (BEEP, red), whereas cueing them to attend-auditory results in greater positivity over the left superior parietal scalp. b SCD maps (occipital view) during the late phase of the sustained parietal/parieto-occipital response (1070 ms; yellow lines in a) are shown for both attention conditions. The difference map (cue-BEEP minus cue-FLASH) shows the right the topographic distribution of these differences between conditions (blue isocontours, negativity; red, positivity)

This positivity appeared to be of greater amplitude to the FLASH cue (attend-visual) than to the BEEP cue (attend-auditory). Two scalp sites over the left (P3 and PO1) and right (P4 and PO2) parieto-occipital area best represented this sustained late activity and area measures (730–1080 ms) from these sites were subjected to a 2 (cue-type)×2 (hemisphere)×2 (scalp site) repeated measures ANOVA. A significant Cue×Hemisphere interaction was found [F(1,11)=4.828, P=0.050]. Follow-up focused-contrast tests revealed that this interaction was due to a main cue-condition effect over the right parieto-occipital region [F(1,11)=6.116, P=0.031] whereas no effect of cue-condition was seen over left parieto-occipital scalp (P=0.808).

During the investigation of late parieto-occipital sustained activity, a large difference between cue-conditions was also noted over the left superior parietal scalp, with the BEEP-cue eliciting a substantially larger sustained positivity in the late phase preceding S2 onset. A post hoc analysis tested whether this difference was significant by two-tailed paired t-test at scalp site P1 (see Fig. 9), which was in the centre of the focus of this difference. This revealed a significant effect of cue-condition [T(1,11)=5.65, P=0.0001].

Two possibilities obtain in terms of the likely neural generators of this left superior parietal positivity. On the one hand, it may represent a differential effect in left superior parietal cortices. The alternative interpretation is that it represents the opposite pole of the effect seen over right parieto-occipital scalp sites, with this unitary phenomenon having a dipole generator in the mid parieto-occipital region. The topography (see Fig. 9b) appears to suggest the former possibility as the shape of the right parieto-occipital focus suggests more than one contributing generator whereas the left superior parietal effect is quite focal, suggesting a single generator. However, both effects have a very similar time course (Fig. 9a), so we have been unable to dissociate them temporally. Importantly, observation of the topographies of the SCD maps for the unsubtracted sustained activity shows that these effects seen in the difference maps are not coextensive with ongoing activation foci. This suggests that there are parietal and parieto-occipital generators that are common to the maintenance of attention for both modalities and that only a subset of generators are actually sensitive to which modality is to-be-attended.

Earliest divergences

A series of exploratory analyses were undertaken to establish the onset of the earliest differentiation between cue types across the different cortical regions and to relate the timing of these earliest differentiations. Points of earliest divergence were determined using the following strategy. Group averaged SCD waveforms from 56 scalp sites sampled with equal spacing from the scalp region over which our 128-channel montage was distributed were visually inspected for divergence between the two cue conditions. The search for these divergences was aided by inspecting SCD “difference” maps, which allowed us to determine if a difference seen in the waveforms had a physiologically plausible distribution and stability over time. When a site was found that showed a suggestion of early divergence in the waveform comparison, the point of highest putative divergence (usually found at the peak of a component) was chosen for testing. Amplitudes for each of the 12 subjects were derived for that point for both conditions and were subjected to two-tailed paired t-tests. The component peak that immediately preceded the point being tested was also chosen for testing in order to establish a non-significant baseline. If the point chosen for investigation proved significant at the 0.05 α level, a series of t-tests were then conducted in the latency region preceding that point to determine the point at which significant divergence had occurred and if the divergence was consistent across a physiologically reasonable period of time (>10 ms). The earliest point preceding the original peak which still conformed to the 0.05 criterion was found by this means and this point was only accepted as a significant divergence if at least 11 subsequent consecutive time-points met the 0.05 criterion (for a similar strategy, see, e.g. Rugg et al. 1995; Thorpe et al. 1996; Doniger et al. 2000).

We first investigated possible differences in the classical visual ERP components N1 and P2 over the occipital and parietal scalp region. Recall that C1 and P1 do not differ between cue type (see above; P=0.53 for C1 and P=0.85 for P1, respectively). The N1 and P2 components can be seen at occipital sites (e.g. ~165 and ~225 ms, respectively, at scalp site Oz). Two-tailed paired t-tests revealed no significant cue-related differences in amplitude for these two components (P=0.337 and P=0.398, respectively) at Oz. However, at right lateral occipital scalp site O2 and the neighbouring right parieto-occipital site PO6, significant divergence was found at each of these sites which onset just following the peak of the N1 (180 ms at O2 and 177 ms at PO6) with FLASH-cue eliciting an enhanced negativity over the right scalp region. At left occipital scalp sites O1 and PO5, no N1 difference was found. This right parieto-occipital N1 difference constituted the earliest differentiation of processing for the word cues.

Next, significant divergence was found over dorso-central parietal cortex (scalp sites PZ, P1 and P2). This divergence was seen as a phasic P2-like positivity that peaked at ~250 ms (later than the P2-component recorded over central occipital sites) which showed significantly larger amplitude to the FLASH cue (P=0.002 at PZ). Onset of significant divergence was established at 240 ms (scalp site P2).

The earliest frontal divergence was found at scalp site Fz spanning a peak at 245 ms (P=0.015) but not at a peak seen at 150 ms (P=0.865). The onset of significant divergence was established as 233 ms, largely contemporaneous with the onset of differences over central parietal cortex and consistent with the activation of fronto-parietal attention networks (e.g. Kastner et al. 1999). It is noteworthy that this difference was manifest as a sustained rather than phasic difference, consistent with onset of a sustained frontal involvement in attentional maintenance.

Following this, significant divergence was seen at central sites (C2: 261 ms and C1: 292 ms) with the response to FLASH showing an increased negativity in this latency range relative to the response to BEEP.

The onset of late sustained differences, which will be discussed in the following three sections, was the next series of divergences evident in the comparison waveforms. Sustained differences over visual areas (parieto-occipital scalp) preceded the sustained differences seen over auditory areas (fronto-central scalp). A sustained late parieto-occipital positivity had a significant onset at 331 ms (site PO2, see Fig. 9). Over fronto-central cortex, significant divergence of a late sustained difference was found at 468 ms (site FC3, see Fig. 8). It is noteworthy that these late sustained differences over modality specific cortices appear to succeed earlier onset of sustained differentiation over mid-frontal cortex.

Discussion

The results of this study reveal a network of neural areas that are involved in the top-down mechanisms of deploying and sustaining inter-sensory selective attention in response to visually presented symbolic cue stimuli. This network includes sensory areas of the visual system, the posterior parietal cortex and fronto-central areas as well as lateral and central prefrontal areas. In addition to cue-related enhancement of early information processing areas, we revealed later sustained activity (330 ms to the onset of S2 at 1085 ms) that dissociated between attend-auditory and attend-visual conditions. That is, in the period preceding the arrival of an imperative S2, differential sustained brain activity was seen dependent on the modality that was being singled out for subsequent selective attentive processing. The brain had been placed in a biased attentional state. In addition, although the anticipatory period is marked for these sensory biasing effects, another striking feature of the cue-related activity is the commonality of activity for both cue-types. That is, both auditory and visual cortices are substantially activated for the presentation of each cue-type, suggesting that the attentional deployment system begins to prepare both sensory systems together rather than utilizing a simple binary gating mechanism.

Processing and encoding the cue-word

We will first discuss the early processing enhancements that are common to both the attend-auditory and attend-visual cues. These enhancements are likely to represent the intentional extraction of the relevant information from the cue-words that subsequently allow the brain to establish a biased attentional state. We addressed these early processing changes by comparing the responses to cue-words when they were actively used to inform subjects about the modality to be attended versus a naive viewing condition where subjects were unaware of the impending import of the words for directing attention. A legitimate criticism can be leveled against this approach in terms of general arousal differences between a putatively uninteresting viewing condition and an attentionally demanding cue condition. However, we found no differences in amplitude for the earliest recordable visual ERP component (C1, ~85 ms) suggesting that early sensory representation of passively viewed and actively viewed words was equivalent and that general arousal was unlikely to explain the later enhanced processing effects we saw. In contrast to the lack of differences we observed for the initial inputs to visual cortex as represented by the C1 component, a marked enhancement of the following P1 component (~140 ms) was observed for all subjects. We take this to represent attentionally mediated supplementary processing resulting from the ongoing “attentional state” of our subjects. That is, due to engagement in the cueing portion of the paradigm this enhancement reflects a more detailed extraction of information relevant to the impending direction of intersensory attention.

Over frontal scalp, we observed early activity consistent with a frontal eye-field generator that was equal in both passive and active conditions (~130 ms), consistent with a stimulus-driven response (see also Saron et al. 2001). Slightly later, however, a left dorsolateral prefrontal positivity was observed in response to both cue-words, which was virtually absent from the responses during passive viewing. This positivity onset at approximately 120 ms and peaked at 180–190 ms. One possibility is that this left dorsolateral positivity reflects the electrophysiological signature of attentive verbal encoding. A number of neuroimaging studies have reported involvement of left dorsal frontal regions in these processes (e.g. Kapur et al. 1994a, b, 1996; Demb et al. 1995; Dolan and Fletcher 1997; Kelley et al. 1998). Of course, it may also represent more general attentional deployment mechanisms for goal-directed behavior, which have been associated with prefrontal regions (see Corbetta and Shulman 2002). In this vein, a similar frontal positivity was also found in a recent task-switching experiment where we also used orthographic stimuli (Wylie et al. 2003). We show that this process is a relatively early one temporally and that it precedes more extensive prefrontal activations in the later delay period (discussed below). That is, in the later time period between the cue and imperative S2 stimuli, multiple bilateral prefrontal foci develop, suggesting that additional generators in prefrontal cortex become active.

It is noteworthy that attentive processing appears to enhance both the sensory encoding of the cue-words as reflected by the P1 effect and that this is followed by the appearance of frontal activation that may represent the enhanced verbal encoding of these words. This sequence suggests that the extra frontal encoding demanded for the cue-words in the current attention task may be predicated upon attentionally mediated enhancement of extrastriate processing.

Decoding the cue and deploying attention

In order to achieve the task demands of this paradigm, subjects must differentiate between cue-words and presumably, the earliest divergence in the responses to the cues would reflect the earliest point at which the brain is capable of differentiating them. In this vein, Thorpe et al. (1996) showed such a divergence at frontal sites at just 150 ms when subjects were required to discriminate between pictures depicting animals versus pictures that contained no animals, reasoning that this divergence must constitute a point in time, before which, the visual processing needed to achieve the task had taken place. Given the semantic nature of our cue-words, it might be expected that the earliest divergence or dissociation would be found over language processing areas or the left lateral frontal verbal encoding area discussed above. However, we found the earliest dissociation of the cue-words to occur over right occipital scalp in the timeframe of the visual N1 component (~180 ms). It is noteworthy that this effect is coincident with the peak of left frontal activity discussed above and might reflect the interaction of verbal encoding with pattern-recognition functions of inferior occipital cortex (e.g. Allison et al. 1994a, b, 1999; Doniger et al. 2000, 2001). Further, both intracranial grid-electrode recordings (Allison et al. 1994b, 1999; Nobre et al. 1994, 1998) and functional imaging (e.g. Puce et al. 1996; Uchida et al. 1999; Cohen et al. 2000) in humans have revealed a region of the inferior occipito-temporal surface to be specifically sensitive to letter string stimuli, and this area is active with a highly similar time course to that of our surface recorded N180 component. Most recently, an occipito-temporal negativity (N170) was shown to differentiate between word and non-word stimuli during both lexical decision and semantic tasks, and the authors suggested that tasks requiring “deeper” analysis of printed words might result in top-down modulations of the N170 (Bentin et al. 1999). These findings appear consistent with the current result whereby left frontal activity may modulate the posterior word-sensitive visual region.

Another possibility is that the posterior dissociation might also reflect the automation of the cue-encoding phase since there were just three possible cue words and these were seen repeatedly over a very large number of trials. That is, dissociation of the cue-words might have been achieved at the level of occipito-temporal pattern-recognition areas rather than having to be fully decoded at the semantic level by higher-tier language areas. The use of a more extensive cue set whereby subjects would be less likely to automate their responses might better address this issue.

Worth pointing out at this juncture, is that the interval between presentation of the cue-words and the imperative S2 is always ~1.1 s in the current paradigm and it is possible that if a shorter interval were used, an earlier discrimination and consequently an earlier divergence might occur. That is, it may not be necessary for the brain to dissociate the cue-words in a particularly rapid manner given the pace of the task.

Shortly after the posterior occipital dissociation (50–60 ms later), dissociations occurred at both mid-frontal and parieto-occipital sites (230–240 ms). This sequence is consistent with differential activation of frontal-parietal attentional deployment systems following the extraction of cue information. Although frontal divergence was found to precede parietal divergence by about 10 ms and to then continue in a sustained fashion, we are reticent to interpret this small timing difference as evidence for frontal control of parietal attentional mechanisms, although this interpretation would be consistent with currently held models of attentional deployment (see Mesulam 1998). These earliest differentiations likely represent the beginnings of the attentional deployment phase, which will establish the subsequent attentional preparatory state.

We suggest that the late sustained differentiations that follow these early divergences (all of which onset after 300 ms), represent this preparatory state and these will be discussed in the following sections. It is of note that the sustained effects over visual areas onset more than 100 ms prior to those over auditory areas, perhaps consistent with more direct access for visual cues to visual attentional biasing mechanisms than to equivalent auditory mechanisms.

Late sustained “biasing” activity

Three main groupings of late processes are discussed in the current treatment of these data. That is, we have concentrated on the late processes that were observed over prefrontal, fronto-central and parietal/parieto-occipital areas. All of these late processes manifest as late sustained activations that are maintained in a relatively stable configuration over a timeframe of hundreds of milliseconds preceding the onset of the S2 at 1085 ms. Such timing is highly suggestive of a sustained priming function; a putative biasing of the attentional set of the brain towards stimuli of one modality in favor of a simultaneously presented distracter modality. Also, the pattern of effects that we observe over specific modality related cortices are consistent with the modality being singled out for attention. That is, we see a sustained fronto-central negativity that is enhanced when auditory attention has been preferentially engaged and sustained right parieto-occipital positivity that is enhanced when visual attention has been selectively engaged. These sensory specific effects are accompanied by differential prefrontal activations dependent on the modality being attended, and are consistent with top-down frontal intentional driving of sensory specific biasing. We will detail the late effects in three sections; frontal/prefrontal effects, fronto-central effects and parietal/parieto-occipital effects.

Biasing signals over frontal/prefrontal areas

Engagement of auditory attention is associated with enhancement of a pair of late bilateral prefrontal positivities when compared with engagement of visual attention. Conversely, a midline prefrontal positivity is enhanced when visual attention is engaged. Hence, the prefrontal areas, having activated equivalently in the early time period (the putative encoding phase), dissociate between the two attention conditions during the later sustained activations. Thus, there appears to be differential activation of a system of frontal areas underlying the deployment and sustaining of anticipatory attention to different modalities. In addition, it may be that the frontal regions giving rise to these differential effects are comprised of modality specific sub-regions (e.g. Azuma and Suzuki 1984; Bushara et al. 1999). It remains possible that these differential effects are specific to differences in the auditory and visual tasks and are not due to the sensory modality to-be-attended although subjects‘ accuracy was equivalent for both the auditory and visual tasks. In addition, both the relative timing of these frontal activations and the presence of correlated priming effects over modality specific areas in concert with them, strongly suggests their involvement in the top-down mediation of modality-specific processing.

The current paradigm is likely to invoke a number of different cognitive functions that have been associated with frontal/prefrontal areas of the brain. That is, once the cue-information has been extracted, attention must then be switched between modalities, biasing mechanisms must then be deployed (possibly incorporating a planning phase) and these mechanisms must then be sustained until the arrival of the imperative stimulus, requiring some working memory component. Frontal areas have been variously implicated in all of these functions.

For example, Rogers et al. (1998) have shown that patients with left frontal damage display a specific deficit in making switches between a letter-naming and digit-naming task when compared to patients with right frontal lesions. Since our task incorporates cue-driven shifts of attention between two different anticipated modalities, the late sustained activity over left frontal cortex may at least partially reflect this switching function. Further, lesions to prefrontal areas appear to cause a loss of the ability to inhibit task-irrelevant sensory information (e.g. Knight et al. 1999), a necessary component of the current task where subjects must attend one modality while trying to ignore the irrelevant distracter modality. The involvement of frontal areas in planning tasks has also been shown (e.g. Shallice 1982; Morris et al. 1993; Rezai et al. 1993; Baker et al. 1996; Owen et al. 1996; Elliott et al. 1997). Lesions of lateral frontal cortex in humans have been shown to particularly effect so-called “conditional associative learning tasks” whereby specific responses are required to certain cues (e.g. Petrides 1990). Petrides (1987) has shown that lesions specific to area 8 have a severe and specific effect on performance in conditional tasks whereby a monkey must choose from a set of visual stimuli based on visual cueing. An equivalent human area (posterior part of the superior frontal sulcus, left hemisphere) is activated when humans perform a similar task with color-cues (Petrides et al. 1993a). Petrides suggests that while mid-dorsolateral frontal cortex is integrally involved in working memory, this dorsolateral area, just posterior to the working memory region is specific to the “higher-order control of behavioral responses involving the selection of appropriate stimuli depending on particular environmental contingencies”. Our finding of a central prefrontal focus that is enhanced in the attend-visual condition could represent the electrophysiological signature of this planning area. Other candidate structures for this activation, which is seen only during attend-visual trials, are the supplementary eye-field which has been implicated in the anticipatory attention phase in visual tasks (SEF: e.g. Bon and Lucchetti 1997; Kastner et al. 1999), the anterior cingulate cortex which plays a role in guiding visual search (e.g. Posner and Dahaene 1994) and bilateral superior frontal sulci which have been implicated in working memory for spatial localization of visual stimuli (e.g. Courtney et al. 1998; see also Bushara et al. 1999). Certainly one possibility is that there are modality specific divisions in anterior cingulate. We cannot say which of these mid-frontal areas might be producing this effect. However, our results suggest that it is more involved or even exclusively involved in visual rather than auditory attentional processes as this focus is not in evidence during attend-auditory trials.

It is possible that planning and sustaining auditory attention may be carried out by other prefrontal regions such as those represented by the bilateral positive foci that are more prominent in the attend-auditory condition. In support, a recent PET investigation of intersensory attention found a pair of bilateral prefrontal activations during the attend-auditory condition (modalities were attended in blocks), which they suggested were involved in the control of attentional tuning of auditory cortices (Alho et al. 1999). The brain areas involved in the anticipatory phase of a cueing paradigm will probably have a high overlap with the areas involved in sustaining attention in continuous attention tasks.

Imaging studies have shown right prefrontal involvement during sustained attention paradigms involving multiple stimulation modalities (Cohen et al. 1988; Pardo et al. 1991; Coull et al. 1996) suggesting a supramodal role for right prefrontal cortex in sustaining attention (see Coull et al. 1996). Also, patients with right frontal lesions show marked impairment in sustained attention tasks (Wilkins et al. 1987; see also Knight et al. 1999). It might be premature to assign a supramodal role to a single area in right prefrontal cortex. The right prefrontal positivity that we find is likely to represent this attention sustaining function, but the late modality specific dissociation suggests differential involvement in sustaining auditory versus visual attention.

Sustained prefrontal activations are also consistent with working memory processes (e.g. Levy and Goldman-Rakic 1999) and the current task has such a component in that subjects must remember which modality is to be attended in the approximately 1 s during which no stimulus is on the screen. This memory could take a number of forms. For example, subjects might maintain a representation of the word stimulus, a verbal working memory that could be accessed repeatedly during the delay period. Indeed, bilateral prefrontal activations have been shown in verbal working memory tasks (e.g. Petrides 1993a; Coull et al. 1996; Smith and Jonides 1998). However, the late sustained bilateral prefrontal activations that we see differ depending on the impending modality suggesting a more complex role in the task than working memory simply for the cue-word. One explanation is that it is not the cue-words themselves that are held in working memory but that the information imparted by them results in a representation of the impending attentionally relevant imperative stimulus being conjured and subsequently held in memory—a prospective code. Compellingly, a recent monkey intracranial investigation showed just such a handoff function such that in the early period, a memory for the cue was being maintained which gave way to prospective activity related to an anticipated target (Rainer et al. 1999).

The exact relationship of working memory and the mechanisms for sustaining attention is not well defined and the extent to which the same neural areas might underlie these two functions has not been well established (see Owen 1997). The original conception of working memory was as an active store of information that could be readily manipulated (see Baddeley 1992) and both the study of Rainer et al. (1999) and the current results are consistent with an active manipulation of the information in working memory towards completion of task demands. If selective intersensory attention is a function of parallel distributed processing in an extended multi-focal neural network, it is reasonable to expect that it will have a large degree of overlap with the equivalent distributed network responsible for working memory (see Mesulam 1998; Levy and Goldman-Rakic 2000). The top-down attentional biasing mechanism may be dependent on this working memory network for its “driving” information and for its maintenance over time.

Biasing signals over fronto-central auditory projections

Our data reveal a pair of sustained fronto-central negativities bilaterally that display modality specificity for the impending S2 stimulus, with significantly greater negativity seen for attend-auditory than attend-visual trials. This effect is considerably stronger over the left fronto-central region than it is over the right where it only approaches significance. The topography of this negativity is more anterior than would be expected for a generator in primary auditory cortex, suggesting that sustaining auditory attention in the preparatory period is achieved by higher-tier auditory areas. However, since auditory cortex is a deep generator relative to its projection to the fronto-central scalp, we cannot rule out small baseline shifts in auditory cortex preparatory activity that may be beyond the resolution of these data. Indeed, biasing of early auditory areas would be more consistent with the very early auditory attention effects that have been seen during subsequent stimulus processing (e.g. McCallum et al. 1983; Woldorff et al. 1987; Hackley et al. 1990). Also, the presence of hemispheric asymmetry in this effect with left hemisphere showing much stronger modulation, suggests that this effect represents activation of a higher order attentional control area (e.g. Alho et al. 1999; Bushara et al. 1999). That is, since our task involved presentation of bilateral tones, both left and right auditory cortices would be expected to prime equally and produce hemispherically symmetric effects, unlike the asymmetry reported here. This finding suggests that control of auditory attention may be more strongly associated with left fronto-central generators, in much the same way that control of visuospatial attention has been more associated with right parietal areas (but see Zatorre et al. 1999). The topography of this effect is consistent with a generator in the more anterior and mesial bank of the superior temporal plane, in auditory association cortices, but could also be interpreted as consistent with a generator in superior frontal cortex. The current data set cannot dissociate these possibilities.

It is important to emphasize that this negativity is also present, albeit attenuated, to the attend-visual instruction. Thus, selective deployment of intersensory attention does not appear to be an “all or nothing” neural strategy but rather a biasing of preparatory state. It is plausible that exclusive priming of only one sensory system might be an unfavorable tactic for optimal task performance in the current paradigm, militating against error corrections if the cue-instruction were initially misinterpreted. Another relevant finding is that restrictions to attentional capacity are not as severe between audition and vision as they are within a single modality (Duncan et al. 1997). This may allow the intersensory attention system to be less stringent in terms of preparatory allocation of attentional capacity than if two attention channels were within a single modality. Another possible mechanism might be that the arrival of the initial sensory representation of the cue-word initiates the deployment of the priming function for both modalities before the cue-instruction has been fully decoded and that it is not until the cue has been decoded in parallel that deployments begin to be biased. These possibilities will need to be investigated in further work.

The fronto-central negativity revealed by high-density mapping in this study is also consistent with experiments reporting on the CNV (contingent negative variation) component seen in S1–S2 designs (Walter et al. 1964) and some studies have shown modality specific anticipatory components of the CNV (Simson et al. 1977; Jarvilehto et al. 1978; Naatanen and Michie 1979; Ritter et al. 1980; Sanquist et al. 1981). In these earlier studies where it was only possible to record from low-density montages, it proved very difficult to dissociate the CNV from putative motor-preparatory activity associated with impending button-pushes (see Sanquist et al. 1981; Tecce and Cattanach 1993). However, the high spatial resolution provided for by 128-channels and SCD mapping allows us to dissociate a left-central motor-preparatory potential from the fronto-central negativity displaying modality specific modulation.

Biasing signals over parietal/parieto-occipital areas

We observed a multifocal complex of sustained activitions over parietal and parieto-occipital cortices in the period preceding the onset of the imperative stimulus, displaying a similar timecourse to the sustained activity observed over frontal areas. This sustained activity was seen during both attend-auditory and attend-visual conditions, implicating parietal areas in the maintenance of attention to both modalities. The equivalent timing of frontal and parietal sustained activity strongly suggests an interactive system for top-down control of attentional allocation (e.g. Coull et al. 1996; Corbetta and Schulman 1998).

In the late sustained activity period, we found two posterior foci that differentiated with respect to the attentionally relevant modality of the impending imperative stimulus. A right parieto-occipital positivity was found to be larger for the attend-visual condition than for the attend-auditory condition and a left central parietal positivity was larger for attend-auditory than attend-visual. Importantly, neither of these differential activations are consistent with preparatory biasing of early visual areas as their topographies are not co-extensive with the topography of the earliest visual response to the subsequent imperative stimulus. Thus, biasing of visual cortices also appears to occur in higher-level parieto-occipital and parietal areas, consistent with known attention centers of the dorsal visual stream. However, small baseline shifts in V1 or V2 activity levels may be below the resolution of these data and cannot be ruled out, although the absence of very early selective processing effects on the subsequent S2 argue against this possibility (see Foxe and Simpson 2005).

We will first discuss the right parieto-occipital differentiation. This differentiation is coupled with a modality specific anticipatory modulation of parieto-occipital alpha-band oscillatory activity that has been reported separately (Foxe et al. 1998). We proposed that increased right parieto-occipital α during the attend-auditory condition, represented a top-down suppression of the anticipated irrelevant visual portion of the S2. Conversely, during the attend-visual condition, decreased α presumably represented the engagement of parieto-occipital cortex in anticipatory visual selective attention mechanisms. The finding reported here of concurrent low frequency sustained activity that is of greater amplitude during the attend-visual condition appears to lend a measure of support for this contention. That is, we find increased slow-wave anticipatory parietal activity at exactly the same time period that alpha-suppression is greatest. Of course, this speculation will need to be directly tested in a future study. Nonetheless, parietal cortex has been implicated in attentional deployment and sustaining attention by many studies. Neuroimaging studies have shown its involvement in attention shifting (e.g. Corbetta et al. 1993, 2000; Corbetta and Schulmann 1998; Culham et al. 1998; Sathian et al. 1999) and in sustaining attention (e.g. Pardo et al. 1991; Culham et al. 1998). Further, intracranial recordings in the lateral intraparietal area of monkeys show neurons that respond in anticipation of salient visual target stimuli (e.g. Colby et al. 1996; Stricanne et al. 1996; see Colby and Goldberg 1999 for review). We propose that the right parieto-occipital positivity seen in our data represents engagement of visual attention mechanisms of the posterior parietal cortex in anticipation of attentionally relevant visual stimuli during the attend-visual trials. However, as was the case for the auditory fronto-central negativity, the parieto-occipital positivity was seen during both attention conditions, suggesting that visual cortices are also primed but to a lesser degree during attend-auditory trials. Thus, the sustained period appears to represent a biased rather than an “all-or-nothing” attentional state.

Another potential explanation for the fact that parietal activation is seen during anticipation of both sensory modalities is that this activation might represent an attention mechanism common to both modalities, i.e. activity from supramodal attentional control areas. Certainly, the parietal cortices have been implicated in multisensory processing (e.g. Lewald et al. 2002; Schroeder and Foxe 2002; Foxe and Schroeder 2005) and so it is possible that multisensory parietal control regions are involved in directing attention to stimuli from the various sensory modalities. In order to tease this issue apart, further research is clearly necessary. A note of caution regarding paradigm design is warranted though. Recent research using task-switching designs, and the present task is clearly a task-switching design as subjects switch between tasks across two modalities, has shown that even if one of the tasks becomes completely irrelevant, activation patterns for that task continue to be seen for considerable amounts of time afterwards (see Wylie et al. 2003, 2004).

We also found a very robust left central parietal positivity that was larger for the attend-auditory condition than for the attend-visual condition. One possible role for this activity may be in allocating temporal attention. Coull and Nobre (1998) have recently shown that an area of cortex in the region of the left intraparietal sulcus is significantly more involved in allocating attention to temporal details of a stimulus than to spatial details. In our paradigm, both the cue to auditory and to visual attention arouse temporal orienting of attention (“S2 will arrive in about a second”), but the added use of a temporal discrimination in our attend-auditory S2 may have contributed to the increased left parietal activation that we recorded. That is, unlike the visual task, during attend-auditory trials, subjects were required to compare two sequentially presented tones. Selective preparation of a temporal attention system for this discrimination may be consistent with this left superior parietal effect. Alternatively, just as right parietal areas have been implicated as specialized for visual and particularly, visual-spatial attention, left superior parietal areas may have some specialization for auditory attention.

Additionally, electrophysiological studies have reported a sustained posterior positivity when working memory for written verbal information was required by a task (e.g. Ruchkin et al. 1997) and human lesion data also points to a role for posterior parietal cortex in verbal working memory (e.g. Warrington et al. 1971; Shallice and Vallar 1990). Neuroimaging studies support these findings (e.g. Petrides et al. 1993b; Awh et al. 1996; Schumacher et al. 1996; Jonides et al. 1997, 1998). As mentioned above, working memory for the cue-word may be an important factor in the present study. We favor an interpretation whereby after the initial decoding of cue-word meaning, the sustained differential activity over parietal cortex likely represents the selective preparation of intersensory attention including interactions with working memory for the stimulus features to-be-attended. This is consistent with findings that show parietal involvement in sensory working memory (e.g. Paulesu et al. 1993; Klingberg et al. 1996, 1997; Chafee and Goldman-Rakic 1998). Further, sustained parietal working memory related activity would not be expected to differentiate strongly between cue words per se. In this vein, Gevins et al. (1996) recorded a positive slow-wave over parietal cortex in both spatial and verbal working memory conditions. Since they found no differences in the distribution of this positivity between conditions, they raised the possibility that their positivity reflected more attentional mechanisms that are a necessary component of sustaining working memory. Further studies will be necessary to dissociate working memory related activation of parietal cortex from activity related to its involvement in sustaining attention, although the exact manner in which these two functions differentiate is still a theoretical issue.

Finally, an important issue that arises in studies such as this one is the potential for the sensory modality within which the instructional cue is presented (i.e. the fact that our cues are always visual) to be a factor in how attention is differentially deployed. That is, it is quite plausible that using a visual cue to predict a visual S2 is quite different to using a visual cue to predict an auditory S2. Studies by Harvey (1980) and subsequently by Turrato et al. (2002, 2004) have convincingly shown that it can make a real difference to performance, in terms of reaction times, if within-modality S1–S2 pairs are used versus cross-modality pairings. For example, Turrato et al. (2004) presented both within-modality and across-modality stimulus pairs, using variable temporal lags between both stimuli. The first stimulus of the pair was always irrelevant to the task, whereas subjects were required to make a speeded discrimination response on the occurrence of the second stimulus. It was found that even though the first stimulus of the pair was completely irrelevant, attention was automatically captured to the modality of that stimulus such that at very short temporal lags, reaction times to the second stimulus within another sensory modality were significantly lengthened relative to a within-modality S2. Compellingly, this effect was even seen in a second experiment where the modality of the S2 was kept constant such that the subject always knew which modality they would be responding to. Even in this case, the totally irrelevant S1 stimulus caused automatic capture to that modality and lengthening of RTs to the cross-modal S2s. These results have obvious implications for the present study. However, all of these studies find that this automatic sensory capture effect persists for a relatively short epoch. In Turrato et al. (2004), effects were seen to be very strong at lags of 150 ms but were highly attenuated at 600 ms gaps and totally absent at 1000 ms. The reader will recall that in the present experiment, the S1–S2 gap is 1085 ms. As such, while the visual S1 in our experiment undoubtedly captured visual attention, the cue-target-interval we used is considerably longer than the gaps shown by Turrato to be relevant for this effect. We would also note that in a very similar experiment, where we used auditory symbolic cues rather than visual, similar activation patterns to those detailed here were seen (Fu et al. 2001). Clearly, a direct comparison of different modality cues and their effects on attentional deployment to the various other sensory modalities is warranted in future work.

Summary

High-density electrical mapping of the brain activity associated with directing attention to one of two possible modalities based on a visually presented cue-word, allowed us to examine the temporal activation pattern of the brain regions that were involved in setting up and maintaining intersensory selective attention prior to the selective processing of subsequent stimuli.

  1. 1)

    We revealed an early enhancement of sensory processing for words when they were seen during the cueing condition relative to the naïve condition, and since the earlier sensory activations were equivalent for these conditions, this is likely not an effect of arousal.

  2. 2)

    Subsequently (120–220 ms) and perhaps as a result of this extra visual sensory processing, a robust left lateral frontal positivity was seen, which we interpreted as representing the encoding of the cue-word by known verbal encoding areas in left lateral prefrontal cortex.

  3. 3)

    The first differentiation of cue-words processing was found over right occipital scalp sites during the visual N1 component (180 ms) consistent with inferior occipital areas that are sensitive to visually presented word stimuli.

  4. 4)

    By 240 ms, differentiation of activity over mid-frontal and posterior parietal areas emerged, likely representing engagement of fronto-parietal attention networks for attention to the respective modalities of the upcoming compound auditory-visual S2 stimulus.

  5. 5)

    Sustained activation patterns achieved a relatively stable pattern in the later phase of the S1–S2 period with multiple activation foci evident over frontal, central and parietal scalp regions.

  6. 6)

    In addition to sustained frontal activity that did not differentiate between attention conditions and likely represented general attentional maintenance and working memory processes, a pair of bilateral positive foci was enhanced during attend-auditory trials and a more superior mid-frontal positive focus was enhanced during attend-visual trials. We take these effects to represent differential activation of a system of frontal areas underlying the deployment and sustaining of anticipatory attention to different modalities.

  7. 7)

    A robust left hemisphere dominant, sustained fronto-central negativity was enhanced in the attend-auditory condition and may represent the anticipatory activation of higher order auditory attention areas, mediated by modality specific top-down inputs from frontal cortices.

  8. 8)

    A right parieto-occipital positivity was of greater amplitude during attend-visual trials and we propose that this represents anticipatory activation of higher order parieto-occipital visual attention areas of the dorsal visual stream, likely mediated by modality specific top-down inputs from frontal cortices.

  9. 9)

    Importantly, neither the fronto-central negativity nor the parieto-occipital positivity were exclusively engaged during either attention condition. Rather, both activations were seen during both attend-visual and attend-auditory trials but were differentially modulated. We take this to suggest that deployment of intersensory attention mechanisms in this attention task, involves a biasing of brain attentional mechanisms, rather than “all-or-none” gating.