Introduction

Perception of our environment is based on integrative processing of input from various sensory modalities. It is still unclear, however, the extent to which this integration occurs for all stimuli, or whether it occurs more for relevant stimuli. Early selection theories predict that our brains are able to amplify relevant information and suppress irrelevant information at early stages of information processing. This amplification and suppression of early sensory processing is a mechanism by which, for example, we can direct attention to a specific stimulus in one sensory modality while ignoring stimuli from other sensory modalities (Eimer et al. 2002; Hackley et al. 1990; Macaluso et al. 2002; Talsma and Kok 2001, 2002).

One of the key functions of selective attention is to enhance perceptual clarity and reduce stimulus ambiguity. Interestingly, many studies have shown that the process of multisensory integration serves a similar set of functions (e.g. Calvert et al. 2000). Single cell recordings in cats and rhesus monkeys have shown that multisensory integration can also occur very early in the processing stream, even at the subcortical level. For example, it has been reported that neurons in the superior colliculus integrate sensory inputs of different modalities (Wallace and Stein 1997, 2001). Further animal studies have shown that integration effects can occur very early in time (Schroeder and Foxe 2002; Schroeder et al. 2003). In humans, some recent studies using electroencephalographic (EEG), magnetoencephalographic (MEG), and functional magnetic resonance imaging (fMRI) recordings, focusing on the cortical integration of audio-visual information, have discovered enhanced activation in different cortical areas, for example the superior temporal gyrus (Callan et al. 2001), temporo-frontal and occipito-parietal regions (Giard and Peronnet 1999; Fort et al. 2002; Molholm et al. 2002), and the temporo-parietal junction (Raij et al. 2000). The EEG multisensory interaction effects have been reported to start as early as 40 ms after stimulus onset, and to occur at various points in time thereafter, indicating that multisensory integration is probably a process that occurs at multiple stages in the processing stream. The similarity in purpose and latency range between initial effects of attention and multisensory integration led Talsma and Woldorff (2005b) to hypothesize that these two processes might interact, and thus further increase the efficiency of perception for multisensory stimuli. Drawing upon and extending previous ERP studies investigating the effects of multisensory integration (Giard and Peronnet 1999; Fort et al. 2002; Molholm et al. 2002), Talsma and Woldorff showed that these integration effects were stronger in the attended channel than in the unattended channel.

Interestingly, although Talsma and Woldorff could demonstrate that attention and multisensory integration interacted at different points in time beginning at approximately 90 ms post-stimulus, these effects did not include the occipital 40 ms integration effects that had been reported previously (Giard and Peronnet 1999; Fort et al. 2002; Molholm et al. 2002). One possible explanation of the absence of these early effects could be that the design of the spatial attention paradigm they used required lateral presentation of the two streams of stimuli, and that the resulting, more peripheral, presentation may have resulted in a slight delay in the onset of the multisensory integration processes. An alternative explanation, however, is that these multisensory integration effects did occur early in time but that these activations were too weak to be picked up in the early (i.e. 40 ms post-stimulus) occipital ERP waves, because of some ramification of the peripheral presentation of the visual stimuli on the triggered ERP activity recordable at the scalp. One way to further explore this issue would be the use of a different method of analysis for examination of these effects and interactions, for example investigation of fast oscillatory gamma responses (Herrmann et al. 2004; Tallon-Baudry and Bertrand 1999).

The aim of the present study was to investigate these early multisensory integration effects on scalp recorded oscillatory GBR activity and their relationship to spatial selective attention by re-analyzing the GBR activity in the EEG data from Talsma and Woldorff (2005b). Previous studies have shown that analysis of gamma-band responses is a useful tool to study multisensory integration processes. For example, Sakowitz et al. (2001) have shown that multisensory stimuli evoke stronger GBRs than unisensory stimuli over central, parietal, and occipital brain areas. It has also been found that oscillatory GBRs over visual areas were enhanced in a sound-induced visual flash phenomenon when a sound induced a visual illusion compared with when it did not (Bhattacharya et al. 2002). This suggests a link between oscillatory GBRs and multisensory processing. Furthermore, oscillatory GBRs have been shown to be sensitive to spatial attention: attended auditory (Tiitinen et al. 1993) and attend visual (Gruber et al. 1999) stimuli have been reported to produce stronger GBRs than unattended stimuli. Accordingly, the above-mentioned studies suggest two key relationships. First, oscillatory GBR activity can reflect audio-visual processing interactions. Second, the GBR activations seem to be modulated by spatial attention.

In the present study oscillatory GBR activity was examined in EEG data that were collected while subjects were engaged in a spatial selective attention task in which two continuous streams of unisensory auditory, unisensory visual, and multisensory audio-visual stimuli were being presented to the left and right hemispaces. These stimuli, which were always unilateral, were presented in random order to the left and right hemispaces while subjects focused their attention on a designated side to detect occasional target stimuli on that side (which could occur in either one or both of the modalities). Using this design enabled the analysis of brain responses to the same set of unisensory and multisensory stimuli when they were attended compared with when they were unattended, with the only difference being the covert focusing of attention toward the stream in one hemispace or the other.

Since this paper extends previous work, a brief summary of the main findings of the Talsma and Woldorff (2005b) study is in order. As expected, spatial attention effects on the processing of the unisensory stimuli consisted of modulation of visual P1 and N1 components (at 90–130 and 160–200 ms, respectively) and of the auditory N1 and processing negativity (100–200 ms). Several phases of multisensory interaction effects, which were defined by the difference between the ERP responses to the multisensory stimuli (AV) and the sum of the ERP responses to the unisensory stimuli (A+V) (Giard and Peronnet 1999), were observed. These interaction effects differed significantly between the attended and unattended channels of stimuli. The earliest of these interaction effects was present only for attended stimuli and consisted of an initial fronto-medial positivity beginning around 90 ms post-stimulus. This effect was followed by two later phases of integration, which were either larger and/or earlier in the attended channel. These phases consisted of a centro-medial positivity beginning at approximately 160 ms post-stimulus and peaking at around 190 ms, followed by a positive wave peaking at about 370 ms after stimulus onset. Both the early frontal effect and the later centro-medial positive effects had scalp distributions rather dissimilar to those normally associated with visual or auditory perceptual processing in modality-specific cortical regions. We therefore hypothesized that the multisensory integration waveform is unlikely to reflect modulations in sensory processing alone but is more likely to reflect integrative processes that originate from association areas in the brain.

In this report we take a different approach, focusing on analyses of high-frequency oscillatory GBR activity extracted using a Morlet wavelet transformation. As suggested above, the motivation for this re-analysis is that it has been suggested that GBRs might be more sensitive to early latency effects of specific cognitive processes than are ERPs (Herrmann et al. 2004; Senkowski and Herrmann 2002). Assuming this is correct, such differential sensitivity could derive from the possibility that some ERP activations may be composed of a superposition of a phase resetting of multiple EEG processes (Makeig et al. 2002; Karakaş et al. 2000). These processes can be separated into specific frequency ranges that, it has been proposed, are associated with different cognitive processes (Başar et al. 2001). Separation of the different frequency ranges would therefore enable separate analysis of the cognitive processes related to each frequency range. Thus, if GBRs can serve as an alternative sensitive marker of early attentional processing, as previous findings have suggested (Herrmann and Knight 2001; Tiitinen et al. 1993), we would expect that these responses might also be more sensitive than ERPs to certain early selective spatial attention effects, such as in multisensory processing.

As described above, the earliest spatial attention effects on ERP multisensory interactions have been reported to begin at approximately 90–100 ms post-stimulus (Talsma and Woldorff 2005b). Evoked GBRs (oscillations that are both phase, and time-locked to stimulus onset) usually peak after approximately 40–50 ms in the auditory modality (Tiitinen et al. 1993) and after approximately 90–100 ms in the visual modality (Herrmann et al. 1999). We therefore expected the earliest GBR attention effects for auditory stimuli to occur after approximately 40–50 ms and the earliest ones for visual stimuli after approximately 90–100 ms. Of particular interest, however, was the investigation of attention effects on multisensory stimuli. In accordance with integration effects of multisensory stimuli previously reported by Sakowitz et al. (2001), we hypothesized that we might find attention effects on multisensory stimuli with a latency of approximately 60–70 ms (GBRs to multisensory stimuli in the Sakowitz et al. study started around then). After the initial evoked GBR responses, a later increase in induced GBRs (i.e. all stimulus-related GBR activity, both non-phase-locked and phase-locked) starting approximately 200 ms post-stimulus has also been described in the literature for GBRs to visual (Tallon-Baudry et al. 1998) and auditory (Haig et al. 1999) stimuli, and thus we expected activity in that latency range also. On the basis of previously reported attention effects on induced GBRs to visual and auditory stimuli, we also expected to find modality-specific effects for unisensory stimuli (Gruber et al. 1999; Sokolov et al. 2004). Lastly, multisensory integration areas that are not modality-specific might also show attention effects for multisensory stimuli (Kaiser et al. 2004). For both evoked and induced gamma responses, the high temporal resolution of electrophysiological recordings was expected to provide insight into the stage of processing at which such interactions of attention and multisensory processing take place, and their scalp topography was expected to provide some insight into the brain regions in which these effects and interactions occur.

Methods

Participants

Sixteen subjects (21.1±2.9 years, nine females) participated in the experiment. Of these sixteen subjects, two were rejected from the analysis because their EEG data contained too much high-frequency noise in the frequency ranges of interest, mostly due to muscle activity. The remaining fourteen subjects (21.9±3 years, eight females) were included in the full data analysis. All subjects were right-handed and had normal or corrected-to-normal vision. None had a history of neurological or psychiatric disorder, and all gave written informed consent to participate in the study.

Procedure

Streams of unisensory auditory, unisensory visual, and multisensory (audio-visual) stimuli were presented in random order to the left and right hemispaces, while subjects attended on different runs to one or the other of these sides. The subject’s task was to detect occasional target stimuli (that could occur in either or both of the modalities) on the attended side and report these by making a manual response with the right or left index finger (counterbalanced across runs), while ignoring all stimuli from the other side. Target stimuli were highly similar to standards, but differed from them by containing an intensity decrement halfway through the stimulus duration, which caused the subjective impression that the stimulus either appeared to flicker (for visual targets) or to stutter (for auditory targets). Multisensory target stimuli contained both the visual and the auditory intensity decrements. The duration of all the stimuli was 105 ms, and trials were presented with a randomly varying inter-stimulus-interval (ISI) between 350 and 650 ms (mean 500 ms). Subjects were presented with ten experimental blocks of trials (five attend left and five attend right blocks, counterbalanced across runs). For each condition (attend left/right), a total number of 700 visual, 700 auditory, and 700 multisensory stimuli were presented; 350 to the left hemispace, and 350 to the right. For each stimulus type, 70 out of these 350 stimuli were targets. To facilitate the elimination of distortion of ERP waves from overlapping responses of adjacent trials (Woldorff 1993), a total of 350 “no-stim” trials were included in each condition (Burock et al. 1998, see Talsma and Woldorff 2005a for details). “No-stim” events, which have been used previously in faster-rate fMRI studies (Burock et al. 1998; Buckner et al. 1998; Woldorff et al. 2004), are points in time in the trial sequence that are randomized as if they were real stimulus events, but during which no stimulus is presented. Because these no-stim events do not elicit a physiological response by themselves, the selective averages to these events thus represent the average overlap of adjacent trials (Busse and Woldorff 2003). As demonstrated by Talsma and Woldorff (2005b), subtracting the time-locked averages from these no-stim trials effectively eliminated this overlapping activity.

The amount of mid-stimulus intensity reduction for the targets was determined for each subject individually during a training session before the experiment. The difficulty of the targets was individually adjusted for each subject using a unimodal target discrimination task, by reducing or increasing the amount of the midstimulus intensity decrement. The required accuracy was set to 90% correct in these unimodal practice sessions, so that subjects would still be able to achieve reasonable levels of accuracy during the somewhat more demanding main experimental session, in which subjects were required to divide their attention between visual and auditory stimuli and detect targets in both modalities.

Data acquisition

Recordings took place in a sound attenuated, electrically shielded chamber. Stimulus presentation was controlled using “Presentation” software (Neurobehavioral Systems, CA, USA). EEG was recorded from 64 channels, mounted at equi-spaced positions in a customized configured electrocap (Electro-Cap International, Eaton, OH, USA) and referenced to the right mastoid. The electrode positions in Talairach coordinates are given in Table 1, as determined previously (Woldorff et al. 1997, 2002).

Table 1 Approximate coordinates of the 64 EEG channels in Talairach space

The Electrode impedances were kept below 2 kΩ for the mastoid and ground electrodes, 10 kΩ for the eye electrodes, and 5 kΩ for the remaining electrodes. Horizontal eye movements were monitored by two electrodes at the outer canthi of the eyes referenced to each other. Vertical eye movements and eye blinks were detected by electrodes placed below the orbital ridge of both eyes, which were referenced to two electrodes directly located above the eyes. The EEG was recorded using a NeuroScan (SynAmps, TX, USA) acquisition system with a 0.01–100 Hz band-pass filter at a sampling rate of 500 Hz. To suppress high-frequency background noise, data were also off-line filtered in the post-experimental analysis with a narrow 60 Hz notch filter (frequency band 57.74–62.25 Hz (−3 dB)). All channels were re-referenced to the algebraic average of the two mastoid electrodes. Data were epoched from 200 ms before to 500 ms after stimulus onset. Baselines were computed for each trial between −150 and −50 ms in the pre-stimulus interval and subtracted from raw data before averaging. Trials containing artifacts were automatically excluded from averaging when the standard deviation within a moving 200 ms time interval in any channel exceeded 30 μV for the scalp electrodes and 40 μV for the eye electrodes.

Stimuli

Unimodal visual stimuli consisted of white horizontal square wave gratings (5.8 cm×5.8 cm, subtending a visual angle of approx. 6°) presented against a black background. These visual stimuli were presented laterally to the left or right of the display at an angle of about 15° from a centrally presented fixation point, in the lower visual fields (approx. 6° below the horizontal meridian), with a duration of 105 ms. Unimodal auditory stimuli consisted of a 1600 Hz sinusoidal tone presented with an intensity of 65 dB (A) and a total duration of 105 ms (10 ms linear rise and fall times). These stimuli were presented through two speakers that were placed behind the monitor, such that the perceived location of the sound matched the location of the lateral visual stimuli. Multisensory stimuli consisted of simultaneous presentation of both the auditory and visual stimuli.

Data analysis

Two subjects were rejected because of unacceptable high-frequency noise levels in the current analysis that were not rejected by the corresponding ERP study reported by Talsma and Woldorff (2005b). For this reason, we re-analyzed behavioral and early-latency ERP data of the 14 subjects that are included in this article.

Behavioral data

Reaction times (RTs) for correct detections of targets, hit rates (HR), and false alarm (FA) rates were computed separately for the different conditions. These measures were subjected to Analysis of Variance (ANOVA) using the subject factors: “stimulus modality” (visual, auditory, audio-visual), and “stimulus location” (left hemispace, right hemispace).

Event-related potentials

For the ERP analysis we subtracted the time-locked average of the no-stim trials from that of the other trials before the analysis, to eliminate the overlap from preceding trials. Because activity in high-frequency bands usually contributes little overlap at the stimulus rate jitter used here (Woldorff 1993), this subtraction was not necessary for the GBR analyses. For the ERP, only the very early time interval (40–60 ms) was analyzed. Later attention effects on the ERP of this data set have previously been described by Talsma and Woldorff (2005b).

Wavelet transformation and analysis of phase-locking

For the analysis of oscillatory GBRs a wavelet transform based on Morlet wavelets was used (details of this transformation have previously been described (e.g. Tallon-Baudry et al. 1998; Herrmann et al. 1999; Quian Quiroga et al. 2002). On the basis of time-frequency planes (Fig. 3) we conducted a wavelet transform with a central frequency of 45 Hz for all three stimulus types (visual, auditory, audio-visual) and all subjects. The duration (2σt) of this wavelet was 44 ms resulting in a spectral bandwidth (2σf) of 14.3 Hz. The spectral bandwidth is twice the standard deviation of the Gaussian frequency kernel (σf=(2πσt)−1).

For interpretation of gamma activity we distinguished two types of activity: evoked and induced. Both evoked and induced activity are elicited by experimentally controlled cognitive processes. Evoked gamma activity is strictly phase-locked and time-locked to the onset of events leading to brain processes, that is, it starts both at the same latency and with the same phase in each single trial. Accordingly, evoked gamma activity remains present in the averaged ERP. Thus, to estimate evoked GBRs, we computed Morlet wavelet transformation of the average ERP waveform (i.e. after the average has been computed). Induced gamma activity, on the other hand, contains gamma activity that is stimulus-related but not necessarily phase-locked to the onset of an event. Thus, the induced GBRs consist of the summed activity of phase-locked and non-phase-locked GBRs (Bertrand and Tallon-Baudry 2000), although tending to be dominated by non-phase-locked GBR activity (Kaiser and Lutzenberger 2003; Müller et al. 2000). Due to this phase variability, most of the induced gamma activity will tend to cancel out during the time-locked selective averaging and is therefore not generally present in the ERP waveform averages. To estimate induced GBRs we first calculated the time-locked average of the absolute values of the wavelet transforms of individual trials. In other words, each single trial was first transformed, using a 45 Hz wavelet, and the absolute values of these transformed epochs were subsequently averaged.

In addition to the induced and evoked GBRs, we analyzed the phase of oscillatory GBR activity. In this analysis, we first plotted the resulting complex phase-angle of each single trial as a point on a unit circle (Mardia and Jupp 2000). This was done for the whole time period of the GBR analysis, separately for each frequency in the gamma band (30–80 Hz). Next, we tested whether the phases of single trials were uniformly distributed by subjecting these phases to the Rayleigh test of uniformity. If a stimulus onset does affect the phase of the GBRs, either by a phase resetting of ongoing oscillatory activity or by phase-locked activity of additional generators, one would expect to find a non-uniform distribution of phase-angles (Tallon-Baudry et al. 1996). In other words, the phase-angles would not be randomly distributed and the test would be significant (p<0.05). The results of the Rayleigh test were finally plotted into a time-frequency plane.

Regions of interest (ROI)

To enhance statistical power, analyses of the GBRs and early ERPs were performed within ROI on the scalp. The focus of the analysis of triggered oscillatory activations and ERPs was on the “standard” (i.e. non-target) stimuli. Since early visual GBRs and early multisensory integration effects in ERPs have been described over occipito-parietal areas (Fort et al. 2002; Giard and Peronnet 1999; Senkowski and Herrmann 2002), we defined a topographical ROI that included these areas for the statistical analysis (Fig. 1). This posterior ROI consisted of a symmetrically arranged cluster of scalp-sites inside which the 10–20 electrodes Oz and Pz would be located. Seven electrodes were included in the posterior ROI: 35 (~Oz), 36 (~Pz), 37, 43, 44, 45, and 46. Early auditory GBRs, on the other hand, usually show a maximum of activation over medial-frontal areas (Tiitinen et al. 1993), and thus we defined a second anterior ROI including a symmetrically arranged cluster of channels inside which the 10–20 electrodes Fz to Cz would be located. The anterior ROI included seven electrodes: 2 (~Fz), 3, 4 (Cz), 11, 12, 13, and 14.

Fig. 1
figure 1

Regions of interests (ROI) definition for the statistical analyses. ERPs and oscillatory GBRs were analyzed for one posterior and one anterior ROI

Statistical analysis

On the basis of latencies of previously reported early-triggered oscillation effects in the visual and auditory modalities (Tiitinen et al. 1993; Herrmann et al. 1999), and latencies of previously reported early attention effects (Woldorff et al. 1993), we investigated two different early time intervals (40–60 and 80–120 ms) in this work. Because induced GBRs usually also increase after approximately 200 ms (Posada et al. 2003), we further investigated a later time window between 200 and 350 ms. Before analyzing the GBRs in relation to attention, we evaluated whether the mean amplitudes in the three specified time intervals differ significantly from baseline activity. These analyses were performed separately for each of the three time windows. Mean GBR amplitudes were computed for the anterior and posterior ROI in the three specified time intervals and in the baseline interval separately for the three stimulus types (unisensory auditory, unisensory visual, and multisensory audio-visual). These amplitudes were subjected to ANOVAs using the factors “time interval” (baseline versus one of the specified time intervals), “attention” (attended or unattended), and “presentation side” (left or right hemispace). Only when a significant effect of, or interaction with, the factor “time interval” was found did we subsequently perform ANOVAs for the specified post-stimulus time interval using the within-subject factors “attention” and “presentation side”.

Results

Behavioral data

Reaction times

The ANOVA for the reaction time data revealed significant differences between unisensory auditory, unisensory visual, and multisensory audio-visual stimuli, as indicated by a significant main effect of the factor “stimulus modality” (F(2,12)=13.09, p<0.001). Fastest responses were found for the audio-visual targets (564±30 ms), followed by auditory targets (616±30 ms) and visual targets (690±37 ms). Post-hoc comparisons revealed that audio-visual targets were processed significantly faster than both auditory (F(1,13)=7.29, p<0.018) and visual targets (F(1,13)=20.15, p<0.001). In addition, shorter RTs were obtained for auditory targets than for visual targets (F(1,13)=16.12, p<0.001). No other significant reaction time effects were found.

Hit rates

As with RT, the ANOVA for HR yielded significant differences between the three stimulus modalities (F(2,12)=28.14, p<0.001). The highest HR were observed for audio-visual targets (89±2.2%), followed by auditory targets (86.6±2.7%) and visual targets (72.9±4.0%). Post-hoc analyses showed that HR for both multisensory audio-visual and unisensory auditory target stimuli were significantly larger than HR for unisensory visual targets (F(1,13)=51.09, p<0.001 and F(1,13)=21.48, p<0.001, respectively). The HR for audio-visual and auditory targets, however, did not differ significantly. No other significant accuracy effects were observed.

Event-related potentials

For a direct contrast between oscillatory GBRs and ERPs, analysis of the early ERP responses (40–60 ms) was done for the same scalp ROI as for the oscillatory GBRs. Figure 2 shows early event-related potentials for all three stimulus types (collapsed across left and right hemispaces). ANOVAs for multisensory audio-visual, unisensory auditory, and unisensory visual stimuli over frontal and posterior scalp regions did not, however, reveal any statistically significant effects in this early latency range (For longer-latency (>90 ms) effects of attention and multisensory interactions see Talsma and Woldorff 2005b.

Fig. 2
figure 2

Event-related potentials for midline electrodes of audio-visual stimuli (left), auditory stimuli (center), and visual stimuli (right) for bother attended stimuli (solid line) and unattended stimuli (dotted line). No significant effects of attention were found for the early time range between 40–60 ms (vertical dashed lines). Later attention and integration effects (>90 ms) of these ERP data have been described in Talsma and Woldorff 2005b. Electrodes marked with an asterisk are located close to the named 10–20 electrode site

Gamma-band responses

Evoked gamma activity

We analyzed evoked (strictly phase-locked) and induced (mainly non-phase-locked) gamma activity using Morlet wavelet transformations for three time intervals (40–60, 80–120, and 200–350 ms). First, we calculated time-frequency planes that gave an estimate of both frequency and latency of the evoked GBR activity for the various event types, separately for when attended and when unattended (Fig. 3). Figure 3 shows time-frequency planes for the GBRs for the audio-visual, auditory, and visual stimuli when attended and when unattended. As can be seen in Fig. 3, there was evoked GBR activity for the audio-visual and auditory stimuli, especially in the frequency range centered around 45 Hz, with little seen for the visual stimuli. Figure 3 also shows the corresponding attended minus unattended differential responses for the audio-visual and auditory stimuli, which shows a robust attentional enhancement of the 45-Hz GBR activity for the audio-visual stimuli. Accordingly, we computed wavelet transforms of this activity with a center frequency of 45 Hz (Fig. 4).

Fig. 3
figure 3

The upper panel shows time-frequency planes of evoked GBRs for channel 2, which is located close to the 10–20 electrode Fz, for attended (upper row), unattended (middle row), and attended minus unattended (lower row) multisensory and auditory stimuli (n=14). Multisensory audio-visual stimuli are shown in the left column, unisensory Auditory stimuli in the right column. An increase of evoked GBRs was found in a frequency range of approximately 45 Hz, after 40–60 ms, for both stimulus types. These planes show an early attention effect at approximately 45 Hz for audio-visual stimuli (see attended–unattended time–frequency planes). Note that attend–unattended planes are shown with a different amplitude scale. Attended and unattended visual stimuli are plotted in the lower panel. Neither of which showed a significant increase of GBRs after stimulus onset

Fig. 4
figure 4

a Grand-averaged evoked GBRs for midline electrodes for audio-visual stimuli (left), auditory stimuli (middle) and visual stimuli (right). b Scalp topographies of GBRs for the attended and unattended condition (40–60 ms). Notice the medial-frontal topography of GBRs for audio-visual and auditory stimuli. c Bandpass-filtered (top) and wavelet-transformed data (bottom) of one subject for attended (dotted line) and unattended (solid line) stimuli. The 45 Hz wavelet corresponds with the convolution of bandpass filtered ERPs in a frequency range of approximately 45 Hz. Electrodes marked with an asterisk are located approximately at the named 10–20 electrode site

To statistically assess the GBR activity patterns, ANOVAs were performed separately for the three time intervals, to investigate whether the evoked GBR activity in each interval differed significantly from baseline activity (−150 to −50 ms). The ANOVAs for the early time interval (40–60 ms) revealed significant main effects of the factor “time window” (i.e. early time window versus baseline) for multisensory stimuli at both the anterior and posterior ROI, indicating a significant increase of GBR after stimulus presentation (F(1,13)=30.38, p<0.0001 and F(1,13)=9.17, p<0.01, respectively, for the two ROI). Furthermore, for auditory stimuli a significant main effect of the factor “time interval” was found for the early time interval at the anterior ROI (F(1,13)=17.92, p<0.001). No other significant effects of the factor “time interval” were found for the three time intervals and for the three stimulus types.

Evoked GBRs in relation to the factor “attention” were calculated for the early time interval (40–60 ms) for the multisensory audio-visual stimuli (for the anterior and posterior ROI) and for the unisensory auditory stimuli (anterior ROI). We submitted the mean GBR amplitudes obtained for multisensory audio-visual and unisensory auditory stimuli to ANOVAs using the within-subject factors “attention” (attended or unattended) and “presentation side” (left or right hemispace), separately for the anterior and posterior ROI. For multisensory stimuli we found a significant main effect of the factor “attention” for the anterior ROI (F(1,13)=6.99, p<0.02), indicating that attended multisensory stimuli evoked larger GBRs (0.16 μV) than unattended multisensory stimuli (0.11 μV). No other effects (e.g. presentation side) were found over the anterior region for multisensory stimuli. A similar analysis was conducted for the posterior region data, but no significant effects were found there. The analysis of the GBR to unisensory auditory stimuli from the anterior ROI also did not yield any significant effects of the factors “attention” or “presentation side”. Although we did not find any main effect of the existence of significant visual GBR activity, we compared the GBR activity for attended versus unattended visual stimuli in the same types of analysis as calculated for multisensory and auditory stimuli. For the anterior and posterior ROI, however, we did not find any significant effects of the factors “attention” or “presentation side”. Figure 3 shows evoked GBRs of the anterior ROI for the three stimulus types. To investigate how attention modulates early GBRs of multisensory stimuli in more detail, we performed ANOVA using the factors “stimulus type” (multisensory or unisensory auditory) and “attention” (attended or unattended) for the anterior ROI, collapsed across stimuli from the left and right hemispaces. We did not compare GBRs of AV versus A+V (a method that has been used in previous ERP studies) for two reasons. First, we did not find any significant triggered GBR activity for the visual stimuli (Fig. 4). Second, because of the nonlinearity of Morlet wavelet transformations, when absolute values are computed the visual and auditory GBRs do not simply combine in a linear fashion when added together. Thus, it seemed most appropriate to compare the GBRs of AV versus A. ANOVA with the within-subject factors “stimulus type” and “attention” revealed a significant interaction of “stimulus type”דattention” (F(1,13)=5.09, p<0.05, Fig. 5). No other significant effects were found. Because of the significant interaction we performed further ANOVA separately for attended and unattended stimuli using the factor “stimulus type” (multisensory or unisensory auditory). For attended stimuli we found a significant effect, indicating that attended multisensory stimuli evoke higher GBRs than attended auditory stimuli (F(1,13)=4.78, p<0.05). No significant differences were found between unattended multisensory and unattended auditory stimuli.

Fig. 5
figure 5

Mean values and standard error of evoked GBRs (40–60 ms after stimulus presentation) for audio-visual (AV), unisensory auditory (A), and unisensory visual (V) of the medial-frontal ROI

Induced gamma activity

Time-frequency planes of induced GBR activity (not plotted) showed only a slight increase of activity for all three-stimulus types after stimulus onset. To compare the patterns of any effects for the evoked and induced GBRs we also calculated induced GBRs in the 45-Hz frequency range. As with the evoked GBR activity, we performed ANOVAs for multisensory audio-visual, unisensory auditory, and unisensory visual stimuli to investigate whether the induced GBR activity significantly increased after stimulus onset separately for the three time windows (40–60, 80–120, and 200–350 ms). We performed ANOVAs on the induced GBR using the same within-subject factors as reported for the evoked GBR analyses described above. These ANOVAs revealed a significant increase of induced GBRs (relative to the pre-stimulus period) only for the early time interval (40–60 ms) and at the anterior ROI for the multisensory stimuli (F(1,13)=5.76, p<0.03). No other significant effects in the other time intervals were found for the three stimulus types. Further ANOVAs were performed to investigate whether there were any attention effects on these early anterior induced GBRs for the multisensory stimuli. Statistical analysis did not, however, reveal any significant effects of attention on this early induced multisensory GBRs activity (F(1,13)=1.25, p<0.28). Thus, these data indicate that spatial attention affected early multisensory processing in the evoked but not in the induced gamma responses.

A potential problem in relation to induced GBRs in this experiment might be the use of short inter-stimulus intervals (varying ISI between 350 and 650 ms). It is possible that induced GBRs, which occurred at longer latencies in response to the previous stimulus in the sequence, might have overlapped with the baseline activity of the current stimulus. If so, long-latency induced GBRs could have been picked up in the baseline of the current stimulus, which could then have reduced the difference between the post-stimulus interval current stimulus and that baseline. This might be argued to be a possible reason why we did not find stronger induced GBRs in this study (particularly for the later time interval). To test whether long-latency induced GBRs from previous stimuli might have been overlapping the baselines we analyzed induced GBR activity surrounding the “no-stim” events (Methods section). Because it is very unlikely that the “no-stim” events in this study evoked any response of their own (Busse and Woldorff 2003), induced baseline GBR can be measured by calculating the differences between the pre- and post-stimulus activity of these null events. If the late induced GBRs had been present, we would have expected them also to be present in the pre-stimulus baselines of the no-stim trials (−150 to −50 ms), but not at longer latencies after the no-stim trial (200–400 ms). Thus, the presence of such longer-latency induced GBRs would be revealed by comparing the pre- and post stimulus GBR activity of the no-stim trials. A repeated measures ANOVA was performed for these trials using the factor “time interval” (−150 to −50 ms pre-stimulus and 200–400 ms post-stimulus), “attention side” (attend right or attend left) and “ROI” (anterior or posterior). This ANOVA, however, did not reveal any significant results, indicating that induced GBR in the pre-stimulus time window did not differ from induced GBRs in the post-stimulus time interval. This therefore argues that induced activity in the baseline (−150 to −50 ms) was not enhanced. Therefore, we conclude that overlapping induced GBRs of preceding trials in the baseline did not contribute to the absence of later induced responses in the post-stimulus interval of the actual stimulus events.

Phase-characteristics of the gamma activity

The increase of evoked GBRs (collapsed across attended and unattended stimuli) relative to baseline was approximately seven times higher (0.14 μV) than the increase in induced GBR activity (0.02 μV, Fig. 6). An increase of evoked activity may result from either of two possible mechanisms or from their combination. On the one hand the amplitude of the gamma oscillations might remain unchanged compared with the baseline but the phases of a substantial subset of them could have been reset to some fixed value by the stimulation, as has been reported previously for alpha oscillations (Brandt 1997). On the other hand, additional event-triggered gamma power might also enhance the amplitude. If, however, the latter explained the evoked GBRs in this study, one would expect to find a much larger increase of induced gamma activity than we observed. Therefore, our data seem to support the former explanation, and would suggest that a phase resetting of ongoing high-frequency EEG activity is likely to have contributed to the observed increase in evoked GBRs.

Fig. 6
figure 6

Gamma-band activity in response to attended (dotted line) and unattended (solid line) audio-visual stimuli. Induced activity only slightly increases (left graph) after stimulus onset whereas evoked activity (right graph) increases more strongly. Significant attention effects were found for the evoked but not for the induced GBRs. Data are plotted without baseline correction

To further explore phase aspects of the data, we examined the uniformity of the phase distribution in an additional analysis. Figure 7 shows the probability of uniformly distributed phases in time-frequency planes, averaged across 14 subjects, for multisensory audio-visual and unisensory auditory stimuli. If there were increased GBR activity at a particular phase, we would expect to find a non-uniform distribution of the phases of single trials and, therefore, a significant (p<0.05) result in the test of uniformity (Rayleigh test of uniformity). Consistent with this, Fig. 7 shows a significant deviation from uniform phase distribution at the same frequency (at approx. 45 Hz) and at the same latency (approx. 40–60 ms) as the evoked GBRs. Figure 7 also suggests a more pronounced deviation from uniform phase distribution for attended than for unattended audio-visual stimuli. This pattern of results would be consistent with the hypothesis that enhanced phase-locking for attended versus unattended multisensory stimuli may have contributed to the attention effects seen in the evoked GBRs, as suggested above. The result would, however, also be consistent with the triggering of additional phase-locked GBRs activity for attended compared with unattended multisensory stimuli. Indeed, the increase of induced gamma activity shows that the presentation of a multisensory stimulus is associated with a small early increase in gamma power. Because the induced GBRs to multisensory stimuli were not significantly modulated by spatial attention, however, we suggest that the attention effect on evoked GBRs is caused in part by phase-locking of some of the ongoing oscillatory gamma activity. In addition, in that the phase-distribution modulation pattern paralleled the effects seen in the evoked GBRs, this additional analysis of phase characteristics adds support for the early-latency attention effects on the multisensory stimulus processing.

Fig. 7
figure 7

Time-frequency planes at a frontal electrode (Electrode 2, which is located approximately at the 10–20 electrode Fz) for the probability of uniform distribution of the single trial phases, averaged over 14 subjects for audio-visual (left column) and auditory (right column) stimuli. A significant deviation from a uniform distribution after approximately 40–60 ms in a frequency range of approximately 40–50 Hz indicates phase-locking of oscillatory gamma activity after stimulus onset. Notice the differences between the responses to the attended and unattended audio-visual stimuli

Summary of results

In line with the ERP analyses by Talsma and Woldorff (2005b), we did not find early (<90 ms) spatial attention effects on ERPs of visual, auditory, and multisensory stimuli (Fig. 2). However, early attention effects (in the 50 ms latency range) on multisensory stimuli were observed for the evoked oscillatory GBRs, suggesting that these responses may be more sensitive to these early attentional processing effects for multisensory stimuli than are ERP (Figs. 3, 4, and 5). The finding of attentional enhancement of the evoked multisensory GBRs, along with the subsequent analysis of the phase distribution of gamma oscillations, suggests that a resetting of ongoing high-frequency EEG activity may have contributed to these early latency effects (Fig. 7). No attention effects on either the evoked or induced GBRs for unisensory auditory and unisensory visual stimuli were found.

Discussion

The objective of this study was to investigate possible early interactions between attention and multisensory integration, by analysis of gamma-band activity during a multisensory spatial attention study. These data were derived from ongoing EEG data from an earlier reported ERP study (Talsma and Woldorff, 2005b) in which we had established that interactions of attention and multisensory processing occur at several latency phases beginning at approximately 90 ms post-stimulus, but not at the very early latencies at which the first ERP signs of multisensory integration have previously been reported in the literature (Giard and Peronnet 1999; Fort et al. 2002; Molholm et al. 2002). Thus, we investigated the possibility that early attention/multisensory interactions might still be present, but may be picked up better by in the gamma-band oscillatory responses. This did indeed seem to be the case, in that we did find a clear enhancement of evoked GBR activity for multisensory stimuli when they were attended compared with when they were unattended. Moreover, these effects showed up mainly as an interaction between attention and multisensory processing, in that no such differences could be found when the auditory (or visual) stimuli were presented alone. An additional novel finding of our study was that although the early evoked GBRs elicited by unisensory visual stimuli were still too weak to produce significant GBR activity, including over occipital areas, they nevertheless exerted their influence when presented in concert with auditory stimuli, resulting in clear attentional enhancement of evoked GBR activity for the multisensory stimuli. Below we will discuss the various main findings of this study.

Event-related potentials

In this experiment, our ERP results did not reveal effects of selective attention on auditory ERPs in the very early (i.e. 50 ms) latency range. Such early latency attention effects have sometimes been reported in the literature (Hackley et al. 1990; Woldorff and Hillyard 1991; Woldorff et al. 1987, 1993). To find attention effects so early in the processing stream, however, it generally seems that one needs to use very rapid dichotic sequences of auditory-only material, which forces subjects to stay highly focused on only one auditory channel (Woldorff and Hillyard 1991). Here, we presented stimuli of different modalities at a somewhat slower rate that were intermixed with each other, which required subjects to divide their attention between visual and auditory modalities. In accordance with earlier studies using this approach (Talsma and Kok 2001, 2002), the earliest effects of selective attention on the unisensory auditory ERPs occurred at around the peak of the auditory N1 component, as described in more detail by Talsma and Woldorff (2005b). Similarly, for visual-only stimuli, the earliest ERP attention effect also started around the onset of the visual P1 (i.e. approx. 90 ms after stimulus onset), which is consistent with reports in the literature (Hillyard et al. 1999; Mangun and Hillyard 1991).

Multisensory integration effects, as obtained by comparing the multisensory ERP to the sum of the unisensory visual and auditory ERP waves, have previously been reported to occur as early as 50 ms after stimulus onset in some ERP studies (Giard and Peronnet 1999; Molholm et al. 2002) but not in others (Teder-Sälejärvi et al. 2002; Talsma and Woldorff 2005b). Following up an initial suggestion made by Teder-Sälejärvi et al. that this summation approach for studying multisensory integration processes might be sensitive to artifactual distortion of early ERP activity, Talsma and Woldorff (2005b; also described in Talsma and Woldorff 2005a) have demonstrated that such potential artifacts can be corrected using a novel technique based on the subtraction of activity from “no-stim” events. Importantly, after applying this correction technique, no ERP effects of multisensory integration at any early latency could be found by Talsma and Woldorff (2005b), whereas before correcting, because of these artifactual sources, effects that would otherwise have been attributed to attention and/or multisensory integration became significant beginning at as early as stimulus onset. Given, however, that we did find effects of selective attention on the early GBRs for multisensory stimuli, we suggest that GBR may be more sensitive to certain early experimental manipulations than ERPs are.

Attention and multisensory integration effects on oscillatory gamma activity

We observed a clear effect of spatial attention at approximately 50 ms post-stimulus in the evoked GBRs to the multisensory stimuli. This finding is interesting not only because no corresponding effect of attention was found on the unisensory auditory stimuli, but also because no significant GBR activity could be recorded in response to the visual stimuli alone. The enhanced GBR activity over medial-frontal areas for attended multisensory (AV) stimuli is consistent with findings from Sakowitz et al. (2001), who reported larger GBR amplitudes for audio-visual stimuli (AV) than for unisensory stimuli. In the study by Sakowitz et al., however, all stimuli were perceived passively without requiring the subject to perform any task. In addition, each stimulus type was presented separately in different blocks. Thus, while the study of Sakowitz et al. did not manipulate attention, it does support the view that multisensory integration processes can enhance GBR activity. Assuming subjects in the study of Sakowitz et al. were attending the stimuli simply by default, their results would be consistent with our findings.

Interestingly, attended multisensory stimuli resulted in increased gamma activity relative to attended unisensory auditory stimuli (and relative to the sum of the unisensory auditory and visual stimuli), suggesting that our early GBR effect may reflect an amplification mechanism that enhances the integration of attended multisensory inputs. Differential amplification of the neural response to multisensory stimuli by attention could be a mechanism for increasing the signal-to-noise ratio of these stimuli and hence their discriminability from the unattended environment (Hawkins et al. 1990). The finding that GBRs of attended and unattended stimuli only differ significantly for multisensory and not for unisensory stimuli suggests that these GBR modulations might be related to the effect of attention on the earliest integration processes of multisensory stimuli.

Whereas GBRs evoked by unisensory auditory stimuli seemed not to be affected by attention, a clear burst of gamma activity for both the attended and unattended auditory stimuli could be observed approximately 50 ms after stimulation. This result is consistent with observations made by Karakaş and Başar (1998), who ran a series of five different auditory experiments and who also found no attention effects on early auditory GBR. The authors concluded that such early GBR activity simply reflects automatic sensory processing activity. The current data would support this view by showing that attention did not affect the amplitude of early auditory GBRs to unisensory auditory stimuli. However, the topographies of early GBRs to auditory and multisensory stimuli were very similar in this study (Fig. 4b), suggesting that the auditory and multisensory GBRs were caused by the same or similar neuronal generators. As shown for multisensory stimuli, these generators are sensitive to attention, suggesting that early GBRs do indeed reflect a process that can be affected by attention. Such a conclusion would be consistent with attention effects on GBRs reported by Tiitinen et al. (1993), but contradict the stronger conclusion drawn by Karakaş et al. (2000) that early GBR only reflect automatic sensory processing. ERP and MEG studies in the past, however, have shown that the effect of attention at such early latencies depends strongly on the attentional demands placed on the subject (Woldorff et al. 1987, 1993; Woldorff and Hillyard 1991). Further studies are necessary to investigate whether or not this is also true for the effect of attention on auditory GBRs.

In contrast with the unisensory auditory and multisensory GBRs, little or no GBR activity was elicited by the unisensory visual stimuli. This result differs from those in some other studies that have reported GBR activity elicited by visual stimuli (Herrmann and Mecklinger 2001; Keil et al. 2001; Strüber et al. 2000). Those other studies, however, presented visual stimuli at the center of the visual field, rather than the peripheral presentation we used here. Busch et al. (2004) recently showed that the eccentricity of a visual stimulus significantly affects GBR amplitude. The authors investigated the effects of eccentricity on visual GBRs and showed that GBR amplitudes significantly decrease with increasing eccentricity. At their maximum eccentricity (8°) only a small increase of evoked visual GBR activity was observed (Fig. 8b in Busch et al. 2004). In our study the stimuli were presented at the even higher eccentricity of 15°. Thus, the absence of visual GBRs in this study seems to be in accordance with the finding of Busch et al. (2004).

Regardless, and most interestingly, however, although the occurrence of the visual stimuli was not reflected in significant gamma activity when presented alone, these stimuli nevertheless affected brain activity in that their presence resulted in a robust modulation of the amplitudes of attended and unattended multisensory GBRs. Given that the topography and timing of the multisensory and unisensory auditory gamma activations were highly similar, the differential responses to these two stimuli by attention presumably had to originate from the effect of the visual component of the multisensory stimuli.

Phase-locking of gamma activity

Finally, we investigated the nature of the stimulus-related GBR activity in more detail. Interestingly, the induced GBR (which consists mainly of non-phase-locked activity but includes the phase-locked activity also) increased only slightly after stimulus onset, the increase being only about 1/7 of the amplitude change of the evoked activity. This finding suggests that the measured effects have, in part, been caused by a phase resetting of ongoing oscillatory gamma activity, with a relatively smaller contribution from an amplitude increase of the triggered GBRs (Kolev et al. 1998). The reasoning here is that if the strong increase of evoked GBR were mainly because of an increase in amplitude of stimulus-triggered gamma activity, we would have expected a higher increase of the induced amount of GBRs as were (because this activity consists of both the phase-locked and non-phase-locked activity). Thus, this reasoning would suggest that selective spatial attention during multisensory processes modulate a phase resetting of GBRs after stimulus onset in this very early time latency. However, in that there was also a significant increase in the induced GBRs for the multisensory stimuli, although smaller than the evoked GBR change, it would suggest that additional generation of GBR activity phase-locked to the stimulus onset may also have contributed to these effects.

In an additional analysis we investigated the phase distribution of the oscillatory GBR activity. This showed the pattern of modulation was similar to that for the evoked GBR amplitudes. Most importantly, this alternative analysis supported the existence of a significant early effect of attention for the multisensory stimuli, with no such effects for the unisensory stimuli. In addition, although this change in phase distribution could have been produced by the triggering of additional phase-locked GBR activity, it is also consistent with a possible effect on the phase resetting of the GBRs, as suggested by the differential magnitude of the amplitude increase in induced and evoked GBR activity, as described above.

Task difficulty

Although we controlled the difficulty of visual and auditory target discrimination for the visual and auditory modalities separately before the experimental runs, and expected a slight performance decrease in the somewhat more demanding multimodal tasks, it seemed that this drop in performance was somewhat higher for visual targets than for auditory. It is unlikely, however, that this difference could have profoundly affected our results. As discussed in Talsma and Woldorff (2005b), one might argue that subjects were strategically focusing on the easier stimulus (i.e. the auditory stimulus) and trying to filter a potentially harder to discriminate stimulus (i.e. the visual stimulus). This explanation, however, would also predict that behavioral performance would be fastest and more accurate in the auditory condition, because filtering of the visual part of the multisensory stimulus should occur at an observable cost or, at the very least, not be faster than detecting the auditory targets alone. This was clearly not found in this study, in that the multisensory targets were processed faster and better than either of the unisensory target types. Furthermore, in a previous study we investigated the effects of task difficulty on ERPs and GBRs of centrally presented visual stimuli (Senkowski and Herrmann 2002) and found the earliest modulations at a later latency of approximately 260 ms in the ERP and 100 ms in the evoked GBRs. Thus, the absence of early (<90 ms) attention effects on visual and auditory stimuli in this study is probably not related to differences between task difficulty in these two modalities. Accordingly, even though the visual stimuli might have been somewhat harder to detect, the most important issue still holds, namely that the visual and auditory targets were sufficiently difficult to force the subjects to stay highly focused on both visual and auditory modalities at the same time.

Conclusion

The main purpose of this work was to investigate the relationship between multisensory integration and oscillatory gamma responses and how this relationship is affected by spatial attention. We found that attention can amplify evoked oscillatory GBRs for attended audio-visual stimuli over medial-frontal scalp areas at very early latencies (approx. 50 ms). Because no such effect could be observed for the GBRs elicited by unisensory stimuli, we concluded that this amplification effect probably reflects a multisensory integration process that is stronger for attended stimuli. Furthermore, the multisensory integration effect seems to have contribution both from higher phase resetting of GBRs in response to attended compared with unattended audio-visual stimuli, although it could have included some increased triggering of phase-locked GBR activity for the attended multisensory stimuli. In contrast, we did not find such early attentional modulation in the ERPs in this paradigm, because the effects and interactions of attention and multisensory integration in the ERP did not begin until approximately 100 ms after stimulus presentation (Talsma and Woldorff 2005b).

Thus, these results indicate that oscillatory GBRs may be more sensitive than ERPs to some very early multisensory integration and attention effect and, accordingly, may provide a useful alternative tool for investigating the mechanisms underlying these phenomena. Based on our findings here using oscillatory GBRs, we conclude that selective spatial attention can modulate integrative multisensory processes at very early levels of cortical signal processing.