Keywords

Introduction

At any given moment our mind focusses on a small number of tasks, thoughts, or sensory impressions. This does not seem to be a deliberate choice; it rather reflects fundamental limits in the ability of a healthy brain circuitry to process all available information in parallel. Fortunately, a number of mechanisms guide the efficient allocation of limited processing resources to behaviourally relevant tasks and sensory input. These mechanisms can be subsumed under the term “attention”. In this chapter we introduce the most prominent mechanisms of attention and discuss recent findings about how these relate to oscillatory brain activity.

Mechanisms of Attention

In the 1800s researchers observed that human conscious perception has a limited capacity; participants of an early psychophysical experiment were incapable of reporting a full array of objects briefly flashed to them. However, they could improve performance, i.e. consistently report a subset of the array, when they deliberately focussed on specific positions in anticipation of the upcoming array [1]. Similar results were later obtained in experiments using auditory stimuli: focussing on a particular voice among others improved performance in reporting spoken words or elements of a narrative [2].

The above-described experiments established that attention can be allocated voluntarily to a portion of space (e.g. parts of a letter array) or a stimulus feature (e.g. voice pitch) of the collective sensory input to facilitate task performance when trying to achieve a specific goal. Nevertheless, high-contrast sensory events such as a loud bang or a bright flash of light will attract attention automatically. In fact, even hearing your name in a background conversation at the proverbial cocktail party can have a strong stimulus-driven pull effect on your focus of attention. Both ways to (re)allocate attention have led to the influential “top-down” (goal-directed) vs. “bottom-up” (stimulus-driven) dichotomy in attention research [3,4,5]. As we will see in the following sections, this distinction continues to inspire research into oscillatory correlates of attention.

In sum, early experimental findings have led to the conceptualization of attention as a selective filter mechanism (or a set of hierarchical filters) that can be adjusted dynamically to meet the demands of current behavioural tasks or allow facilitated responses to rapidly changing situational circumstances. The filter concept thus already implements three characteristic properties that have become subjects of intense research into underlying neur(on)al correlates:

  1. 1.

    The internal representation of an attended stimulus experiences a selective gain as compared with concurrent unattended sensory input.

  2. 2.

    Stimuli outside the focus of attention do not receive in-depth processing—they are effectively filtered out. Note that while this effect can be seen as a consequence of a selective gain mechanism, current research supports the notion of an active suppression of irrelevant input (see the section “Oscillations and Suppression of Irrelevant Information” later in this chapter).

  3. 3.

    Conceiving of the focus of attention as a dynamic mechanism implies that it can move through position and feature spaces to allow for flexible selection (and filtering-out).

In the following sections of this chapter, we review models of how these properties can be formulated in terms of rhythmic brain activity in characteristic frequency bands. Beforehand, we briefly recapitulate prominent models of attention to point out where these fall short and may benefit from integrating concepts learned from the study of brain oscillations.

Models of Attention

Psychological models of attention have evolved during the second half of the last century mostly based on results of behavioural studies. A number of metaphors have been coined in the process to illustrate the selective filter aspect of attention. Whereas a “bottleneck” was used to describe selective listening [6, 7], visual—more specifically visuo-spatial—attention has been famously likened to a “spotlight” [8] or “zoom lens” [9]. Especially in the visual modality, more complex models have been developed that sought to describe and, ultimately, to predict the characteristic properties of the attentional spotlight. Efforts culminated in the Feature Integration Theory [10], Guided search models [11], and the Theory of Visual Attention [12], among others.

These models were mainly based on abstract psychological constructs such as the spotlight and schematic internal representations of external physical stimulus situations, so-called “feature maps”, devoid of any specific neurophysiological substrate. They were nevertheless successful in predicting behavioural performance in visual search tasks. At the same time, advances in neurophysiological techniques increasingly allowed the investigation of the neural substrates of attention. Early electrophysiological experiments found that the neural activity associated with stimulus processing increased when a stimulus was attended. This led to the notion of attention as a response gain (“sensory gain”) mechanism [13].

Soon after, recordings of single neuron firing patterns allowed groundbreaking insights into the influence of attention on neuronal stimulus processing. Based on their studies, Moran and Desimone [14] put forward the influential Biased Competition model of attention. When they placed two stimuli in the receptive field (RF) of a neuron that represented an unattended location, its response (spike rate) was a weighted average of the responses to the singly presented stimuli. A stimulus that usually elicited a strong response (“preferred” stimulus) and one that usually elicited a weak response (“poor” stimulus) placed within the same RF gave an intermediate response. Crucially, allocating attention to one of the two stimuli shifted the neuronal response towards the response given when the attended stimulus was presented alone.

These results led them to hypothesize that multiple simultaneously presented stimuli enter a competition for neuronal representation, thereby suppressing each other’s processing. They further proposed that attention biases the competition by releasing a selected stimulus from mutual suppression [15]. To date, numerous single-cell studies have supported this assumption [16, 17]. Neuroimaging studies have revealed Biased Competition-like mechanisms in large-scale population responses in the human brain [18,19,20], although more recent findings question whether these act on all stages of the visual processing hierarchy [21, 22].

Notably, Biased Competition posits a contrast gain rather than a response gain mechanism to enhance the processing of an attended stimulus. The stimulus profits maximally from the attentional bias when it competes with concurrent equally salient stimulation. The bias has little effect when the stimulus is highly salient itself (ceiling effect) or presented among more salient stimuli (floor effect).

Recent progress in single-cell research led to the development of powerful computational models of attention that supersede the original Biased Competition idea in many aspects. To date, so-called normalization-type models represent the state of the art [23]. The term “normalization” refers to the fact that this class of models includes a computational stage at which response magnitudes of individual neurons are divided by the population level activity. The Normalization Model of Attention by Reynolds and Heeger [23] has especially raised interest because it implements entities that seem closely related to constructs used in psychological theories of attention, but are directly derived from properties of single neuron and population level activity (Fig. 3.1). The Attention Field, for example, resembles aspects of both, the spatial “spotlight” and the “feature maps”. One important contribution of this model is that it unifies seemingly contradictory response gain and contrast gain effects of attention on the fly by predicting a simple relationship between the (flexible) size of the attentional focus and the stimulus size.

Fig. 3.1
figure 1

Schematic representation of the Normalization Model of Attention. From left to right: The presentation of the stimulus display leads to the activation of neuronal populations that prefer the orientation of the black bar stimuli and whose receptive fields (RF) encode their locations. This Stimulus Drive can be represented as a two-dimensional position space by feature space maps. Attending to one position is equivalent to multiplying the Stimulus Drive with an Attention Field, which leads to a relative gain effect depicted as the Excitatory Drive. In a second stage the excitatory drive is effectively normalized through a division with the Suppressive Drive (a convolution of the Excitatory Drive and a Suppressive Field that represents lateral inhibition between neurons) to yield the final biased Population Response. (Used with permission from Montijn JS, Klink PC, van Wezel RJ. Divisive normalization and neuronal oscillations in a single hierarchical framework of selective visual attention. Front Neural Circuits. 2012;6:22. Modified after Reynolds JH, Heeger DJ. The normalization model of attention. Neuron. 2009;61(2):168–85)

In conclusion, there is an abundance of theories and models that describe influences of attention on perception and behavioural performance. While some are based on abstract psychological constructs, others are derived from studying single-cell or population-level activity. Importantly, models increasingly converge—psychological constructs can be expressed in terms of neuronal interactions as in the Normalization Model of Attention. Nevertheless, most models can be considered incomplete with regard to two important aspects. First, attending to a stimulus requires the orchestrated activity of widely separated neuronal populations in different brain areas. Current models instead disregard or simplify the underlying brain circuitry and anatomical connections between them. Second, we rarely attend to a particular stimulus over an extended period of time. The allocation of attention is a highly dynamic process. Imagine, for example, a typical traffic situation during rush hour. These dynamics require transient and on-demand connections between remote neuronal populations. Models of attention are based on the assumption that these functional connections exist, but currently lack further specifications of how they are established.

Attention and Brain Rhythms

Long-range functional connectivity requires anatomical connections such as fibre tracts that link distant areas of the brain. Several anatomically defined networks have been identified whose nodes contribute to various aspects of attention and its influence on perception [24]. Among them, a dorsal fronto-parietal network encompassing the intra-parietal sulcus (IPS), in posterior parietal cortex, a portion of the precentral supplemental motor area, the so-called frontal eye fields (FEF), and early sensory areas, such as visual cortex, comprises the most comprehensively investigated cortical network implicated in the control of attention (Fig. 3.2) [25].

Fig. 3.2
figure 2

Schematic cortical surface. Areas coloured in shades of blue correspond to the sensory cortices. Yellow and orange areas denote locations of nodes of the fronto-parietal attention network. Asterisks (∗) in the legend signify that the indicated areas are not identical to the nodes, but likely contain them. The yellow areas cover parts of the posterior parietal lobes that enclose the intra-parietal sulcus (IPS) bilaterally. The orange areas give approximate locations of the frontal eye fields (FEF) that can be found in precentral supplemental motor areas

The last couple of years have seen an increasing number of studies reporting that the nodes of the fronto-parietal “attention network” communicate by means of brain rhythms in characteristic frequency bands [26,27,28]. Crucially, the idea is that these rhythms establish functional connections that convey modulatory influences of attention [29]. Above-described shortcomings of current models of attention can thus be addressed by considering the intrinsic rhythms of the human brain as a key player in intra- and inter-areal brain communication.

In the following, we will relate the selective gain aspect of attention to the selective routing of information between neuronal populations that synchronize their activity locally, within cortical regions, or globally, across cortices, through slower delta (1–4 Hz), theta (4–8 Hz) and alpha rhythms (8–13 Hz), as well as faster beta (15–30 Hz) and gamma rhythms (>30 Hz). The alpha rhythm and its relationship with the second aspect of attention, the filtering-out or suppression of irrelevant, possibly distracting, sensory input will be discussed in more detail in a later section. Furthermore, we will outline that (low-frequency) oscillatory phase may play its part in understanding how the dynamics of attentional gain and suppression unfold in time. Ultimately, we review attempts to integrate these aspects into a coherent oscillatory framework of attention and introduce an approach that links brain oscillations and the normalization model of attention. Neural mechanisms of attention have also been investigated by means of stimulus-driven brain oscillations [30]. The nature of stimulus-driven brain oscillations and their relationship to intrinsic rhythms is currently under debate [31]. An extensive review of findings on (visual) attention by means of frequency-tagging can be found in [32].

Oscillations and Selection of Relevant Information

This section reviews two hypothetical accounts, communication-through-coherence [33] and the phase reset of low-frequency oscillations [34] that model how selective attention influences stimulus processing via brain oscillations.

Communication Through Coherence (CTC)

The CTC framework starts with the observation that any neuronal assembly can synchronize otherwise random firing patterns of individual neurons when activated by a common input [35, 36]. Such coherent behaviour of neurons in sensory cortices is regarded as the signature of neural stimulus representation in the animal model [37, 38] and, more recently, in human electrophysiology [39, 40]. More importantly, rhythmic activity of a neuronal assembly entails both periods of high excitability to external input coupled with peaks in spiking activity, as well as periods of low excitability during which neurons cease firing [41, 42]. It is this periodicity in excitability that allows for selective communication with other groups of neurons.

CTC posits that two groups of neurons establish a communication link by synchronizing their rhythmic bursting behaviour and, thus, their excitability cycles [33, 43, 44]. Conversely, communication ceases when two groups desynchronize. A sending and a receiving neuronal assembly that seek to transmit information between them will do so during their joint phases of maximum spiking activity that coincide with excitability peaks. This coherent, strictly periodic opening and closing of communication “channels” subserves a number of purposes: (1) It ensures that the receiving group of neurons picks up spike bursts emitted by the sending group during its periods of highest excitability while capitalizing on the fact that neurons are particularly sensitive to synchronous input. This maximizes information transfer [45] and renders CTC effective. (2) Communication can be maintained over at least a number of coherent cycles because the sender can easily predict the upcoming excitability peaks of the receiver due to the inherent temporal regularity. This establishes the stability of CTC. (3) A given group of neurons can temporarily synchronize and desynchronize with different other groups of neurons to form transient coherent networks for specific processing tasks. CTC thus allows a selective and dynamic arrangement of functional connections within a network of anatomical links.

CTC does not strictly limit the bandwidth of the frequencies at which groups of neurons communicate with other populations. Frequencies rather depend on the time it takes to transmit signals between sender and receiver [33]. Specific lags are thus mostly determined by synaptic delays and axonal conduction speed [46, 47]. For relatively short anatomical connections, as within brain areas, signals travel quickly and allow for information transmission within one cycle of gamma oscillations (>30 Hz). For long-range inter-area connections, signal conduction times increase. Groups of distant neurons thus typically synchronize at lower, beta-band frequencies [48,49,50]. The role of gamma and beta rhythms in cognitive processes in general and attention in particular has recently been reviewed extensively in [51, 52].

As a framework for selective and flexible communication, CTC is ideally suited to model the neuronal mechanisms that underlie selective processing of attended over ignored sensory input. As laid out above, an attended stimulus dominates the competition for neural representation. Within CTC, in-depth neural representation of a given stimulus can be expressed as communication between neuronal assemblies that code that stimulus across hierarchical stages of sensory processing. Selective gain can thus be conceived of as selective communication within a cortical network of neuronal assemblies coding the attended stimulus (while excluding concurrent ignored sensory input).

Note that the strict phase-locking of the receiving neural population to one subordinate group of neurons but not the other resembles a winner-take-all mechanism [53, 54] consistent with the Biased Competition account of attention that has been formulated on the level of single neuron spiking behaviour [14, 15]. There, neurons in cortices representing late sensory processing stages have been found to show a characteristic response to an attended stimulus in the presence of irrelevant stimuli as if it were presented alone. Therefore, selective attention described in terms of CTC extends Biased Competition to the level of neuronal populations and links it to intrinsic neural rhythms with a prominent role for gamma band oscillations.

To date, various predictions of a CTC account of selective attention have been tested and confirmed [28, 55,56,57]. Only recently, Bosman et al. [58] investigated its core assumption in early visual cortices of the macaque brain: they recorded electrocorticograms from two sites in primary visual cortex (V1) that were responsive to two spatially distinct stimuli as well as from one site in higher-order visual cortex (V4) that received converging input from both V1 sites. They found compelling evidence that the downstream group of neurons selectively coupled to the V1 site that represented the currently behaviourally relevant stimulus, thus corroborating that selective gamma-band synchronization allows for dynamic and exclusive routing of attended sensory input.

Considering the wealth of research on CTC, it can be regarded as an exceptional model for how selective stimulus processing makes its way up the (visual) processing hierarchy. It is less clear, however, how goal-directed (top-down) biases can be implemented by CTC. Put differently, how does CTC model the proverbial spotlight? The top-down direction requires higher-order brain areas to exert processing biases. Indeed, several studies have shown long-range gamma coherence (i.e. “coupling”) between early sensory cortices and FEF [28], between homologue areas in different cerebral hemispheres [59], and between motor cortex and peripheral muscle innervation [60]. Although modulations of gamma coupling were substantial, the overall coherence was found to be relatively low. Lisman and Jensen [61] discussed that low coherence might render communication ineffective. In their opinion, long-range gamma coupling might rather be a consequence than a means of neuronal communication over such distances, which makes it an unlikely candidate for conveying direct top-down influences on sensory processing. Below, we discuss how low-frequency brain oscillations (<15 Hz) may enable long-range high-frequency coherence in top-down processing.

Low-Frequency Phase Reset

Rare, high-contrast salient sensory events—ambulance sirens or a camera flash—capture attention automatically, “bottom-up”. In most cases we will immediately turn towards the sources of these events involuntarily. With regard to neural activity in corresponding sensory cortices, these salient sensory events have (at least) two effects: First, they elicit evoked responses, an increase in overall activity that occurs strictly time-locked to stimulus presentation [62]. And, more importantly, they reorganize the phase of ongoing oscillations in such a way that a preferred phase occurs at a certain latency after the event, irrespective of the phase prior to the event [63, 64]. This phase “reset” (Fig. 3.3) leads to strong phase synchronization that tunes the cortex to the processing of the properties of the driving stimulus [65]. In detail, phase resets provide organized temporally structured windows of high cortical excitability that can lead to optimal stimulus processing equivalent to a sensory gain mechanism. In contrast, stimuli that occur outside this optimal window arrive at phases of lower excitability and have a processing disadvantage. As a consequence, stimulus-driven phase resets implement a potent mechanism for sensory selection [34].

Fig. 3.3
figure 3

Schematic phase reset. Each coloured waveform represents oscillatory activity in a small neuronal population of a given sensory cortex. (The heavy black line depicts a cosine signal that can be used as a reference.) Prior to stimulation, these populations may oscillate with a random phase relationship (see corresponding phase plots in unit circles next to the waveforms). A salient sensory stimulus can reset oscillatory phase across populations. This leads to a non-random phase distribution, i.e. phase alignment, shortly after stimulus presentation

Extending the phase-reset mechanism to multisensory scenarios, Lakatos et al. [66] suggested that neural processing can be guided by the sensory modality corresponding to the salient event. In case of the ambulance siren, for example, the auditory cortex takes reign over sensory processing. Ultimately, the “leading sense” exerts modulatory influences on early cortical visual and somatosensory processing. These influences can be considered as a cross-modal spread of attention: attending to a specific stimulus in one sensory modality has been shown to selectively facilitate the processing of temporally and spatially congruent input to the other senses [67]. The ambulance siren will likely draw your attention towards a fast-approaching vehicle with blinking lights. This selective bias of processing between senses might aid in extracting and integrating concurrent multisensory input [68].

A crucial precursor was the finding of Lakatos et al. [66] that the phase reset (but not the evoked response) “spills over” to other sensory cortices. A phase reset across senses (“cross-modal”) occurred specifically in oscillations within gamma- and theta-frequency ranges, and was most pronounced in the theta band. Especially low-frequency oscillations have proven instrumental in providing temporal reference frames for the encoding of stimuli in sensory cortices [69]. It is thus conceivable that auditory-guided selective processing of visual input is supported by a cross-modal phase reset, where the auditory cortex imposes its temporal reference frame on visual processing.

Lakatos et al. [66] further demonstrated that cross-modal phase resets are only initialized by attended stimuli. More specifically, when attending to auditory input, the presentation of an auditory stimulus will lead to an evoked response and a phase reset in auditory cortex but will only reset phase in visual cortex. The same holds true for visual stimulus presentation while attending to visual input. The presentation of an auditory stimulus during attention to vision, however, still leads to an evoked response and phase reset in auditory cortex (albeit of smaller amplitude) but is ineffective in resetting phase in visual cortex. These findings stress the role of attention as a dynamic selector of the leading sense as a pace maker in sensory processing.

Although powerful in describing how attention may govern flexible sensory selection, some aspects of the “leading sense” framework need further specification. Similar to CTC, as discussed above, it is unclear how attention is initially allocated to a sense. The notion of cross-modal phase resets emphasizes the role of transient salient sensory events in capturing attention automatically, as in our ambulance example. Nevertheless, most experiments investigating oscillatory cross-modal interactions employed paradigms that required sustained focussed attention to one of two concurrently presented equally salient sensory streams [65, 70]. For that purpose, a higher-order mechanism operating above sensory modalities and exerting such biases must be assumed and remain to be included in the model. As with CTC, a likely candidate is the dorsal fronto-parietal attention network (see “Mechanisms of Attention” at the start of this chapter).

Furthermore, it remains to be seen whether a phase-reset mechanism can be generalized to other stages of stimulus processing. Physically distinct properties of an object, such as “colour” and “motion trajectory” in the visual system have to be selectively processed and integrated within senses as well. It might be an interesting subject of future research whether a phase reset can also account for within-modal but between-feature coupling in visual processing. For instance, will the red ball coded in colour-sensitive visual areas phase reset oscillations in other areas that code its trajectory (or vice versa)? Such a mechanism might prove vital for an efficient assessment of the ball’s approach towards oneself and allow for timely evasive action.

Oscillations and Suppression of Irrelevant Information

Another important mechanism by which attention optimizes stimulus processing in the human brain is the suppression of unattended sensory input. Preventing task-irrelevant information from reaching higher processing stages optimizes the use of limited processing resources and avoids interference or competition between irrelevant and relevant information. Ideally, irrelevant information should be blocked at the earliest possible stage, i.e. in early sensory areas. Evidence for task-specific suppression of sensory information is ubiquitous in the neuroimaging literature. Interestingly, recent studies provided compelling evidence that brain oscillations play an important role in attentional suppression. In particular, oscillations at a frequency of around 10 Hz (alpha-band) show task-specific amplitude modulations that are consistent with a role in attentional suppression. This hypothesis has gained early support from studies demonstrating an inverse relationship between alpha amplitude and behavioural performance of target processing [71, 72]. These studies show that even spontaneous fluctuations in occipito-parietal alpha power modulate the perceptual fate of an incoming near-threshold stimulus. Other studies extend this finding by showing that alpha power is related to cortical excitability [73, 74].

But beyond these general findings, evidence is emerging that specifically suggests that alpha band activity transiently inhibits neural populations that process task-irrelevant sensory information. In the following sections we will review and discuss this evidence.

Suppression of Spatial Location

Most of this evidence originated from electrophysiological studies of the classical Posner paradigm—a cued target detection paradigm [8]. Typically, participants fixate on a central fixation cross throughout the trial. A symbolic cue (e.g. visually presented small arrow, word, or tone) instructs the participant to covertly shift attention to the left or right visual hemifield (Fig. 3.4) while continuing to fixate on a central cross. After a delay period (often between 500 and 1500 ms), a target is presented in the left or right hemifield. Behavioural performance is better for targets presented in the attended hemifield [8]. A number of variations of this classical paradigm exist. The validity of the cue stimulus can be changed (i.e. targets are presented in the uncued hemifield with a certain probability), participants can be instructed to respond (or not) to targets presented in the uncued hemifield, and distractors can be presented at the same time with the target stimulus in the attended or unattended hemifield. The task may involve the detection of near-threshold targets or the identification of a specific target stimulus, etc.

Fig. 3.4
figure 4

Schematic representation of the modulation of brain oscillations during visual spatial attention. In the commonly used “Posner-task”, participants fixate the cross. The < cue instructs them to covertly shift attention (to the left hemifield in this case). The shift of attention leads to a modulation of 10 Hz brain oscillations in occipito-parietal brain areas. The amplitude of 10 Hz oscillations decrease in the hemisphere contralateral to the attended hemifield and increase in the hemisphere contralateral to the suppressed hemifield

Interestingly, the amplitude of alpha oscillations over occipito-parietal brain areas is modulated following the presentation of the cue stimulus and even reflects the locus of spatial attention (see Fig. 3.4). Specifically, the covert shift of spatial attention to one hemifield leads to a reduction of alpha oscillations in contralateral occipito-parietal brain areas [75,76,77]. This reduction is sustained in the absence of sensory stimulation during the cue-target interval. These often-reproduced findings indicate a close link between visuo-spatial attention and alpha oscillations. But what exactly is the evidence that link alpha oscillations more specifically to attentional suppression?

Importantly, several studies report an up-regulation of alpha oscillations contralateral to the unattended hemifield consistent with a suppression of the visual hemisphere that is less likely to receive target information [26, 78,79,80]. This is illustrated in Fig. 3.4 where the parietal areas contralateral to the unattended hemifield show an alpha increase in the cue-target interval (before presentation of the target).

Furthermore, the amount of alpha modulation in this type of paradigm has been found to correlate with behavioural performance, indicating a functional role of alpha oscillations in the gating of target stimuli. It is important to note here that it is the single-trial alpha power in the cue-target interval that correlates with subsequent target processing performance. This is consistent with the notion that alpha power reflects the anticipatory attentional bias of location-specific neural populations. However, it remains unclear to what extent this correlation holds for the inhibitory aspect of alpha oscillations. In fact, Capilla et al. [79] found a correlation between anticipatory alpha power and behaviour only for the alpha power decrease contralateral to the attended hemifield, and not for the alpha power increase (thought to reflect sensory suppression) contralateral to the unattended hemifield. Further studies have reported correlations of behavioural performance with a collapsed measure of hemispheric lateralization of alpha power in occipito-parietal EEG electrodes [75, 76].

The correspondence between alpha modulation and shifts of visual attention has been generalized to more complex (and ecologically valid) scenarios. Recently, Tan et al. [81] showed that during a dynamic action observation task alpha modulation spatially coded for the predicted movement end point of the behaviourally relevant stimulus feature (in this case the moving hand of an actor performing a pointing movement). After movement onset, participants dynamically predicted the end point of the pointing movement. The outcome of this prediction was reflected in hemisphere-specific occipito-parietal alpha modulations several 100 ms before the observed movement was finished.

Similarly, the amount of alpha lateralization has been shown to correlate with cue validity [77]. Together, these studies indicate that alpha modulations reflect the brain’s predictions about upcoming stimulus contingencies—important for efficient deployment of limited processing resources.

Suppression of Object Features

Postulating a role of alpha oscillations in attentional suppression of irrelevant information requires further generalization across different tasks, stimulus features, and modalities. Indeed, neural populations processing task-irrelevant object features seem to show increased alpha activity in the cue-target interval. Snyder and Foxe [82] used coloured moving dots as targets and instructed participants via a cue to attend to either one of these object features. Areas of the dorsal visual stream showed increased alpha activity when participants shifted attention to the movement, whereas alpha activity increased in ventral areas when colour was attended. Similarly, Jokisch and Jensen [83] studied alpha modulation in the ventral and dorsal visual stream while participants remembered the orientation or identity of a face in a match-to-sample task. Consistent with the inhibitory role of alpha, they observed an alpha power increase in the dorsal stream during the identity task and in the ventral stream during the orientation task.

Extending these findings, Capilla et al. [79] demonstrated the co-representation of suppression and selection in the alpha band with distinct spatio-temporal signatures. Using a classical Posner paradigm, numbers were presented at near-threshold in the cued or uncued hemifield. Source localization of MEG signals revealed transient alpha power increase following cue presentation in dorsal parietal areas contralateral to the inhibited (unattended) hemifield. In contrast, the occipital ventral area contralateral to the attended hemifield that is associated with processing numbers and letters showed sustained alpha decrease throughout the cue-target interval. The first effect represents an alpha-mediated suppression of irrelevant spatial locations whereas the second effect represents an alpha-mediated priming of neural populations that are expected to receive the target.

Suppression Across Sensory Modalities

Further evidence for a more general role of alpha oscillations in attentional suppression comes from studies investigating other sensory modalities as well as intermodal attention.

The correspondence between visuo-spatial attention and alpha oscillations has been replicated in the somatosensory domain for painful stimuli by May et al. [84]. The authors reported lateralized anticipatory alpha modulation in primary somatosensory cortex. However, it is important to note that while the pattern of alpha lateralization is identical to the visual domain (relatively more alpha suppression contralateral to attended side) there was no evidence of alpha power increasing relative to baseline. This is in agreement with results of a study of tactile attention that also reported lateralized alpha (and beta) modulation in anticipation of a tactile target stimulus, but similarly failed to find alpha power increase as a sign of active inhibition [85].

Similarly, a study of tactile discrimination found significant alpha suppression contralateral to the attended side, but no significant increase in ipsilateral somatosensory cortex [86]. Interestingly, the same group reported a significant alpha increase in ipsilateral somatosensory cortex that contributed significantly to discrimination performance when distractors were introduced opposite to the attended side [87]. The lack of alpha increases in the previously mentioned studies could simply result from the fact that suppression was unnecessary because no distractors were presented. Therefore, these studies further support the notion of alpha oscillations playing a role in suppressing task-irrelevant information.

Another group of studies investigated the role of alpha in intermodal attention tasks based on the Posner paradigm. Targets could be presented in the auditory or visual modality with a preceding cue instructing participants to focus attention on one of these two sensory modalities [88]. Instructing participants to attend to auditory stimuli resulted in increased alpha power over visual brain areas indicating inhibition of the irrelevant sensory modality. But no increase in auditory areas was reported when attending to the visual domain. Bauer et al. [89] used an intermodal vision-touch attention paradigm and reported stronger alpha suppression in the attended sensory domain. An MEG study by Frey et al. [90] showing alpha modulation specifically in auditory cortex in an audio-visual spatial attention task complemented earlier results.

Finally, an interesting finding relating to the inhibitory role of alpha oscillations was made by Hwang et al. [91]. They studied inhibitory control with the anti-saccade task where participants are instructed to make a saccade to the opposite direction of a peripherally presented target stimulus. Here, pre-stimulus alpha power in FEF predicted saccadic inhibition.

Overall, this constitutes considerable evidence for an at least partially inhibitory role of alpha oscillations. Interestingly, recently more direct evidence for a causal involvement of alpha oscillations in the suppression of irrelevant stimulus aspects has been gathered. Rhythmic TMS at alpha frequencies was used to specifically entrain alpha oscillations in IPS—an important node of the dorsal attention network engaged during the shifting of visual spatial attention. Simultaneous EEG recordings revealed that this particular TMS protocol transiently increased alpha power and led to a suppression of the contralateral visual hemifield [92, 93].

Oscillations and the Dynamics of Attention

In the previous sections, we have summarized oscillatory mechanisms that may underlie the selection and filtering of sensory input. It is obvious that these mechanisms must operate in a highly dynamic manner: A visual search, for example, entails successive shifts of the spotlight of attention selecting yet unexplored portions of space until the target stimulus is finally found. In a mechanistic interpretation, shifts of attention have been described as cycles of disengaging and shifting the spotlight from a searched location and engaging it onto a new target [3]. This conception acknowledges a fundamental property of all neural processes that subserve attention—they take time. As an example, one cycle of shifting attention from one location to another does not occur instantaneously, but has a given duration. Furthermore, facilitatory effects on selected and suppression of irrelevant sensory input take time to build up [94]. The allocation of attention itself can thus be considered a function of time. In the following section, we focus on how intrinsic neural rhythms can serve as “clocks” of attention and provide a temporal frame for the cyclic dynamics in allocating attention.

Stimulus Anticipation and Temporal Regularities

Previous research led to the notion that our senses capitalize on rhythmic structures in sensory input to efficiently process and predict upcoming stimulation [95]. Predictions based on such temporal regularities indeed improve behavioural performance. For instance, Rohenkohl et al. [96] reported faster reaction times and greater accuracy for temporally predictable visual target stimuli within a regular, as compared with an irregular, stimulus train. Temporal regularities can be used to precisely time the deployment of anticipatory biases on sensory processing.

Without initially making a connection to intrinsic neural rhythms, Large and Jones [97, 98] introduced their Dynamic Attending Theory (DAT). The DAT provides an account for the waxing and waning of attention in time by assuming an internal oscillatory process that is able to “lock on” or “entrain” to temporal regularities in sensory input. This oscillatory conceptualization of attention is closely related to the idea that low-frequency brain oscillations underlie a selective temporal tuning of sensory cortices [34]. More specifically, periods of high and low excitability of delta-theta rhythms are a potential neural correlate of the DAT oscillator model, as pointed out by Henry and Herrmann [99]. Schroeder and Lakatos [34] further suggested that entraining to rhythmic input is metabolically optimal. In case of arrhythmic input, making temporal predictions impossible, the brain needs to resort to an energy-consumptive “continuous” processing mode instead.

Importantly, relating fluctuations in attention to the entrainment of low-frequency oscillations emphasizes the role of their phase on stimulus processing. Recent experimental work has repeatedly confirmed the role of relative delta, theta, and alpha band phase on stimulus perception [100,101,102,103]. These studies consistently demonstrated that stimuli presented during high excitability phases were detected faster and more accurately. Moreover, low-frequency oscillations have been shown to entrain to temporal regularities in sensory input through phase alignment [66, 104, 105].

Recent research into auditory speech processing has further recognized the role of entrainment in the selection of complex sensory input [106]. A recent study by Zion Golumbic et al. [107] demonstrated compellingly how low-frequency oscillations in auditory cortices selectively entrained to the speech envelope (i.e. the pitch contour) of an attended speaker in a multiple-speaker environment. Entrainment can thus be regarded as a versatile mechanism of sensory selection.

Active Sensing

Assuming oscillatory entrainment as a general mechanism of selective attention is tempting. However, only a subset of stimuli allows straightforward extraction of temporal regularities. When viewing a painting, for example, despite the absence of any periodic changes in its content, we are still able to perceive and even focus on its constituent elements. How do our perceptual systems exploit the benefits of entrainment in such a situation? Schroeder et al. [108] suggested that in the absence of temporal regularities, sensorimotor interactions lead us to produce rhythmic behaviour that imposes a temporal structure on sensory input. These authors argue that active rhythmic sampling is the rule and not the exception in at least some of our senses. Their “Active Sensing” perspective rests on a number of observations. First of all, free exploration of a sensory situation involves moving our sensors: gaze shifts successively cover areas of interest in visual scenes, and our fingers manipulate objects to experience their physical properties. Respective exploratory movements occur in a near-periodic manner. During free viewing of natural images, saccadic gaze shifts occur at a rate of three per second, and fixation dwell time is ~200 ms on average [109]. Both values correspond well to the frequency and period of delta and theta rhythms. Although corresponding findings remain scarce for active human tactile perception [110], research in the rat model shows a similar periodicity of whisking movements during haptic exploration [111]. Second, just like sensory perception, motor output seems to be slave to the rhythm; motor cortices generally exhibit rhythmic activity in the same characteristic frequency bands as sensory cortices. These rhythms are instrumental in coordinating motor activity, such as planning and executing movements [60]. For example, during slow, precise finger movements a small 5–8 Hz rhythm can be observed peripherally that originates from rhythmic activity in a thalamo-cortical loop and likely supports optimal movement control [112]. Interestingly, participants instructed to simulate Parkinsonian tremor settled naturally into the same 5–8 Hz low-frequency rhythm, highlighting the preference of the human motor system for this frequency range [113]. Third and most crucially, low-frequency cortical oscillations tend to align with quasi-periodic gaze shifts [114, 115] and haptic receptors in the rodent model [116].

Using the visual modality as an example, Schroeder et al. [108] argue that each saccade triggers a volley of “fresh” sensory input that is subsequently processed within a period of high cortical excitability. This period starts with the onset of fixation and ends before the initialization of the next saccade [117]. The concept of Active Sensing thus links rhythmic motor behaviour to rhythms in perception. It posits that we actively sample our (visual and tactile) environment using our sensory organs. Rhythmic sampling routines thereby optimally exploit periodic changes in perceptual processing of sensory input.

Note that Schroeder et al. [108] acknowledge that Active Sensing does not provide a straightforward account of selective attention for the auditory modality. This is simply because we are not able to move our ears to rhythmically sample auditory input. Interestingly, this observation ties in well with recent findings that, unlike in the visual sense, auditory processing might not underlie a low-frequency rhythmical sampling process [118].

Discrete Perception and the “Blinking Spotlight” of Attention

Although the Active Sensing framework possesses high ecological validity—it reflects how we naturally explore our (visual and haptic) environment—it deliberately disregards the fact that we are able to focus our attention on a portion of the visual field that is not in the centre of our gaze. This “covert” form of visuo-spatial attention decouples gaze fixation from selective sensory processing. It allows shifting the spotlight of attention while keeping gaze steady. Attention can thus either be allocated by shifting gaze and fixating a target (termed “overt” attention) or covertly as described before. Importantly though, both mechanisms rely on the same underlying neural circuitry, the fronto-parietal “attention-network” [25, 119].

In a seminal study on the dynamics of FEF control of attention, Buschman and Miller [120] investigated FEF neuronal activity during covert shifts of attention in awake behaving monkeys. These were trained to perform a covert visual search task in a four-item display and respond upon discovery of the target item by making a saccade towards it. While doing so, monkeys obeyed a strictly serial—predominantly clockwise—pattern as reflected in FEF neuronal activity: Neurons exhibited maximal firing when attention was allocated to their preferred location. When a target was presented at their preferred location, firing rates peaked just before the saccade (50 ms). When a target was presented one or two positions further clockwise, firing rates of the same neurons peaked earlier (100 or 200 ms prior to saccade), indicating that the attentional focus moved across successive positions in order to find the target. Importantly, firing rates were modulated by the phase of ongoing beta band oscillations of the LFP. Single-trial variations in frequency of these oscillations were predictive of corresponding saccadic reaction times. Finally, Buschman and Miller [120] were able to conclude that monkeys spend on average 44 ms per item, which corresponded well with the cycle length of observed 18–34 Hz LFP oscillations (40 ms at 25 Hz). In summary, their results provide compelling evidence for a serial periodic sampling of a search display that can be conceived of as successive shifts of the attentional spotlight, and that is implemented via rhythmic beta-band fluctuations in local neuronal excitability.

The findings of Buschman and Miller [120] leave us with the interesting possibility that rhythmic exploratory motor behaviour in terms of Active Sensing [108] might rather be a consequence of an intrinsically periodic sampling of our sensory environment than a cause. In fact, it is a long-standing notion that perception itself is based on taking discrete snapshots in contrast to merely processing continuous sensory inflow [118]. Again, neural oscillations, particularly those in the alpha and theta frequency ranges, have been identified as being instrumental in digitizing continuous input into discrete samples [121]. More specifically, Busch et al. [100] as well as Mathewson et al. [122] found that detection of near-threshold visual stimuli depended on the relative phase of ongoing 7 or 12 Hz oscillations in human EEG recordings, respectively. However, a follow-up study by Busch and VanRullen [123] emphasized the role of attention: Oscillatory phase only influenced the detection of threshold stimuli at attended, but not at unattended, locations. This finding suggests that either attention accentuates perceptual sampling or the sampling process is closely related to sensory input selection by attention.

Accordingly, a number of studies have since reported signatures of attention-based rhythmic sampling in human behavioural performance [124,125,126]. For instance, Landau and Fries [126] presented participants with two visual stimuli, one within each visual hemifield. They found that accuracy in a change detection task fluctuated rhythmically with a frequency of 4 Hz after cueing participants to attend covertly to the left or right stimulus. Moreover, the rhythm was in counterphase for both stimuli, indicating periodic shifts of attention between them. These findings were replicated by Fiebelkorn et al. [125] remarkably showing a similar 4-Hz rhythmic sampling between stimuli in different hemifields. Moreover, their experiment featured a condition investigating effects of object-based attention: In addition to target events on attended or unattended stimuli, task-relevant events could occur at an unattended location situated on the same “object” (a white bar) as the attended location. Crucially, target detection within objects obeyed an 8-Hz rhythmicity suggesting attentional sampling at a higher temporal rate.

Overall, these findings accord well with the notion of a “blinking” spotlight of (at least visual) attention as proposed by VanRullen et al. [124]. This notion emphasizes the intrinsic rhythmicity in sampling one object discretely or multiple objects successively, and is well in line with the reported phasic neural processes underlying attentional selection. Furthermore, recent results indicate that the blinking-spotlight framework might further elucidate the neural underpinnings of parallel vs. serial visual search, i.e. that target search times remain constant vs. increase with increased display size [127].

Integrating Models of Oscillations and Attention

Taken together, oscillatory accounts of attention mechanisms are able to describe long-assumed properties of the underlying neural processes (e.g. the dynamics of the “spotlight”) on the level of communication within and between neuronal populations—a level that is likely the locus of neural representations of our sensory environment, intentions, and thoughts. However, it remains to be shown which of and how all of these mechanisms work in concert to produce, for example, typical scans of a visual search display that involve the selection of a stimulus while filtering out distractors, subsequently moving on to the next stimulus and repeating this cycle until the target is found. Likely candidates for an integrated framework are oscillatory interactions between frequency bands that are usually referred to as cross-frequency coupling [128].

Cross-Frequency Coupling

The most prominent cross-frequency coupling mechanism is phase-amplitude coupling (PAC) where the phase of low-frequency oscillations modulates the amplitude of high-frequency oscillations. PAC is particularly suited as a neural mechanism that can similarly account for long-range low-frequency biasing signals (phase) that further act upon short-range high-frequency stimulus representations (power) in local neuronal networks [129, 130], both processes of which are required to incorporate all described aspects of attention.

Most vividly captured in the case of visual attention, a phase reset of low-frequency biasing signals can be generated by internal events and exerted by the fronto-parietal attention network [131], or by salient external events in the same or different sensory modalities [66, 132, 133]. These biasing signals determine local excitability cycles and thus regulate the high-frequency activity of neuronal populations that encode sensory stimulation. Evidence for PAC in human cortical activity associated with cognitive functions in general is still sparse but growing [106, 134,135,136]. Only recently, Szczepanski et al. [137] provided experimental evidence for a PAC that underpinned the control of visuo-spatial attention. In a spatial cueing task they found that the coupling strength predicted reaction times to target stimuli, thus tying PAC to a behavioural outcome that varied with the allocation of attention.

Jensen et al. [130] have proposed a model of coupled alpha and gamma band oscillations that serve in prioritizing visual input. Crucially, the model postulates that a visual scene is decomposed into its constituent objects via a transformation into a temporal code. Different gamma cycles code different objects, and the most salient item is processed first at the onset of increasing local excitability as determined by alpha phase. Importantly, current task demands may modulate the relative saliency of objects. Thus goal-directed attention can modify the order of the temporal code. Moreover, as for example in a visual search, the behavioural relevance of items of the search display can change over time. In that case, the model provides a flexible mechanism of re-prioritizing objects on each new excitability cycle (i.e. alpha phase) according to the strength of their neuronal representation (i.e. gamma power).

Although these findings and ideas show that different oscillatory phenomena associated with attention can be integrated into a consistent unified framework by assuming cross-frequency interactions such as PAC, explanatory gaps still remain to be closed. In the beginning of this chapter we have introduced current models of attention that are based on observations of single neuron behaviour. These so-called normalization models have been widely successful to explain a wide range of effects of attention on stimulus processing while, however, disregarding any oscillatory contributions. Given the explanatory power of oscillatory accounts of attention on the one side, and normalization models on the other side, it is clear that a comprehensive account of human attention (and its underlying neural processes) has to incorporate both aspects.

Hierarchical Normalization and Oscillation Model of Attention

Montijn et al. [138] undertook a pioneering foray into combining oscillations and normalization models. They identified a potential weakness of the normalization model by Reynolds and Heeger [23] when modelling the processing of two close-by stimuli along the visual processing hierarchy. They observed that the neuronal activity profiles (given by the “Population Response” diagram in Fig. 3.1) increasingly blur into each other at higher processing stages simply because the receptive field (RF) sizes of respective neurons increase. Because attention can only modulate neuronal responses at the spatial scale provided by the RFs at each stage (the “Attention Field” in Fig. 3.1), it loses its discriminative power and similarly enhances the responses to both stimuli. Put differently, a neuron with an RF that fully encompasses both stimuli would respond maximally.

Montijn et al. [138] introduced a possible solution to this limitation by reinstating the discriminability of two stimuli falling within overlapping RFs. They assumed—in accordance with the CTC framework [33]—that neuronal populations coding the stimuli would oscillate at different phases. In fact, their oscillatory extension elegantly maintains unambiguous responses to each of the two stimuli at later processing stages. Now, a neuron with an RF that fully encompasses both stimuli would receive phase-shifted input from neuronal populations coding the stimuli at an earlier processing stage. Modelling the according “Population Response”, Montijn et al. [138] were able to demonstrate that such a neuron would only give an intermediate response due to phase cancellation effects. Maximum responses instead were obtained from neurons whose RFs gave a slight preference to one of the two stimuli and thus received dominating input from—or, in terms of CTC, showed coherent activity with—the corresponding lower-tier populations.

Taking into account oscillatory phase thus preserves the possibility to selectively modulate the processing of stimuli at stages of the visual hierarchy on which a selection based on space or feature alone is difficult. In a sense, Montijn et al. [138] amended the original normalization model [23] simply by giving it a time dimension that is required for oscillatory processes to take place. Further modelling showed that this “Hierarchical Normalization and Oscillation Model of Attention” is able to accurately reproduce known effects of attention such as response and contrast gain, as well as the backward progression of the onset (and magnitude) of attentional modulation, along the visual hierarchy as first described by Buffalo et al. [94]. Despite its promise, to date, the model awaits experimental validation.

Conclusion

Expressing mechanisms of attention in terms of brain rhythms is a massively pursued effort in cognitive neuroscience. As we have reviewed in the above sections, three major components of attention that contribute to the preferential processing of behaviourally relevant sensory input can be described from an oscillatory perspective: Selective processing of attended as well as suppression or filtering-out of ignored stimulation, and the dynamic allocation of processing resources.

We have seen that at least two oscillatory phenomena play their part in boosting neural representations of attended stimuli. Neuronal populations can synchronize their firing patterns in the gamma (or beta) frequency range, enabling effective connections along which information can be transmitted. This communication-through-coherence [33, 44] readily allows a selective routing of information by increasing the coherence between neuronal populations that encode an attended stimulus. As a second complimentary mechanism, low-frequency delta/theta or alpha band oscillations can reset their phase to accommodate incoming stimulation during periods of optimal cortical excitability [34, 133]. One sensory cortex may reset the phase of others, thus tuning them to processing coincidental input in other senses [66, 132]. Such a cross-modal spread of attention may also subserve multisensory integration [68].

The suppression of irrelevant stimulation has classically been linked to oscillatory activity within the alpha band, and has been most extensively studied in the visual domain. Generally, alpha power decreases in cortical regions that process an attended portion of space, and increases in other regions that represent unattended locations [75, 80]. High alpha power thereby indicates decreased cortical excitability and, consequently, reduced stimulus processing [73, 139]. Beyond suppressing unattended spatial locations, alpha power increases have been linked to a selective inhibition of unattended object features, [82] as well as unattended sensory modalities [88, 90].

Neural mechanisms of selective gain and suppression underlie dynamics that follow the phase of intrinsic rhythms. Neural oscillators can entrain to temporally regular sensory input to match phases of optimal cortical excitability with anticipated upcoming stimulus occurrences [96, 97, 99, 133]. In the absence of temporal regularities, some of our senses tend to create periodic behaviour—such as quasi-regular eye movements in vision—to actively produce rhythmic sensory input [108]. Moreover, in the visual domain, rhythmic sampling can even occur in the absence of eye movements, i.e. when gaze remains fixated. Visual search experiments requiring covert shifts of attention still revealed a cyclic sampling of the search display [120]. These and other findings [123, 125, 126] have led to the notion of a “blinking spotlight” of attention [124], i.e. attention itself being a rhythmic sampling process independent of any sensor movement.

In summary, research over the last years has greatly emphasized the importance of brain oscillations for the neurophysiological implementation of cognitive processes of attention. Although significant progress has been made, there is still a considerable gap between psychological theories and behavioural descriptions of attention on one side, and computational models and their neurophysiological implementation on the other side. Narrowing this gap represents a formidable challenge and, at the same time, a highly promising and fruitful endeavour for interdisciplinary scientists.