Keywords

The ability to detect, remember, and use the temporal relations among stimuli is critical for anticipating their future occurrence [13]. Accurate anticipation facilitates stimulus processing and is reflected in improved perception, response times, and decision quality [48]. This chapter provides an introduction to the use of scalp-recorded electroencephalography (EEG) and event-related potentials (ERPs) as tools for investigating the cognitive and neural basis of timing and time perception. To this end, we first provide a brief description of the EEG technique and then review a broad selection of the EEG literature that addresses questions related to interval timing.

Electroencephalography (EEG) & Event Related Potentials (ERPs)

Modern EEG amplifiers have made it relatively straightforward to non-invasively record brain electrical potentials in humans with electrodes placed on the scalp (Fig. 1). These scalp-recorded potentials reflect an instantaneous summation of excitatory and inhibitory post-synaptic potentials (EPSPs, IPSPs) from tens of thousands of neurons, primarily cortical pyramidal cells, spread over several cm2 of brain surface [9].

Fig. 1
figure 1

Illustration of EEG electrode placement on a 3D head model. Electrodes are typically positioned based on percentage distances from various skull landmarks so that they can be placed consistently across participants, at least with respect to those landmarks. Across participants there is significant variation in the brain tissue that lies immediately below a particular electrode site. Moreover, because of volume conduction and summation of electrical potentials the source of the electrical signal at an electrode is not necessarily the tissue immediately beneath it (see text)

Although EEG has excellent temporal resolution, it has relatively poor spatial resolution because the detectability of a brain potential at a particular scalp electrode is determined by the orientation of the neurons with respect to the scalp surface, the organization of simultaneously active neurons with respect to each other (i.e., open versus closed-field arrangement), and the number of simultaneously active neurons (see Fig. 2). Equally important, the signal originating from one neural source can be detected at multiple scalp locations due to volume conduction of the electrical potential. Consequently, the electrical potential recorded at a specific scalp location may be the summation of signals from multiple neural generators spread over a wide region of brain at a substantial distance from the electrode site [9, 10]. There are many EEG source localization techniques, but discussion of the strengths and weaknesses of these localization methods is beyond the scope of the present chapter (for discussion see [11]).

Fig. 2
figure 2

Neuron orientation determines whether an electrical potential can be detected at the scalp. First, neurons must be aligned with respect to each other (open field arrangement; lower right panel), rather than positioned randomly (closed field arrangement; upper right panel), in order for simultaneous changes in membrane potential to be detectable by a scalp electrode. In other words, the dipoles formed by the individual neurons must sum, rather than cancel. Second, membrane potential changes from large groups of neurons, represented here as dipoles (left panel), are detectable at a scalp surface electrode when a group of neurons forming an open field is oriented perpendicularly with respect to that electrode, i.e., it comprises radial, rather than tangential, dipoles

The ongoing EEG contains voltage fluctuations that are related to the perceptual or cognitive process of interest (i.e., the signal), but it also contains voltage fluctuations (i.e., so-called noise) that are due to task irrelevant perceptual and cognitive processes (e.g., the participant thinking about lunch) and/or physiological artifacts such as heart rate, whole body movements, or eyeblinks. A typical human EEG experiment includes a relatively large number of trials in each of the experimental conditions because averaging the EEG signal across many trials from the same condition amplifies EEG features that are time- and phase-locked to the events of interest while suppressing random noise [12, 13]. The output of this averaging procedure is referred to as an ERP, the components of which can be consistently identified by polarity, latency, and scalp topography. As illustrated in Fig. 3, ERP components are either transient, meaning they span a narrow time window and are evoked by rapid changes such as a stimulus onset, or sustained, meaning they span several hundred milliseconds or more and are evoked by both rapid and gradual changes [14]. It is worth emphasizing that a component does not necessarily reflect a single perceptual or cognitive process. Finally, although ERP analysis is the conventional approach to analyzing averaged EEG signals, it can be complemented by single-trial methods that examine the variability of the EEG signal across trials [15, 16].

Fig. 3
figure 3

Summary of the main steps involved in EEG data collection and analysis. Top Left: Electrodes are attached to the surface of a participant’s scalp before she performs the experiment. The EEG amplifier receives neuroelectrical signals and stimulus timing information (i.e., triggers) so that the onset time of events of interest can be assigned to the correct time point in the EEG recording. Amplified and digitized EEG signals are then stored and ready for preprocessing and analysis. Behavioral data are often collected so that brain-behavior associations can be studied. Top Center: During preprocessing, multi-channel (electrode) ongoing EEG data of the whole experimental session are checked for contamination by noise and irrelevant signals are minimized. The processed EEG data are then epoched, so that only segments of EEG signals closely related to the events of interest are retained. Epochs are grouped according to experimental condition, and averaging is performed across epochs of the same condition. Top Right: Averaging reveals a waveform containing signals that are time- and phase-locked to the onset of the event of interest. Peaks and troughs of this Event Related Potential (ERP) that have functional implications are called components, and are assigned labels according to their polarity and peak latency, e.g., the positive peak at 100 ms that is sensitive to the perceptual features of the event is labeled the P1 or P100. Bottom: ERP parameters that may be sensitive to the experimental manipulations include component amplitude, latency, and distribution across the scalp, and their relationship with behavior or other physiological signals

Detailed introductory guides to using EEG/ERPs to address fundamental questions about perception and cognition are provided in a number of excellent texts [10, 17]. More advanced topics, including source localization, are covered in detail by Nunez and Srinivasan [9] and in edited volumes by Handy [17, 18] as well as Ullsperger and Debener [19].

To summarize, scalp-recorded EEG is a non-invasive recording of the neuroelectric signals generated by the brain. It primarily comprises the summation of post-synaptic potentials of cortical pyramidal neurons that are simultaneously active, in open-field configuration, and positioned radially with respect to the recording site. EEG possesses very good temporal specificity, but relatively poor spatial specificity. EEG measured during a cognitive task includes neuroelectric changes that are relevant and irrelevant to the task. The ERP is a time-locked and phase-locked brain response to the event of interest.

Implicit and Explicit Timing

Perhaps the most common lab-based approach to the study of interval timing in humans is to instruct participants to attend to the durations of stimuli and then make an explicit response based on a judgment about those durations (i.e., explicit timing). For example, the judgment could be a comparison of a standard and a probe duration, a decision about whether a target interval has elapsed, or a verbal estimate of a stimulus duration.

However, there are also situations in which actions or brain responses are clearly time based or time sensitive, but the stimulus duration is judged implicitly or pre-attentively. For example, if a 10 ms tone pip is presented once every 200 ms 20 times in a row, but on the 21st presentation the tone pip is delayed by 100 ms, the brain will respond to the change even if the participant has been instructed to ignore the tone stream [20]. This ERP component, known as the Mismatch Negativity (MMN), is a sensitive marker of pre-attentive stimulus processing (e.g., [21]) and, as described below, has been used to investigate pre-attentive or implicit timing.

The distinction between explicit and implicit timing tasks is important because the different objectives and procedures in these tasks can lead to different behavioral and neural manifestations [5, 22] and this has direct consequences for the interpretation of neuroelectric signals.

Mismatch Negativity (MMN)

The auditory MMN is elicited when a stimulus violates a pattern or rule established by previously presented stimuli [21]. The rule may be defined by physical stimulus characteristics such as pitch, intensity, or duration such that an infrequent 900 Hz tone presented in a sequence of frequent 1,000 Hz tones will elicit a MMN, but also can be defined by the relationship between stimuli rather than physical characteristics [23]. For example, if participants hear a sequence of sounds in which each sound is higher in pitch than the previous one, then a lower pitch sound will elicit a MMN. The MMN component is obtained by subtracting the ERP response elicited by the more frequent (standard) stimuli from the ERP response elicited by the rare (deviant) stimuli. It is easiest to distinguish when participants are not actively attending to the auditory stimulus stream because otherwise it can be concealed due to the partially overlapping and much larger P300 response [21].

Researchers have used duration changes in the context of mismatch negativity experiments to address questions about the auditory change detection system itself, as well as questions about the cognitive and neural substrates of interval timing. Several early studies suggested that the MMN could be elicited only when the standard stimuli were at most a few hundred milliseconds long [24, 25]. However, Näätänen et al. [26] reported a MMN for stimuli of several seconds, which indicates that under at least some circumstances the pre-attentive timing process is not limited to a brief temporal window of integration.

Of greater relevance here is use of the MMN response as a tool to investigate the perceptual and cognitive processes underlying interval timing [20, 2729]. The pre-attentive nature of the MMN response lends itself to interval timing investigations that would otherwise be difficult to achieve. This includes examining sensitivity to time in the absence of attention allocation to the timing task and in the absence of explicit task instructions. Hence, the MMN allows the timing abilities of preverbal children to be tested and the functions of the adult timing system to be measured in a way that is unbiased by the instructions provided to participants.

Brannon et al. [27] used an auditory oddball task to investigate the interval timing abilities of 10-month old human infants and adults (Fig. 4). The standard intervals were defined by 50 ms tone pips separated by an inter-stimulus-interval (ISI) of 1,500 ms, whereas the rare deviant intervals had an ISI of 500 ms. The infants and adults showed comparable MMN responses to the deviant stimuli, which suggests that infants have at least some of the basic mechanisms underlying time perception. Subsequent work from the same group [28] demonstrated that although larger standard and deviant ISI ratios (1:4; 1:3; 1:2; 2:3) elicited larger MMN amplitudes, changing the duration values while keeping the standard to deviant ratio constant did not affect the MMN amplitude. Consequently, the data were interpreted as indicating that Weber’s law for time holds in infants, as well as adults. These results are important because they reveal similarities in pre-attentive interval timing between infants and adults that would otherwise be impossible to demonstrate using behavioral measures that rely on explicit instructions.

Fig. 4
figure 4

Mismatch Negativity (MMN) in infants. Top: A MMN was elicited when infants heard a stream of predominately isochronous auditory tones (1,500 ms ISI) with rare shortened ISIs (375 ms). Middle: ERPs elicited by the two ISI types revealed a strong negative response when the ISI was a deviant. The MMN is typically shown as the difference wave between the ERP of the Standard interval and that of the Deviant interval. Bottom: Topographical distributions of the infant MMN. Although not illustrated in the figure, the MMN showed systematic changes in amplitude as a function of the ratio between standard and deviant (Experiment 1), but not as a function of stimulus duration when the standard to deviant ratio was held constant (Experiment 2). Redrawn from Brannon et al. [28]

Tse and Penney [20] used the MMN to investigate how people time empty intervals (i.e., intervals demarcated by two short stimuli, one at the beginning of the interval and one at the end). Whether such intervals are timed with respect to the onsets or offsets of the demarcating stimuli has been the subject of dispute in the timing literature. However, the rule used could easily be influenced by the task instructions provided to the participant, so Tse and Penney [20] used the instruction-free MMN paradigm. Specifically, they adjusted the durations of the markers so that the pattern of MMN amplitudes elicited across the five deviant conditions would indicate the rule being applied. For example, in one condition the standard duration would be experienced as 130 ms if the participant timed the stimuli from marker onset-to-onset whereas it would be experienced as 110 ms if the participant timed it from marker offset-to-onset. The deviant stimulus in this condition was selected so that the marker onset-to-onset rule would result in a 40 ms duration, whereas the marker offset-to-onset rule would result in a 20 ms duration. Hence, the magnitude of change was 69 % under the onset-to-onset rule, but 81 % under the offset-to-onset rule. Across five deviant conditions, the onset-to-onset rule resulted in a larger deviant change than the offset-to-onset rule in some conditions, but a smaller deviant change in the other conditions. Hence, the pattern of MMN amplitude effects across the conditions would provide support for one rule or the other. The data pattern revealed that pre-attentive timing is from stimulus offset to stimulus onset in the case of empty interval timing. This experiment demonstrates that it is possible to use ERP components to discriminate between competing models of timing behavior without biasing the participant by providing instructions.

In summary, the pre-attentive change detection system in the human brain is sensitive to duration changes on the order of tens of milliseconds to several seconds. In laboratory settings, the MMN is elicited when the regularity established by the presentation of the standard stimuli is violated by rare deviant stimuli. With appropriate experimental design, MMN paradigms allow researchers to study timing in the absence of instructional bias [20] across a wide range of participant populations [27].

Omission Potentials

When participants pay attention to a stimulus train comprising regularly occurring events (i.e., a constant ISI) the omission of a stimulus from the sequence elicits an ERP component referred to as an omission potential [3037].

OPs are strongly sensitive to the temporal structure of the stimulus sequence, which suggests that they reflect neural processes related to interval timing, short term memory for time, and/or temporal expectations [32, 38]. For instance, jittering the stimulus sequence abolishes the OP for both visual and auditory stimuli [32, 39], whereas removing the task relevance of the omitted stimulus or the allocation of attention to it reduces OP amplitude, increases its latency, and latency variability [40]. Furthermore, OPs are not correlated with motor RT and are elicited even when a motor response is not required [41].

Bullock et al. [32] examined the effect of omission placement (end of the stimulus train vs. middle of a continuous train) and stimulus presentation frequency (from 0.3–40 Hz) on the visual OP. In the low presentation frequency range (0.3–2 Hz), sequences as short as two stimuli per trial across repeated trials gave rise to a stable positive OP. Jittering the ISI (e.g., regular ISI of 2 s vs. jittered ISI with mean of 2 s) or reducing attention to the stimulus train (e.g., participants were not required to count omissions) reduced the OP amplitude, demonstrating the importance of temporal regularity and attention for OP generation. Interestingly, changing the modality of the final stimulus before stimulus omission did not eliminate the OP. However, the authors did not examine whether the OP latency varied due to the modality change. In a subsequent study using auditory stimuli, Karamürsel and Bullock [39] observed a change in the OP latency. Systematic examination of OP differences across modalities is of interest because stimulus modality influences interval timing in some circumstances (see [42] for review). In this regard, the OP may serve as a useful tool for probing the origin of these differences and help reveal whether representation/processing of time is modality specific or amodal.

To this end, Penney [37] recorded participant’s EEG while they performed a stop reaction time task [43]. This task requires participants to respond when they believe a sequence of stimuli has ended. Although no explicit instructions to time the stimuli are given, participants must be sensitive to the SOA between successive stimuli because this allows them to recognize that the delay since the last stimulus occurred is long enough to indicate that the sequence is over. Penney [37] presented visual and auditory sequences in two separate blocks. Within each block, the stimulus onset asynchrony (SOA) of a sequence was either 470 or 770 ms. Biphasic omission potentials were elicited in all conditions (Fig. 5), suggesting at least a partially shared timing process across modalities. Specifically, a negative OP elicited between 150 and 200 ms after the scheduled onset of the omitted stimulus was comparable between modalities and was not related to the RT difference observed in the behavioral data. This result is consistent with an amodal regularity detection/decision mechanism.

Fig. 5
figure 5

Omission Potential (OP). Top Left: An Omission Potential can be elicited in a stop reaction time task, in which participants respond to the unpredictable termination of a stream of isochronous stimuli. The OP is measured from the time-point when the omitted stimulus would have occurred. Bottom Left: Illustration of the topographical distribution of the biphasic (negative-positive) OP reported in Penney [37] using either auditory or visual stimulus trains in the stop RT task. The early negative phase had a right frontal focus, while the late positive component had a strong parietal distribution. Right: The ERPs of the OP were comparable regardless of modality and ISI, suggesting amodal processes during implicit time estimation. The inset shows that a biphasic OP was not elicited when isochronous tones were presented, implying a relation of the OP to the violation of temporal regularity. Redrawn from Penney [37]

In a single modality experiment, Busse and Woldorff [40] asked participants to perform an auditory oddball (pitch change) detection task in which the SOA between successive tones was either 1 or 2 s and which included task irrelevant tone omissions 11, 22, or 33 % of the time. They observed a biphasic OP in all conditions regardless of SOA and percentage of tone omissions, but the OP was smaller when the SOA was 2 s as compared to 1 s and smaller when tone omissions were most frequent (i.e., 33 %). In contrast to Penney [37], they observed that the OP in the long SOA condition had a broader latency than the short SOA condition, which they attributed to increased variability in the OP as SOA increased. However, they did not determine whether the variability increase was scalar [44].

Recently, Motz et al. [36] used the auditory OP to study how humans process violations in metrical patterns. In all blocks, the main beat was produced by periodic (SOA = 1,000 ms), pink-noise bursts. A weaker beat produced by periodic, but less frequent, white noise bursts was embedded in the main beat, generating a polyrhythm either at a simple integer ratio (1/3) or a non-metrical ratio of the pink-noise beat (metrical: 33 % of the between beat distance vs. non-metrical: 43 and 53 % of the between beat distance). Omission occurred at the last expected beat of the white noise bursts. The latency of the positive component of the biphasic OP recorded at the CPz electrode corresponding to omission at 33, 43, and 53 % of the between beat distance indicated a cognitive bias that regularized perception of non-metrical beats to the nearest simple integer ratio (50 %). While the OP latency at 43 % was later than that at 33 %, the OP latency at 53 % was earlier than that at 33 %, showing up-regulation (bias towards later) and down-regulation (bias towards earlier), respectively. However, the regularization was not complete, as shown by smaller than expected changes in the OP latencies, suggesting flexibility in metric perception. In a related vein, Jongsma et al. [45] compared the OP elicited in musically trained (average of 15.6 years) and untrained individuals when they listened to rhythmic percussion sounds (ISI = 800 ms) with an unpredictable omission after three to seven beats. The amplitudes and latencies of single-trial positive OPs at the Pz electrode were identified using wavelet de-noising [46]. OP latency variability was smaller in the group of musically trained participants, suggesting better ability of implicit timing (e.g., beat perception) and/or temporal deviant detection with musical training.

To summarize, similar to the MMN, the OP reflects detection of a violation of the temporal regularity of a stimulus stream. However, unlike the MMN, elicitation of the OP appears to require that the omitted stimulus be task relevant and attended, suggesting a different underlying mechanism. The morphology of the OP also appears to change according to the temporal variability inherent in the preceding stimuli [40]. Recent timing studies using the OP suggest that certain timing processes are amodal [37] and that the brain imposes regularity in environments of high temporal predictability [36]. Finally, as with the MMN [47], the timing system contributing to the OP is susceptible to effects of training, especially for auditory stimuli [34, 45]. As demonstrated by Busse and Woldorff [40], omission of a stimulus is likely perceived as a change in stimulus probability or stimulus expectancy, thus the OP is often considered a close relative of another prominent late positive component—the P300 [38, 4850].

P300

The P300 has long been associated with decision-making [51] and is usually triggered after stimulus evaluation, but before response selection and motor execution (see [52] for review). It reflects memory and/or expectancy match [53, 54] or evaluation of the conditional probability of the occurrence of a rare target [55]. There are two types of P300: the novelty-related, frontally distributed P3a that is associated with stimulus-driven attention processes in the frontal cortical regions, and the memory-based, parietally distributed P3b that is associated with attention and memory processes in the temporal and parietal cortices [52, 56].

Posterior positive slow waves (PSW) such as the P300 and anterior negative slow waves (NSW) such as the contingent negative variation (CNV; discussed below) can co-occur in anticipatory and timing paradigms (e.g., [57, 58]), with the NSWs likely providing the context for the functions reflected by the PSWs [48, 59]. Larger NSW-PSW for interval timing tasks relative to non-timing tasks is claimed to reflect a stronger and wider activation of neural populations that is not due to difficulty differences between the two task types. For example, Gibbons et al. [60] asked participants to perform temporal generalization and pitch discrimination tasks on identical auditory stimuli. The participants were less accurate in the pitch task, but the NSW-PSW amplitudes were larger in the temporal generalization task. Moreover, this pattern remained when participants were sorted into better-timing/worse-pitch-discrimination and better-pitch-discrimination/worse-timing groups. The authors interpreted this result as indicating a stronger involvement of working memory in the timing task than in the non-timing task. A similar interaction between the CNV and P300 specific to timing tasks was also reported by Gontier et al. [61] in a contrast of duration and size discrimination.

Miniussi et al. [62] asked participants to perform a simple reaction time task in which a visual cue predicted the cue-target interval (SOA) correctly 80 % of the time (600 or 1,400 ms). The P300 elicited by the valid visual target had a shorter peak latency and was more positive for the 600 ms SOA. The authors suggested that the provision of temporal information ‘synchronizes or prepares motor processes, or sharpens decision processes’ [62, p. 1516]. The P300 in this study had a parietal distribution, resembling the P3b. Synchronization of behavior, cognitive processes, and/or neural activity is the thesis of the Dynamic Attending Theory (DAT) (see [63] for a review). DAT states that different oscillators, whether in the brain or the environment interact with one another and may result in entrainment (synchronization). Attention to stimuli is maximal at the moments of maximal entrainment, leading to more effective stimulus processing [64].

Schmidt-Kassow et al. [65] recently tested this idea by comparing the P3b amplitude and latency elicited by oddball tones when participants listened to tone sequences with varying degrees of temporal predictability. The P3b amplitude was largest and the latency shortest when tones were isochronous. The authors attributed the stronger and faster response to deviants to an entrainment effect on attention brought about by the regular temporal structure of the task.

However, effective use of the P300 to investigate interval timing requires careful consideration of exogenous factors [49]. Specifically, although a P300 amplitude difference may be observed by comparing durations that are longer and shorter than the target duration, the effect may not reflect timing-specific processes. Instead, it simply may be due to overlap from exogenous, negative ERP components when the durations are long, leading to the commonly reported effect that the P300 elicited by the offset of durations longer than the target is less positive than that elicited by durations shorter than the target (e.g., [60, 66]). Gibbons and Rammsayer [66] specifically controlled for this possibility by including a condition in which participants passively listened to the same stimuli that were used in the temporal generalization condition (ranging from 125 to 275 ms). Two late positive potentials, a parietal P300 and a frontal P500, were elicited only when duration estimation was required. The P300 decreased in amplitude as duration increased, whereas the P500 was larger when the durations were non-targets. Furthermore, these components were not modulated by variation in tone pitch. The authors proposed a two-stage model for processing brief durations. The duration-modulated, parietal P300 was interpreted as a memory-based P3b time-locked to stimulus onset, which indicates an immediate temporal processing of the stimulus that can only be completed when the stimulus is shorter than the target. The duration-insensitive, fronto-central P500 component was interpreted as a novelty P3a timelocked to the expected duration offset at the target duration (200 ms) that indicates a violation of expectation.

The P300 also has been related to performance in temporal tasks. Gibbons and Stahl [67] asked participants to reproduce a 2-s empty auditory target duration as accurately as possible. Timing performance was evaluated by median split of the sample based on either mean reproduction accuracy (absolute error) or variation of reproduction (coefficient of variation, CV). The amplitude of the marker offset P300 at Cz was more positive in the group with less variable reproductions (smaller CV). There was also a negative correlation between the offset P300 amplitude and the CV. A similar, but weaker, relationship obtained between the marker onset P300 and the CV. Consistent with their two-stage model of temporal generalization (cf. [66]), the authors proposed that the offset P300 during the target presentation indicated a comparison between the presented target and the internal representation of the target. Thus, participants did not passively attend to the presented target, but actively revised their internal representation when necessary. Better performers engaged in these processes more efficiently, forming more accurate expectations about the time of offset of the target duration, which resulted in larger offset P300 amplitudes.

Using temporal discrimination with a delayed response (1 s after the offset of the probe duration), Rebaï and colleagues [61, 6871] observed a prefrontal P300-like component after the offset of the probe duration, which they termed a late positive component of timing (LPCt). Paul et al. [70] asked participants to discriminate the visual durations in one of the three possible pairs (100/200 ms, 300/600 ms, and 1,000/2,000 ms), presented either in the order short-long or long-short. For short-long trials, an increased positive amplitude LPCt coincided with increased S2 duration, higher discrimination accuracy, and shorter RTs. In a subsequent study, Paul et al. [71] manipulated the difficulty of a visual temporal generalization task (600 ms standard) by adjusting the linear spacing between probe durations (difficult: 75 ms; easy: 150 ms). Task difficulty is believed to modulate decision thresholds in temporal generalization [72] and here the difficult version yielded fewer “same duration” responses than the easy version. The LPCt amplitude was significantly more positive for the Difficult condition than the Easy condition. The authors posited that the LPCt reflects temporal decision-making processes. Moreover, they also noted the importance of investigating both negative and positive ERP components together in order to fully reveal the temporal network [48, 59]. For example, the decision threshold and/or response uncertainty, as reflected by P300 and LPCt, may be a function of the efficiency of attentional ‘mobilization’ during the monitoring of the to-be-timed interval, as reflected by the CNV.

To summarize, the P300 has been associated with attention, memory, and the evaluation of stimulus probability and expectancy of occurrence [52, 55]; processes that have direct impact on decision making [51, 59, 70]. Changes in amplitude and latency have allowed researchers to infer the brain’s sensitivity to temporal regularity among stimuli [65] and how temporal information is tracked and updated when the time judgment has to be made in a discrete fashion [66]. The latter is consistent with the increased emphasis on the influence of contextual temporal information on temporal judgments through Bayesian principles [7375].

Contingent Negative Variation (CNV)

Walter et al. [76] first identified the CNV as an electrophysiological marker of expectancy. In this classic study, an initial stimulus (S1) served as a cue for presentation of a second stimulus (S2) that appeared 1 s later. In some conditions the S2 served as an imperative stimulus indicating a response requirement (i.e., a button press) and in others it did not. A slow negative potential with a fronto-central topographical distribution (i.e., the CNV) appeared during the S1–S2 period, but only when the S2 served as an imperative stimulus or participants were asked to estimate a 2 s duration before the button press. Typically, the CNV displays a gradual increase or ramp in negativity until it reaches a plateau and then resolves back to baseline or a positive potential value, as illustrated in Fig. 6. In some cases, the plateau is sustained for several hundred milliseconds (e.g., [77, 78]). Over the years, the CNV has been associated with a variety of physiological and cognitive functions such as arousal, motivation, attention, and anticipatory preparation [7883].

Fig. 6
figure 6

Top Right: The CNV is reliably evoked in S1–S2 paradigms. S1 and S2 can be individual stimuli or the onset and offset of a continuous tone (i.e., a filled interval). The ERP is usually time-locked to the onset of S1. Left: The CNV recorded at the FCz electrode when participants completed an auditory duration bisection task in which they had to judge whether the probe duration was more similar to the short (800 ms) or long (3,200 ms) anchor duration. The results imply that participants treated the geometric mean (1,600 ms) as the criterion duration (see text). The CNV amplitude often ramps steadily after the early perceptual ERP components such as N100 and P200. Depending on task details, the CNV may reach maximal negativity and remain sustained at that voltage value for several hundred milliseconds. Bottom: Current source density (CSD) of the CNV shows that the CNV is a long-lasting negativity over fronto-central electrode sites. CSD reduces volume-conducted signals and is thus more sensitive to superficial neural sources in the proximity of the electrode. Consequently, the topographical distribution suggests the medial frontal cortices as potential contributors to the CNV. Redrawn from Ng et al. [77]

The stimuli used to elicit a CNV may consist of a cue and an imperative stimulus [4, 62, 76], onset-offset of a continuous signal [77, 84, 85], onset-offset markers that demarcate an ‘empty’ duration [86], coincidental timing from stimulus onset to time to contact [87], a delay period between an imperative stimulus and performance feedback [88, 89], or an oddball design in which one duration is designated as the standard and one or more other durations as the deviants [90]. The CNV can also be seen in paradigms that employ isochronous stimulus sequences. For instance, Pfeuty et al. [91] analyzed the CNV elicited when participants had to discriminate two auditory sequences of three to six tones based on tempo. Praamstra et al. [92] studied the sensorimotor CNV with an implicit timing task in which participants had to make manual responses to isochronous visual cues.

The CNV has at least two subcomponents. The initial CNV (iCNV) is elicited within about 1 s of S1 onset and sometimes peaks within 1 s. It is modulated by the perceptual properties of the S1 stimulus [57, 85, 9395], S1–S2 duration probability [9698], and task-specific anticipation [79, 99]. It may reflect the orientation to S1, which prepares the participant for subsequent reaction (the ‘O’ wave; e.g., [100, 101]). The second subcomponent, the termination CNV (tCNV), overlaps with the iCNV when the S1–S2 interval is short, usually appears 1 or 2 s before S2, and increases in negativity as the S2 onset approaches. It is modulated by stimulus anticipation [102, 103], task load [79], and motor preparation [104106], but is distinct from the readiness potential (the ‘E’ wave; e.g.,) [102, 107, 108]. If the S1–S2 duration is long enough (>4 s), the two subcomponents appear as a bimodal, long-lasting CNV [109, 110]. Finally, based on a comparison of the CNVs generated in a simple reaction time task, a 4-s foreperiod task, a 4-s temporal production task, and the encoding phase of a 4-s temporal reproduction task Macar and colleagues [35, 111] argued for the existence of a third CNV component that reflects the temporal and probabilistic linkage between S1 and S2.

In general, a CNV is consistently observed only when the participant pays attention to a stimulus and/or the stimulus is task-relevant. For example, Campbell et al. [84] asked participants to respond to a 20 ms gap that appeared early (300 ms) or late (1,300 ms) in an otherwise continuous 1,400 ms tone when the tone frequency was 500 Hz, but not when it was 1,500 Hz. A sustained slow negative wave (SNW) related to the auditory stimulation was present in all conditions regardless of response requirements, but the CNV was present and superimposed on the SNW only when a response was required. The relationship of the CNV to anticipation and time perception is bolstered by findings showing a CNV for duration comparisons of auditory stimuli, but not pitch or intensity comparisons [112, 113], and in a temporal discrimination task, but not in a size discrimination task in the same test session [61].

Numerous studies have revealed an association between the CNV and time perception performance (e.g., [114, 115]). For example, Ladanyi and Dubrovsky [116] compared performance and CNVs of participants making verbal estimates of 10 or 20 s. Compared to less accurate estimators, the more accurate estimators showed smaller amplitude CNVs that resolved faster and had a slower ramping to the maximum negativity. More recently, Pfeuty et al. [85] tested temporal discrimination for filled tones and empty intervals demarcated by two brief tones. They found that the CNV amplitude was significantly larger (see also [117]) and performance (accuracy) significantly worse when the intervals were filled (69 % correct) as compared to empty (77 % correct). A recent experiment by Wiener et al. [118] demonstrated a relationship between the processes contributing to the CNV amplitude and time perception using repetitive transcranial magnetic stimulation (rTMS), which perturbs neural activity by non-invasive application of strong external magnetic fields. Participants performed temporal discrimination with and without rTMS applied to the right superior marginal gyrus (SMG). The difference in the mean CNV amplitude (270–470 ms) between rTMS and non-rTMS trials and the difference in an index derived from the proportion of ‘longer than standard’ responses in rTMS and non-rTMS trials were computed and a positive correlation was found between the two measures.

Furthermore, the putative neural sources of the CNV are implicated in interval timing, as shown by the agreement between electrophysiological source localization and functional neuroimaging data. Surface Laplacian [119, 120] EEG/MEG (a magnetic counterpart of EEG) source localizations [121123], and intracranial EEG recordings (e.g., [124, 125]) show that the supplementary motor area (SMA) and the pre-SMA, together with the right dorsal lateral prefrontal cortex (DLPFC) and posterior cortices, are among the major neural generators of the sensorimotor CNV. fMRI analyses also consistently identify the involvement of the SMA in sub- and supra-second timing (see [5, 126131] for reviews).

The CNV frequently has been interpreted within the framework of the pacemaker-accumulator model of Scalar Timing Theory (STT; [44]). According to this model, the number of pulses stored in an accumulator represents the duration of the event of interest. Comparison of this pulse count with representations of relevant durations held in long-term memory forms the basis of the decision process [72]. Although the debate about the existence and putative neural mechanisms of the ‘internal clock’ is ongoing [132135], the idea that neurons or groups of neurons acting as signal accumulators give rise to cognition is common. For example, it has been used to explain and predict performance in perceptual decision-making (e.g., [136, 137]), response competition and inhibition (e.g., [138]), as well as numerical cognition (e.g., [139141]).

Assuming there is a linear relationship between real time and perceived time [142, 143], the STT pacemaker-accumulator model asserts that neural activation increases over time, longer intervals are represented by more total clock pulses, and thus higher final neural activation. In line with this rationale, early investigations of the neural mechanisms underlying the CNV suggested that it resulted from the summation of excitatory post-synaptic potentials (EPSP) at the apical dendrites in deeper cortical layers, an indication of cortical excitability [79, 144]. Furthermore, the ramping negative potential of the CNV resembles an accumulation process resulting from spreading activation or signal integration of neurons in medial frontal brain areas [35, 120, 135, 145150].

CNV Amplitude

The hypothesis that the CNV amplitude reflects neural accumulator function during duration estimation has received some empirical support. Macar et al. [120] showed a relationship between the CNV amplitude, as determined from a surface Laplacian computation, and the subjective/perceived duration of a 2,500 ms target interval in a temporal reproduction task. The authors assigned the reproduction trials to one of three categories based on accuracy (2,600–2,800 ms; 2,400–2,600 ms; 2,200–2,400 ms) and then generated response locked CNVs for each category by participant. Comparison of the grand average waveforms of the three groups of trials indicated that the CNV amplitude decreased (i.e., became less negative) as the produced intervals decreased, even though the participants were attempting to reproduce the same 2,500 ms target duration in all cases. In a subsequent experiment, Macar and Vidal [119] further showed that the amplitude of the surface Laplacian CNV reflected a consolidated representation of the memory (Experiment 2) rather than learning or updating of the temporal memory of the target duration (Experiment 1). The importance of memory consolidation in determining the CNV was also suggested by Mochizuki et al. [151], who varied the retention period (3,000 or 9,000 ms) between encoding of a 2,700 or 3,000 ms stimulus and its reproduction. The CNV during the reproduction phase was larger for the 9,000 ms retention interval, which the authors attributed to the stronger need to reactivate the decayed memory of the target duration when the retention interval was 9,000 ms. Bendixen et al. [152] replicated and extended the amplitude effect of Macar et al. [120] using a temporal discrimination task with much shorter intervals (500 ms on average). Comparing the grand averaged onset-locked CNV from trials that received a ‘short’ response to the CNV from those classified as ‘long’, they found that N100 and CNV amplitudes were more negative when the response was ‘long’, in line with the pacemaker-accumulator hypothesis.

However, Macar and Vidal [153] failed to replicate the association between CNV amplitude and perceived duration/temporal performance when untrained participants were tested on a temporal discrimination task using intervals of about 2 s. More recently, Kononowicz and van Rijn [81] also failed to find the association in a replication of the paradigm used by Macar et al. [120]. Instead, these authors found evidence for a habituation effect on the CNV amplitude across the experimental session. Ng et al. [77] also failed to find evidence relating CNV amplitude to perceived duration in a duration bisection task with anchor durations of 800 and 3,200 ms. Intermediate probe duration trials were sorted into those that received a ‘short’ response and those that received a ‘long’ response and onset-locked CNVs were determined. There was limited support for a difference in CNV amplitude based on duration classification and when there was a difference, it tended to be opposite to the predicted direction (i.e., larger CNVs for shorter perceived durations).

Several experiments using temporal discrimination, or implicit timing tasks with sub- and supra-second durations with untrained participants also failed to find a difference in the CNV amplitude as a function of the interval duration [92, 109, 148, 154]. To summarize, although some studies demonstrated a consistent relationship between CNV amplitude and performance in a variety of timing tasks, interpreting these results as evidence for the pacemaker-accumulator model of time perception appears unwarranted given the sum total of available evidence [82, 155].

CNV Peak Latency and Slope

The initial ramping and subsequent resolution of the CNV (i.e., return to baseline from the peak negative potential) has also been claimed to reflect the memory representation of the target duration. For the initial ramp, researchers [149, 150, 156, 157] have drawn attention to the resemblance between the CNV’s gradual increase in negativity and the gradual change in the firing rate of single cells in response to different cue-target contingencies [158]. This climbing neural activity hypothesis has been used to account for the CNV elicited in timing tasks (see [159], seventh chapter of this book, for a discussion of this hypothesis in motor preparation and cued anticipation). Pfeuty et al. [157] proposed that whereas the unchanging CNV amplitude in some studies may reflect a fixed criterion of the accumulator to trigger a decision, duration encoding and differentiation is achieved by adjusting how rapidly this criterion is reached. Moreover, once the criterion is reached, a decision can be made (e.g., ‘longer than the target’) without further accumulation of temporal information, which means the CNV may resolve before stimulus offset. In fact, several authors [86, 160] noted that a critical difference between the CNV evoked by perceptual or motor preparatory experiments and the CNV evoked by time perception experiments is the early resolution of the CNV in the latter case. For example, using relatively long durations (e.g., >5 s) in a temporal discrimination task, Macar and Vitton [86] observed that the CNVs corresponding to the standard and target durations resolved before stimulus offset, while the standard—target delay (3 s) and the delay between target termination and response (3 s) elicited typical expectancy CNVs that did not resolve until the end of the specific interval. Many researchers claim that the CNV resolution marks the moment of decision-making in interval timing [77, 153, 157, 161]. It is purported that a positive decision-making or motor programming component may be superimposed on the CNV [160], consistent with the often cited co-occurrence of the CNV and late positive components such as the P300 and Late Positive Component of time [57, 70, 71, 161, 162].

Quantification of the ramping and resolution of the CNV is also done by calculating the slope of the CNV [77, 92, 161, 163, 164]. Macar and Vidal [153] used both visual and tactile temporal generalization tasks to show that the CNV peaked at the memorized target duration (2,000 ms) rather than at the end of the probe duration (2,500 or 3,100 ms). Pfeuty et al. [164] obtained similar results with a S1–S2 duration comparison task. During S2, the CNV reached its negative peak at the S1 target duration (700 ms) at left hemisphere and medial frontal electrode locations, while at right hemisphere frontal electrode sites the CNV peaked at the end of S2. The authors suggested that the distinct CNV profiles at the right and left hemisphere electrodes reflected distinct memory representations for the S1 target duration and the elapsing S2 duration. Furthermore, there was a correlation between CNV peak latency and the subjective standard derived from the generalization gradient. In a subsequent S1–S2 experiment [157], the authors showed that given the same S2 probe duration (794 ms), the peak latency of the CNV corresponded to the S1 target duration (600 vs. 794 ms), although they failed to obtain an effect of target duration on CNV amplitude. Finally, in a bisection task Ng et al. [77] found that the CNV did not ramp to its maximum at the assumed criterion, which was the geometric mean of the short and long anchor durations (1,600 ms), but did so closer to the duration of the short anchor (800 ms). The negativity remained at the same level until the geometric mean and then resolved, hinting that more temporal information is available to the participants in the bisection task than in an S1–S2 temporal task. Similar to the results of Pfeuty et al. [164], they also found that the slope of the iCNV was positively correlated with the participant’s bisection point, which is in line with an ‘accumulator-with-fixed-criterion’ hypothesis. Using a temporal discrimination task with durations of 800, 1,000, and 1,200 ms, Tarantino et al. [161] also reported an early resolution of the CNV close to the target interval.

Praamstra et al. [92] replicated the peak latency and slope effects [153, 157] in an implicit motor timing task. In this task, participants pressed one of two keys depending on whether an arrow pointed to the left or the right. Each trial comprised a short sequence of cues, each presented isochronously (2,000 ms) with the exception of the final cue. A CNV occurred between successive cues, but when the final cue was presented late (2,500 ms), the CNV peaked at the expected inter-stimulus interval (2,000 ms) and then began to resolve. Mento et al. [90] obtained similar results using an oddball task with empty visual durations. Participants were instructed to attend to the stimuli, which lasted 1,500 (70 % of the trials; standard), 2,500, or 3,000 ms (15 % each; deviants), but there was no response requirement. ERPs elicited by the two deviants showed an orderly decrease in the CNV amplitude (i.e., peak) at about the standard interval of 1,500 ms, suggesting that participants established a representation of the temporal structure of the task [165].

In contrast to the CNV amplitude results, those for the CNV peak latency and slope appear to be reasonably consistent. Indeed, studies that failed to show a relationship had a focus or experimental design that did not allow the authors to do similar analyses (e.g., [117]), or the design of the experiment did not allow participants to consolidate a temporal criterion [29, 61, 69]. The latter possibility emphasizes the importance of careful consideration of task requirements when interpreting the data [166168]. In sum, the available evidence suggests a relatively robust relationship between interval timing and CNV peak latency and slope [90], while the relationship between CNV amplitude and timing stimulus duration is equivocal at best [81, 155].

In summary, the CNV is elicited consistently in timing tasks with intervals spanning hundreds of milliseconds to several seconds. Its putative neural generators are active in both ‘automatic’ and ‘cognitively mediated’ time perception [127]. Similar to the OP, attention to the to-be-timed stimulus is required for the timing-related CNV to occur [84] and like the MMN and OP, the CNV can be elicited in paradigms without explicit timing instructions [90, 92], and like the P300, the CNV can be elicited during the timing of discrete events [91, 164]. The CNV amplitude and peak latency are influenced by the temporal information in the task [77, 120]. It is possible that the CNV reflects a temporal representation based on neural ramping and integration (pulse accumulation). This would be consistent with the pacemaker-accumulator model of STT and the climbing activity model [35, 149]. The accumulation stops and the CNV resolves when a temporal decision can be made [153, 157]. However, recent investigations of ERP components that follow the CNV resolution, such as the potentials elicited by the offset marker of an empty interval [155] and the error-related negativity (ERN; [74]), suggest that these components change depending on the magnitude of difference between the target interval and the test interval. This implies that at least some timing processes continue after the CNV has resolved. Hence, the specific relationship between the CNV and timing processes remains to be determined.

Conclusion

In this chapter, we have provided a brief overview of the range of timing and time perception questions to which scalp-recorded EEG methods have been applied. We have seen EEG/ERP measures used as a proxy for behavioral measures in situations where a task requiring behavioral response was not possible (e.g., MMN in infants) or instructions about how to complete a timing task could strongly bias the results obtained (MMN, OPs). We have also seen from the CNV literature the critical importance of seeking corroborating evidence from multiple paradigms and methods when interpreting EEG/ERP features as biomarkers of the specific cognitive processes posited by timing models. In sum, scalp-recorded EEG/ERP has great potential as an investigative tool for the study of interval timing, but much remains to be discovered.