Keywords

5.1 Introduction

How does a listener perceive the auditory world and make sense from the continuous flow of the myriad concurrent sounds in the noisy and complex soundscape impinging on our ears? A major emerging view in cognitive auditory neuroscience is that dynamic auditory input is modeled as neural traces of regularities that allow the formation of perceptual auditory objects. Indeed, sounds do not occur in isolation but are generally integrated into more complex sound patterns, as in speech, music, animal vocalizations, or common sounds such as a cell-phone ringtone. In such cases, temporal integration of ongoing sensory input plays an important role in organizing the acoustic background and thus guiding perception (Bregman 1990; Winkler 2007).

Modeling the auditory scene in search of regularities is essential not only to organize the acoustic background into meaningful percepts but also to predict future sensory events (Friston 2005; Winkler et al. 2009) and to guide attention involuntarily to potential relevant events outside the focus of attention (Escera et al. 1998; Escera and Corral 2007). This major view in cognitive neuroscience has emerged from the successful combination of the empirical, cognitive psychophysiological research that made use of mismatch negativity (MMN) (Näätänen et al. 2007), the auditory evoked potential (AEP) derived from the electroencephalogram (EEG), and several other methods. The theoretical neuroscience approach has led to the formulation of predictive coding as a general theory of perceptual inference (Friston et al. 2006). While all this research was conducted on auditory cortical responses (Deouell 2007), and all the theoretical formulation refers to the cerebral cortex (Friston 2005), neurophysiological investigations in humans and animal models show that the subcortical auditory pathway contributes to these predictive processes and, by extension, auditory cognition.

This chapter begins with a review of studies on a particular form of auditory plasticity that can be regarded as on-line, which focuses on how the auditory system captures the ongoing stimulation “on the fly”, hence adapting to the moment on a temporal scale of only a few seconds (Sects. 5.25.5). Section 5.6 briefly considers how the auditory system’s plasticity incorporates temporal scales that range from minutes to the entire life and covers how these two forms of auditory plasticity interact with each other. The integration of these two areas of research supports the emerging view of regularity encoding as a key property of the whole auditory system.

5.2 Regularity Encoding in Auditory Cortex

A broadly used approach to examine whether the acoustic environment has been internalized into neural traces is by means of oddball auditory sequences. In these sequences, a repetitive (“standard” or “common”) stimulus is presented with a high probability of occurrence, whereas a different stimulus, referred to as “deviant” or “rare,” occurs only occasionally (Fig. 5.1A). These latter stimuli elicit a typical human AEP, the MMN, peaking at 100–150 ms from deviant sound onset. The brain’s neurophysiologic response to such rare stimuli is taken as evidence that the auditory system has built up a neural representation of the preceding sound regularity (Näätänen and Winkler 1999; Winkler et al. 2009).

Fig. 5.1
figure 1

Schematic illustration of different approaches to studying regularity encoding in the auditory system. (A) In simple oddball sequences, a “standard” stimulus is repeated with high probability (blue) at regular or random inter-stimulus intervals, whereas a “rare” or “deviant” stimulus differing in any physical feature (frequency in the plot) occurs with low probability and unexpectedly (red). Notice that complex sounds, such as phonemes, syllables, or musical notes or chords can also be used in simple oddball sequences. (B) In complex oddball sequences, the regularity is not defined by mere stimulus repetition but by the contingency between successive discrete sounds. In this example using tone pairs, standard events (blue) are defined by a frequency relationship of the second tone, for example, a semi-tone higher than its pair, whereas the deviant event (red) features a relationship that is double in pitch. Notice that frequency varies across the entire spectrum for the different pairs precluding the encoding of any particular pitch as the regularity. (C) In the roving-standard sequence, a particular stimulus is repeated for a number of times and then it changes, for example, in its frequency. The first stimulus after the feature change is a deviant (red), whereas after it is repeated at least twice it becomes a standard (blue). With this approach it is possible to study not only deviant responses but also how regularity encoding evolves as a function of the number of standard stimulus repetitions

A large body of EEG evidence suggests the MMN has cortical generators in supratemporal and prefrontal cortices (Deouell 2007) in agreement with functional neuroimaging data. Indeed, using oddball sequences in blocked or event-related functional magnetic resonance imaging studies has confirmed the involvement of the both the temporal and prefrontal cortices in the neural representation of sound regularities (Opitz et al. 1999; Sabri et al. 2004).

Encoding auditory regularities does not occur only for simple acoustic feature repetitions (e.g., frequency, intensity, or duration) and complex discrete stimuli (e.g., speech sounds) but also for complex contingencies between single auditory events, such as the frequency relationship between two tones within a pair or the combination of two sound features (e.g., pitch and duration, Fig. 5.1B) (Paavilainen 2013). These more complex kinds of regularity, and particularly those defined by the relationship between stimulus features that evolve in time but vary along that feature dimension, have supported the view that the auditory cortex implements pre-attentive cognitive operations to make predictions about the near future, a kind of “primitive intelligence” in audition (Näätänen et al. 2001, 2010). Given the cortical nature of the evidence considered above, the function of regularity encoding and deviance detection has been suggested to pertain to high level cognition. However, recordings in the midbrain and the auditory thalamus of experimental animals have disclosed stimulus repetition effects at these stations of the auditory pathway, challenging the corticocentric view of regularity encoding and deviance detection. As discussed in Sect. 5.6, even the auditory brainstem can show such forms of primitive intelligence.

Another, more direct approach to investigate regularity encoding in audition is by using the so-called roving-standard sequences (Fig. 5.1C), in which trains consisting each of a different number of repetitive stimuli are isochronously presented and a particular stimulus feature is changed in every train. This particular type of sequence allows not only studying deviance-related responses (by comparing the brain response across the first tone within the trains to that across the last trains’ tones), but also how regularity encoding evolves as a function of stimulus repetition. The use of this approach reveals repetition suppression (Desimone 1996) as the mechanism of regularity encoding, viewed as the reduction of prediction error (Grill-Spector et al. 2006) in the predictive coding framework introduced in Sect. 5.1. Studies of the human AEP correlate of repetition suppression have revealed that the number of repetitions (Haenschel et al. 2005) and temporal predictability (Costa-Faidella et al. 2011a) are key factors in cortical auditory repetition suppression (Recasens et al. 2015).

5.3 Neurophysiologic Mechanisms in Regularity Encoding

A step forward in understanding regularity encoding in the auditory system was provided by single-unit and multi-unit recordings in animals, which have revealed the existence of stimulus-specific adaptation (SSA) (Ulanovsky et al. 2003) at different levels of the auditory system. SSA neurons rapidly reduce their firing rates after a few repetitions of a sound, but robust responses are restored to a rare or deviant stimulus. Interestingly, SSA shares a number of properties with the MMN cortical potential introduced in Sect. 5.1, such as enhancement by increasing the physical difference between the standard and the deviant tones or by reducing the probability of the rare stimulus. These similarities have led to the suggestion that SSA underlies MMN generation; however, existing differences between these two phenomena, the most relevant of which are in latency and anatomical location, indicate that SSA may well lie upstream of MMN generation (Nelken and Ulanovsky 2007). In other words, other intervening processes may occur between SSA and MMN as discussed in Sect. 5.4.

Neurons exhibiting SSA were first described in the primary auditory cortex of the cat (Ulanovsky et al. 2003; Nelken 2014), but they were subsequently discovered in subcortical stations of the auditory pathway, such as the inferior colliculus (IC) of the midbrain (Pérez-González et al. 2005; Malmierca et al. 2009) and medial geniculate body (MGB) of the thalamus (Antunes et al. 2010). Importantly, SSA is present in primary auditory cortex (i.e., the target of the ascending lemniscal pathway), whereas SSA in subcortical stations is stronger in non-lemniscal regions of these nuclei, such as the dorsal and rostral parts of IC (Malmierca et al. 2009) and the dorsal and medial subdivisions of MGB (Antunes et al. 2010), in agreement with seminal intracranial recordings in guinea pigs (Kraus et al. 1994). Thus, while subcortical SSA was originally suggested to originate in auditory cortex and then transmitted via the corticofugal pathway (Nelken and Ulanovsky 2007) to lower auditory centers, studies of transient deactivation of the auditory cortex suggest that SSA may emerge de novo in subcortical stations (Antunes and Malmierca 2011; Anderson and Malmierca 2013), demonstrating the genuine role of the IC and MGB in regularity encoding. The current view accepts the existence of two relatively independent systems for SSA: one lemniscal that is linked to cortex and another non-lemniscal that is linked to the subcortical auditory pathway (Nelken 2014; Malmierca et al. 2015). The functional relationship between these two systems remains to be established. Importantly, these results pave the way toward more fine-grained research of auditory regularity encoding in humans.

5.4 Regularity Encoding in Thalamocortical Networks

The existence of SSA neurons distributed along the animal auditory pathway and the fact that the latency of the novelty responses in these neurons (e.g., their responses to the rare stimuli) is about 100 ms shorter than the typical MMN latency suggest that earlier correlates of deviant detection could be found in humans. Indeed, a series of recent experiments in humans using oddball sequences that set the stimulation, recording, and analysis parameters to measure earlier auditory evoked potentials, such as the auditory brainstem response (ABR) and the middle latency response (MLR), supported the emerging view that regularity encoding and deviance detection are pervasive properties of the auditory system as a whole.

The MLR is a well-characterized sequence of waveforms in the human auditory evoked potential peaking between 12 and 50 ms from sound onset. They are labeled as N0, P0, Na, Pa, Nb, and Pb, with earliest components (N0 and P0) generated in auditory thalamocortical loops (Picton 2011) and later ones generated in primary auditory cortex (Na and Pa) or beyond (Yvert et al. 2001). Importantly, in oddball experiments aimed at recording MLRs and ABRs, it is necessary to control for stimulus characteristics as these brain responses are sensitive to the specific features of the eliciting stimuli (for recent evidence see Althen et al. 2011); also, controlling for probability factors is important. Indeed, to disclose genuine regularity encoding and disregard adaptation effects yielded by the lower probability of the rare sounds within the oddball sequence, a specific “control” condition needs to be implemented (Fig. 5.2A) (Schröger and Wolff 1996; Ruhnau et al. 2012).

Fig. 5.2
figure 2

Middle-latency response (MLR) correlates of deviance detection. (A) Experimental design: A deviant (DEV) tone of 800 Hz was delivered with a p = 2.0 among a series of standard (STD) tones of 1200 Hz as in (a); a reversed-oddball block as in (b) allowed for the comparison of responses elicited to the same physical tones in the role of standard; critically, a controlled block (c) allowed attributing the effects to true regularity encoding. (B) When the brain response was filtered in the MLR range (15–200 Hz), a clear Nb enhancement was elicited to the deviant stimulus in comparison to both the standard and the control tones (same physical stimuli in separate blocks). (C) When filtered in the long-latency response (LLR) range (0.6–35 Hz), the auditory evoked potential disclosed a remarkable amplitude enhancement to the deviant stimulus, which was generated (shown in the difference waveforms, lower right) by both refractoriness (comparison to the standard; DEV-STD) and true deviance detection (comparison to the control stimulus; DEV-CON). MMN, mismatch negativity; Na, Nb, PO, Pa, MLR waveforms. (Modified with permission from Grimm et al. 2011)

Using such a methodological approach, a number of recent studies revealed correlates of genuine regularity encoding and deviance detection in early thalamocortical networks or early cortical regions (Recasens et al. 2014). Indeed, several MLR waveforms were enhanced for changes in tone frequency (Fig. 5.2) (Grimm et al. 2011; Alho et al. 2012), location (Cornella et al. 2012; Grimm et al. 2012), intensity (Althen et al. 2011), and the spectral content of sound (Slabu et al. 2010). Also, the temporal dynamics of stimulus presentation were tracked by these early thalamocortical networks, as a stimulus occurring earlier than expected elicited clear enhancements of MLR waveforms (Leung et al. 2013).

Considering the time frame of these effects, 20–40 ms from change onset (in neural networks anatomically lower and processing stages about 100 ms earlier than those involved in MMN generation), it has been suggested recently that the deviance-related effects seen at the MLR range might be a better correlate of SSA than the MMN (Escera et al. 2014; Grimm et al. 2016). This is also supported by the fact that MMN is N-methyl-D-aspartate (NMDA) dependent (Umbricht et al. 2000), whereas SSA is not (Farley et al. 2010). In fact, a recent animal study was able to identify two stages of SSA in auditory cortex, the latter of them spanning 200–400 ms and being sensitive to NMDA blockade (Chen et al. 2015), thereby supporting the dissociation between regularity encoding and deviance detection at early (MLR) and later (MMN) processing stages. Moreover, the two stages of regularity encoding and deviance detection, early and late, have also been dissociated with regard to their functional implication in these processes (Cornella et al. 2013; Aghamolaei et al. 2016), indicating that the early thalamocortical networks of the auditory pathway are capable of coding for regularities, but that it takes a further processing step in the cerebral cortex to encode the deviant status. Taken together, the existence of SSA neurons along the animal auditory pathway, the evidence for MMN, and the early correlates of deviance detection in humans support the notion that the encoding of acoustic regularities and the detection of related deviance is a pervasive property of the entire auditory system, spanning from lower levels in the auditory pathway up to higher-order levels of the auditory cortex (Grimm and Escera 2012; Escera and Malmierca 2014; Escera et al. 2014).

5.5 Regularity Encoding in Human Auditory Brainstem

In humans, the involvement of subcortical stations in regularity encoding was recently demonstrated in a functional magnetic resonance imaging (fMRI) study that used the appropriate control and oddball sequences (Cacciaglia et al. 2015). In this study, the oddball trains were composed of two sounds, 500–1000 Hz and 1000–1500 Hz as standard and deviant, respectively, the latter occurring with a 20% probability in the second half of the train. This way, activation to the first part of the sequence that did not contain any deviant sound served as the standard condition for comparison with the second half of the sequence, where deviant sounds occurred. The fMRI acquisition parameters were set to capture activations in structures of the ascending auditory pathway, specifically, the orientation angle was set to 45° with respect to the longitudinal axis of the brainstem. Results yielded significant activations in both the left IC and bilateral MGB when contrasting the standard versus deviant conditions as well as in the contrast of deviant versus control (Fig. 5.3C). These results provide the first demonstration of the involvement of subcortical structures in genuine regularity encoding and deviance detection in humans. However, fMRI lacks the sufficient temporal resolution to disclose whether these activations occurred early in the processing chain or resulted from top-down modulations of the ascending pathway.

Fig. 5.3
figure 3

Neuroimaging evidence for the involvement of the subcortical auditory system in regularity encoding and deviance detection in humans. Broadband noise bursts spanning 500 Hz were presented in trains of 20 tokens with a stimulus onset asynchrony of 150 ms. The oddball sequences presented deviant tokens (1000–1500 Hz) from positions 12 onward among standard tokens of 500–1000 Hz. The control sequence presented five different tokens randomly. The figure shows the activations that survived correction for multiple testing using the family-wise error (FWE). The upper row shows the deviant > standard contrast whereas the middle row shows the deviant > control contrast, disclosing activations in the inferior colliculus (IC) and the medial geniculate body (MGB). The lower row plots the percent signal change in the bilateral IC and MGB. CON, control; DEV, deviant; STD, standard. *, P > 0.05. (Reprinted with permission from Cacciaglia et al. 2015)

Additionally, the involvement of the subcortical auditory pathway in genuine early regularity encoding was examined in another study in which the frequency-following response (FFR) was recorded (Slabu et al. 2012). The FFR is a sustained part of the ABR typically elicited to periodic and complex auditory stimuli such as speech sounds or music. It emerges at circa 7–15 ms from sound onset after the transient waves V and A of the phasic ABR, therefore reflecting the tonic brainstem response that is phase locked to the spectral and temporal components of the acoustic signal (Chandrasekaran and Kraus 2010; Skoe and Kraus 2010a; Kraus, Anderson, and White-Schwoch, Chap. 1). The FFR has gained recent interest in cognitive auditory neuroscience because it provides a noninvasive measure of the tracking accuracy of periodic sound characteristics in the auditory brainstem. The FFR also allows investigation of the environmental conditions and the biological mechanisms that modulate the representation of incoming sounds at this level of the auditory hierarchy by experience-dependent plasticity (Chandrasekaran et al. 2014), including language experience (Krizman et al. 2012), musical training (Parbery-Clark et al. 2011), short-term auditory training (Anderson et al. 2013), context-dependent encoding (Chandrasekaran et al. 2009), and even sensitivity to statistical properties of the stimulus in real time (Skoe and Kraus 2010b; Skoe et al. 2014).

In the Slabu and colleagues (2012) study mentioned previously, the FFR was recorded in response to a consonant-vowel stimulus /ba/ presented with a low probability (p = 0.2) amid a repetitive context set by the repetition of a different syllable (/wa/). To control for the stimulus characteristics, a reversed oddball sequence was used where the /ba/ and /wa/ syllables swapped their deviant/standard status. An additional block featuring four different tokens of the /wa/ syllable (differing in the transition duration of their first and second formants) controlled for probability to preclude mere adaptation effects (Fig. 5.4A). A significant amplitude attenuation in the response to the second and fourth harmonics of the F0 (Fig. 5.4C) of the deviant syllable compared to the standard and to the control conditions revealed genuine regularity encoding and deviance detection in the human auditory brainstem (Slabu et al. 2012).

Fig. 5.4
figure 4

Human auditory brainstem correlates of deviance detection. (A) Experimental design: a consonant-vowel /ba/ was presented randomly with low probability (p = 0.2) amongst a repetitive /wa/1 stimulus (the difference being in the transition duration of F1 and F2: 20 ms for /ba/, 35 ms for /wa/1; longer durations from 50 to 85 ms for /wa/2, /wa3/ and /wa/4 stimuli in the control block). (B) The FFR elicited to the same physical stimulus (/ba/) in the role of standard (STD), deviant (DEV) and control (CON) in different blocks. C Amplitude spectrum of the FFR elicited to /ba/ in the different conditions. Notice that compared to the standard and control conditions, the amplitude of the deviant response was attenuated in the second (H2) and fourth (H4) harmonic of the F0 (FO). The inset shows the individual FFR amplitudes at H2 and H4. *, P > 0.025. (Modified with permission from Slabu et al. 2012)

The results obtained by Slabu et al. (2012) were replicated and expanded by a recent study that investigated the interaction between stimulus probability (deviant status) and auditory learning (Skoe et al. 2014). In this study, instead of using stimuli pertaining to the phonetic inventory of the listeners, the authors presented an identical syllable (/mi/) that was varied on its pitch trajectory to form two different sounds that were minimally contrastive and with no lexical meaning for the English-speaking participants. Pitch tracking accuracy was measured by autocorrelograms. By using this approach, the authors revealed that pitch tracking was more accurate for frequent (standard) than for infrequent (deviant) stimuli (Skoe et al. 2014), thus supporting the role of the auditory brainstem in extracting statistical information from the acoustic background (i.e., regularity encoding). Moreover, the authors found that probability-dependent plasticity—the encoding of the statistical regularity—interacted with behavioral-relevance plasticity. The relationships between the deviant and standard responses varied when the participants learned to discriminate the minute pitch changes differentiating the standard and deviant stimuli during a training program.

It is interesting to note that the two studies mentioned above (Slabu et al. 2012; Skoe et al. 2014), which addressed regularity encoding and deviant detection in the human auditory brainstem, found that responses to deviant stimuli were attenuated rather than enhanced. This is in agreement with studies that found that a behaviorally relevant stimulus, such as a consonant-vowel of the linguistic repertoire of the participants, elicits larger FFRs when occurring in a repetitive context than when occurring amongst varying stimuli (Chandrasekaran et al. 2009; Parbery-Clark et al. 2011; Strait et al. 2011). However, when the nature of the eliciting stimulus lacks behavioral relevance, such as in an amplitude modulated tone (AM), the occurrence of a change in the AM frequency elicits an enhancement, rather than an attenuation, of the amplitude of the deviant-related FFR compared to the standard (Shiga et al. 2015). This is compatible with the view that the auditory brainstem prioritizes behaviorally relevant stimuli, which had been proposed to originate from mechanisms different from repetition suppression (Skoe and Kraus 2010b; Parbery-Clark et al. 2011); however, repetition suppression is the keystone of regularity encoding proposed here.

The capability of the auditory brainstem to encode for acoustic regularities has been observed not only for auditory objects (the “what” in the auditory scene), but also for the temporal, dynamic component of the auditory background, that is, to “when” a particular object is expected to occur. This is supported by a preliminary study that showed that FFR amplitude to rare delayed stimuli occurring in an otherwise regular sequence (isochronous) was enhanced compared to expected stimuli (e.g., those occurring at the regular intervals; Zarnowiec et al. 2014). It is also supported by a study showing that temporal predictability interacts with stimulus repetitions in shaping brainstem responses (Gorina-Careta et al. 2016). Interestingly, these results complement those observed for thalamocortical networks generating the MLR (Leung et al. 2013) and seminal observations for cortical responses (i.e., the MMN) (Ford and Hillyard 1981) thus indicating that encoding of the temporal dynamics in the acoustic scene also is carried out along the entire auditory system.

5.6 Relationships to Other Forms of On-Line Plasticity

The studies reviewed so far have addressed a particular form of auditory plasticity that can be regarded as on-line; it has to do with the way in which the auditory system captures the ongoing stimulation “on the fly,” hence adapting to the moment on a temporal scale that spans a few seconds (although this adaptation to the moment may also interact with longer time scales; e.g., Ulanovsky et al. 2004; Costa-Faidella et al. 2011b). From the evidence discussed, this adaptation to the moment appears as a pervasive property of the entire auditory system, from higher-order regions of the auditory cortex down to the IC at least (Ayala et al. 2012). Correlates of this adaptation to the moment, in forms of neural traces for ongoing statistical regularities, have been described for long-latency (i.e., the MMN), middle latency, and even brainstem responses of the human AEP. Moreover, neurophysiological mechanisms for this kind of on-line encoding of regularities have been associated with SSA, for which the basic strategy for testing, as with human AEP studies, consists of challenging the neural representation of the regularity with a stimulus that does not fulfill the expectation, hence measuring deviance-related responses.

These two processes, regularity encoding and deviance detection, have been considered as two faces of the same coin (Parbery-Clark et al. 2011; Escera and Malmierca 2014), yet they can be dissociated (Taaseh et al. 2011; Aghamolaei et al. 2016). However, this form of plasticity, adapting in the moment, is only one case of the multiple forms of plasticity that the auditory system and, in particular, the subcortical ascending pathway can undergo. In fact, a large series of studies conducted with the FFR to measure responses to complex auditory stimuli (with behavioral relevance) have shown that the human auditory brainstem can experience plasticity over temporal spans ranging from minutes to the entire life (Chandrasekaran et al. 2014), and that these different auditory experiences “layer” along the course of one’s own life to shape the individual’s auditory subcortical function (Skoe and Chandrasekaran 2014). In particular, three forms of on-line plasticity relate to the studies considered previously in this section.

First, a seminal study by Chandrasekaran et al. (2009) showed that the brainstem encoding of the F0 of speech sounds is context-dependent and, furthermore, that the capacity to benefit from contextual information (stimulus repetition) correlates with hearing speech-in-noise abilities. In their study, a consonant vowel /da/ was presented repetitively or among other syllables that varied in a number of acoustic features, such as formant structure, duration, voice-onset time, or F0. The results showed that the second and fourth harmonics of the F0 of the response were enhanced in the constant context compared to the variable context but only in good compared to poor readers (Chandrasekaran et al. 2009). These results indicate that the human auditory brainstem is sensitive to the ongoing stimulus context. Studies that used the same approach confirmed this effect and went a step beyond to show that the capability of the auditory brainstem to benefit from contextual information underlies enhanced speech-in-noise perception in musicians (Parbery-Clark et al. 2011) and reading and music aptitude in children (Strait et al. 2011).

A further capability of the auditory brainstem is that of performing statistical calculations of the discrete sounds occurring in the soundscape. This was demonstrated by a study that presented a series of musical notes arranged in random or in patterned sequences. In the patterned sequences, the individual sounds were constrained so that a particular note was followed by a fixed one, forming a doublet. With this method, the occurrence of an individual sound predicted a subsequent event with certainty, a prediction that did not occur in the random condition. In other words, the patterned condition set precise local statistics. Recordings revealed attenuated brainstem responses for the patterned condition compared to the random condition (Skoe et al. 2013). More striking was the finding that the capability of the auditory brainstem to extract the local statistics within the sequence predicted the individual capability to learn the implicit syntax of the sequence. These results demonstrate again the capability of the human auditory brainstem to perform computational operations on discrete auditory events and highlight the auditory brainstem’s involvement in driving behavioral outcomes.

Finally, two other studies relate more closely to those reviewed in Sect. 5.5 on regularity encoding. In one of them, a piano melody composed of five notes was presented repeatedly for a long recording session. Critically, the melody featured a note repetition (the first and second notes were identical). The results yielded two sets of effects. First, the amplitude of the brainstem responses increased for each note between the first and the second halves of the recording session (Skoe and Kraus 2010b). Second, the note repetition resulted in a repetition enhancement, that is, the amplitude of the brainstem response was larger for the second than for the first note of the melody across the entire recording session, which was analyzed in four separate quarters (Skoe and Kraus 2010b). These results suggest that the human auditory brainstem can encode for both local and global statistics.

The second of these studies (Skoe et al. 2014), as mentioned in Sect. 5.5, confirmed that a rarely occurring stimulus among a series of repetitive ones (a deviant) can be detected by the human auditory brainstem. Moreover, the results of this study also showed that probability-dependent plasticity interacts with another form of plasticity that is behavior dependent through the processes of learning. Hence, the results of this study indicate that behavioral learning can alter the way in which on-line probabilities are computed in the auditory brainstem, thereby highlighting the role of the ascending auditory pathway as a powerful computational network. Interestingly, the authors concluded that by means of long-term experience (e.g., training, but possibly other forms of exposure, such as bilingualism) (Krizman et al. 2012), learning related top-down feedback can override the local brainstem mechanisms that subserve probability detection (Skoe et al. 2014). Although somewhat tentative, this interpretation paves the way for a number of predictions that should guide future research.

The studies considered so far in this section, together with those regarding the auditory brainstem reviewed in Sect. 5.5, indicate that the ascending auditory pathway, beyond being a passive relay of auditory information toward the auditory cortex, possesses complex computational capabilities eventually contributing to auditory cognition. In particular, it is tempting to hypothesize that specific key anatomical structures of the auditory pathway, such as the IC and the MGB, may have the ability to encode for auditory regularities in the acoustic background that go beyond simple stimulus repetition, thus encompassing the relationship between successive discrete auditory stimuli and thereby supporting “primitive intelligence” (Näätänen et al. 2001, 2010).

This idea was preliminarily tested with a sequence of four different tones combining two features (duration: short, long; and pitch: high, low) that was arranged so that the duration of a particular tone predicted the pitch of the next (e.g., high-pitch tones followed short tones). After this contingency was repeated for a number of times to set the regularity, a stimulus that did not follow this implicit contingency was presented. In agreement with former studies (Bendixen et al. 2008), deviant events elicited clear cortical deviance-related responses (MMN). More interesting, however, was the observation of an enhanced amplitude of the FFR elicited to the deviant event compared to that of the standard (Schaefer et al. 2015), suggesting that the auditory brainstem was able to encode for such complex stimulus contingencies. Although preliminary, these results are encouraging and strongly suggestive of the complex and powerful computational capabilities of the human auditory brainstem.

5.7 Summary

This chapter has summarized studies that show that in humans, auditory deviance detection based on regularity encoding occurs at latencies and in neural networks comparable to those revealed in animal studies of single-neuron activity. These studies demonstrate that encoding simple acoustic-feature regularities and the detection of corresponding deviance, such as an infrequent change in frequency or location, occur in thalamocortical networks, giving rise to the MLR in separate auditory cortical regions from those generating the MMN, and occur even at the level of human auditory brainstem, as indicated by the FFR and fMRI. Taken together, these studies give support to the emerging view that regularity encoding is a basic principle of the functional organization of the auditory system, which is organized in ascending levels of complexity along the auditory pathway from the brainstem up to higher-order areas of the cerebral cortex.

Moreover, ongoing studies have started to suggest that subcortical structures in the auditory pathway can implement complex computational operations, mimicking the “primitive intelligence” attributed originally to auditory cortex (Näätänen et al. 2001, 2010) and, therefore, challenging corticocentric views of cognition (Parvizi 2009). Remarkable, for example, are the preliminary results that suggest subcortical structures can support predictive coding, as revealed by enhanced FFRs to individual stimuli that do not accomplish a rule pre-established by the dynamic ongoing sequence (such as the duration of a particular tone determining the pitch of the next) (Schaefer et al. 2015). In combination with results showing that the auditory brainstem can undergo plasticity at multiple time scales (Chandrasekaran et al. 2014), it is tempting to suggest that the inferior colliculus is a hub for primitive intelligence in audition. However, before the field can reach that far, a series of caveats and research questions need appropriate answers.

First, the FFR is a very small signal generated in very deep cerebral structures so that a large number of trials (~2000 or more) need to be recorded to reach a sufficient signal-to-nose ratio (Jeng et al. 2011). This limitation becomes stringent when one plans to address changes or effects in these minute responses as a function of complex relationships among discrete auditory stimuli (e.g., differential or contrasting effects). Some improvements have been suggested based on “optimal” paradigms (Bidelman 2015) or multichannel recordings (Bellier et al. 2015), but there is still a need for further improvement.

Second, the specific contribution of discrete subcortical structures to the FFR, and particularly the role of IC in its generation, is still to be disentangled. In fact, most of the evidence about FFR sources come from seminal observations in human patients with brainstem lesions (Sohmer et al. 1977), human intracranial recordings (Møller et al. 1988), analogies from animal studies (Smith et al. 1975), or are based on the electrode montage-dependency of the response (Davis and Britt 1984) or the phase-locking capabilities of subcortical neuronal assemblies compared to cortical ones (Joris et al. 2004). However, direct evidence is lacking. Therefore, approaches that would apply inverse solution methods capable of disclosing putative EEG subcortical sources (Trujillo-Barreto et al. 2004), magnetoencephalography (Parkkonen et al. 2009; Coffey et al. 2016), or an approach that combines FFR recordings with fMRI (Chandrasekaran et al. 2012) may provide compelling evidence for the specific subcortical generation of the FFR.

A more critical issue is the debated contribution of the corticofugal pathway to subcortical encoding of sound and, specifically, of ongoing regularity, particularly if one wants to claim the contribution of the subcortical auditory pathway to auditory cognition. Animal studies have largely demonstrated that the corticofugal pathway plays a critical role in long-term and even short-term plasticity (Suga et al. 2002; Suga 2008; Bajo et al. 2010). However, recent pharmacological (Pérez-González et al. 2012; Ayala and Malmierca 2015) and cortical transient inactivation studies in animals (Antunes and Malmierca 2011; Anderson and Malmierca 2013) suggested that on-line plasticity, or adaptation to the ongoing input statistics, may rely strictly on bottom-up processes. Yet another possibility is that subcortical cognition results from the interplay between bottom-up and top-down interactions (Skoe et al. 2013; Chandrasekaran et al. 2014). In humans, a potential approach to disentangle the interaction of top-down and bottom-up contributions to subcortical auditory cognition may be through temporarily inactivating the auditory cortex by means of transcranial magnetic stimulation (TMS) (Ahveninen et al. 2013). Another potential approach may involve interfering with cortical processing during tailored experiments to address the encoding of acoustic regularities at multiple levels of complexity by means of transcranial direct current stimulation (tDCS) (Riecke et al. 2015). Any of these approaches, used on their own or in combination, will contribute substantial progress in our understanding of the cognitive neuroscience of audition.