Introduction

In extant cognitive neuroscience models, cortical structures are argued to be crucial for complex auditory and speech processing, and subcortical structures are regarded as passive relay stations of sensory input (Hickok and Poeppel 2007; Rauschecker and Scott 2009). Such a cortico-centric bias (Parvizi 2009) has led to a demarcation of cortical structures as ‘higher-level’ and subcortical structures as ‘lower-level’ despite these structures being inter-connected by afferent (feedforward) as well as efferent (feedback) nerve fibers. Over the last two decades significant progress has been made in our understanding of human auditory subcortical plasticity by factoring in the sensory, cognitive, and reward circuitry that underlies auditory plasticity (Chandrasekaran and Kraus 2010; Kraus and Chandrasekaran 2010). Work focused on expert, as well as disordered, populations suggests that a simple demarcation of auditory function into ‘lower-level’ versus ‘higher-level,’ is outdated. There is rich evidence against a strictly input–output role for the auditory brainstem, but a comprehensive theoretical framework is still lacking. We aim to fill this gap by integrating research across animal and human auditory neuroscience to provide an enumeration of the various mechanisms underlying brainstem plasticity to behaviorally-relevant signals. We then provide a common theoretical framework and extend this framework to elucidate the nature of deficits in clinical populations. We define plasticity as the propensity for the brain to change as a function of experience.

Measuring Subcortical Auditory Function in Humans

Human subcortical function can be indexed by several neuroimaging methods including functional magnetic resonance imaging (fMRI), electroencephalography (EEG), and magnetoencephalography (MEG) (Chandrasekaran and Kraus 2010; Chandrasekaran et al. 2012a; Cheung et al. 2012; Rinne et al. 2008a; Steinmann and Gutschalk 2011). While several fMRI studies have examined subcortical processing of auditory signals, methodological issues reduce the effectiveness of this approach in studying brainstem function. First, brainstem nuclei are small, located deep inside the brain, and are relatively more susceptible to physiological noise than cortical structures (Guimaraes et al. 1998); second, the temporal precision of blood oxygenation level-dependent (BOLD) responses is on the order of seconds. At this resolution, it is difficult to study responses to fast temporal events (for e.g., formant transitions in speech signals). However, recent methodological advances in imaging methods, specifically, high resolution neuroimaging and network analyses, suggest that fMRI methods will have high utility in the future (De Martino et al. 2013; Ress and Chandrasekaran 2013).

Currently, EEG and MEG methods provide the best temporal window into subcortical auditory function and they will form the basis of this review. Although brainstem responses can be recorded using MEG (Parkkonen et al. 2009), the significant decay of magnetic fields as a function of distance has largely restricted the use of MEG to the study of cortical function. EEG, on the other hand, is more sensitive to activity in the auditory brainstem, and has been extensively used to index brainstem function. EEG components that originate from the auditory brainstem can be recorded from the scalp. The auditory brainstem response (ABR, also referred to as brainstem auditory evoked responses) reflects ensemble activity of neuronal populations in the brainstem that are tuned to transient features of sound, such as the onset of sound (Hecox and Galambos 1974), as well as sustained, phase-locked activity to the auditory stimulus (referred to as the frequency-following response) (Chandrasekaran and Kraus 2010; Krishnan et al. 2010a; Smith et al. 1975, 1978; Sohmer et al. 1977). Both onset and sustained components reflect the output of brainstem and midbrain structures and encode stimulus-related information with high temporal and spectral precision (see Fig. 1, for example). In this review, we will discuss transient and phasic responses to complex auditory stimuli, and we will collectively refer to this response as cABRs (complex-ABRs) (Skoe and Kraus 2010b). cABRs are useful in examining the neural transcription of sound, and provide a valuable aperture into how auditory experiences transform the representation of linguistic and other behaviorally-relevant complex signals. In contrast to cortical responses to complex auditory stimuli, cABRs are more isomorphic to the acoustic signal (Fig. 1). Consequently, this biological index has been used to examine how the neural representation of sounds are shaped by language and music experience (Kraus and Chandrasekaran 2010; Krishnan et al. 2010c; Skoe et al. 2013; Marmel et al. 2011; Krizman et al. 2012), how sensory encoding is transformed by developmental and aging processes (Anderson et al. 2012, 2013b, c; Chandrasekaran and Kraus 2010; Skoe and Kraus 2013), and how the representations of behaviorally-relevant signals are altered in various clinical disorders (Bradlow et al. 2003; Hornickel et al. 2011; Hornickel and Kraus 2013; Russo et al. 2005; Tzounopoulos and Kraus 2009).

Fig. 1
figure 1

The complex auditory brainstem response (cABR) (bottom) captures many of the temporal (left) and spectral (right) features of the stimulus (top). For illustration, the response to a spectrally-dynamic speech consonant-vowel (CV) speech syllable [(da)] is plotted. The first 50-ms of the stimulus represent the transition from stop-burst “d” to the sustained, steady-state vowel “a”. As seen in the spectra in the right panels, low frequencies, including those associated with pitch and timbre perception, are preserved in the cABR. In this example, the spectra reflect the envelope of the spectrally-dynamic CV transition

Origins of the cABR

While the origin of the cABR includes multiple subcortical regions, previous studies suggest that the inferior colliculus (IC), the primary auditory midbrain nucleus is a major neural source (Chandrasekaran and Kraus 2010; Krishnan et al. 2010a; Smith et al. 1975, 1978; Sohmer et al. 1977). As a convergence hub in the auditory system, the IC receives substantial efferent connections directly from the auditory cortex (AC) (Winer 2005, 2006; Winer et al. 1998) and is an obligatory station for bottom-up signals arising from other auditory brainstem nuclei. The IC is composed of three functional subdivisions. These include the tonotopically and periodotopically organized central nucleus, which receives bottom-up projections from various brainstem nuclei (Baumann et al. 2011), the multisensory lateral nucleus, and the dorsal nucleus, which receives a large proportion of the top-down corticofugal connectivity from primary and secondary auditory cortices (Winer 2006). In addition, the IC is connected with somatosensory cortex, other divisions of the midbrain (superior colliculus), cerebellum, as well as networks involved in vocalization and attentional processing (Huffman and Henson 1990). Thus, in terms of connectivity, the IC is a computational hub that is influenced by signals from a number of brain structures. The IC has a wide variety of neural cell types that are capable of representing sound with high fidelity. For example, within the central nucleus of the IC, six different cell types have been distinguished based on their discharge patterns (Peruzzi et al. 2000). A majority of these cell types show sustained discharges and others show transient discharges to sound onset. Sustained and onset responders have different cellular properties, but together, they are capable of representing complex auditory signals with high temporal precision. Not coincidentally, the IC is one of the most metabolically active neural structures in the human brain (Sokoloff 1977).

The focus in this article is the environmental conditions and the biological mechanisms that modulate the representation of the incoming signal at the level of the IC. We functionally define plasticity as a reorganization of the sensory signal as a function of experience. This reorganization can occur over a range of time scales—from a momentary, short-term change involving neural modulation to a more permanent long-term change involving reorganization of neural circuits. Here, experience is broadly defined, and extends from on-line context-dependent modulation to learning dependent changes that reflect long-term engagement with sound.

Neurobiological Mechanisms Underlying Subcortical Plasticity

Previous studies have demonstrated that the brainstem representation of speech sounds are sensitive to language experience (Krishnan et al. 2010b, c, 2005; Krizman et al. 2012). More recent studies show modulatory influences related to musical training (Musacchia et al. 2007; Parbery-Clark et al. 2009; Wong et al. 2007), and short-term auditory training (Carcagno and Plack 2011; Chandrasekaran et al. 2011; Song et al. 2008; Anderson et al. 2013c). Krishnan et al. (2005) for example, showed that native adult speakers of Mandarin Chinese, relative to native English speakers, have enhanced representation of Mandarin tone categories, as measured by the sustained, FFR component of the cABR. Similarly, native English speakers with long-term music training also demonstrated enhanced brainstem representation of Mandarin tone categories, suggesting that plasticity in the representation of speech signals is not restricted to linguistic experience (Wong et al. 2007; Bidelman et al. 2011) (see Chandrasekaran and Kraus 2010 for a review). Moreover, short-term sound-to-meaning training enhances brainstem encoding of speech, as reflected by cABRs (Chandrasekaran et al. 2012a; Russo et al. 2005; Song et al. 2008). In addition, there is emerging evidence that the brainstem sensitive to statistical properties of the stimulus in real-time (Skoe and Kraus 2010a; Skoe et al. 2013), leading to (under some conditions) changes to the response throughout the recording. For example, Skoe and Kraus (2013) measured brainstem activity to a five-note melody containing a repeated note (E3-E3-G#3-B3-E4). Over the course of the 1.5 h recording, the response progressively increased in magnitude, with the enhancement being greatest for the repeating note (i.e., the note that was presented with the highest probability). Together, this line of research argues against the outdated view of the brainstem being a passive relay station. However, to date, there is no single theoretical framework that can account for the range and timescales of subcortical plasticity that have been documented in human and animal models.

Current Models of Auditory Subcortical Function

In current theoretical frameworks, two putative mechanisms are argued to account for modulatory influences on subcortical nuclei (Table 1): (1) local modulation, which reflects modifications initiated by operations within the subcortical circuitry (Gold and Knudsen 2000; Krishnan and Gandour 2009) and (2) top-down modulation, which reflects modulations initiated by the cortex via a feedback network (Suga 2008; Suga et al. 2000, 2002). In the next few paragraphs, we expand on these two mechanisms.

Table 1 Putative mechanisms underlying subcortical auditory plasticity

Local Modulation

Local modulation refers to changes in mechanisms local to subcortical structures as a result of intrinsic (within IC) cellular changes (Dahmen et al. 2010; Escabi et al. 2003), or modifications induced by changes downstream (e.g. cochlear trauma) (Mulders et al. 2010). Indeed, electrophysiological studies in animal models demonstrate that auditory midbrain neurons rapidly adapt to stimulus statistics (Dahmen et al. 2010; Dean et al. 2005; Escabi et al. 2003; Perez-Gonzalez et al. 2005). That is, midbrain neurons dynamically adjust their firing rates to the complex statistics (e.g. mean and variance) of the sounds being presented.

In humans, Krishnan et al. (2005, 2009b, 2010c) have argued that experience-dependent effects in scalp-recorded cABRs are driven by local reorganization of brainstem circuits to selectively enhance key stimulus features (via excitatory and inhibitory synaptic plasticity). As per this account, cross-language differences in subcortical function between Mandarin Chinese and English speakers are due to differential environmental exposure to particular signal features, in this case curvilinear changes to the fundamental frequency (F0), across the two groups (Swaminathan et al. 2008; Krishnan et al. 2005, 2009a). Mandarin Chinese, being a tone language, uses more dynamic changes in fundamental frequency than English, a non-tonal language. In support of Krishnan et al.’s proposal, Jeng and colleagues compared responses from adults and neonates and showed that FFR tracking of the F0 differed between Chinese- and English-speaking adults but not between neonates from China and the United States, suggesting that language-dependent differences depend on exposure not innate differences between English and Mandarin (Jeng et al. 2011). Thus, brainstem neural ensembles likely calibrate over time to preferentially encode frequently occurring signals in one’s auditory environment. This can be likened to local changes in barn owl midbrain following experience with prismatic spectacles that shift the visual field to accommodate the altered sensory input (Feldman and Knudsen 1997). Krishnan and colleagues find that language-dependent plasticity (e.g. Chinese > English) is especially pronounced in the response to frequently occurring aspects of speech (e.g. dynamic portions of pitch contours, that are more frequently occurring in tone languages such as Chinese), as a function of long-term listening experience. In the case of the local modulation model, the environment largely drives experience-dependent modulations of brainstem function. In support of the local modulation model, experience-dependent effects are not evident for linear approximations of Mandarin pitch trajectories (Xu et al. 2006). These authors argue that the lack of a cross-language difference in brainstem encoding of linear approximations, which are judged to be ‘within-category’ by native speakers, is related to the fact that these linear approximations are non-existent in the real-world. Since IC neurons have not been exposed to these stimuli, they are not equipped to enhance representation of these artificial approximations. Based on these results, it was concluded that such experience-dependent plasticity was not driven by top-down categorical effects. Rather, enhanced representation appears to be a function of the stimulus that the IC is ‘tuned’ to as a function of long-term listening experience. However, recent studies argue for top-down modulatory influences, and these will be discussed in the next section.

Top-Down Modulation

In contrast to local reorganization, this type of plasticity represents changes in subcortical activity resulting from cortical output. Top-down modulation is argued to be critical for ‘ego-centric’ selection, that is, to enhance signals considered behaviorally-relevant to the organism. Substantial work on animal models demonstrates unequivocally that brainstem processing of auditory signals can be modified by top-down cortical feedback (Suga 2008). One mechanisms for top-down control is executed via corticofugal pathways, which are efferent feedback loops that back-project from primary and association auditory cortical regions directly onto the auditory brainstem (Winer 2006). In animal models, inactivation of the AC disrupts brainstem plasticity (Zhou and Jen 2007; Suga et al. 2000, 2002; Suga 2008). A recent study found that when the corticofugal pathways are selectively destroyed, auditory learning is severely impaired (Bajo et al. 2010). However, the selective lesion did not disrupt encoding of sounds learned prior to the lesion, suggesting that plastic effects are maintained by some form of locally-occurring mechanism (Bajo et al. 2010).

A number of studies have examined best frequency (BF) shifts, which reflect experience-dependent changes in the neuronal processing of sound frequency representation. When sounds are paired with a behaviorally-salient event (a shock, for example), experience-dependent reorganization of the IC frequency map are significantly larger than when reorganization is based on acoustic stimulation alone (Gao and Suga 1998). Thus, the behavioral relevance of a sound plays an important role in plasticity, and is established via cholinergic pathways that connect the nucleus basalis (NB) with the AC (Suga et al. 2000; Gao and Suga 1998).

While the exact role of the corticofugal system is unclear, various proposals endorse the role of this system in selective attention (Hairston et al. 2013; Krizman et al. 2012), in extracting the signal in noisy environments (Parbery-Clark et al. 2009), in promoting auditory learning (Bajo et al. 2010; Song et al. 2012; Skoe et al. 2013), or providing a higher signal-to-noise ratio to the AC (Suga 2008). In addition, top-down modulation may also be important in mediating frequency map plasticity (Suga 2008). Several lines of research demonstrate that short-term training can result in enhanced neural responsitivity to training stimuli (Kilgard 2012). This enhancement in the neural responsitivity change has been hypothesized to be closely associated with behavioral changes (e.g. increased discrimination accuracy). However, sensory map expansion can reverse without negatively affecting behavior (Reed et al. 2011). A recent proposal suggests that the expansion-renormalization of sensory maps may result from greater recruitment of various cortical and subcortical circuitry during initial learning, followed by a pruning, where the most efficient circuit is selected (Kilgard 2012). It is, thus, conceivable that one role for the corticofugal pathway may be in selecting particular circuits within the brainstem that most efficiently encode the trained stimuli.

Context-Dependent Modulation

Context-dependent modulation refers to changes in stimulus encoding based on the immediate context in which the stimulus is presented. Such plasticity may be driven by local as well as top-down modulatory effects. One neurobiological mechanism underlying context-dependent effects is hypothesized to be stimulus-specific adaptation (SSA). SSA refers to reduced responsitivity to a repeating stimulus, relative to a novel stimulus (Duque et al. 2012; Bauerle et al. 2011; Farley et al. 2010; Malmierca et al. 2009; Perez-Gonzalez et al. 2005; Ulanovsky et al. 2004). Experiments examining the extent of SSA typically use protocols (passive oddball stimulation) similar to those used to elicit preattentive change-detection components in the cortex. In passive oddball paradigms, rare sounds (deviants) are presented in the context of frequently occurring (standard) sounds. The cortical AEP change-detection component, called the mismatch negativity (MMN) is measured as a larger negative signal for the deviant relative to the standard, occurring between 150 and 300 ms post stimulus onset (Naatanen et al. 2007). Some authors have hypothesized that the SSA may be the cellular basis for the MMN since SSA was originally thought to be a cortical phenomenon (Ulanovsky et al. 2003, 2004). However, recent studies suggest that SSA occurs in the auditory thalamus and all three subdivisions of the IC as well (Duque et al. 2012; Bauerle et al. 2011; Malmierca et al. 2009; Perez-Gonzalez et al. 2005).

The fact that SSA is observed in several critical auditory nuclei suggests that it may be fundamental to auditory processing. Being able to differentially encode novel stimuli in the context of repetitively presented stimuli may have a biological implication. Rare sounds, especially in the context of repetitively presented sounds may signify danger, and therefore encoding this information with vigor may be important for survival. Thus, it is possible that SSA reflects local processes that are primed to respond to any significant, immediate change in the sensory environment by adapting to repetitive stimulation. Malmierca and colleagues argue that since a large fraction of the neurons showing SSA are onset responders, it is unlikely that SSA in the IC is driven by top-down cortical modulation. The basis for this argument is that the adaptation occurs rapidly and likely before corticofugal processes can kick in. Interestingly, the majority of neurons demonstrating SSA are located in the dorsal cortex of the IC. Although the exact function of dorsal cortex of the IC is unclear, connectivity patterns suggest a critical role in top-down control of the auditory midbrain. Thus, a more sophisticated, top-down contribution may also play a role in the generation of SSA. Consistent with this proposal, the SSA is argued to be a top-down phenomenon, emerging within the AC, and inherited by subcortical structures (Nelken and Ulanovsky 2007). As a test of this prediction, Bauerle et al. (2011) created a pharmacological lesion of the AC and examined SSA at the thalamus (Bauerle et al. 2011). Their results show a substantial reduction in SSA at the thalamic nuclei, suggesting some level of top-down involvement. An alternative explanation is that the lesion encroached beyond cortex affecting lower structures, with the outcome resembling a top-down mechanism. In contrast, a follow-up study deactivated the AC by cooling, and did not find significantly diminish SSA in the thalamus, suggesting a local dependence for SSA (Antunes and Malmierca 2011). Similarly, deactivating the AC had some effects on the IC, but did not abolish SSA (Anderson and Malmierca 2013). At this point, the relative contribution of local and top-down networks to SSA is unresolved.

Another example of contextual modulation comes from studies examining the neural bases of dynamic range adaptation (Dean et al. 2005; Wen et al. 2009). Behaviorally, the auditory system is sensitive to a large range of sound pressure levels. However, auditory neurons are responsive to a much smaller range. Studies examining midbrain processing show that firing rates of midbrain neurons shift to the most probable sound level, thereby improving the precision of sound level encoding. Such range adaptation occurs at the level of the auditory nerve as well, but to a lesser extent than at the level of the midbrain, suggesting an enhancement of neural sensitivity to stimulus statistics at each ascending level of the auditory system (Wen et al. 2009). Such processing schema have implications for speech processing as well since most naturally-occurring sounds show sound level distribution patterns similar to those used in the dynamic range adaptation studies.

Modulatory influence of stimulus context on auditory signals appears to occur at every level within the auditory system, and may involve a complex interaction between local and top-down modulatory influences. However, no current theoretical model satisfactorily discusses the relative contribution of local processes versus top-down tuning in brainstem plasticity, a topic that has important clinical implications.

Predictive Tuning Model: An Integrative Account of Subcortical Auditory Plasticity

Here we posit an integrative account of subcortical plasticity that extends the predictive coding hypothesis, which has thus far been applied only to cortical responses. In cortical predictive coding models (Friston 2005), prediction is generated and imposed via top-down feedback signals. Each level in the sensory system receives input (bottom-up) from the level below, as well as the level above (top-down). Processing in a sensory region thus attempts to reach equilibrium between bottom-up sensory information and top-down predictions (priors). Prediction errors, which capture the extent of the mismatch between the prediction and sensory input, are continuously generated. The processing goal within each sensory level is thus to reduce mismatches between levels, i.e., to reduce prediction errors. Predictive coding is continuously operational in an online fashion, leading to a near instantaneous updating of predictions based on incoming information and the history of input.

Our extension of the predictive coding model integrates several decades of work on top-down modulation in animal models (Zhou and Jen 2007; Wu and Yan 2007; Winer 2006; Villa et al. 1991; Suga et al. 2000; Suga 2008; de Boer and Thornton 2008; Bajo et al. 2010), and more current perspectives gained from Bayesian models of visual and auditory processing (Friston 2012; Kumar et al. 2011; Feldman and Friston 2010; Garrido et al. 2008, 2009a, b; Hohwy et al. 2008). Current neural models of vision and audition postulate that perceptual processes are hierarchical, with each level of hierarchy influencing other levels (Ahissar and Hochstein 2004). Within the cortex, there is theorized to be a near-continuous, ongoing comparison between predictions based on past experience, with those generated by the incoming signal, a process carried out by feedforward and feedback connectivity. Thus, higher cortical regions fit learned abstractions onto sensory information from lower level structures (Rauss et al. 2011). The predictive tuning model has been developed to explain cortical function in general. However, given extensive feedforward and feedback connectivity between the cortex, limbic system, and brainstem, there is a neurobiological basis to extend this to understand subcortical plasticity. Expanding the predictive coding hypothesis, our model posits that there is a continuous, online modulation of brainstem encoding by the AC via corticofugal pathways (Chandrasekaran et al. 2009, 2012b; Chandrasekaran and Kraus 2010). As per our model, in a mature system, short-term changes in sensory processing within the IC are largely dominated by top-down corticofugal tuning, although local processes are still active. Top-down tuning is based on a predictive algorithm that constantly anticipates the incoming stimulus stream. Within a processing level, when the incoming stimulus matches the expectation, signal representation is enhanced. Enhancement may be in the form of selective enhancement of behaviorally-relevant signal properties, or in the form of greater inhibition of irrelevant details. In the case of the auditory brainstem, enhancements could reflect more synchronous phase-locking to the stimulus, or enhancement of aspects of the stimulus deemed behaviorally-relevant. When incoming stimulus fails to match expectation, signal representation is poorer at the level of the brainstem, and this results in a prediction error at the cortex. Subcortical representation of a signal is enhanced if prediction is accurate. Integrating the various forms of plasticity discussed before, our model (predictive tuning) makes the following predictions:

Plasticity and Development

Our review of the literature demonstrates the existence of multiple forms of experience-dependent subcortical plasticity. Therefore, the concept of bottom-up versus top-down modulation is, we believe a false dichotomy. The more useful question is the relative dominance of bottom-up and top-down plasticity as a function of age. The classic studies on corticofugal plasticity (Suga and colleagues) have used mature, adult auditory systems as a model. These studies demonstrate the functional role of top-down processing on behaviorally-relevant signals. On the other hand, studies on auditory deprivation and congenital hearing loss show that higher-level constructs are abnormal without sufficient bottom-up input (Kral and Eggermont 2007). In the mature typically-developed auditory system, there is a general reduction in bottom-up plasticity and a greater emphasis on top-down mechanisms (Kral and Eggermont 2007). Thus, when sensory deprivation occurs early in life top-down mechanisms do not develop fully and consequently play a less significant compensatory role (Conway et al. 2009; Skoe et al., in press). Consistent with Kral and Eggermont (2007), we posit that bottom-up plasticity is critical in establishing stored representations; once these representations are established and stable (in mature systems), they help guide novel learning via top-down mechanisms. In other words, during development, local modulation (bottom-up plasticity) may be a dominant form of plasticity that is critical in establishing higher-order behaviorally-relevant constructs (e.g. linguistic constructs such as phonological categories, words). Once the auditory system becomes mature, there is an increasing dependence for top-down modulation for novel learning.

Contextual Modulation

Contextual modulations in the form of SSA and dynamic range adaptations continue to operate as an animal matures. These mechanisms remain fundamental to novelty detection and by optimally and dynamically adjusting neural responsitivity to stimulus statistics. Both mechanisms are present throughout the ascending pathway. However, the extent to which these mechanism but can be modulated or overridden by top-down signals is an open question. Experience-dependent processes may shift what is considered novel to the individual. As evidence of prediction, in healthy young adults, the cABR adapts to predictable stimulation, resulting in smaller amplitudes in predictable versus unpredictable stimulus conditions (Skoe et al. 2013). However, in individuals with substantial musical experience, or in individuals who are exceptional auditory learners, predictable auditory input is enhanced (Skoe et al. 2013; Parbery-Clark et al. 2011). This finding, in combination with our earlier work in language-impaired populations (Chandrasekaran et al. 2009), suggests that predictive coding manifests in the FFR along a continuum from adaptation to enhancement depending on the balance between bottom-up and top-down processes (Fig. 2) (Skoe et al. 2013).

Fig. 2
figure 2

Effects of stimulus context and probability on the cABR. The target stimulus ([da], from Fig. 1) was presented under predictable (red) or unpredictable (black) conditions. In the predictable case (red), it was presented isochronously amidst identical stimuli (p = 1.00). In the unpredictable case (black), the target stimulus was one of eight sounds (p = 0.125) that were presented in pseudo-random order such that the sounds preceding or following the target were not predictable. These stimulus conditions affect how low-frequency information is captured in the auditory brainstem response of different human populations, as illustrated here. In lifelong adult musicians (bottom, left) and children who perform above average on standardized tests of reading ability (top, left), there is a boost in low frequency activity when the target stimulus is predictable (yellow circles indicate the region of difference). In contrast, in nonmusicians and “poor reader” children (<1 standard deviation below average on standardized reading tests), a different effect is observed. In nonmusicians, the contextual differences are not registered in the brainstem, whereas in poor readers there is a statistical trend for the unpredictable condition to elicit a larger response than the predictable one. For the good and poor readers, the effect is observed in response to the CV transition and for the adult musicians and non-musicians the effect is observed in response to the steady-state vowel, which is consistent with previous work in these populations. Modified from Chandrasekaran et al. (2009) and Parbery-Clark et al. (2011) (Color figure online)

Predictive Tuning

Predictive tuning may operate at multiple time-scales, based on the present context (i.e., statistical structure of the incoming novel sound stream) as well as past learning of statistically-probable combinations. Our model posits that predictions are facilitated when the current statistics align with past experience and learned representations. Specifically, corticofugal modulation is maximal when the incoming signal is (a) predictable; (b) behaviorally relevant; and (c) aligns with existing learned representations. Consistent with these predictions, the sustained component of the cABR has been shown change depending on the statistical features of the auditory environment as well as long-term experience with signal properties (Chandrasekaran et al. 2009; Parbery-Clark et al. 2011; Slabu et al. 2012; Skoe et al. 2013; Skoe and Kraus 2010a; Marmel et al. 2011), with frequent sounds and sound combinations being represented differently than infrequent ones. This sensitivity to stimulus statistics reflects both the long-term history of input (Marmel et al. 2011; Skoe et al. 2013) and the statistics of incoming sensory input (Chandrasekaran et al. 2009; Parbery-Clark et al. 2011; Slabu et al. 2012; Skoe et al. 2013; Skoe and Kraus 2010a). Enriched experience with sound would likely increase the accuracy of predictive coding specific to that experience, whereas in the cases of impoverished experiences with sound, as might occur in clinical populations, accuracy is expected to be lower.

The strongest evidence for predictive coding comes from studies that show neural responsivity to an expected auditory signal, even though the bottom-up signal is never received (SanMiguel et al. 2013). We are not aware of such an experiment in the subcortical auditory domain as of yet. But there is evidence to suggest that higher-order percepts can substantially modify subcortical activity. For example, Galbraith and colleagues have shown that the FFR is modulated by perceptual processes. Rapidly and repetitively presented speech can often result in a mental transformation of the signal, a phenomenon referred to as ‘verbal transformation.’ FFRs recorded in individuals with high verbal transformation (indicated online by the participant) were different (smaller in amplitude) from those recorded from participants who indicated low verbal transformation (Galbraith et al. 1997). Similarly, FFRs recorded to backward speech (which is not intelligible) are different from recordings to forward speech, an effect that may be driven by phonological and prosodic expectations (Galbraith et al. 1995). Taken together, these results are suggestive of predictive modulations of the subcortical signal.

Role of Auditory Selective Attention

Selective attention may enhance the search for information-bearing elements within the signal via corticofugal pathways. Focused attention can modulate IC activity as measured by fMRI (Rinne et al. 2008b) as well as EEG (Galbraith and Arroyo 1993; Galbraith et al. 2003). Strong correlations have been demonstrated between behavioral measures of selective attention and FFR components (Ruggles et al. 2011; Krizman et al. 2012; Hairston et al. 2013). This type of attentional influence, we propose, may be distinct from the continuous ongoing predictive tuning that is automatic and continuous. However, it is conceivable that under certain circumstances, the two types of modulations may be congruent and have a compounding effect on how brainstem activity is modulated.

Subcortical Plasticity in Clinical Populations

Thus far, we focused on the positive aspect of experience-dependent brainstem plasticity by examining ‘expert’ auditory systems. However, experience-dependent auditory reorganization is not always beneficial. For example, cochlear trauma, in the form of noise exposure or hair cell damage can result in substantial modification in cellular function (increased spontaneous discharges) within the central nucleus of the IC (Hatano et al. 2012; McAlpine et al. 1997; Kitzes 1984). Such local modifications (IC hyperactivity) have been linked to the percept of tinnitus (Bauer et al. 2008). Deficient subcortical encoding could also result from failures in the predictive coding process, in establishing higher order representations, or a combination of these mechanisms. From a clinical perspective, auditory brainstem responses can be acquired relatively fast in difficult-to-test populations, and they provide a high degree of individual specificity as well (Chandrasekaran and Kraus 2010). A number of studies have demonstrated deficient brainstem encoding of complex auditory signals in clinical disorders of speech, language, and reading (Anderson et al. 2013c; Hornickel et al. 2011; Anderson and Kraus 2010b; Banai et al. 2009a; Russo et al. 2009b; Wible et al. 2004; King et al. 2002; Hornickel and Kraus 2013). However, the mechanisms underlying deficient brainstem encoding have been difficult to pinpoint. In the next section, we review findings from clinical studies that have used cABR as an index of auditory function. It is important to note that brainstem dysfunction is one of many neurological issues in clinical populations with auditory deficits. In fact, the role of auditory cortical deficits has been extensively studied using electrophysiological indices such as the mismatch negativity (Näätänen 2003; Bishop 2007). Although our review focuses on subcortical auditory function, we do not argue that behavioral dysfunction is necessarily caused by brainstem dysfunction. Rather, the goal is to understand influences on subcortical function from the study of clinical populations.

In individuals with dyslexia, a neurodevelopmental disorder of reading that affects 5–10 % of all children, there is higher trial-by-trial variability in brainstem responses (Hornickel and Kraus 2013). This finding is consistent with an animal model showing high trial-to-trial variability in neural responses to speech stimuli elicited in a knockout mouse model that had reduced expression of Kiaa0319, a gene associated with dyslexia (Centanni et al. 2013). Further, these children demonstrate (a) less effective use of stimulus context to modulate ongoing brainstem activity (Chandrasekaran et al. 2009); (b) poorer brainstem encoding of speech signals in noise (Anderson et al. 2010b; Chandrasekaran et al. 2009), and (c) poorer brainstem encoding of formant structure and timing information within the speech signal (Banai et al. 2007, 2009b; Kraus 2001; Hornickel et al. 2012). In children with specific language impairment (SLI), a developmental disorder where language skills are affected without an obvious developmental delay in other cognitive domains or hearing loss, dynamically changing or rapidly presented information in the auditory signal is not well represented relative to children with typical language ability (Basu et al. 2010). Children with autism show deficient encoding of naturally occurring curvilinear vocal pitch trajectories (Russo et al. 2008, 2009a). In addition, the brainstem response to speech signals has also been shown to be a significant predictor of speech perception in challenging listening environments (Anderson and Kraus 2010a; Anderson et al. 2010a, b, 2011, 2013b; Chandrasekaran et al. 2009; Parbery-Clark et al. 2011; Song et al. 2011b; Bidelman and Krishnan 2010). Deficient brainstem encoding in noise has been associated with poorer speech perceptual ability in younger and older adults (Anderson and Kraus 2010c; Anderson et al. 2012; Song et al. 2011a), in individuals with dyslexia (Chandrasekaran et al. 2009; Hornickel and Kraus 2013; Hornickel et al. 2009b), as well as individuals with auditory processing deficits (Billiet and Bellis 2011). Taken together, these results paint a complex clinical picture related to brainstem encoding of behaviorally-relevant signals in clinical populations. Why is a measure of brainstem transcription of sounds predictive of (a) reading ability (Banai et al. 2009b); (b) speech perception in noise (Anderson and Kraus 2010a); (c) language learning (Chandrasekaran et al. 2012b); (d) online auditory learning (Skoe et al. 2013); (e) auditory selective attention (Hairston et al. 2013; Krizman et al. 2012); and (f) phonological processing ability (Hornickel et al. 2009a, 2011; Hornickel and Kraus 2013)?

We argue that the complex clinical picture arises because brainstem responses capture multiple mechanisms that actively process the incoming signal. These mechanisms may be differentially affected, resulting in various clinical features. First, the local brainstem circuitry itself may be at fault, resulting in poorer representation of complex auditory signals within the auditory pathway. Second, the local brainstem circuitry may be ‘hyperactive’ as a result of abnormal plastic changes induced by defective bottom-up processes (Bauer et al. 2008; Mulders et al. 2010; Anderson et al. 2013a; Skoe et al., in press). This has been noted in cases of acoustic trauma to the cochlea, resulting in hyperactive IC responses, which may be a neurological basis for the percept of tinnitus. Third, a deficit in predictive coding may result in less robust brainstem responses (Chandrasekaran et al. 2009). It has been theorized that individuals with dyslexia and other language-based may have a core deficit in the mechanisms underlying SSA (Chandrasekaran et al. 2009; Ahissar et al. 2006; Oganian and Ahissar 2012; Wijnen et al. 2012), and in extracting predictable elements from the environment (Evans et al. 2009). Such a deficit could result in ineffective contextual modulation of the incoming signal. Indeed, we showed that children with developmental dyslexia failed to ‘tune’ predictable signals, but did not differ from typical readers in contexts that were unpredictable (Chandrasekaran et al. 2009). Interestingly, we found that the ability to use prior context to modulate cABRs were highly correlated with performance in a speech-in-noise task, which required participants to tune into the signal while ignoring the background noise. Finally, deficient long-term stored representations may also lead to poorer brainstem responses, as a result of a failure to ‘tune’ the brainstem via corticofugal pathways, to critical information-bearing elements within the signal. In such cases a clinical profile may typically show no deficit in quiet listening conditions, but the addition of background noise may disproportionately disrupt subcortical encoding. Thus, deficient or ‘fuzzy’ learned auditory category structures, as a result of neurological dysfunction, or typical aging, could result in deficient brainstem encoding.

Determining the Neurobiology of Auditory Subcortical Deficits

In animal models the relative contribution of top-down modulation versus local processes is discerned using pharmacological modifications, cortical cooling, or ablation methods. However, none of these invasive methods are appropriate in humans. How do we discern the relative contribution of local versus top-down tuning deficits in clinical populations? By modifying traditional analysis methods used in eliciting auditory brainstem responses and examining responses to stimuli of various behavioral relevance, the underlying mechanisms may be unveiled. Traditionally, cABR recordings have involved repetitive presentation of thousands of stimuli. However, due to several advances in analyses procedures, recent studies examining cABRs have used more sophisticated designs borrowed from the study of cortical auditory responses (Chandrasekaran et al. 2009; Skoe and Kraus 2010a; Slabu et al. 2012; Skoe et al. 2013). This has opened the possibility of using more sophisticated paradigms that can target online processing. For example, the impact of contextual effects can be discerned by using a passive oddball paradigm and comparing responses to ‘standards’ and ‘deviants’ (Slabu et al. 2012). Finally, parametrically varying the behavioral-relevance of the incoming signal, and presenting signals in a variety of adverse listening conditions may be useful ways to target the role of higher-order constructs on subcortical signal encoding.

Model Limitations and Future Directions

Many of the propositions of the predictive tuning model are derived from invasive studies on animal models. These studies have yielded critical insights into the mechanisms underlying subcortical plasticity. However, in human models, invasive studies are not possible and our current state of knowledge is largely derived from far-field EEG recordings. This leads to several caveats. In animal models, the neural locus (e.g. spatial resolution) can be clearly established. Further, near instantaneous neural responsitivity to auditory stimulation can be assessed. In contrast, spatial resolution of EEG is poor, and the noisy nature of the far-field method requires averaging across hundreds of trial. Thus, the EEG methodology may not be able to sufficiently capture intricate dynamics and subtle top-down effects that drive experience-dependent plasticity. Therefore, the extent to which many of the model proposals can be effectively tested in human models is unclear. Yet, we believe that our integrative model can serve as a test-bed to bridge the large gap between model systems. One way of moving forward is to use multimodal approaches in human studies to overcome the limitations of single methods. Recently, short-term experience-dependent plasticity in the human IC was effectively captured using fMRI and EEG responses collected from the same participant (although not simultaneously) (Chandrasekaran et al. 2012a). Since fMRI has reasonable spatial resolution, and EEG has excellent temporal resolution, the combined information provided by the two methods may more effectively inform mechanisms underlying experience-dependent plasticity.

Conclusions

While the concept of auditory subcortical structures as passive input–output pathways is outdated, current understanding of the mechanisms underlying subcortical plasticity in humans requires more substantial empirical work. Here, we outlined various forms of neural modulation evidenced in animal models. Specifically, local reorganization, contextual modulation, and experience-dependent modulations can all influence subcortical auditory processing. We posited the role of predictive tuning in mediating both local and top-down brainstem plasticity. The extent to which these mechanisms can be evaluated in humans may provide useful insights into the nature of auditory processing deficits in clinical populations.