Keywords

5.1 Introduction

Birdsong is an acoustic communication signal used in a wide range of contexts that include courtship interactions and territory advertisement. Song behavior varies substantially among the over 5000 songbird species, with species-specific variation in vocal learning, sex-specific patterns in song use, the number of songs that one individual sings, and the acoustic features of song (see Sakata and Woolley, Chap. 1). Moreover, songbirds are one of only a few taxa that learn their songs. Consequently, many of the acoustic features of an individual’s song are unique to that individual, making song a signal that conveys individual identity in addition to species, sex, location, and breeding condition. Songbirds of both sexes use the unique songs of familiar individuals to maintain social relationships with mates and territory neighbors (Catchpole and Slater 2008). The immense diversity of song behavior across species and individuals provides the opportunity to identify functional relationships between the neural circuits for auditory processing and vocal communication behavior. For example, the unique songs of individuals can be used as probes to investigate the neural mechanisms of vocal perception, including those that underlie learning, memory, sensorimotor integration, vocal production, and mate choice.

Despite the impressive diversity of birdsong across species and individuals, there exist common principles of auditory processing underlying song learning, perception, and production among species. Across many songbird species, males learn to sing as juveniles and use their adult songs to court females and to engage in aggressive exchanges with other males (see Sakata and Yazaki-Sugiyama, Chap. 2). The acoustic features of male song convey honest information about reproductive fitness to listeners (Beecher and Brenowitz 2005; Richner 2016). Additionally, the acoustic properties of male song drive female attraction to males; females use song as a mate choice cue (Riebel 2009). Consequently, females evaluate songs to choose males that will contribute to the next generation, which places a premium on auditory processing by females and on song learning and performance by males (see Podos and Seung, Chap. 9).

While the importance of hearing for song perception is obvious, determining the importance of hearing during development and the degree to which it shapes song production and perception has required experimental studies that manipulate auditory experience and analyze the effects of those manipulations on song and preference behavior. Those studies have shown that auditory exposure to adult song is required for song to develop normally and that auditory feedback is required for both song development and maintenance (Brainard and Doupe 2000; Murphy, Lawley, Smith, and Prather, Chap. 3). Additionally, song exposure is necessary for the display of some species-typical song preferences in adulthood (Lauay et al. 2004; Chen et al. 2017). The multiple ways in which song behavior depends on hearing illustrate that auditory coding is a fundamental form of neural processing in song communication.

This chapter describes the structure and function of song with particular focus on the song-related auditory tasks that birds perform to perceive and process communication signals. In light of those behavioral functions, the chapter then describes the organization, connections, and information-coding properties of the auditory pallium with particular emphasis on its roles in species and individual recognition, tutor song learning, and mate choice processes. Throughout, the chapter highlights the homologies between avian and mammalian auditory systems and the unique advantages that songbirds afford to the study of auditory processing.

5.2 Structure of Song

The complex acoustic structure of birdsong provides the dimensionality for the diversity that has been documented across species and individuals. A bird’s song is a sequence of complex sound units, hierarchically organized into notes, syllables, motifs, and bouts (Fig. 5.1). Notes are the smallest acoustic units in song and may be produced alone or grouped in time to form syllables. Syllables are therefore composed of one or multiple notes. Motifs (also called phrases or strophes) are stereotyped sequences of syllables. Birds of some species produce multiple different motifs; others repeat the same motif multiple times in singing bouts. Figure 5.1A shows a spectrogram of a zebra finch song bout in which the motif is repeated multiple times with the notes, syllables, and motifs labeled.

Fig. 5.1
figure 1

Spectrograms of songs highlight differences in song structure between individuals and species. Song spectrograms from two different zebra finches (A, B; Taeniopygia guttata) and a closely related species, the long-tailed finch (C; Poephila acuticauda). Color indicates intensity: blue is low and red is high. Lines below the top spectrogram label the different components of song including notes (top), syllables (middle), and a motif (bottom)

As with the mating vocalizations of many animals, song structure is species specific. The spectrograms in Fig. 5.1 show the acoustic features that distinguish the songs of two closely related species: the zebra finch (Taeniopygia guttata) and the long-tailed finch (Poephila acuticauda). Whereas zebra finch song is characterized by harmonic and noisy syllables (Fig. 5.1A, B), long-tailed finch song is dominated by syllables with nearly tonal frequency-modulated sweeps (Fig. 5.1C).

Unlike the mating vocalizations of most animals, the structure of birdsong is learned, and song structure depends on auditory processing at every life stage (Konishi 2004; Woolley 2008). In addition to the extraction of social information from the environment via song perception, song learning requires auditory memory and feedback of self-generated sounds during song practice (Sakata and Yazaki-Sugiyama, Chap. 2). Songbirds that are deprived of hearing adult song as juveniles or are deafened at some point during development sing highly abnormal songs as adults (Brainard and Doupe 2000; Konishi 2004). Adult maintenance of normal song output also requires auditory feedback as deafened adults gradually lose their songs (Nordeen and Nordeen 1992; Woolley and Rubel 1997). The lifelong reliance on auditory processing for normal singing indicates that understanding song learning and production requires understanding the structure and function of the auditory pallium.

5.3 Functions

Vocal signals contain rich information about the signaler, including information about its species, individual identity, location, and motivational state. Receivers can use the information present in vocal signals to make decisions about social behaviors, including whether to attack or mate. How the auditory system extracts information from vocal signals and uses this information to guide social decision-making is a fundamental question in animal behavior and neuroscience.

5.3.1 Species Recognition

One way in which auditory processing guides behavior is by directing birds, including juveniles, to the songs of their own species. Changes in heart rate (Dooling and Searcy 1980), begging behavior (Nelson and Marler 1993), and movement (Stripling et al. 2003) serve as measures of arousal and indicate that songbirds discriminate between conspecific and heterospecific vocalizations. Comparisons of these measures during the playback of different species’ songs suggests that a bird’s arousal increases most when exposed to conspecific song. Thus auditory preferences for conspecific song likely guide song learning. For example, young male zebra finches actively worked for playback of conspecific song over other songs during song learning (Adret 1993; Braaten and Reynolds 1999).

Early auditory preferences also guide a bird’s selection of song material to copy (Nelson 2000). Birds can learn heterospecific song from interactions with heterospecific adults during development (Immelmann 1969; Woolley and Moore 2011) and from audio presentation of heterospecific song (Baptista and Petrinovich 1984; Petrinovich and Baptista 1987). Moreover, when birds of some species copy heterospecific song, they produce renditions of song that are as accurate as those produced by birds copying their own species song. However, given a choice of templates, juveniles preferentially copy their own species’ songs over other songs (Marler 1970; Marler and Peters 1977). This selectivity occurs even when basic hearing sensitivity and song spectra are similar between species (Dooling and Searcy 1980; Okanoya and Dooling 1987). The findings that juveniles can copy heterospecific song but preferentially copy conspecific song indicate that song learning biases are not due to motor constraints. Instead, auditory mechanisms appear sensitive to the acoustic features that distinguish conspecific song from other sounds in the environment.

5.3.2 Individual Recognition and Auditory Memory

Complex social relationships, including those that require repeated interactions among the same individuals, benefit from the ability to remember social partners. Individual recognition reduces aggression, promotes cooperation, and stabilizes long-term social relationships (Tibbetts and Dale 2007). Songbirds interact in a number of behavioral contexts for which there is an advantage to being able to identify individuals, and there has been substantial interest in understanding the role of song in individual recognition in those contexts (e.g., Stripling et al. 2003; Dai et al. 2018). Three of the contexts that have been studied best are territoriality, mate recognition, and tutor song memorization.

5.3.2.1 Territoriality: Recognizing Territory Neighbors

Song is used in territorial interactions in a number of songbird species. In particular, male songbirds often compete for breeding territories and use song to advertise their presence and defend their occupation of a territory (Catchpole and Slater 2008; Bradbury and Vehrencamp 2011). Males in adjacent territories will interact in bouts of singing and counter-singing to establish territory boundaries and, ultimately, a relatively stable social order (Beecher et al. 2000; Catchpole and Slater 2008). Novel males or challengers singing at the edge of a territory will initially provoke an aggressive response, which can include singing, counter-singing, and physical interactions (Brooks and Falls 1975; Catchpole and Slater 2008). However, as the contested boundary is resolved, male aggression decreases such that a song broadcast from a consistent location no longer provokes an attack (“dear enemy effect”, Fisher 1954; Temeles 1994). If either the song or the location of the song changes, aggression will be reinstated (Ydenberg et al. 1988; Beecher and Brenowitz 2005). These changes in aggressive behavior imply that territorial males are able to remember and integrate information about the location and identity of other males based on their songs.

5.3.2.2 Mate’s Song Recognition

Species that form durable, long-lasting pair bonds, including monogamous species, require perceptual mechanisms for recognizing individuals. Individual recognition of a mate based on acoustic cues has been shown in a range of biparental bird species, including gannets (Sula bassana), laughing gulls (Lams atricilla), least terns (Sterna albifrons), eastern silvereyes (Zosterops lateralis), and zebra finches (Beer 1971; White 1971; Miller 1979a; Moseley 1979; Robertson 1996). In zebra finches, males and females form life-long, socially monogamous pairs (Zann 1996), and females show strong preferences for the song of their mate relative to the songs of unfamiliar conspecifics (Clayton 1988; Woolley and Doupe 2008). These data indicate that females, like males, form stable auditory memories of song that can be used to identify individuals (Woolley and Doupe 2008).

5.3.2.3 Tutor Song Memorization

Song experience during development organizes long-term perception: male and female adults remember and show behavioral preferences for songs they encountered as juveniles (Miller 1979a; Riebel 2009). Both males and females memorize the songs of their fathers or tutors during development, and these memories persist into adulthood (Miller 1979b; Clayton 1988). These song memories are then used as acquired templates for sensorimotor learning. While the tutor song memory is important for song development in birds that learn to sing, its significance for females in species in which females do not learn to sing (e.g., zebra finches) is less clear. A female may sexually imprint on her father’s song, and this auditory learning can influence attraction to particular song features or to regional dialects, thereby sculpting mate choice decisions (Riebel 2009). Taken together, these data highlight the importance of auditory learning and the ability for both male and female songbirds to memorize the songs of particular individuals for use in social interactions.

5.3.3 Song Preference and Mate Choice

Changes in the performance of particular song features can provide information about the social context or motivational state of the signaler (Sakata and Vehrencamp 2012; Podos and Sung, Chap. 9). For example, male canaries (Serinus canaria) increase the number of “sexy syllables” (broadband, two-note syllables produced at a high repetition rate) in their songs when singing to females relative to when singing alone (Fig. 5.2A) (Vallet and Kreutzer 1995), and songs incorporating more of these syllables are preferred by females (Fig. 5.2B, C) (Vallet et al. 1998). Similarly, male zebra finches produce songs that are longer, faster, and more stereotyped when males are courting females compared to when they sing in isolation (Fig. 5.2D, E) (Kao and Brainard 2006; Sossinka and Böhner 1980). Female zebra finches generally prefer courtship songs to noncourtship songs, even when the singer is unfamiliar (Fig. 5.2F). Moreover, the strength of the courtship song preference is correlated with the degree of difference in measures of pitch stereotypy or spectral entropy (Woolley and Doupe 2008; Chen et al. 2017). Thus, females attend to and prefer particular vocal characteristics of songs.

Fig. 5.2
figure 2

Courtship song preferences in female canaries (Serinus canaria) and zebra finches (Taeniopygia guttata). (A) Examples of six different canary song types, including two examples of broadband, rapidly trilled “sexy syllables” (#1 and #6). (B) Vallet and Kreutzer quantified the number of copulation solicitation displays (CSDs) in response to canary song types (#3 in this figure) embedded in a greenfinch song (het). (C) CSD responses of female canaries to greenfinch songs embedded with each of the six canary song types shown in (A). Female canaries performed significantly more CSDs to greenfinch songs embedded with sexy syllables (#1 and #6; white bars) than with other song phrases (black bars). (D) In zebra finches, the courtship song contains the same complement of syllables but differs in song performance from the noncourtship song. Courtship songs are longer, faster, and more stereotyped than noncourtship songs (E). (F) Both mated females (Mated F) and unmated females (Unmated F) prefer the courtship song (gray bars) to the noncourtship song (white bars). Moreover, females prefer the courtship song even when it is from an unfamiliar singer (Unf song). (adapted with permission from Vallet and Kreutzer 1995 and Woolley and Doupe 2008)

5.4 Organization of the Avian Auditory Pallium

Vocal communication is dependent on the ability of receivers to acquire information from acoustic communication signals. The diversity of social tasks for which songbirds use acoustic signals and the evolutionary conservation of auditory circuitry make the songbird an excellent model system for investigating how the auditory system extracts information from vocal sounds to impact social development and communication skills. Although the avian pallium is not laminated like the mammalian neocortex, recent studies of circuit organization, neuron types, gene expression, and physiological response properties demonstrated that there are parallels in the organization and function of the avian pallium and the mammalian neocortex (Karten 2013; Calabrese and Woolley 2015). This detailed knowledge of the neuroanatomy and regional organization provides a critical framework for understanding circuit function as it relates to communication behavior.

5.4.1 Neuroanatomy

The avian auditory pallium, located in the caudal forebrain, contains six major regions organized into contiguous fields of neurons. The regions are heavily interconnected, but they are distinguished by their projections, cell morphology, gene expression, and physiological response properties (Wang et al. 2010; Elliott and Theunissen 2011).

The avian auditory pallium consists of both primary and secondary auditory regions akin to primary and secondary regions of the mammalian auditory cortex. The primary auditory regions include Field L (made up of L1, L2a, L2b, and L3) and the caudolateral mesopallium (CLM) (all abbreviations appear in Table 5.1). The two secondary auditory regions are the caudomedial mesopallium (CMM) and the caudomedial nidopallium (NCM; see Fig. 5.3). Field L and the adjacent CLM form a layered structure in which the layers correspond to different regions: L1 and CLM are superficial regions, L2a and L2b are intermediate regions, and L3 is the deepest region. Direct input from the auditory thalamus (nucleus ovoidalis) arrives primarily in the intermediate region L2, and different subregions of the auditory thalamus innervate L2a versus L2b (Vates et al. 1996). Neurons in L2 project to the more superficial regions L1 and CLM, to the proximal edge of the deeper region L3, and to a secondary auditory region, the NCM. The superficial CLM connects with multiple auditory regions, including reciprocal connections with each of the subregions of Field L and medial projections to CMM. In addition, the CLM sends projections out to sensorimotor regions important for song production, including HVC (used as a proper name), HVC-shelf, the robust nucleus of the arcopallium (RA) cup, and the nucleus interfacialis of the nidopallium (NIf) (Vates et al. 1996; Bauer et al. 2008). Moreover, within the CLM is a song-selective subregion known as nucleus Avalanche (Av) with bidirectional connections to both HVC and NIf. In contrast, the CMM is heavily interconnected with both the NCM and the CLM with few projections that leave the auditory system (Vates et al. 1996). Finally, the NCM connects most extensively with the CMM and intermediate arcopallium (AIV) and less extensively with other regions of the caudal nidopallium (e.g., caudoventral nidopallium) (Atoji and Wild 2009; Mandelblat-Cerf et al. 2014).

Table 5.1 Abbreviations
Fig. 5.3
figure 3

Circuitry of the auditory pallium. (A) Top: Nissl-stained image of a parasagittal section of the auditory pallium showing cell bodies (purple stain) and lamina (white). Regions of the primary auditory pallium (CLM, Field L including L1, L2a, L2b, and L3), the secondary auditory area NC and the sensorimotor region HVC are labeled. Bottom: Drawing of the same section, with colors corresponding to the laminar regions of auditory pallium to illustrate the laminar organization. Moving from rostrodorsal to ventrocaudal, CLM and L1 are in the superficial region (green); L2a and L2b are in the intermediate region (yellow); L3 is in the deep region (blue); and NC is the secondary auditory pallium (gray) (d, dorsal; c, caudal). (B) Circuit diagram of the auditory pallium. Colors correspond to those in (A) to indicate superficial (green), intermediate (yellow), deep (blue), and secondary (gray) regions. The diagram outlines inputs from the thalamic nucleus ovoidalis (black); local connections within the auditory pallium, including the primary auditory pallium (large blue square) and secondary auditory pallium (gray); and outputs to the arcopallium and sensorimotor and motor regions (black). AIV, ventral portion of the intermediate arcopallium; CLM, caudolateral mesopallium; CMM, caudomedial mesopallium; HVC and HVCshelf, used as proper names; L, subdivisions of Field L (L1, L2a, L2b, L3); NC, caudal nidopallium; NCM, caudomedial nidopallium; RAcup, cup portion of the robust nucleus of the arcopallium (adapted with permission from Calabrese and Woolley, 2015)

Because these auditory regions lack the thinly laminated structure observed in the mammalian auditory cortex (Fig. 5.3A), there have been various hypotheses regarding the homology of avian and mammalian auditory systems (Karten 1969; Striedter 1997). However, studies of region-specific gene expression (Dugas-Ford et al. 2012), neuron types, and microcircuitry (Wang et al. 2010; Calabrese and Woolley 2015) support potential homologies between regions within the avian auditory pallium and specific layers in the mammalian auditory cortex. In particular, the different regions of the avian auditory pallium appear to be organized in a manner similar to the cortical layers in mammals (Fig. 5.3B). For example, genetic markers that identify thalamo-recipient cortical layer 4 neurons in mammals are expressed in the thalamo-recipient regions of the avian auditory pallium L2a and L2b (Dugas-Ford et al. 2012). Moreover, like the mammalian auditory cortex, the avian auditory pallium is organized into columns with neurons and axons restricted to a column while traversing the multiple regions of the pallium (Wang et al. 2010). Taken together, these data emphasize the impressive similarity between the avian auditory pallium and the mammalian auditory cortex.

5.4.2 Selectivity and Receptive Fields across the Auditory Pallium

The similarities in connectivity within the mammalian sensory cortex and the avian auditory pallium are paralleled by functional similarities. Neurons in the intermediate region have the shortest first spike latencies, and neurons in the secondary region NCM have the longest first spike latencies (Calabrese and Woolley 2015). These latency differences reflect the information processing hierarchy in the pallial circuit and mirror differences in first spike latencies across mammalian cortical layers (Atencio et al. 2009). As in the mammalian cortical circuit, response selectivity and the sparseness of population responses increase at each processing stage of the songbird auditory pallium (Fig. 5.4) (Schneider and Woolley 2013; Calabrese and Woolley 2015). Related to response selectivity, the receptive fields of individual neurons progressively increase in complexity along the hierarchy (Moore and Woolley 2019). Connectivity between putative excitatory and putative inhibitory neurons also differs by region: the connectivity patterns in intermediate, superficial, and deep regions of the songbird auditory pallium (Calabrese and Woolley 2015) map onto connectivity patterns between the same cell types in intermediate, superficial, and deep layers of the mammalian cortex (Hansen et al. 2012; Harris and Mrsic-Flogel 2013). Thus, comparable information-coding strategies of single neurons and neuronal populations in avian pallial regions and mammalian cortical layers suggest that birds and mammals have parallel, possibly homologous, auditory processing circuits.

Fig. 5.4
figure 4

Auditory responses across the auditory pallium. Raster plots of the responses of single neurons to the same song (spectrogram shown, top). Raster plots are organized from thalamo-recipient (L2a) to secondary (NCM) regions. Moving from L2a to NCM, the activity becomes more sparse and selective (i.e., responds to a smaller range of sounds in the song). Each tick mark represents a spike of a neuron, and each row of the raster plot summarizes the response of a neuron to a single presentation of the sound

5.4.2.1 Primary Auditory Pallium

The regions of the primary auditory pallium, including Field L and CLM, are tonotopically organized. Studies using pure tone stimuli have found that Field L (Zaretsky and Konishi 1976; Heil and Scheich 1991) and CLM display regions of isofrequency contours (Müller and Scheich 1985; Müller and Leppelsack 1985) and have identified multiple subcenters or tonotopic gradients within those areas. In addition, within each region there appears to be a mediolateral gradient of spectral tuning (Kim and Doupe 2011).

Beyond the simple tonotopic organization, studies using more complex sounds and reverse correlation techniques found a basic set of spectrotemporal receptive fields (STRFs), which depict the acoustic features that drive a neuron to fire. Mapping STRFs revealed region-dependent variation in spectral and temporal tuning (Hose et al. 1987; Nagel and Doupe 2008). In particular, the thalamo-recipient region L2 contains neurons with the simplest receptive fields (Nagel et al. 2011; Kim and Doupe 2011), while both deep and superficial regions have more complex receptive fields (L1, L3, CLM).

Regional differences in the firing rate and selectivity of neurons correlate with the differences in receptive fields (Nagel and Doupe 2008; Calabrese and Woolley 2015). In general, single neurons in Field L and CLM fire in response to the presentation of most conspecific songs, and spikes are reliably time-locked to specific acoustic features in a sound (Fig. 5.4). One neuron will produce distinct spike trains in response to acoustically different songs because the acoustic features that match the neuron’s receptive field occur at different points in each song. Because receptive fields differ, the same sound evokes different responses from each neuron. In addition, receptive field complexity determines a neuron’s response selectivity.

Song selectivity is often measured as the fraction of presented songs that do not evoke a response from a given neuron (Schneider and Woolley 2013; Calabrese and Woolley 2015). On average, neurons in the intermediate region L2 are significantly less selective (i.e., a smaller fraction of sounds do not evoke a response) than those at successive processing stages (Meliza and Margoliash 2012; Calabrese and Woolley 2015). Superficial-region and deep-region neurons (those in L1, L3, and CLM) produce more selective song responses with lower firing rates than do L2 neurons (Calabrese and Woolley 2015; Moore and Woolley 2019). Finally, while neurons throughout the primary auditory pallium may respond strongly to tones and modulated noise, they respond more robustly to song than to other sounds (Theunissen et al. 2004). Together, these region-specific differences indicate that tuning complexity and response selectivity increase along the primary pallial pathway.

5.4.2.2 Secondary Auditory Pallium

Neurons in the secondary regions, CMM and NCM, have more complex response properties than neurons in primary auditory areas. In particular, unlike Field L neurons, neurons in the NCM show wider and more multipeaked tuning. They are also driven less strongly by tones or noise-like sounds; instead, neurons in the NCM respond to more complex auditory features and exhibit pronounced selectivity for particular songs (Schneider and Woolley 2013; Yanagihara and Yazaki-Sugiyama 2016). For example, in European starlings (Sturnus vulgaris), CMM neurons responded selectively to learned auditory objects versus unlearned auditory objects (Gentner and Margoliash 2003; Jeanne et al. 2011). Similarly, in zebra finches, NCM neurons preferentially responded to the tutor song and/or the bird’s own song following sensory learning during development (Phan et al. 2006; Yanagihara and Yazaki-Sugiyama 2016).

In secondary sensory areas of both mammals and birds, the classical STRFs failed to accurately describe observed responses to natural stimuli (Theunissen et al. 2000; Machens et al. 2004). For example, in songbird secondary auditory regions, STRF models can explain, at most, 30% of a neuron’s response to a stimulus (Sen et al. 2001). These data highlight the challenges inherent in modeling responses to sensory stimuli in regions beyond primary auditory pallium. In particular, standard linear models do not capture nonlinear tuning properties and, therefore, do not yield accurate receptive field estimates for upstream neurons. For example, approaches that incorporated information about the probability of sounds, rather than just the spike-evoking acoustic features, improved model predictions of neural responses (Gill et al. 2008; Lu and Vicario 2017). These and other alternative approaches to strictly linear models will provide novel paths forward for measuring the receptive fields of neurons in these regions (see Sect. 5.8.4) (Gill et al. 2008; Lu and Vicario 2017).

Indeed, there are several factors beyond the selective responses to auditory objects or features that strongly modulate the activity of neurons in secondary auditory regions: stimulus history covering multiple syllables (Schneider and Woolley 2013; Lu and Vicario 2017), the acoustic environment (Terleph et al. 2008; Yang and Vicario 2015), and the behavioral salience of songs (Gobes et al. 2010). For example, repeated playback of a single song led to adaptation of both the electrophysiological (Phan et al. 2006) and immediate early gene responses (Mello et al. 1995) in the NCM (reviewed in Dong and Clayton 2009). However, the acoustic context in which the repetitions occur can affect the response (Kruse 2004; Lu and Vicario 2017). For example, if a repeated stimulus is played in a novel or unexpected context (e.g., playback of a familiar zebra finch song is unexpectedly embedded within a series of canary songs), responses to the familiar song can be enhanced (Lu and Vicario 2017). This enhanced response rapidly returns to the adapted rate when the stimulus is again played in a familiar context (Lu and Vicario 2017). Thus, NCM neuron responses not only provide a read-out of the auditory properties of a stimulus but also encode the probability of sounds or sound transitions. NCM neurons may even generalize probabilities across categories, predicting not only the probability of one auditory object based on its repetition but also the expectation for an entire class of sounds (Lu and Vicario 2017).

While the challenges in characterizing the receptive fields of secondary auditory neurons have led to an incomplete description of the tuning in these regions, there does appear to be variation in the tuning and response properties of neurons across the secondary auditory pallium. For example, there appeared to be topographic differences within the NCM in the degree to which neurons habituate in response to repeated stimuli: dorsal and caudal regions of the nucleus showed greater habituation than rostral or ventral regions (Chew et al. 1995; Mello et al. 1995). Taken together, these data hint at the potential for topographic compartmentalization of function that would help to explain the ability of secondary auditory neurons to perform somewhat disparate tasks, for example, the invariant coding required for auditory scene processing (Sect. 5.5) versus the rapid, stimulus-specific habituation associated with auditory memory (Sect. 5.6).

These findings highlight that, like the mammalian auditory cortex, the avian auditory pallium is organized into a discrete hierarchy of interconnected areas. As one moves from primary to secondary regions, which then project to motor and sensorimotor regions, firing becomes sparser and more selective, and linear models of receptive fields become poorer at estimating actual responses. The hierarchical transformation of song coding in the songbird auditory pallium is similar to transformations in sensory representations in other systems (Graham and Field 2007; Harris and Mrsic-Flogel 2013). In addition, auditory neurons higher up the hierarchy differentially respond to the acoustic context in which sounds are embedded. These changes in firing, receptive fields, and selectivity are functionally significant. As discussed in the upcoming sections, the emergence of sparse-firing neurons with greater selectivity contributes to a number of important abilities and behaviors, including processing complex auditory scenes (Sect. 5.5), memory formation and individual recognition (Sect. 5.6), and song preference and mate selection (Sect. 5.7). Moreover, these varied functions may themselves be important in elucidating the topographic organization of the secondary auditory pallium (Sect. 5.8.1).

5.5 Invariant Coding Pulls Signals out of Noise

The ability to attend to target sounds, such as a communication signal, in a complex acoustic background is critical for receivers (Bregman 1994; Bee and Micheyl 2008). In songbirds, individuals are able to identify particular songs occurring within complex acoustic scenes such as noise (Dent et al. 2009) and song choruses (Schneider and Woolley 2013). Solving this “cocktail party problem” may depend on the differences in neural firing between neurons in Field L and neurons in secondary auditory regions, in particular, the emergent sparse coding of sounds by NCM neurons (Moore et al. 2013; Schneider and Woolley 2013).

As described in Sect. 5.4, the auditory coding of complex sounds like birdsong dramatically transforms between the thalamo-recipient and higher pallial regions (Nagel and Doupe 2008; Woolley et al. 2009). Early in the cortical processing pathway, neurons respond nonselectively (i.e., each neuron responds to a high proportion of songs) and with many spikes throughout the stimulus because their receptive fields are linear and driven by simple acoustic features found in many complex sounds (Nagel and Doupe 2008; Woolley et al. 2009). This coding scheme results in a dense and redundant neural representation of a song or a chorus of multiple birds’ songs. However, the coding of songs transforms between L2 and subsequent regions where firing is more selective and sparse because these neurons have nonlinear receptive fields, which display responses that vary depending on a variety of factors, including recent history (Sect. 5.4.2). At the highest levels of the auditory pallium, single neuron responses are selective and characterized by few, highly reliable spikes in response to a song (Fig. 5.4). Because responses are so sparse, each neuron produces a highly distinct response pattern to each song, if it responds to the song at all. Higher cortical regions, therefore, represent songs in a sparse spiking code distributed across multiple neurons.

Selective and sparse neural coding may facilitate the coding of target sounds, for example, individual songs in complex scenes (song choruses). The coding of songs within complex scenes requires neurons to fire consistently over multiple presentations. Importantly, as discussed previously, there appears to be some topography within the NCM in how neurons respond to multiple presentations of the same song. In particular, while neurons in the dorsal and caudal NCM habituate following repeated playback of a song, sparse-coding neurons in the rostral NCM produce highly precise song responses; the temporal patterns of their responses are almost identical over multiple presentations of the same song (Schneider and Woolley 2013). Selective, sparse, and precise coding may facilitate the recognition of individual songs because selectivity is inversely correlated with the strength of responses to background sounds. For this reason, sparse-coding NCM neurons have been studied as potentially providing a neural mechanism to solve the cocktail party problem (Moore et al. 2013; Schneider and Woolley 2013).

Specifically, sparse-coding NCM neurons produce very similar responses to one song presented alone and to that same song presented in combination with background sounds (e.g., chorus, songs, noise). These responses are referred to as background invariant and have the potential to accurately represent a target vocalization in an acoustic scene composed of vocalizations from many others. Schneider and Woolley (2013) tested the relationship between NCM neuron responses and behavioral recognition of target songs that were presented with varying levels of background choruses. The signal (song) to noise (chorus) ratios of acoustic scenes were varied while birds completed song recognition tasks, which revealed the signal-to-noise ratios that permitted correct identification of target songs in those acoustic scenes. In the same birds, sparse-coding neurons in the rostral NCM produced the same sparse responses to songs alone and to those songs embedded in acoustic scenes as if the background choruses were absent. While consistent firing patterns were observed at signal-to-noise ratios that permitted behavioral recognition, these same neurons stopped firing at signal-to-noise ratios that were too low for behavioral identification of a target song. Those results demonstrated that the responses of sparse-coding NCM neurons parallel perceptual recognition of target songs in acoustic scenes, providing a potential neural solution to the cocktail party problem.

5.6 Secondary Auditory Pallium as a Potential Substrate for Song Memory

Given the importance of memorizing song for vocal learning and individual recognition, songbirds offer a compelling model for understanding the encoding of auditory memory. Indeed, one of the greatest challenges in songbird research has been to identify the neural site (s) in which the tutor song template is stored. The secondary auditory area NCM has been of particular interest. Early studies argued that the habituation of neural activity in NCM to repeated presentations of the same song was indicative of a song memory trace in the NCM (Chew et al. 1995; Phan et al. 2006). The habituation of auditory responses is specific to song, as there is no habituation to tones, implying that the changes in activity are not a consequence of general adaptation of the auditory system to repeated stimuli, but the changes in activity could be related to the encoding of a song memory. In the following sections, the data supporting the role of the NCM in auditory memory are described and some remaining questions regarding the contribution of NCM to auditory memory formation are considered.

5.6.1 Song Memory Formation in Adulthood

Unlike the noise invariant responses of neurons in the rostral portion of the NCM, the largest response of dorsal and caudal NCM neurons to an individual song occurs with the first playback, and responses to the same song decrease over repeated playbacks (Chew et al. 1995; Mello et al. 1995). The degree to which there is habituation of firing or immediate early gene expression over presentations of the same song depends on both the number of consecutive playbacks the bird originally experienced as well as the duration of time between playbacks (Mello et al. 1995). For example, responses to a song are decreased only slightly following ten consecutive playbacks, but responses are almost completely abolished following 200 consecutive playbacks (Kruse et al. 2000).

The habituation of the neural response in both immediate early gene expression and electrophysiology was proposed to represent a memory trace (Chew et al. 1995; Phan et al. 2006). For example, male zebra finches exhibit song recognition learning after passive song playback (Stripling et al. 2003; Dai et al. 2018). The time course of song recognition learning parallels the time course of changes in neural activity in response to repeated song playback (Mello et al. 1995; Stripling et al. 2003). On both the behavioral and neural levels, memory lasts at least a day and, in some cases, can be long lasting (Miller 1979a,b). For example, female zebra finches show strong preferences for the song of their mate even after weeks of separation from their mate (Woolley and Doupe 2008), and such lasting preferences for familiar song require an enduring memory trace. Consistent with a role for the NCM in long-term song memory, immediate early gene expression in the NCM of females is lower in response to hearing their mate’s song than in response to hearing the songs of unfamiliar males up to several weeks after separation from mates (Woolley and Doupe 2008).

5.6.2 Memory of Tutor Song

One of the longest lasting auditory memories in songbirds is that of the tutor song (see also Sakata and Woolley, Chap. 1). Emberezine sparrows, such as swamp sparrows (Melospiza georgiana), provide a particularly striking example of the endurance of tutor song memory. Juvenile swamp sparrows memorize the song of their tutor in the late summer or fall; however, they only begin practicing to produce those songs in the following spring, months after they were exposed to their tutor song (Marler and Peters 1981). This indicates that there must be an enduring trace of the tutor’s song that allows for accurate song imitation in these birds.

Activity in the auditory forebrain has been implicated in both the formation of tutor song memory and the adult recall of tutor song. Both adult male and female zebra finches prefer the tutor song over unfamiliar songs, and lesions of the NCM significantly reduced the strength of the tutor song preference (Gobes and Bolhuis 2007). Moreover, in adult male zebra finches, the fraction of tutor song that is copied is correlated with immediate early gene expression in the NCM in response to the tutor song (Bolhuis and Gahr 2006). Tutor song playback also differentially increased the expression of immediate early genes, such as EGR1, in the CMM of adult female zebra finches (Terpstra et al. 2006). This differential response to tutor song was also observed in juvenile zebra finches: EGR1 responses in both the CMM and NCM were greater for tutor song than for novel song (Bolhuis and Gahr 2006; Gobes et al. 2009). Taken together, these data support a potential role of the NCM and CMM in storing tutor song memory.

Further evidence for a role of the auditory forebrain in tutor song memory comes from experiments manipulating the molecular pathways that regulate the expression of EGR1 (see London, Chap. 8). Specifically, the gene product ERK is part of a molecular pathway critical for memory formation that lies upstream to EGR1 (London and Clayton 2008). In a series of elegant experiments, London and Clayton demonstrated that blocking the ERK pathway during developmental song tutoring leads to poor imitation of the tutor song. The effect does not appear to be a consequence of the disruption of hearing or sensorimotor practice; the effect specifically results from interfering with song memorization. While the infusion of the ERK inhibitor affected EGR1 induction in both the NCM and CMM and, thus, prevented the specific attribution to NCM or CMM, these data provide compelling support for the role of secondary auditory regions in tutor song memory formation.

Taken together, these studies indicate that activity in the NCM and CMM often parallels behavioral measures of learning and memory. However, detailed understanding of the coding properties of these regions remains incomplete, in part, because of variation in the approaches used. For example, analyses of immediate early gene expression have been pivotal in establishing that there is molecular habituation and the extent of this habituation can vary across auditory regions (NCM versus CMM). However, the limited range of stimuli used in immediate early gene studies and the absence of expression of immediate early genes, like EGR1, in primary auditory pallium has hindered the use of these methods in providing a more complete understanding of song memory formation (but see Horita et al. 2010; Horita et al. 2012).

Lesion and manipulation studies have been significant in demonstrating the importance of the auditory forebrain for particular memory tasks, but the ability to discretely affect a single neural locus remains a challenge as does controlling for manipulations that affect sound processing versus memory. Finally, while electrophysiological approaches enable comparisons within a single neuron across a broader array of stimuli and provide needed insight into how auditory memories are encoded, more studies that couple neurophysiological recordings with behavior are necessary to better understand memory coding.

5.7 Neural Mechanisms of Song Preference and Mate Selection

Across a diversity of songbird species, male song serves to attract females (Andersson 1994; Catchpole and Slater 2008). Both field and laboratory studies have found that song can lead females to approach a male or a speaker (Eriksson and Wallin 1986; Woolley and Doupe 2008). Similar studies have found that females will call back in response to hearing songs (Dunning et al. 2014; Chen et al. 2017) and will perform operant tasks (e.g., perch hopping, string pulling) to hear playback of song (Riebel 2009; Schubloom and Woolley 2016). Females show preferences for particular song categories: preference for conspecific over heterospecific songs (Searcy and Brenowitz 1988; Riebel 2009) and for courtship over noncourtship songs (Vallet and Kreutzer 1995; Woolley and Doupe 2008). Such categorical preferences are generally shared across females and are often correlated with particular song features. For example, female zebra finches prefer songs with less variability in pitch across syllable renditions and less within-syllable spectral entropy (Woolley and Doupe 2008; Chen et al. 2017). Thus, behavioral responses to song have been widely used to assess female song preferences that can ultimately affect female mate choice (Riebel 2009).

One approach to study the neural basis of song preference has been to use neural tuning to uncover song features that influence female preferences for song. To this end, a number of studies have measured behavioral responses to songs that differ in a particular feature space and then played those songs back to assess whether particular regions of the auditory forebrain showed differential expression of immediate early genes in response to songs that differ in particular features (Leitner et al. 2005; Woolley and Doupe 2008). For example, EGR1 expression in CMM is increased in response to salient or preferred songs, including courtship song in zebra finches (Woolley and Doupe 2008; Chen et al. 2017), and EGR1 expression in CMM increased in response to songs with sexy-syllables in canaries (Leitner et al. 2005). Both studies raised the possibility that the CMM is involved in discriminating song quality or salience.

However, one challenge has been deciphering whether differential neural responses reflect differences in preferences or are simply a result of differences in acoustic features between preferred and unpreferred stimuli. In the case of the CMM, additional studies have found instances in which the expression of EGR1 in the CMM was uncoupled from behavioral preference for calls (Gobes et al. 2009) and for some songs (Chen et al. 2017; Van Ruijssevelt et al. 2018). For example, unlike their normally reared counterparts, females reared without developmental song exposure (song naïve) do not consistently prefer courtship song over noncourtship song. However, similar to normally reared females, EGR1 in the CMM also increased in song-naïve females in response to courtship song compared to noncourtship song (Chen et al. 2017).

In another study, there was not only a disconnect between the behavioral preferences and neural responses in CMM, but the nature of the neural response provided insight into the features that may be attended to by CMM. In particular, Van Ruijssevelt et al. (2018) measured neural (BOLD fMRI) and behavioral responses to courtship and noncourtship song and to stimuli that manipulated temporal and spectral features of song. The manipulated stimuli contained the characteristic temporal features of courtship song, and BOLD responses in the CMM clustered stimuli on the basis of those temporal acoustic features; behaviorally, the birds differentiated between the manipulated stimuli and the unmanipulated courtship song (Van Ruijssevelt et al. 2018). The temporal structure of song also affected EGR1 expression in the CMM (Lampen et al. 2014), and temporal cues were more important than spectral cues in single-unit auditory responses in a target of CMM, the sensorimotor nucleus HVC (Theunissen and Doupe 1998). Together, these data raise the possibility that the CMM and its targets are biased toward temporal information (Woolley and Rubel 1999; Woolley et al. 2005).

Activity in the NCM also is correlated with song preferences. For example, EGR1 expression in the NCM, but not the CMM, of female starlings was higher following playback of long songs, which females generally prefer, versus short songs (Gentner et al. 2001). Similarly, EGR1 expression also was higher in female zebra finches in response to the preferred courtship song compared to the less preferred noncourtship song (Chen et al. 2017); however, this difference is modulated by familiarity. Whereas EGR1 differences in the NCM were observed between unfamiliar courtship and noncourtship songs (Chen et al. 2017), EGR1 expression in the NCM did not differ between familiar courtship and noncourtship songs (Woolley and Doupe 2008).

Finally, both behavioral preferences and neural responses are shaped by social and acoustic experience during development. For example, female song sparrows prefered the dialect in which they were reared over the dialect of their genetic parents (Hernandez and MacDougall-Shackleton 2004), and female zebra finches prefered the songs of the subspecies with whom they were reared over their genetic parents (Clayton 1990). Moreover, females reared without developmental song exposure showed atypical song preferences as adults. Unlike normally reared females, song-naïve female zebra finches preferred the songs of isolate males (who lack multiple acoustic features of learned song) and had significantly fewer dendritic spines per unit length in the NCM compared to normally reared females (Lauay et al. 2004). Similarly, song-naïve females showed aberrant song preferences and no difference in EGR1 expression in the NCM for courtship versus noncourtship songs (Chen et al. 2017). Whereas electrophysiological studies have indicated that the early acoustic environment had subtle but significant effects on the responses of neurons in Field L of females (Hauber et al. 2013), characterization of the degree to which the responses of neurons in the secondary auditory regions are shaped by developmental song exposure will provide needed insight into the mechanisms by which social and auditory experience shape song preference.

Thus, female songbirds show preferences for particular acoustic features and auditory objects. These preferences go beyond just the ability to discriminate sounds and can be shaped by auditory and social experiences both during development and in adulthood. Activity in both the CMM and NCM has been associated with different aspects of song preferences, though further work is necessary to better delineate the circuitry involved in song preference decisions and mate choice.

5.8 Future Directions

Neuroanatomical tracing (Wang et al. 2010), gene expression (Dugas-Ford et al. 2012), and targeted electrophysiological recordings (Kim and Doupe 2011; Calabrese and Woolley 2015) have resulted in a detailed understanding of the connectivity and coding properties of the avian auditory system and have facilitated comparisons to the mammalian auditory cortex. While these approaches have led to greater recognition and appreciation of the similarities in structure and function of auditory systems across species as well as better depictions of general principals of auditory coding, much remains to be discovered about the organization and function of songbird auditory circuits.

5.8.1 Mapping of Secondary Auditory Areas

Electrophysiological mapping of Field L and CLM have revealed the detailed structure of spectral and temporal response properties both within and between regions; however, adopting similar approaches in the secondary regions, including the NCM and CMM, has been more challenging. Responses of NCM and CMM neurons are sparse, selective, plastic, and highly nonlinear; thus, activity in these regions is poorly characterized by linear models such as STRFs (Meliza and Margoliash 2012; Schneider and Woolley 2013). In starlings, there are facilitative and suppressive interactions between song notes on the spiking responses of neurons in the CMM, thereby making it difficult to predict CMM neuron responses to songs by analyzing responses to single notes presented individually (Meliza and Margoliash 2012). Similarly, EGR1 responses in the secondary auditory pallium to whole canary song were not re-created in the responses to individual components that make up the song (Ribeiro et al. 1998). While neurons in these regions have a propensity to show parallels between neural and behavioral responses to stimuli, the way in which they encode information remains elusive. For example, many of the brain-behavior correlations rely on comparing responses to pairs or small numbers of behaviorally relevant stimuli (e.g., conspecific versus heterospecific, familiar versus unfailiar, courtship versus noncourtship). Studies employing broader stimulus sets will be critical to gain insight into more general principles of NCM and CMM neural responses and, ultimately, into how characteristics of vocal signals are processed and used to guide behavior.

5.8.2 Catecholamines in the Auditory Cortex Shape Behavior

Forebrain auditory areas receive neuromodulatory inputs that can affect their activity and plasticity (see Remage-Healey, Chap. 6). Inputs from catecholamines, in particular, may provide a mechanism for translating auditory experience into changes in brain and behavior. Dopaminergic and noradrenergic neurons in the midbrain and hindbrain in songbirds respond to salient or preferred stimuli, indicating to the brain which stimuli are important (Fields et al. 2007; Sara and Bouret 2012). For example, cFOS expression in dopaminergic neurons in the caudal ventral tegmental area of female songbirds was higher following playback of the preferred courtship song than the noncourtship song (Barr and Woolley 2018). Similarly, in juvenile male zebra finches, noradrenergic neurons and dopaminergic neurons expressed more EGR1 in response to tutoring methods that lead to more robust vocal learning (Chen et al. 2016).

Pairing stimulation of midbrain dopaminergic neurons or hindbrain noradrenergic neurons with playback of a tone drove plastic changes to the tonotopic representation of sounds in the mammalian primary auditory cortex (Bao et al. 2001; Martins and Froemke 2015). There is potential for a similar role of catecholamines in shaping neural responses in the avian auditory pallium and behavior. For example, the NCM receives substantial catecholaminergic projections (Van Ruijssevelt et al. 2018), and NCM responses to song can be modulated by norepinephrine in zebra finches (e.g., Velho et al. 2012; Ikeda et al. 2015). Further, decreasing norepinephrine levels can attenuate the rate of auditory learning and discrimination (Velho et al. 2012), reduce copulation solicitation displays to sexually stimulating songs, and reduce responses of forebrain auditory regions to conspecific song (Appeltants et al. 2002; Lynch and Ball 2008). However, while catecholaminergic inputs are well-positioned to modulate plasticity and firing of forebrain auditory regions, little is known about the mechanisms of these effects in songbirds. Studies of how dopamine and norepinephrine affect the response properties of different cell types in the NCM and CMM will be critical for understanding how these neuromodulators contribute to auditory tasks by shaping auditory preferences for or altering cortical representation of an auditory stimulus.

5.8.3 Development

Hearing species-typical song during development critically shapes the auditory system of both male and female songbirds. Birds form long-lasting memories of those song exemplars and use them in learning to produce their own songs or in guiding social decisions. Hearing song during development also appears to shape auditory responses to song. For example, the song-evoked firing rates of Field L neurons were significantly higher in zebra finches reared and tutored by conspecific adults than in zebra finches reared and tutored by Bengalese finches (Woolley et al. 2010), and Field L neurons were more selective for conspecific song over simple sounds (such as tone pips) in normally reared zebra finches compared to zebra finches reared in white noise (Amin et al. 2013). While there has been reasonable demonstration that not hearing song during development alters some neural response properties and behavior (Woolley 2012), future work will have to uncover how developmental song exposure influences tuning properties or neural selectivity across the auditory cortex.

5.8.4 Quantifying Nonlinear Responses of Neurons in Secondary Auditory Areas

As discussed previously (Sect. 5.4.2.2; Sect. 5.8.1) the responses of many pallial neurons, particularly those in secondary areas, are poorly characterized by strictly linear models of stimulus-response relationships such as the simplest STRF model. The use of linear-nonlinear models of sensory tuning can improve predictions of auditory responses to complex sounds (e.g., songs) (Calabrese et al. 2011). These models combine the linear filter (STRF) with nonlinear functions designed to capture neural response properties, such as spike threshold and dependence on recent spike history, to predict responses to complex sounds more accurately. While the linear-nonlinear model represents an improvement over the linear model alone, quantifying neural tuning that explains the nonlinearities in responses to natural sounds, including song, will require far more sophisticated models. For example, models that include synaptic depression have accounted for nonlinear modulations in tuning in the mammalian auditory cortex during natural sound processing (David et al. 2009). New approaches that factor in behavioral state and auditory learning will be particularly important for progress in understanding how the auditory pallium encodes and decodes song.

5.9 Conclusions

Songbirds use learned songs to convey diverse information about an individual’s species, sex, identity, motivation, or social context. Studies of behavior highlight the abilities of receivers to extract information from song and use this information for song learning, territorial interactions, individual recognition, and mate choice. Investigations of the neural mechanisms underlying these auditory abilities have provided substantial information about the organization, structure, and response properties of neurons in the primary auditory pallium and the potential role of those neurons in shaping behavior. Future research focused on gaining further insight into the roles of developmental experience and neuromodulators and generating improved methods to describe and understand the nonlinear response properties of secondary auditory areas will provide needed insights into the neural basis of auditory learning, coding, memory, and perception.