Keywords

9.1 Introduction

The peripheral auditory system encodes the acoustic inputs that are used by the brain when listeners interact with the auditory world, monitor their own behaviors, and communicate with each other. Applying concepts from ecological biology, a communicative ecological system has been defined (Borg et al. 2008, p. S132) as “A system of communicating individuals in a social and physical background, who function together to circulate information and mental energy to create knowledge and emotions and a change in the system’s constitution and function over time.” From an ecological perspective, the importance of successful participation in social activities motivates listeners to allocate attentional resources to auditory and cognitive information processing in a range of everyday situations (Pichora-Fuller et al. 2016). The cocktail party situation is one of the most challenging of such situations, but it also offers one of the potentially most rewarding opportunities for social interaction.

At a cocktail party, sound provides information to listeners about their surroundings; for example, a doorbell ring alerts the host to the arrival of a guest and partygoers might hear rain against the window or music playing in the background. Sound provides feedback about an individual’s own actions; for example, the hostess hears her own footsteps while walking down the hall to open the door, crunching as she bites a piece of celery, or the clanking of glasses as she makes a celebratory toast. Interpersonal communication entails an exchange between a sender and a receiver of a message as they co-construct meaning in the social and physical setting of the party. Hearing is critical to spoken communication because it enables individuals to receive the speech signal sent by other communicators, monitor their own speech production, and assess the acoustical characteristics of the social (e.g., people laughing) and physical environments (e.g., reverberation in the concrete atrium of the art gallery) in which communication occurs at the party. For the most part, the goals of the listener determine how many and which sounds he or she intentionally samples from the auditory feast of the party soundscape, but sometimes highly salient sounds (e.g., hearing one’s own name or a phone ringing) may attract a listener’s attention to or distract it from an intended listening goal or task. At the cocktail party, listening will also be influenced by congruent or conflicting multisensory inputs and multitasking demands. Successful communication at a cocktail party will depend on how the listener hears, attends to, comprehends, and remembers relevant information in the auditory scene.

The auditory and cognitive processing abilities that are needed at the cocktail party or in other complex auditory scenes mature over childhood and peak in young adulthood (Werner, Chap. 8). As described in other chapters, however, listening at a cocktail party challenges even young adult listeners with normal hearing because there are heavy demands on complex auditory and cognitive processing, including the formation and selection of auditory objects (Shinn-Cunningham, Best, and Lee, Chap. 2), general masking (Culling and Stone, Chap. 3), release from informational masking (Kidd and Colburn, Chap. 4), and stream segregation (Elhilali, Chap. 5; Middlebrooks, Chap. 6; Simon, Chap. 7). Chapter 10 by Litovsky, Goupell, Misurelli, and Kan describes the deleterious effects of hearing loss on listening at the cocktail party and how the use of technologies such as hearing aids or cochlear implants may restore or sometimes further disrupt functioning. The present chapter explores how age-related changes in auditory and cognitive processing may affect listening at the cocktail party by older adults, in particular those whose pure-tone audiometric hearing thresholds are normal or near-normal. From a practical perspective, it is important to consider the aging auditory system at the cocktail party because older adults who find such situations too demanding or stressful may cope by withdrawing from social interaction, with long-term negative effects on their quality of life and mental and physical health. From a theoretical perspective, age-related changes in auditory processing provide a special window into the relative contributions of sensory, cognitive, and social abilities during social interaction. Younger listeners typically function better than older listeners in the acoustical wild, and laboratory research helps to pinpoint the specific aspects of listening that are preserved or decline as adults age.

9.2 Auditory Aging

9.2.1 Periphery

Hearing loss is the third most common chronic health condition in older adults (Yueh et al. 2003). The symptoms of age-related hearing loss (ARHL) can begin in the fourth decade of life. Its prevalence increases with age, affecting roughly half of those older than the age of 65 years and up to 90% of those older than the age of 80 years (Cruikshanks et al. 2010). ARHL (sometimes called presbycusis) is commonly characterized by high-frequency sensorineural hearing loss defined in terms of audiometric thresholds (Kiessling et al. 2003). In standard clinical audiometric testing, pure-tone thresholds are measured in decibels referenced to normal human hearing levels (dB HL) at octave frequencies from 250 to 8000 Hz. Threshold elevations in ARHL begin at the highest frequencies and gradually progress to lower frequencies (ISO 7029 2000). In the earliest stages of auditory aging, before clinically significant abnormal thresholds are observed, elevated thresholds (>25 dB HL) at frequencies above 8000 Hz may reduce the availability of interaural intensity cues to localization, including important pinna cues around 10,000 Hz. As ARHL progresses to lower frequencies (especially in the range from 500 to 4000 Hz), more of the speech signal becomes inaudible and speech perception worsens even in quiet environments. Amplification can restore audibility, in turn improving phoneme and word recognition accuracy, especially in quiet (Humes and Dubno 2010). Nevertheless, the difficulties that older adults have understanding speech in noise persist. Notably, when amplification is provided, speech-in-noise performance is not restored to normal levels, despite what would be predicted if the difficulties of older listeners were confined to reduced audibility. Speech-in-noise understanding depends on more than just making speech audible. It depends on nonaudiometric factors such as suprathreshold auditory temporal processing and cognitive processing (Humes 2007).

High-frequency sensorineural hearing loss, whether in younger or older adults, often involves damage to outer hair cells in the cochlea as a result of exposure to industrial and/or recreational noise. However, in ARHL, one or more structures in the cochlea or central auditory system can be damaged in ways that are not typical in younger adults who have high-frequency hearing loss (Schmiedt 2010). Specifically, high-frequency sensorineural hearing loss in older adults may be attributable to changes in the endocochlear potentials associated with changes to the cochlear blood supply in the stria vascularis (Mills et al. 2006; Saremi and Stenfelt 2013). There may also be neural changes that do not necessarily manifest in elevated audiometric thresholds. Mounting physiological evidence (Kujawa and Liberman 2009) and computational modeling (Lopez-Poveda 2014) point to neural degeneration and/or reductions in neural synchrony in the periphery that may underpin age-related differences in suprathreshold auditory and speech processing.

9.2.2 Speech Understanding

Importantly, the hearing abilities of older adults are heterogeneous. Their difficulties in understanding speech in noise vary considerably and are not well predicted from the audiogram (Füllgrabe et al. 2014). Indeed, difficulties understanding speech in noise often precede clinically significant elevation of audiometric pure-tone thresholds in quiet (Bergman 1980). Typically, older adults require higher signal-to-noise ratios (SNRs) to perform equivalently to younger adults on speech-in-noise tests, even if they have normal or near-normal audiograms. The SNR at which listeners reach 50% correct word recognition is the speech recognition threshold (SRT) in noise. A number of studies indicate that, over a broad range of conditions, older adults whose hearing thresholds in quiet are normal for their age have SRTs in noise from 2–4 decibels (dB) higher than those of younger adults (Schneider et al. 2010).

Age-related differences in speech understanding in noise could be due to declines in other auditory abilities that are unrelated to pure-tone threshold elevations and involve central auditory or cognitive processing (CHABA 1988). In addition to difficulties understanding speech in noise, age-related declines in melodic pitch perception (Russo et al. 2012), the identification of vocal emotion (Dupuis and Pichora-Fuller 2015), and the understanding of emotional speech in noise (Dupuis and Pichora-Fuller 2014) could also reduce an older listener’s ability to participate at a cocktail party, where enjoying music and identifying emotions may be as or more important than recognizing words.

9.2.3 Psychoacoustics of Temporal Processing and Behavioral Measures of Speech Processing

Over the last 30 years, a large body of knowledge has accumulated to characterize human ARHL based on psychoacoustics and behavioral speech perception research (for a comprehensive review see Gordon-Salant et al. 2010). Of particular relevance to listening at the cocktail party are well-documented age-related differences in auditory temporal processing (Fitzgibbons and Gordon-Salant 2010; Walton 2010) and binaural hearing (Eddins and Hall 2010) that could undermine speech understanding in noise (Humes and Dubno 2010). Highlights of this research are provided to show how auditory aging might affect listening at the cocktail party.

It is important to differentiate among levels of auditory temporal processing (Phillips 1995), and to consider how aging might affect abilities at each level because they may have different consequences for listening to speech at the cocktail party. Monaural temporal cues are relevant to three main levels of speech processing in quiet (Greenberg 1996): subsegmental (phonetic), segmental (phonemic), and suprasegmental (syllabic and lexico-syntactic). Subsegmental speech processing relies on fine structure cues, including periodicity cues based on the fundamental frequency and harmonic structure of the voice. Some types of segmental information are provided by local gap and duration cues and properties of the speech envelope that contribute to phoneme identification (e.g., presence of a stop consonant, voice onset time). Suprasegmental processing depends on cues such as the pattern of fluctuations in the amplitude envelope of the time waveform that convey prosodic information related to the rate and rhythm of speech, and these cues also serve lexical and syntactic processing. Each level has been investigated in older adults using psychoacoustic and speech perception measures. The effects of age on some measures suggest losses in gap and duration coding or poorer use of envelope cues, while others implicate reductions in synchrony or periodicity coding.

9.2.3.1 Gap and Duration Detection

At the segmental level, gaps and duration cues provide temporal information about some phonemic contrasts, in particular contrasts based on distinctions in the manner of articulation for consonants (Gordon-Salant et al. 2006; Pichora-Fuller et al. 2006). The most common psychoacoustic measure of temporal processing is the gap detection threshold, the smallest gap that a listener can detect in a stimulus. Older adults with normal or near-normal audiograms do not detect gaps until they are significantly longer than the gaps that can be detected by younger adults, and their gap detection thresholds do not significantly correlate with audiometric thresholds (Schneider et al. 1994; Snell and Frisina 2000). Notably, age-related differences are more pronounced when the sound markers surrounding the gap are shorter than 10 ms (Schneider and Hamstra 1999), and when the location of the gap is near the onset or offset of the signal (He et al. 1999). When spectrally identical sounds precede and follow the gap (within-channel markers), gap detection thresholds are small (a few milliseconds). The perceptual operation required for within-channel gap detection is thought to involve relatively simple processing of activity in the neural channel representing the stimulus. In contrast, when there are spectral differences between the sounds that lead and lag the gap (between-channel markers), gap detection thresholds can be about 10 times larger than those obtained for within-channel markers. This suggests that more complex processing may be involved, such as a more central relative timing operation across different neural regions (Phillips et al. 1997). Importantly, speech processing likely relies on both within and between-channel processes, and age-related differences have been found for both types of markers.

The effect of age on gap detection thresholds is exacerbated when more complex stimuli are used, as illustrated in studies examining gap discrimination thresholds when the frequency of the leading marker was fixed and the frequency of the lagging marker was varied (Lister et al. 2002), or when synthetic speech stimuli with spectrally dynamic markers were compared to those with spectrally stable markers (Lister and Tarver 2004), or when the harmonic structure of the leading and lagging markers was manipulated (Heinrich et al. 2014). In a study investigating age-related differences in gap detection for both nonspeech and speech markers that were either spectrally symmetrical (within-channel condition) or spectrally asymmetrical (between-channel condition), gap detection thresholds were longer for both age groups and age-related differences were more pronounced when the markers were spectrally asymmetrical than when they were symmetrical (Pichora-Fuller et al. 2006). Notably, age-related differences for asymmetrical markers were less pronounced when the markers were speech sounds than when they were nonspeech sounds. Presumably, older listeners were able to compensate because of their familiarity with speech sequences in which gaps cue the presence of an unvoiced stop consonant (e.g., the silent gap for the stop consonant /p/ between /s/ and /u/ in the word spoon). Furthermore, the size of the gap needed to distinguish word pairs that differed in terms of whether or not an unvoiced stop consonant was present (e.g., spoon and soon or catch and cash) varied with the rate of speech (i.e., the duration of the speech markers), but older listeners always needed larger gaps compared to younger listeners (Haubert and Pichora-Fuller 1999). Interestingly, patterns of scalp-related neuromagnetic activity during gap detection suggest that age-related differences are related to higher-level object formation rather than to lower-level registration of acoustical cues (Ross et al. 2009, 2010).

There is also abundant research on age-related differences in duration discrimination ability. This evidence converges with the findings on gap detection on three key points. First, age-related differences in duration discrimination do not significantly correlate with audiometric thresholds (Fitzgibbons et al. 2007). Second, age-related differences in ability to discriminate the duration of markers are more pronounced when the reference signal is shorter (20 ms) than when it is longer (200 ms) (Abel et al. 1990; Fitzgibbons et al. 2007). Third, age-related differences in duration discrimination can be exacerbated by increasing the complexity of the stimulus or task (Fitzgibbons and Gordon-Salant 2001). Similar findings using speech markers underscore the relevance of duration discrimination for the perception of phonemic contrasts serving word discrimination (Gordon-Salant et al. 2006). As with gap detection, different mechanisms may contribute to age-related deficits in duration discrimination depending on marker properties. Impaired coding of rapid onsets and offsets seems likely to be involved in deficits seen when brief markers are used, whereas higher-level auditory processing involving a central timing mechanism may be involved in the age-related differences observed for longer duration and more complex stimuli (Fitzgibbons et al. 2007).

9.2.3.2 Temporal Fluctuations in the Amplitude Envelope

The patterns of amplitude modulations in the speech time-waveform can be thought of as a sequence of gaps and durations that provide temporal information pertaining to the suprasegmental or prosodic level of speech processing required for lexical and syntactic analyses in the cortex (Peelle and Davis 2012). Significant effects of age have been found on psychoacoustic measures of modulation detection, and these behavioral results are correlated with electrophysiological envelope-following responses, suggesting the involvement of both brainstem and cortical subsystems in this level of temporal processing (Purcell et al. 2004). Envelope fluctuations in speech vary with a talker’s speaking rate and rhythm. Older listeners have more difficulty understanding sentences when they are spoken at a fast rate or are time-compressed (Versfeld and Dreschler 2002; Wingfield et al. 2006). When speech is speeded, speech understanding may be hampered because acoustical speech cues are reduced and/or because the time available to process the speech information cognitively is reduced. For younger adults, the deleterious effects of speeding speech on word identification and sentence comprehension are explained by reduced availability of time for cognitive processing, whereas for older adults both cognitive and auditory factors seem to play a role (Wingfield et al. 1999; Vaughan et al. 2008). When speech is speeded, older listeners benefit more than younger listeners when prosody is congruent with syntactic structure, but they are more disadvantaged when prosody and syntax are incongruent (Wingfield et al. 1992). Lexical decision reaction times are slower for older than for younger adults when the preceding sentence context is acoustically distorted by time compression, but reaction times are facilitated more for older than for younger listeners when the preceding sentence context is semantically congruent with the target item (Goy et al. 2013). In general, older listeners need to hear more speech information to identify words in a time-gating task, but they are as able as younger listeners to benefit from prosodic envelope information even when fine-structure cues are not available for phonemes identification (Wingfield et al. 2000). Furthermore, experiments using noise-vocoding with a varying number of bands have shown that older adults need a greater amount of temporal envelope information (i.e., more bands) to recognize word or syllables compared to younger adults (Souza and Boike 2006; Sheldon et al. 2008). Overall, it seems that older listeners have more difficulties understanding speeded speech and need more envelope information than younger listeners to understand syllables, words, and sentences in quiet. However, they can compensate by using semantic context and congruent prosody to linguistically parse the speech stream. At a noisy cocktail party, older adults may be well advised to converse with talkers who speak slowly and whose speech rhythm provides rich linguistic prosodic cues.

9.2.3.3 Synchrony or Periodicity Coding

Synchrony or periodicity coding involves phase locking to (quasi-)periodic, low-frequency sound inputs such as the fundamental frequency and lower harmonics of speech. These fine structure components of speech are relatively unimportant for word recognition in quiet, but listeners can use them to identify and follow the voice of a talker in a group. For instance, the continuity of pitch contours can help listeners to segregate the voices of competing talkers. Pitch cues contribute to linguistic prosody that helps listeners to identify word and sentence structures. These cues also contribute to affective prosody that is used to identify a talker’s vocal emotion, and they contribute to the perception of musical melody or tonality.

Because the psychoacoustic frequency difference limen (DL) is thought to depend on phase locking at low frequencies, deficits in periodicity coding could explain why age-related increases in frequency DLs are greater for low frequencies than for high frequencies (e.g., Abel et al. 1990). Deficits in periodicity coding or loss of synchrony could also explain why age-related differences in the detection of FM modulation are larger at low frequencies than at high frequencies for older listeners (He et al. 2007), and why older listeners have larger intensity DLs for high-level low-frequency tones in noise compared to younger listeners (MacDonald et al. 2007). Furthermore, loss of synchrony might contribute to age-related declines in detection of a mistuned harmonic (Alain et al. 2001), melodic perception (Russo et al. 2012), or identification of concurrent vowels (Snyder and Alain 2005; Vongpaisal and Pichora-Fuller 2007). In addition, simulating a loss of synchrony in younger adults by introducing temporal jitter in the low frequencies (<1.2 kHz) leads them to perform like older adults when the accuracy of word recognition is tested in babble (Pichora-Fuller et al. 2007; Smith et al. 2012). Note that these age-related differences affect the auditory processing of suprathreshold sounds in the lower frequencies where audiometric thresholds are in the normal range in typical cases of presbycusis.

9.2.3.4 Binaural Processing

In addition to the contributions of auditory temporal cues to speech processing in quiet listening conditions, auditory temporal processing abilities become even more important at the cocktail party where they can be used by the listener to unmask speech in noise, segregate concurrent speech streams, localize sounds, and direct spatial attention. Beyond age-related changes in monaural auditory temporal processing, age-related declines in binaural processing, even in older adults who have normal or near-normal audiograms, may contribute to the communication difficulties of older listeners at cocktail parties (Eddins and Hall 2010). Interestingly, age-related declines in the ability to detect a change in the interaural correlation of a noise presented to both ears (Wang et al. 2011), and in the ability to use interaural timing differences to unmask signals (Pichora-Fuller and Schneider 1992), have been shown to be consistent with age-related declines in neural synchrony. Such losses in neural synchrony would likely make it considerably more difficult for older adults to parse the auditory scene into its component sound sources, especially in multitalker situations where voice cues help to segregate the speech streams produced by different talkers.

9.3 Electrophysiological Measures of Auditory and Cognitive Aging

For the most part, psychoacoustic and speech understanding experiments measure the offline responses of listeners after auditory or speech processing has been completed. Other methods are needed to investigate the dynamic online changes in processing that occur over time, and to assess the brain operations and areas involved in processing incoming acoustic signals. Scalp recordings of neuroelectric brain activity or electroencephalography (EEG) make it possible to delineate normal and impaired systems at multiple stages of auditory processing (Alain et al. 2013). Notably, such recordings nicely complement behavioral assessments and allow scientists and clinicians to assess the activity in the auditory system with high temporal precision in the absence of overt behavioral responses (Simon, Chap. 7).

9.3.1 Brainstem

The brainstem frequency-following response (FFR) has been used to probe the neural registration and encoding of complex sounds (e.g., harmonic complex, vowels, or phonemes) at subcortical levels of processing (e.g., Bidelman and Krishnan 2009; Krishnan et al. 2010). Notably, FFRs have provided important insights into the early neural transcription of sound at subcortical levels, including how nascent sensory representations influence and contribute to the early formation of auditory percepts (Bidelman and Krishnan 2010; Bidelman et al. 2011). Compared to younger adults, older adults have reduced amplitude and delayed speech-evoked brainstem responses (Anderson et al. 2012). Such age-related declines in the temporal precision with which speech sounds are encoded at the subcortical level could negatively affect the cortical representation of speech (Bidelman et al. 2014).

9.3.2 Cortex

Auditory event-related potentials (ERPs) can be elicited by clicks, tone onsets, and speech sounds. The P1–N1–P2 complex occurs between 50 and 250 ms after sound onset. This complex represents the processing and encoding of acoustic information and is thought to reflect the activation of early forebrain structures including the thalamus and primary/secondary auditory cortices (Picton et al. 1999). Previous studies revealed that, like brainstem FFRs, these ERPs are sensitive to parametric changes in perceptual features related to the acoustic speech waveform, such as voice pitch, formant transitions, timbre, and harmonicity (Alain 2007; Chang et al. 2010). However, whereas brainstem responses appear to map acoustic details, cortical responses appear to reflect the perceptual organization of auditory objects. For example, in a study of categorical speech perception, activity from the brainstem was found to mirror properties of the speech waveform and changes in speech acoustics, whereas cortical evoked activity reflected distinct perceptual categories associated with abstract phonemic speech boundaries (Bidelman et al. 2013). These findings suggest a critical transformation in neural speech representations between brainstem and auditory cortex analogous to the acoustic-phonetic mapping necessary to generate categorical phoneme perception. In a study evaluating behavioral measures of categorical speech perception and both brainstem and cortical speech-evoked brain responses in the same younger and older listeners, older adults had slower and more variable speech classification performance than younger listeners, which coincided with reduced brainstem amplitude and increased, but delayed, cortical speech-evoked responses (Bidelman et al. 2014). The impoverished representation of speech sounds in older brainstems appears to be compensated by increased cortical responses in the aging brain, altering the acoustic-phonetic mapping necessary for robust speech understanding.

Older adults often generate larger cortical responses to speech stimuli compared to younger adults. Woods and Clayworth (1986) found an age-related increase in the amplitude and latency of early cortical evoked responses (approximately 30 ms after sound onset) that remained even after controlling for age-related differences in audiometric thresholds. The amplitude of the P1 wave is often larger for older than for younger adults (e.g., Ross et al. 2010; Lister et al. 2011). Some studies using pure tones or speech sounds during active or passive listening have also reported a larger N1 wave in older adults than in younger adults (e.g., Anderer et al. 1996; Chao and Knight 1997), while other studies have reported longer latencies (e.g., Iragui et al. 1993; Tremblay et al. 2003). For the P2 wave, studies using pure-tone or speech sounds have observed comparable amplitudes across age groups, but often the latencies of older adults are longer than those of younger adults (Alain and Snyder 2008; Lister et al. 2011). These age-related increases in latency could result from general slowing in perceptual and cognitive processing (Salthouse 1996), whereas age-related increases in auditory ERP amplitude may reflect impaired inhibitory functions at various levels within the afferent and efferent auditory pathways (Chao and Knight 1997; Alain and Woods 1999). Older adults may also have more difficulty filtering out task-irrelevant information such that they need to allocate more attentional resources to the processing of auditory stimuli compared to younger adults (Alain et al. 2004). Importantly, the difference between the amplitude of responses in attentive and nonattentive conditions is larger in older than in younger listeners, suggesting that attentional mechanisms are more often deployed by older than by younger listeners during listening. Such enhanced cortical evoked responses may also reflect a loss of stimulus specificity such that the older brain over-responds to incoming sounds (Leung et al. 2013). Larger N1 and P2 amplitudes may indicate that incoming sounds are processed at a deeper level of encoding, which could account for intrusions in subsequent memory tasks (Greenhut-Wertz and Manning 1995). That is, older adults may preserve representations in sensory memory, even when they are no longer relevant.

9.3.3 Reconciling Behavioral and Electrophysiological Findings Regarding Age-Related Changes

Behavioral studies have revealed numerous age-related declines in suprathreshold auditory processing, including declines in temporal processing at a number of different levels. However, notwithstanding the effects of age on neural activity in general, ERP studies that have incorporated a psychoacoustic design have shown that the rate of changes in neural activity as a function of signal duration (Ostroff et al. 2003), harmonicity (Alain et al. 2012), fundamental frequency (Snyder and Alain 2005), or first formant transition (Bidelman et al. 2014), is often comparable between younger and older adults. For example, in a study in which neuromagnetic auditory evoked responses were measured in young, middle-aged, and older healthy participants who listened to sounds of various durations, age-related differences in absolute response magnitudes were found, but increases in sound duration resulted in comparable changes in cortical responses in all three age groups (Ross et al. 2009).

The results from these electrophysiological studies seem to be at odds with behavioral research suggesting that there are age-related declines in auditory processing. The results from studies measuring cortical evoked responses also appear to be inconsistent with those showing age-related differences in the amplitude and timing of brainstem responses to complex sounds in quiet. The apparent contradiction between the behavioral and electrophysiological data could be reconciled by assuming that there are age-related reductions in the ability of listeners to access or use sensory representations in short-term memory rather than a failure to initially encode temporal information. Another possibility is that there are age-related differences in attentional control during listening. For example, in a study comparing ERPs to gaps measured in controlled versus automatic listening conditions (either respond to the gap or watch a silent movie), when the gap sizes are chosen to equate younger and older listeners in terms of their behavioral performance, younger listeners detected gaps in either the automatic or controlled listening conditions, but older adults detected them only in the controlled condition (Alain et al. 2004). It is also possible that the apparent discrepancies between these neurophysiological findings and previously published behavioral data might be explained by differences between the experimental methods used in behavioral and EEG studies. Specifically, electrophysiological tests, especially brainstem tests, may be more immune than typical behavioral tests to the effects of cognitive factors such as attention and memory. Furthermore, EEG studies may not have used stimuli such as speeded speech or speech masking noise that reveal the most pronounced age-related differences in behavioral studies of auditory aging.

There is increasing evidence that difficulties understanding speech in noise may be related to problems in parsing the incoming acoustic signal into distinct representations of sound objects, especially when listening requires segregating concurrently or sequentially occurring streams of auditory objects. For instance, older adults have more difficulty than younger adults in using binaural cues, and this coincides with changes in neuromagnetic activity originating from the auditory cortices (Ross et al. 2007). Older adults also showed deficits in parsing and identifying two vowels presented simultaneously (Snyder and Alain 2005) and have more difficulty than younger adults in using first formant transitions to group speech sound that are presented sequentially (Hutka et al. 2013). Together, these results suggest that the speech in noise problems commonly observed in older adults could be related to deficits in perceptually organizing incoming acoustic signals into coherent concurrent and sequential sound objects (Alain et al. 2006). When there are multiple sound sources, the more similar the sound objects are acoustically, the more difficulty listeners, especially older listeners, will have segregating them and distinguishing foreground from background streams.

9.4 Age-Related Differences in Speech Understanding Depending on Masker Type

In addition to the many sounds that older adults may want to listen to at the cocktail party, there may also be many unwanted sounds that they would rather ignore. Listeners would experience a confusing jumble of sounds if they could not distinguish between different sounds and selectively attend to the one(s) of most importance to them. In general, older adults have more difficulty understanding speech in noise regardless of the type of masker. Importantly, depending on the type of masker, there may be shifts in the relative contributions of various auditory and cognitive factors to speech understanding, and the magnitude of age-related differences may also vary.

9.4.1 Steady-State Maskers

At the cocktail party, it is relatively easy for listeners to segregate speech from meaningless steady-state sounds (e.g., ventilation noise). Speech easily becomes the attended foreground sound and ventilation noise an ignored background sound. Understanding speech when there is primarily energetic masking depends heavily on peripheral and bottom-up auditory processing of the signals (Culling and Stone, Chap. 3). In this sort of noise background, age-related differences are minimal for older adults who have normal audiometric thresholds.

9.4.2 Complex and Fluctuating Nonspeech Maskers

More complex nonspeech sounds may be annoying (e.g., the sound of chairs scraping the floor, guests playing ping pong, the host demonstrating a new model train in the party room) or pleasant (e.g., music), but they are usually sufficiently dissimilar to speech that it is relatively easy to segregate them from a target speech stream and relegate them to the background. Informational masking will increase as the similarity between speech and the background sounds increases. As informational masking increases, the contribution of central auditory and cognitive abilities will also increase such that age-related differences may be observed to varying degrees depending on the specific nature of the masker. On the one hand, cognitive demands may increase as maskers become more complex. On the other hand, knowledge of the structures of complex nonspeech sounds or familiarity with them may help listeners to use expectations to efficiently allocate attention during listening. For example, accuracy in recognizing sentence-final words varies with knowledge of and familiarity with the background sound for younger adults, but not for older adults (Russo and Pichora-Fuller 2008). Specifically, the performance of younger listeners was best when the background was familiar music, next best when the background was unfamiliar music, and worst when the background was multitalker babble. Interestingly, in a surprise memory test, the younger adults recalled the background music that they had been instructed to ignore whereas the older adults remembered that music had been in the background but they were unable to recall which specific pieces of music had been played. These findings suggest that the younger listeners processed the incoming speech and music streams efficiently and had ample cognitive capacity to listen to and remember both the target and background music. In contrast, more cognitive resources seem to be consumed by the older listeners who focused all of their attention on listening to the foreground speech, with little attention to or memory of even the familiar music in the background (Russo and Pichora-Fuller 2008).

9.4.3 Speech Maskers

Compared to nonspeech signals, the speech of another talker is not so easily dismissed because it is highly similar to the speech of the target talker in terms of its spectrum, temporal fluctuations, and linguistic structure and meaningfulness. Informational masking will be greatest when the masker is meaningful speech. Listening when there is competing speech will involve peripheral and central auditory processing and also draw heavily on cognitive processing. For older adults with normal audiograms, declines in temporal or central auditory processing may undermine performance when the masker is speech. However, if the incoming speech signal matches familiar and expected linguistic structures and has semantic meaning that is appropriate to the situation, then it should be easier for a listener to parse the auditory stream. Conversely, speech understanding will be more difficult if the acoustical properties of speech are somewhat unfamiliar, for example, if the talker has an accent (Van Engen and Peelle 2014). Notably, older adults are more susceptible to background noise and accented speech (Gordon-Salant et al. 2015), but they are often more skilled than younger adults in using knowledge to compensate in challenging listening conditions.

9.5 Behavioral Measures of Age-Related Differences in the Perceptual Organization of Foreground Versus Background Sounds

A number of behavioral experimental paradigms have been used to compare how younger and older adults understand speech in situations similar to cocktail parties. Typically, after experiments have been conducted to establish the abilities of younger adults, similar experiments are conducted to measure the abilities of older adults and to determine if there are age-related differences in performance. Age-related differences have been studied using experiments to evaluate spatial release from masking, stream segregation, the allocation of auditory spatial attention, the comprehension of discourse, and memory.

9.5.1 Spatial Separation and Release from Masking

In one common experimental paradigm used to evaluate release from masking, word recognition is measured using short, syntactically correct, but semantically anomalous sentences such as “A rose can paint a fish” (keywords in italics) that are presented in a variety of masking conditions (Freyman et al. 1999, 2004). The listener’s task is to repeat the sentence verbatim. The number of keywords that are repeated correctly is scored. The SRT in noise can be calculated if testing is done over a range of SNRs. Release from informational masking is measured as the difference in performance between conditions in which the masker is primarily energetic in nature (e.g., steady-state noise) and conditions in which the masker has a high informational content (e.g., competing talkers) (Kidd and Colburn, Chap. 4). Similarly, spatial release from masking is measured as the difference in performance between conditions with and without spatial separation of the target speech and masker (Culling and Stone, Chap. 3). The effect of spatial separation on release from masking can be determined using either real or simulated spatial separation of the target and maskers. Importantly, different auditory cues enable listeners to achieve release from masking depending on the nature of the maskers and on whether or not there is real or simulated spatial separation between the target and masker(s). It is possible to assess age-related differences in how these cues are used by evaluating release from masking across conditions.

9.5.1.1 Real Spatial Separation

Real separation of the speech of a target talker from a competing masker is achieved in experiments by presenting the target from one loudspeaker and the masker from another loudspeaker at a different location. In anechoic environments, only the direct wave from each loudspeaker arrives at the two ears of a listener. When the target is presented from a loudspeaker in front of a listener and the masker is presented from a loudspeaker to the right, interaural intensity differences occur at high frequencies because the head casts a shadow on the masking sound coming from the right loudspeaker before it reaches the left ear of the listener. Thus, for higher frequencies, the SNR at the person’s left ear is markedly higher than the SNR at the person’s right ear. In addition, useful low-frequency interaural time difference cues occur because there is an interaural delay for the masker but not the target. Using a combination of these interaural difference cues, the listener perceives the target talker to be in front and the masker at the right. Thus, benefit from spatial separation between the target and masker depends on high-frequency interaural intensity differences and low-frequency interaural time differences. In general, interaural intensity differences alone contribute more spatial release from masking (around 8 dB), interaural time differences alone contribute less (around 5 dB), and in combination they provide more spatial release from masking (about 10 dB), although the effects are not additive (Bronkhurst and Plomp 1988).

For older adults who do not have significantly elevated pure-tone thresholds, the interaural intensity cues resulting from head shadow remain available. For those who have high-frequency threshold elevations, however, the interaural cues conferred by head shadow at higher frequencies may be reduced or eliminated. Nevertheless, even if these cues are available, they may be more advantageous to younger adults than to older adults. Recall that, in general, older adults need a 2–4 dB better SNR to match the speech understanding performance of younger listeners (see Sect. 9.2.2), likely owing to age-related declines in temporal processing, especially periodicity coding. Age-related declines in temporal and binaural processing could also reduce the ability of older adults to segregate competing talkers based on interaural differences in the temporal fine structure of competing voices.

The relative contributions of high-frequency and low-frequency cues to spatial release from masking were assessed in a study of younger adults and older adults with normal or impaired hearing as defined by the audiogram (Dubno et al. 2002). For sentences in speech-shaped noise (primarily an energetic masker), spatial release from masking was 6.1 dB for younger listeners, 4.9 dB for older listeners with normal pure-tone thresholds, and 2.7 dB for older listeners with pure-tone hearing loss. Not surprisingly, older adults with high-frequency hearing loss benefitted little from high-frequency cues resulting from head shadow. Compared to younger listeners, older adults with normal audiometric thresholds achieved less benefit from spatial separation, possibly because of less effective use of both high- and low-frequency cues.

In a more recent study (Besser et al. 2015), younger and older adults with normal hearing for their age were tested on the Listening in Spatialized Noise–Sentences (LiSN-S) test (Cameron and Dillon 2007, 2009). In the LiSN-S test, SRTs are determined for target sentences in four informational masking conditions: The target speech and masking speech are spoken by the same female or by different females and they are co-located or spatially separated. Scores are also calculated for the advantage (release from masking) due to talker differences, spatial separation, and both factors combined. Younger adults outperformed older adults on all SRT and advantage measures. Notably, spatial release from masking was 14.1 dB for the younger group and 9.6 dB for the older group. For both age groups, spatial release from masking was predicted by high-frequency (6–10 kHz) pure-tone thresholds. In addition, linguistic factors contributed to individual differences in the performance of the younger listeners and cognitive factors contributed to individual differences in the performance of the older listeners.

9.5.1.2 Simulated Spatial Separation

In contrast to experiments in which conditions of real spatial separation are tested, most everyday listening environments are reverberant. If the cocktail party is held indoors, then a direct speech wave will be reflected from all of the surfaces of the room and multiple reflections may continue to occur over time. The sound-absorbing properties of the surfaces affect reverberation time in terms of how long it takes the series of reflections to dampen. Long reverberation times can have deleterious effects on speech understanding in noise, especially for older listeners when the intensity level of speech is relatively low or the rate of speech is fast (Helfer and Wilber 1990; Gordon-Salant and Fitzgibbons 1995).

The delays between the direct wave and the first reflections depend on the distance between the listener and the room surfaces. In typical rooms, the delays between the direct and the first reflected waves are relatively short (2–8 ms). In such rooms, the listener perceives a single sound source at the location that is the origin of the direct wave and no echoes are perceived. In other words, the direct wave takes precedence (precedence effect; Zurek 1987). A second source, or echo, would not be heard unless the delay between the direct and reflected waves became very long, as would be the case in a very large space. Interestingly, when the precedence effect was simulated under headphones using time-delayed 2-kHz tone-pips, no age-related differences were found in the time delay at which listeners transitioned from perceiving a single source to perceiving two sound sources (Schneider et al. 1994).

The presence of a reflective surface can be simulated in an anechoic room by introducing a time delay in the presentation of a stimulus from two loudspeakers. For example, a listener perceives the location of a stimulus to come from the right when it is presented over a loudspeaker to the right beginning 4 ms before the same stimulus starts to be presented over a loudspeaker at the front. Similar to echoes in everyday reverberant environments, the delayed copy of the stimulus from the front loudspeaker is not perceived as a sound from a second source. Notably, when spatial separation is simulated in this way, the high-frequency interaural intensity difference cues arising from head shadow are largely eliminated and the SNRs at the two ears are equalized. As in the real spatial separation condition, the low-frequency interaural time difference cues remain available for the direct waves of the target and masker, but there are additional interaural difference cues for the simulated reflections.

For all listeners, speech understanding is better and spatial release from masking is greater when there is real, rather than simulated, spatial separation between target and masker. In a seminal study of younger adults, 12 dB of spatial release from masking was achieved when a real spatial separation was introduced between the target and competing speech, but only 3–9 dB was achieved when spatial separation was introduced in a simulation based on the precedence effect (Freyman et al. 1999). The most likely explanation for spatial release from masking being at least 3 dB poorer is that it is not possible to benefit from high-frequency interaural intensity and SNR differences when spatial separation is simulated. The superior ability of younger adults to use interaural intensity and SNR difference cues could account for the age-related differences observed in conditions of real spatial separation. If so, then when spatial separation is simulated and the SNRs at the two ears are equalized, age-related differences in spatial release from masking should be less pronounced than they are in conditions of real spatial separation.

Younger and older adults were tested in a study of release from masking conducted using the same basic method as had been used in the seminal study of younger adults (Freyman et al. 1999). For both age groups, release from informational masking and spatial release from masking was evaluated by comparing the results obtained in four conditions with the positions of the target and maskers simulated using the precedence effect: (1) sentence target and noise masker co-located; (2) sentence target and noise masker spatially separated; (3) sentence target and speech masker co-located; and (4) sentence target and speech masker spatially separated (Li et al. 2004). There were three noteworthy findings. First, SRTs were approximately 3 dB SNR higher in older than in younger adults in all four conditions. This result is consistent with the more general finding that older adults need a higher SNR to achieve an SRT equivalent to that of younger adults. Second, the release from masking achieved by spatially separating the target and masker was the same for both age groups when the masker was two-talker speech (about 5 dB) and when the masker was steady-state noise (about 1.8 dB). Third, both age groups demonstrated a similar degree of release from informational masking when the target and maskers were co-located. Neither group demonstrated much, if any, release from informational masking when the target and masker were spatially separated, presumably because masking release had already been optimized based on the advantage conferred by spatially separating the target and masker.

Importantly, although the SRTs of the older listeners were 3 dB higher in all conditions, no significant age-related differences were found when spatial locations are simulated using the precedence effect and interaural intensity and SNR differences are minimized. Taken together, it seems that age-related differences understanding speech in multitalker scenes is attributable primarily to difficulties in auditory processing of interaural intensity and SNR cues rather than to declines in cognitive processing (Li et al. 2004).

9.5.2 Speed of Buildup of Stream Segregation

Stream segregation refers to the ability to disentangle sequences of sounds from competing sources, such as the task of forming distinct streams of speech from two or more talkers. The perception of segregated streams tends to build up over time (Bregman 1978). Some experimental evidence suggests that the buildup of stream segregation may proceed more slowly in older than in younger adults. In younger adults, word recognition improves as the delay between masker onset and word onset increases, whether the masker is steady-state noise or multitalker babble. When the masker is steady-state noise, younger and older listeners show similar improvements, but when the masker is multitalker babble, there is no observable improvement by older adults for word-onset delays up to 1 s (Ben-David et al. 2012). Such slowing is not surprising in light of the evidence that there are age-related differences in auditory temporal processing, and also age-related generalized perceptual and cognitive slowing in adults (Salthouse 1996).

To investigate if age-related slowing in the buildup of stream segregation could influence word recognition during sentence processing, performance on the release from masking paradigm described in Sect. 9.5.1 (Freyman et al. 1999; Li et al. 2004) was examined to take the position of the key word into account (Ezzatian et al. 2012). For younger adults, when syntactically correct but semantically anomalous sentences are masked by co-located two-talker speech, word recognition improves from the first to the last keyword in a sentence. In contrast, when there is simulated spatial separation between the target and masker, there is substantial improvement in overall performance, but there is no improvement as a sentence unfolds. Whether or not listeners perceive the target and masker to be spatially separated, when the masker is a steady-state noise, word recognition is relatively easy, and there is no evidence that performance improves over time. This pattern of results for word recognition in anomalous sentences suggests that speech stream segregation is relatively rapid (less a second) in easier listening conditions (spatial separation or energetic masking). Speech stream segregation may take longer (a couple of seconds) and continue to develop over the course of a sentence being spoken when listening conditions are more challenging (no spatial separation or informational masking).

Like younger adults, older adults do not improve from the first to the last keyword position when the masker is a steady-state energetic noise masker (Ezzatian et al. 2015). For younger adults, stream segregation is slowed only when both the target and masker are intact, highly similar, and co-located speech stimuli, but older adults are slowed in a wider range of informational masking conditions. For older adults, stream segregation builds up over the course of the sentence when there is a two-talker masker, including when it is made more dissimilar to the target by either vocoding the masker to diminish the availability of fine-structure cues or by spatially separating the target and masker. Of course, the degree to which target and masking sounds are perceived to be dissimilar, and therefore, the degree to which they can be segregated from one another, could be affected by ARHL (see Sect. 9.2). For instance, age-related declines in temporal processing may account for the finding that speech stream segregation is rapid for younger listeners but slowed for older adults when the two-talker masker is vocoded. Furthermore, stream segregation is slowed in older listeners even though they can achieve spatial release from masking (Sect. 9.5.1). Age-related losses in neural synchrony are likely to degrade the interaural timing cues that contribute to locating an auditory object in space, thereby slowing stream segregation in situations where there is a spatial separation (either virtual or real) between a target voice and masking voices.

9.5.3 Auditory Spatial Attention

Humes et al. (2006) explored the effects of the acoustic similarity between a speech target and a speech masker in a study in which listeners attended to and reported the content of one of two sentences presented monaurally. The sentences were taken from the corpus of the coordinated response measure (CRM; Bolia et al. 2000) and have the form “ready (call sign), go to (color, number) now.” Call signs were the names of individuals and the colors and numbers were from a closed set (e.g., “Ready Baron go to green 2 now.”) Before or after the sentences were presented, participants were informed that the target sentence would begin with a particular call sign. The percentage of correctly identified color–number pairs was higher when the listener was informed of the callsign before rather than after the trial, presumably because prior knowledge of the callsign helped listeners to focus attention and reduce memory load. Performance was also higher when there was a gender difference than when there was no gender difference between the target talker and the masking talker, with the benefit from the voice pitch contrast being larger for younger than for older adults. It is possible that age-related declines in auditory temporal processing at the level of periodicity coding hamper the ability of older listeners to take advantage of gender-related differences in the fundamental frequency and harmonic structure of the voices of the target and masking talker, thereby slowing the buildup of stream segregation and impeding the efficient allocation of attention to the target speech stream.

CRM sentences have also been used to study how spatial attention affects word recognition in a three-talker display with real or simulated spatial separation between the target and two competing talkers (Singh et al. 2008). For a block of trials, the probability that the target would appear in each of the three possible locations varied from certainty (100%) to chance (33%), with two intermediate probabilities (80% and 60%). In general, older adults performed worse than younger adults in all conditions. Importantly, however, age did not interact with (1) the probability that the target would appear at a specific location, (2) whether or not the listener had prior knowledge of the call sign, or (3) whether or not the separation of the three sentences was real (coming from three different loudspeakers) or simulated (using the precedence effect). A follow-up analysis investigated the cost incurred when the target sentences were presented at an unlikely location instead of the most likely position (Singh et al. 2008). As expected, the cost of reallocating attention from the likely to the unlikely position was substantial in all conditions, with the extent of the reduction in performance being the same for both younger and older adults.

At the cocktail party, the need to redirect auditory spatial attention could happen if Fred unexpectedly begins talking (the listener’s attention is directed to Fred) and announces that everyone should listen to Mary because she has some important news to tell (Fred cues the listener to switch attention to Mary). To introduce such realistic attentional demands into the CRM experiment (Singh et al. 2008), new task instructions were used (Singh et al. 2013). As before, when the call sign appeared at the expected center location, participants were asked to report the color and number associated with it. However, when the call sign appeared in an unexpected location (to the left or right of center), participants were asked to report the color and number from the sentence presented at the opposite side (i.e., they had to redirect their attention). As expected, there was a significant interaction between age and the complexity of the instructions (older simple instructions versus the new more complex instructions), with the older adults performing significantly worse than younger adults when the instructions increased the complexity of the task. These results suggest that older adults are not as agile as younger adults in redirecting their attention when the listening task is more demanding, as it might be in everyday situations.

9.5.4 Discourse—Beyond Words and Sentences

9.5.4.1 Adjusting SNR to Study Comprehension of Monologues

The difficulties that older adults have understanding speech in noise are not well explained by their audiometric thresholds. It is possible that their difficulties might be explained better by their SRTs in noise. Experiments using more complex linguistic materials were conducted to investigate how SRTs in noise might affect tasks requiring comprehension rather than only word recognition. Younger and older participants answered a series of questions concerning a lecture that they had just heard when the lecture was masked by multitalker babble presented from the same spatial location (Schneider et al. 2000). When the SNR (level of the lecture/level of the babble in dB) was the same for both age groups, older adults answered fewer questions correctly compared to younger adults. However, when the SNR was individually adjusted to take into account the higher SRTs in noise of older individuals, both age groups performed equivalently. These findings suggest that apparent age-related differences in comprehension could be attributed to the higher SRTs in noise of older adults.

9.5.4.2 Adjusting Spatial Separation in Dialogues and Trialogues

In the experiment described in Sect. 9.5.4.1, both the lecture and the babble masker were mixed and presented monaurally over the same earphone (co-located condition). In more realistic everyday listening situations, including cocktail parties, talkers would be spatially separated and there would likely be more than two talkers in a conversation. In a follow-up experiment (Murphy et al. 2006), younger and older participants were asked to answer questions concerning two-person conversations. The dialogues and masking babble were played over a single central loudspeaker, or there was a real spatial separation between the three sound sources. After adjusting the SNR for individual differences in SRTs, both age groups answered the same number of questions correctly in the co-located condition, but younger adults outperformed older adults in the condition with spatial separation. As described previously, it seems that older listeners do not benefit as much as younger listeners do from the availability of binaural cues when there is real separation between the sources (Sect. 9.5.1.1). Reduced benefit from binaural cues would make it more difficult for the older listeners to segregate and allocate attention effectively to the three streams. The influence of possible age-related differences in ability to use binaural cues was supported by the finding of no age-related differences when the experiment was repeated using the precedence effect to control the perceived locations of the stimuli (Avivi-Reich et al. 2014). Furthermore, the inadequacy of either pure-tone thresholds or SRTs in noise to account fully for the everyday listening problems of older adults is consistent with their self-reports on a questionnaire (Banh et al. 2012). Given that listeners must function in conditions in which there is real separation in the location of talkers at the cocktail party, even if the level of the background noise were reduced to improve the SNR, older partygoers would still struggle more than younger partygoers when conversing in group situations.

9.5.5 Memory

The preserved ability of older adults to comprehend discourse in most conditions (see Sect. 9.5.4) seems to be at odds with research on cognitive aging suggesting that memory for heard material is poorer in older than in younger adults. In most memory experiments, however, no corrections are made for age-related differences in the ability to hear the words. In addition, often the words to be recalled are presented in random lists rather than in meaningful sentences or discourse. Older adults benefit more than younger adults from contextual support for both recognizing and remembering words in sentences that are presented in babble (Pichora-Fuller et al. 1995). The discourse materials used in the comprehension experiments provided rich and socially relevant context. Older adults’ knowledge of language and culture is preserved and is often superior to that of younger adults. It is possible that they used their expert knowledge and were able to take advantage of the richness of contextual support provided in discourse to compensate for poorer basic memory abilities. Alternatively, their poorer memory for heard words presented in lists may have arisen because they were not able to perceptually encode the words as precisely as younger adults owing to age-related changes in auditory processing.

To investigate the extent to which auditory aging is responsible for age-related differences in memory, the ability of younger and older adults to recall words in a paired-associates memory task was measured when the words were masked by babble, but with the SNRs adjusted for individuals’ SRTs in noise (Murphy et al. 2000; Heinrich and Schneider 2011a, b). Even after adjusting SNRs to equate for individuals’ SRTs in noise, older adults were less able to recall the words than younger adults in a wide variety of masking conditions. Interestingly, age-related differences were greatest when the masker was gated on and off with the to-be-remembered words, but they were less pronounced when words were heard in continuous masking. Slower buildup of stream segregation in older adults may contribute to their memory problems when background noise and the words have simultaneous onsets. Overall, age-related declines in auditory processing do seem to exacerbate the memory problems of older adults. In everyday discourse, however, contextual support is abundant and it can help older adults to bind words into meaningful sequences that are easier to remember (for a more detailed discussion, see Schneider et al. 2016a, b).

9.6 Cognitive Aging and Sensory-Cognitive Interactions

9.6.1 Cognitive Aging

Some aspects of cognition decline with age, but others continue to improve. In general, there are declines in dynamic or fluid processing of information, whereas static or crystallized linguistic and world knowledge are well preserved in healthy aging. Importantly, the ability of older adults to use knowledge and contextual support is a strength that they can use to compensate for weaknesses in rapid information processing (Craik and Bialystok 2006). Age-related declines in cognitive processing that could affect communication include slower speed of information processing, reduced working memory, and difficulty dividing attention or selectively attending to relevant information while inhibiting distractions (Pichora-Fuller and Singh 2006).

9.6.2 Sensory-Cognitive Interactions

9.6.2.1 Cognitively Healthy Older Adults

There is growing evidence that audition and cognition interact, even in healthy older communicators who have clinically normal or near-normal audiograms and no clinically significant cognitive impairment (Schneider et al. 2010; Humes et al. 2013). Furthermore, for older adults with clinically significant audiometric threshold elevations, even when amplification has been provided to restore audibility, individual differences in understanding speech in noise remain and are associated with auditory temporal processing cognitive processing abilities (Humes 2007).

On the one hand, declines in auditory processing may impose increased demands on cognitive processing capacity. On the other hand, increased allocation of cognitive resources and use of knowledge can be compensatory when tasks involving listening are challenging (Grady 2012). Furthermore, age-related changes in brain activity and how complex tasks are performed involve more than the effects of ARHL. For older adults, the cognitive demands of multitasking can affect posture and gait (Woollacott and Shumway-Cook 2002). Multisensory integration may reduce cognitive demands when information across modalities is congruent, but increase demands when it is incongruent (Mozolic et al. 2012), including during speech reading (Tye-Murray et al. 2010). Furthermore, social factors such as self-efficacy (Wingfield and Tun 2007), stigma, and ageist stereotypes may affect and be affected by age-related declines in auditory and cognitive performance (Chasteen et al. 2015; Pichora-Fuller 2016).

The interactions of auditory and cognitive aging are seen in how listeners contend with the challenging listening conditions of the cocktail party. In addition to the demands of listening, older adults may have difficulty multitasking or processing conflicting multisensory inputs as they mingle among the guests. Despite these demands on their cognitive resources, they may be motivated to interact socially. They may even benefit from what seem to be distractions so long as information is sufficiently congruent to support the allocation of attention (Weeks and Hasher 2014). When cognitive compensation is insufficient, however, and demands outweigh the possible benefits of social interaction, older adults may cope by withdrawing from noisy social situations.

9.6.2.2 Older Adults with Cognitive Loss

Provocative epidemiological findings indicate that cognitive loss is more prevalent and may progress more quickly in people with hearing loss compared to peers with good hearing, although the mechanisms underpinning these correlations are not yet known (Gates et al. 2011; Lin et al. 2013). Nevertheless, the increasingly common co-occurrence of declines in sensory loss and cognitive loss as people get older suggests that there is not simply increasing prevalence of these conditions with age but that they are interrelated (Albers et al. 2015). When sensory inputs are diminished, there can be short-term consequences to brain functioning. Long-term deprivation or alternations in processing can affect brain neuroplasticity. One possibility is that, as ARHL progresses over decades, the effects of information degradation on memory may become permanent (Dupuis et al. 2015). It remains to be determined if these cognitive declines could be slowed or prevented by auditory exercise such as playing music (Parbery-Clark et al. 2011), or if cognitive training would help older adults compensate for sensory aging (Reuter-Lorenz and Park 2014).

9.6.3 Brain Plasticity and Compensation

There is emerging evidence that the neural networks engaged when people are processing speech differ between younger and older adults (Harris et al. 2009). There is also behavioral evidence that the extent to which younger and older adults engage top-down processes in listening to speech is modulated by the listening situation. As listening becomes more challenging, compensatory use of knowledge increases. Such knowledge includes lexical-level information, sentence-level information, discourse-level information, and world knowledge.

9.6.3.1 Vocabulary

In an analysis of the results of two studies (Schneider et al. 2000; Murphy et al. 2006), no significant correlation was found between how well listeners comprehended a lecture in quiet and the size of their vocabulary (Schneider et al. 2016a, b). However, when the same participants were tested in noisy backgrounds, listening comprehension was strongly correlated with vocabulary scores. These results indicate that when auditory input is degraded, top-down processes involving the use of linguistic knowledge facilitate lexical access for both younger and older adults. However, older adults are more vulnerable than younger adults when contexts are misleading (Rogers et al. 2012).

9.6.3.2 Sentences and Discourse

Once lexical access is achieved, additional processing is needed to integrate words into meaningful sentences, match this information to stored knowledge, construct inferences, and store the information for later recall. It is reasonable to assume that the post-lexical processes subsuming these tasks would be similar and modality independent (e.g., listening versus reading). Indeed, when listening is easy (quiet), and reading is easy (large font), the number of questions correctly answered concerning a story is highly correlated across modalities for both younger and older. In contrast, when listening is difficult (co-located babble masker) and reading is easy, listening comprehension is no longer significantly correlated with reading comprehension for older adults, although the correlation remains high for younger adults (Avivi-Reich et al. 2015). Importantly, age-related differences in listening comprehension were eliminated in these experiments when the SNR was adjusted according to individual participants’ SRTs in noise. Hence, even though there was no age-related difference in the comprehension outcome measure, the results suggest that there are differences in the ways in which younger and older adults engage cognitive processes to achieve speech understanding.

9.7 Summary

Overall, the performance of older adults in situations like a cocktail party is often, but not always, poorer than that of younger adults. In general, older adults need a 2–4 dB better SNR to perform as well as younger adults on tasks involving speech understanding in noise. When the SNR is adjusted according to individual participants’ SRTs in noise, many but not all age-related differences are eliminated. Younger adults are better able to take advantage of the rich interaural cues provided when there is real spatial separation between targets and informational maskers. However, both age groups achieve a similar release from masking when spatial separation is simulated using the precedence effect. Older adults underperform compared to younger adults when speech is speeded and they demonstrate a slower buildup of stream segregation in a wider range of informational masking conditions. However, both age groups demonstrate similar benefit from allocating spatial attention when targets are presented at expected locations and instructions are simple. Older adults have poorer recall, especially when context is minimal. However, when context is available, older adults are better at using it to compensate for difficulties in hearing during comprehension and recall tasks.

Over the last three decades, much has been learned about auditory aging. Behavioral research demonstrates age-related declines in speech processing related to declines in auditory temporal processing at various levels. Electrophysiological research has advanced knowledge of the similarities and differences in how the brains of younger and older adults are engaged in processing complex auditory and speech information. Future research will explore further how auditory aging interacts with age-related changes in cognition and across nonauditory aspects of sensorimotor function. The interactions of these multiple sensory, motor, cognitive, and social factors and how they change over the course of adult aging will need to be studied to understand fully how older adults listen a cocktail parties and in the other complex auditory scenes in everyday life.