Keywords

9.1 Introduction

It has been more than 30 years since the classic report “Speech Understanding and Aging” was published by the Committee on Hearing, Bioacoustics, and Biomechanics (CHABA 1988). The report underscored the potential multifaceted influences of senescent changes in peripheral and central auditory function and cognition, as well as linguistic factors, on speech understanding performance by older adults. However, the evidence to confirm the role of many of these factors was sparse. The report inspired auditory and cognitive researchers to study these issues in greater depth by (1) using better strategies for selecting participants and stimuli, (2) collecting behavioral and electrophysiologic/imaging measures, and (3) evaluating peripheral, central, and cognitive abilities in the same participant cohort. This chapter provides a review of some of the key findings on peripheral, central, and cognitive influences on speech understanding performance by older adults, with a focus on studies published since the first volume on this subject (Gordon-Salant et al. 2010). (Readers are referred to another excellent review on this topic by Anderson et al. 2018.)

The term speech understanding is used in this chapter to refer to a listener’s reception and processing of a spoken message (word, sentence, passage) and recognition of that speech signal as demonstrated by an identification response (repetition, written response, or button press). The phrase speech understanding is differentiated from discourse comprehension, which implies interpretation of the meaning of the spoken message as evidenced by, for example, responses to inferential questions about the message content.

A theoretical framework for considering the interactive roles of peripheral, central, cognitive, and linguistic factors to successful speech understanding is illustrated in Fig. 9.1. The figure shows the bottom-up processes involved in receiving the speech signal at the periphery, which entails the initial analysis of the spectral and temporal cues in the acoustic waveform. For the peripheral analysis to result in an accurate representation of the acoustic parameters of the input stimulus, the signal must be audible across the frequency spectrum and all structures comprising the peripheral auditory system must be intact. Further processing occurs in the central auditory pathway (the perceptual system), which is thought to be responsible for additional encoding of spectral and temporal features of speech in addition to binaural correlation of signals presented to the two ears. In particular, the neural pathways at this level encode rapid signal onsets and signal duration, which are critical for precise representation of the acoustic cues that distinguish one speech unit (phoneme) from another and the stress patterns of speech. Auditory object formation, defined as the ability to focus attention on a separate sound source in a complex environment, is also thought to occur at this level (Shinn-Cunningham 2008).

Fig. 9.1
figure 1

Operations required for understanding speech at the word and sentence level, as well as discourse comprehension, as constrained to a limited capacity-processing resource system. (Adapted from Wingfield and Tun (2007), with permission from the American Academy of Audiology)

Most often, object formation occurs in realistic listening environments when the listener discriminates a target message from a background composed of one or more talkers. The perceptual system is also responsible for initial phonological analysis of the spectrotemporal acoustic information leading to word retrieval. The figure demonstrates that the output of the perceptual system has a feed-forward/backward loop to various linguistic operations, such as lexical access and knowledge of syntax and semantics. This implies that contextual information has a direct impact on speech understanding, to the extent that the listener is proficient in the language. Various cognitive abilities, ranging from working memory and processing speed to inhibition and attention, are relevant to nearly all stages of this processing model. The processes depicted in this model must accommodate not only clean speech (i.e., speech produced by a clear talker in a quiet environment), but also degraded speech that is typical of realistic communication scenarios, such as fast or accented speech in noisy and reverberant environments.

The working premise of this chapter is that the sensory, perceptual, cognitive, and linguistic functions leading to speech understanding are constrained by a limited processing resource model. As these resources become restricted by normal age-related changes to the peripheral and central auditory systems, there is a shift in the reliance on cognitive and linguistic factors to lead to accurate speech understanding. Moreover, as age-related limitations in cognitive processes become evident, older adults must work harder, or expend more effort, in order to maintain speech understanding performance (Kahneman 1973; Pichora-Fuller et al. 2016). The demands of challenging listening situations further exacerbate these problems.

In this chapter, the foregoing concepts are used as a framework for elucidating the connections between audibility and encoding of the speech signal, cognitive and linguistic abilities, speech understanding, and aging. The chapter addresses the following questions:

  1. (a)

    How do age-related changes in structures and functions of the auditory system contribute to speech understanding difficulties experienced by older listeners? (Sect. 9.2)

  2. (b)

    How do age-related changes in cognitive capacity affect speech understanding and how do linguistic abilities modulate these effects? (Sect. 9.3)

  3. (c)

    Do age-related differences in reliance on cognitive skills depend on the type of stimulus, availability of contextual information, stimulus ambiguity, and memory demands? (Sect. 9.4)

  4. (d)

    How does motivation and effort affect speech understanding performance by older listeners in challenging speech tasks? (Sect. 9.5)

  5. (e)

    Do these connections between speech understanding and increased reliance on cognitive and linguistic abilities differ between acoustic listeners (who have some residual hearing) and listeners who use cochlear implants who have no useful residual hearing but rely on an electrical representation of the speech signal that is inherently distorted? (Sect. 9.6.1)

  6. (f)

    What is the nature of speech understanding performance of older adults who are nonnative speakers of English? What mechanisms account for their performance patterns? (Sect. 9.6.2)

9.2 Peripheral and Central Issues

9.2.1 Peripheral Hearing Loss

Hearing sensitivity of older adults varies widely both in degree and configuration (pattern of pure-tone thresholds across frequency) of hearing loss. This variability derives, in part, from the different etiologies that produce sensorineural hearing loss of cochlear origin among older adults, which may include age-related processes, noise exposure, ototoxicity, genetic factors, and disease. Dubno et al. (2013) developed a classification scheme of human audiometric phenotypes associated with age-related hearing loss (ARHL) that were derived from findings with animal models of ARHL of cochlear origin. The principal audiometric phenotypes are shown in Fig. 9.2. Classification of the audiograms of a large cohort of older adults (ages 50–97.5 years, N = 1728 audiograms) indicated that 7.5% were classified as older-normal, 22.5% were classified as metabolic (pattern associated with atrophy and degeneration of the stria vascularis, which is involved in regulating and maintaining the endocochlear potential), 18.8% were classified as sensory (pattern associated with deterioration of sensory hair cells and supporting cells in the cochlea), and 51.2% were classified as combined metabolic + sensory. Thus, the majority of older adults in this sample exhibited a relatively flat hearing loss (10–40 decibels hearing level [dB HL]) in the low frequencies and a steeply sloping hearing loss in the higher frequencies.

Fig. 9.2
figure 2

Schematic boundaries of audiograms corresponding to five phenotypes of age-related hearing loss based on five hypothesized conditions of cochlear pathology. Red hatch marks indicate the range of audiometric thresholds that fall within each phenotype classification. Few participants had the premetabolic audiogram and this phenotype was subsequently removed. db HL, decibel hearing level. (Reproduced from Dubno et al. (2013), https://springerlink.bibliotecabuap.elogim.com/journal/10162, with permission from the Association for Research in Otolaryngology)

An individual’s speech understanding performance is determined by his or her hearing thresholds, as well as by the speech signal and environmental listening condition (quiet or noise). The Articulation Index (AI; ANSI, 1969; and its successor, the Speech Intelligibility Index; ANSI, 1997) is a framework for predicting speech understanding performance given the long-term average speech spectrum (LTASS), the range of speech peaks and minima, the level of background noise, and the audibility of the speech area (portion of the speech signal between 100 and 8000 Hz that is heard). The audibility can be reduced by hearing loss and/or noise. Calculations of the AI, ranging from 0 to 1.0, are based on the effective and audible signal-to-noise ratio (SNR) of the target speech signal and background noise (if present) in each of a number of frequency bands that encompass the speech spectrum.

Humes and Dubno (2010) provide an excellent review of the principles of the AI and its application to several examples of audiograms associated with ARHL. In one example, they demonstrate that for an individual with a typical metabolic + sensory audiometric phenotype described in Sect. 9.2.1 and an input speech level of 62.5 dB SPL (the level of average conversational speech), the AI calculation is 0.53 in a quiet environment, indicating that nearly half of the speech area is inaudible. In the presence of noise (SNR = 0 dB and a spectrum comparable to that of the speech signal), the AI decreases to 0.28. These AI values can be applied to transfer functions for specific speech materials to predict the speech recognition score for that material. In this example, the listener would achieve percent correct scores in quiet of 74% and 99% for low and high-context sentences, respectively, and in noise of 39% and 75% for low and high-context sentences, respectively. Thus, the AI is a useful construct for examining the impact of audibility on speech understanding in quiet and in noise, and indeed, numerous studies have confirmed that the principal limitation in speech understanding by older listeners is reduced audibility of the speech signal associated with ARHL, particularly for speech signals that provide limited contextual information (e.g., Humes and Roberts 1990; Humes et al. 1994).

The audiometric phenotypes described above include an older-normal category, defined as exhibiting hearing thresholds that are ≤10 dB HL from 0.25 to 1.0 kHz, and ≤ 20 dB HL at audiometric frequencies up to 8 kHz. Despite good audibility of the speech signal, older adults with normal hearing often report difficulty understanding speech in noise, which has been verified in the laboratory setting (e.g., Dubno et al. 1984). There are a number of possible reasons for this observation, as discussed later in this chapter. One intriguing theory to explain this phenomenon is that aging is accompanied by a slow deterioration of ribbon synapses beneath the afferent nerve fibers that receive synaptic transmission from inner hair cells, leading to a loss of cochlear neurons (Kujawa and Liberman 2015). This type of neural deterioration, called “cochlear synaptopathy,” appears to affect low spontaneous-rate, high-threshold neural fibers in mice (Kujawa and Liberman 2015). Because high-spontaneous-rate, low-threshold fibers are unaffected, hearing thresholds appear normal. However, the effect of loss of the low spontaneous-rate, high-threshold fibers becomes evident for suprathreshold signals, such as speech in noise. It has been hypothesized that the cumulative effects of cochlear synaptopathy throughout the adult lifespan result in poor temporal and spectral encoding of suprathreshold speech signals at the level of the eighth nerve (Segeyenko et al. 2013). In particular, loss of these neural fibers is thought to reduce precise encoding of rapid signal onsets and signal duration that convey specific phonetic contrasts leading to accurate word recognition.

9.2.2 Decline in Central/Temporal Processes

Aging is associated with deterioration throughout the central auditory pathway, from cochlear nucleus to auditory cortex (reviewed in Syka, Chap. 4 and Recanzone, Chap. 5). The seminal work in this area comes from animal studies showing small losses in neuron numbers in each region as well as a reduction in inhibition, both pre- and postsynaptically, at multiple levels of the central auditory nervous system (see Jayakody et al. 2018 for an extensive review). More recently, imaging studies with healthy adults have demonstrated loss of volume in regions of interest in the brain that are involved in neural networks contributing to auditory and cognitive functions, including the temporal lobe (Scahill et al. 2003), hippocampus (Braak et al. 2011), and pre-frontal cortex (Raz et al. 2001; Pfefferbaum et al. 2013). For adults with ARHL, diffuse imaging measures indicate changes in fiber density, axonal parameters, and myelination of white matter in the superior olivary complex, lateral lemniscus, and inferior colliculus (Chang et al. 2004). Additionally, structural MRI studies of older adults with ARHL show decline in gray matter volume in the auditory cortex (Peelle et al. 2011; Eckert et al. 2012). Finally, decreases in the neurotransmitters GABA and glutamate have been observed among adults with ARHL using MR spectroscopy studies (Profant et al. 2013; Gao et al. 2015). These imaging studies with humans confirm and extend many of the earlier findings observed in animal studies regarding loss of inhibitory neural transmitters and loss of neural tissue at each nucleus along the central auditory pathway. The findings also suggest that the neuroplastic changes in the brain associated with ARHL are not confined to the central auditory pathways but affect association areas as well (Peelle and Wingfield 2016).

Converging evidence indicates that these age-related structural and neurochemical alterations in the central auditory pathways affect encoding of the temporal characteristics of speech. One technique used to measure neural encoding of speech is to present a speech stimulus to listeners and to record subcortical or cortical responses to the signal. Anderson et al. (2012; see also Presacco et al. 2015) recorded brainstem responses using electroencephalography (EEG) to the speech syllable, /da/, and reported that older normal-hearing adults, ages 60–67 years, showed later peak latencies to the syllable onset compared to younger adults, ages 18–30 years. An example of age-related differences in the brainstem response to a speech syllable is illustrated in Fig. 9.3. The older adults also showed less consistent responses and lower amplitudes across the entire syllable compared to younger adults. Finally, a phase-locking factor, indicating trial-to-trial coherence, revealed better neural phase locking for younger compared to older listeners. These findings provide strong confirmation that normal aging is accompanied by both delayed neural timing and less neural precision, relative to younger adults, for processing of speech stimuli.

Fig. 9.3
figure 3

Neural delays (mean ± 1 SE) in the aging population for the syllable /da/. The x-axis represents the peak analyzed for each subject, while the y-axis represents the normalized peak latency for each subject. To facilitate visualization of the data, peak latencies on the y-axis were normalized. Normalization was obtained by subtracting the expected latency from the /da/ (8, 32, 42, 52, 62 ms, etc.) from the actual response latency until 112 ms, for the transition and the steady state. Negative values indicate that the peaks were early with respect to the expected latency, positive values indicate that the peaks were late with respect to the expected latency. Older adults show a shift in neural response timing relative to the younger adults to the syllable /da/ for both the onset and transition peaks (32–52), but not for the steady state with the exception of peak 102. *p < 0.05, **p < 0.01, ***p < 0.001. [Adapted from Presacco et al. (2015), https://journals.lww.com/ear-hearing/pages/default.aspx, with permission from the American Auditory Society]

A subsequent study evaluated brainstem and cortical responses to speech signals presented in noise to younger and older listeners with normal hearing (Presacco et al. 2016). The frequency-following response (FFR) was measured for the speech syllable /da/ presented in quiet and in the presence of a single competing female talker at four SNRs. In addition, neural magnetic responses were recorded in a magnetoencephalography (MEG) system while younger and older listeners with normal hearing attended to a target story in quiet or in the presence of a single competing talker at the same four SNRs as used in the EEG experiment. Midbrain FFR responses of younger adults were more robust, and responses in noise were better correlated to responses in quiet, compared to those of older adults. That is, neural encoding of periodicity in the speech envelope was less accurate among the older adults than the younger adults in the presence of competing speech, reflecting reduced temporal processing that may be associated with decreased speech understanding in noise. The MEG data of the older adults showed an overrepresentation of the cortical response, as well as a substantial decrease in the accuracy in decoding the target speech signal in the presence of the competing talker. Overall, these findings suggest that temporal processing deficits are evident at the midbrain, and compensation through neural enhancement at cortical levels may not improve accuracy in processing of speech in noise.

9.2.3 Effects of Decline in Central Auditory Temporal Processing on Speech Understanding in Quiet

The impact of age-related decline in the auditory system’s ability to process rapid events and periodicity in the speech signal (i.e., auditory temporal processing) is manifested on multiple types of speech tasks. At the segmental level, older listeners require longer differences in the duration of acoustic cues that distinguish one speech sound from another, compared to younger listeners, as exhibited on identification functions for continua of two speech syllables differing in a single temporal cue. One example is the identification function for a dish to ditch continuum, in which older listeners require a longer silent-interval duration to shift their percept from the sibilant /ʃ/ to the affricate /tʃ/ (Gordon-Salant et al. 2008).

Similar observations have been made for other speech continua, including those that vary in voice-onset time as a cue to initial stop voicing, vowel duration as a cue to post-vocalic voicing, and transition duration as a cue to the stop-glide distinction (Gordon-Salant et al. 2008). These age-related differences in the ability to use brief temporal cues to distinguish speech segments are even greater for vocoded speech, which provides limited spectral cues, suggesting that older adults who use cochlear implants may experience additional deficits in perceiving temporal attributes of speech, beyond those observed for older acoustic listeners (Goupell et al. 2017).

Alterations in the typical rhythm, timing, and stress patterns of spoken English (i.e., sentences) also have a substantial effect on speech understanding performance of older listeners. This is thought to be another manifestation of age-related changes in the precision of neural encoding of stimulus onsets. Older adults often have difficulty understanding speech that is presented at a fast rate, usually implemented by time compression algorithms that increase the presentation rate without creating spectral distortion (e.g., Schneider et al. 2005). Listener experience with rapid speech, contextual information, and slowing at phrasal boundaries can minimize older listeners’ difficulty understanding time-compressed speech in laboratory settings (Wingfield et al. 1985, 1999; Gordon-Salant and Friedman 2011). Nonetheless, most older adults, even those with normal hearing, experience disproportionate difficulty understanding naturally fast speech in everyday listening situations (Gordon-Salant et al. 2014).

Another type of temporally altered speech signal encountered in everyday life is foreign-accented speech. Nonnative speakers of English often retain the rhythm and timing pattern of their native language when learning English, and these patterns are often different from the stress-timing pattern of American English. In addition, nonnative speakers of English exhibit changes in overall stimulus duration and may insert pauses at inappropriate junctures in a spoken message. Numerous reports now indicate that older adults exhibit poorer understanding of foreign-accented speech than younger adults with comparable hearing sensitivity (Hargus Ferguson et al. 2010; Gordon-Salant et al. 2013), which may be attributed, at least in part, to difficulty following unexpected changes in stress and timing because of senescent changes in auditory temporal processing.

9.2.4 Speech Stream Segregation and Decline in Central-Temporal Processing

Central auditory temporal processing deficits have also been implicated in speech stream segregation. This refers to the ability to separate a target speech message from a competing speech message and underlies speech understanding performance in the presence of competing talkers. Older listeners have considerable difficulty understanding speech in a background of competing talkers (Tun and Wingfield 1999; Helfer and Freyman 2008; see also Gallun and Best, Chap.8).

Two types of auditory temporal processing abilities have been associated with speech stream segregation. The first is amplitude modulation detection, or the ability to detect a brief decrement in the amplitude of the temporal envelope. Temporal envelope modulation detection enables a listener to detect changes in the competing signal’s temporal waveform, thus enabling the listener to take advantage of momentary increments in the SNR corresponding to dips in the waveform of the competing message. It is associated with changes in neural firing rate to the stimulus (Hopkins and Moore 2011). The second auditory temporal processing ability that may contribute to speech stream segregation is sensitivity to temporal fine structure (TFS), which refers to the relatively rapid oscillations within each frequency band of speech. TFS provides information about voice pitch (Moore 2016) and is conveyed in the neural phase-locking response to an acoustic stimulus.

Numerous investigations have examined associations between temporal-envelope amplitude modulation detection and speech understanding, as well as between TFS sensitivity and speech understanding, by younger and older adult listeners. Results have been somewhat mixed. An investigation by Füllgrabe et al. (2015) comprehensively evaluated the performance of younger and older listeners with normal hearing on a number of speech understanding, psychoacoustic, and cognitive measures. They reported a significant age effect on sentence identification performance in the presence of a two-talker competing masker, as well as on measures of temporal envelope detection and TFS sensitivity. Moreover, TFS sensitivity was highly correlated with sentence identification in noise (r = 0.805, p < 0.001), which remained after the effects of age and cognition were partialled out. These findings reinforce and extend conclusions from electrophysiology studies indicating that age-related changes in phase-locking to a speech stimulus likely contribute to difficulties in speech stream segregation and understanding speech in the presence of competing talkers (see also Sect. 9.3.2).

9.2.5 Phonological Analysis and Lexical Processing

According to the Wingfield and Tun (2007) model shown in Fig. 9.1, the subsequent stage following peripheral and central auditory system analysis of the speech signal and attention to the target speech stream is phonological analysis and lexical processing. Phonological analysis refers to the processing of sounds (i.e., phonemes) that comprise a word in an individual’s language. There is a considerable body of research investigating phonological awareness and retrieval in young children as a predictor of acquisition of spoken language and literacy success, but relatively little work has investigated the ability of older adults to conduct online phonological analysis to support word recognition. One technique used to assess online phonological analysis and word recognition is the Visual World Paradigm (Allopenna et al. 1998). The paradigm uses eye-tracking to monitor the time-course of a listener’s identification of a spoken word, via eye gaze, from a limited set of competitor words represented by printed words or object pictures presented visually. The competitor stimuli are selected such that their individual phonemes differ from those of the target at varying positions in the word as the presentation of the word unfolds. For example, the target and competitor can differ in the initial position, as in the rhyming words pin versus bin, or in the final position with overlapping word onset as in pin versus pit. By monitoring eye movements to the target and competitors, the investigator can measure the listener’s speed of online phonological analysis. This technique has an advantage over measuring the speed of an overt response as there are only minimal age differences in the velocity of saccadic eye movements (Ayasse et al. 2017).

Ben-David et al. (2011) surmised that this online phonological analysis would be altered in older adults compared to that in younger adults, either because of slowed processing that would limit the older listener’s ability to keep up with the presentation of the target word, or because of reduced inhibition that would diminish the ability to suppress the strength of the competitor words. They compared word recognition accuracy and timing of eye movements relative to word onset for younger and older adults in conditions that varied the number of syllables, type of competitor (rhyming or word onset overlap) and presence or absence of noise, while controlling for word recognition accuracy. Age-related differences were observed in discrimination of targets from rhyming words in the presence of noise, which lasted up to 900 ms after stimulus onset. This finding suggests that older adults require longer stimulus onsets than younger adults to achieve word recognition accuracy in noise, consistent with age-related deficits in auditory temporal processing described earlier.

Accurate speech understanding also depends on the ability to identify a match between the phonemes analyzed and the listener’s mental lexicon. This lexical processing is influenced by the frequency of occurrence of the spoken word in the language and the number of words in the lexicon that have overlapping phonemes with the target word, referred to as the neighborhood density. The Neighborhood Activation Model (Luce and Pisoni 1998) theorizes that word recognition accuracy is higher if the word comes from a high-frequency, low-density neighborhood than if it comes from a low-frequency, high-density neighborhood. Essentially, sparse lexical neighborhoods comprise fewer competitors to a target word that must be inhibited by the listener for accurate word retrieval.

Because older adults may have age-related changes in the ability to inhibit irrelevant information, it could be predicted that the effect of neighborhood density on word recognition accuracy would differ between younger and older listeners. Taler et al. (2010) assessed the performance of younger and older adults with normal hearing on a sentence understanding task in which keywords varied by word frequency and neighborhood density. Compared to younger adults, older adults showed poorer accuracy overall and stronger neighborhood density effects, especially for low-frequency stimuli presented at a relatively low SNR of −3 dB. Correlation analysis between selected cognitive measures and the difference score of performance in more difficult vs. easier neighborhood density conditions revealed a significant correlation between the measure of inhibition (Stroop test—color-word naming condition) and the neighborhood density effect at the more difficult SNR presented, indicating that lower inhibitory function is associated with a large neighborhood density effect.

Helfer and Jesse (2015) extended the findings of age-related differences in lexical effects on recall of target words in sentences in the presence of a single competing talker by examining lexical effects observed not only in target stimulus recall but also in the pattern of intrusive errors from the competing speech masker. Whereas neighborhood density exerted a strong influence on target word recall by older listeners, high-frequency words in the masker were more likely to appear as incorrect target responses than low-frequency words by older listeners. Overall, the findings support those of Taler et al. (2010) and suggest that at least one factor contributing to difficulties experienced by older adults in noise is poorer access and retrieval of words from high-density lexical neighborhoods, which appears to be associated with a limited ability to inhibit the multiple irrelevant competitor’s characteristic of these neighborhoods.

9.3 Cognitive Processes

9.3.1 Cognitive Change in Adult Aging

Two cognitive factors, working memory and inhibition, have been mentioned to this point in the context of adult aging, along with their importance for a full picture of speech understanding in the older adult. Working memory is defined in the cognitive literature as a limited capacity system that enables the individual to temporarily hold (store) and manipulate (process) information in immediate memory (Baddeley 2012). Inhibition refers to the ability to prevent other mental or external sources from interfering with these working memory operations (Hasher et al. 2007). Although the nature of working memory and inhibition remains an active research area in cognitive psychology, a representative characterization can be found in McCabe et al. (2010). Based on relationships and overlaps between multiple test batteries, these authors define working memory as the ability to store and manipulate information in immediate memory, and inhibition as part of a broader executive system that includes monitoring and updating performance and shifting attentional set.

A third factor associated with adult aging is a general slowing in a range of perceptual and cognitive operations (Salthouse 1996). One of several mechanisms proposed to underlie the limited capacity of working memory has been a time-based model in which switching attention from processing to storage, to updating and refreshing the memory trace, are constrained by the time parameters of these processes (Barrouillet et al. 2004). A discussion of attention-based models of working memory (e.g., Cowan 1999; Engle 2002) can be found in Wilhelm et al. (2013). Readers interested on the development of current concepts of working memory resources can find a review in Wingfield (2016).

Two final points should be made. The first is that, like peripheral hearing acuity and effectiveness in central auditory processing, these cognitive fundamentals (working memory, inhibition, speed of processing) tend to decline in adult aging, but with wide differences from individual to individual. The second point is that when considering age-related changes in speech understanding, hearing and cognitive factors do more than exert independent effects on communicative success. Rather, the quality of speech understanding results from their interaction (Arlinger et al. 2009; Jerger, cited in Fabry, 2011, p. 20).

9.3.2 Cognition and Speech Understanding in Degraded and Complex Listening Environments

Over the last 10–15 years, there has been an increasing awareness by audiologists and hearing scientists of the role of cognition on measures of speech understanding (Humes et al. 2012), and a corresponding awareness by cognitive psychologists of the importance of audibility and central auditory processing to measures of cognitive processing that involve auditory presentations of stimuli. The dynamic contributions of auditory and cognitive interactions are most apparent when attempting to unravel the principal sources of speech understanding problems of older listeners in degraded and complex listening environments.

A considerable body of research has now amassed that examines the relative importance of peripheral, central, and cognitive abilities in predicting age-related decline in speech understanding performance. An exhaustive review of this literature is beyond the scope of this chapter. Nonetheless, certain trends have emerged. One trend is that measures in many cognitive domains have been associated with age-related differences in speech understanding performance in noise, even when differences in hearing sensitivity are controlled. (Note that understanding of undistorted speech in quiet is highly predictable based on signal audibility for younger and older adults, as discussed in Sect. 9.2.1). These include measures of attention/inhibition (Janse 2012), processing speed (Füllgrabe et al. 2015), executive control (Ward et al. 2017), and working memory (Füllgrabe et al. 2015). Some investigations employ a factor analysis approach in which a composite measure of cognitive performance is derived; these composite measures typically are correlated highly with speech understanding performance in noise (i.e., Füllgrabe et al. 2015).

A domain in which inhibitory ability is critical to effective functioning is in complex listening tasks such as the selection of segregated signals for attention. This is indicated in Fig. 9.1 as an attentional filter that limits the ability to analyze more than one speech stream at a time. The term “cocktail party problem” was coined by Cherry (1957) to refer to one’s ability to attend to a single speaker while being unaware of the content of other talkers speaking simultaneously. The fact that listeners can detect their name being spoken by a previously non-attended speaker about 30% of the time (Moray 1959) implies periodic switching of the filter to a fading echoic trace of the other voice (Broadbent 1971) or shifting the relative allocation of processing resources from one source to the other (Treisman 1969).

Consistent with arguments for an age-related inhibition deficit is the finding that older adults are more influenced by the semantic content of a to-be-ignored voice in a multiple talker situation than younger adults. For example, in a study by Tun et al. (2002) it was shown that, relative to young adults, older adults experience more interference from a second, to-be-ignored talker speaking English, than one speaking an unknown language with a similar phonological inventory (Dutch). A regression analysis conducted on these data revealed that executive control (inhibition) as measured by the Trail Making Test contributed significant variance to the ability to prevent interference from a background speaker even after accounting for hearing acuity. It is thus the case that older adults’ difficulty in following a single speaker in a noisy background results from deficits at both central auditory processing and cognitive levels.

It is also the case that inhibition and working memory tend to be correlated in adult aging, and it has been suggested that working memory capacity is predictive of the effectiveness of inhibitory processes (Sörqvist et al. 2012; Lash and Wingfield 2014). Indeed, of the various cognitive domains assessed across a wide range of studies, working memory emerges consistently as a key factor contributing to speech understanding performance in noise, to include competing speech, by older adults.

9.4 Working Memory, Linguistic Context, and Speech Understanding

9.4.1 The Ease of Language Understanding Model

As indicated, working memory capacity is viewed as a limited capacity system in which, in the case of speech, the listener carries out a complex signal processing task and holds that information in a memory store for later retrieval. Rönnberg et al. (2008, 2013) have offered the Ease of Language Understanding (ELU) model as a theoretical construct for clarifying the role of the working memory system to speech understanding. The theory postulates that working memory is explicitly engaged when phonological analysis does not yield a clear signal as a result of distortions associated with signal processing devices or the presence of a noise background. Once engaged, working memory enables the listener to access stable information held in long-term episodic memory or semantic memory to aid speech understanding. (A further discussion of the ELU model and a critical analysis can be found in Wingfield et al. 2015.)

Generally, it has been observed that individuals with high working memory capacity perform better on speech understanding tasks in noise and with fast-acting compression in hearing aids (Rudner et al. 2011; Souza and Sirow 2014), indicating that they are better able to access the information held in long-term memory or semantic memory to improve performance compared to those with low working memory capacity. Many of these prior studies evaluated adults who varied widely in age and hearing sensitivity.

Because decline in hearing sensitivity contributes to speech understanding deficits, an ideal strategy for examining possible age-related differences influencing the impact of working memory capacity on speech understanding is to evaluate listeners with normal hearing. Schurman et al. (2014) measured the SNR corresponding to 80% correct performance (SNR80) for high-context and anomalous-context sentences presented in different noise maskers using an immediate recall task in which the listener immediately repeated the sentence presented. After adjusting the SNR to the level corresponding to 80% correct performance, the investigators presented the same stimuli in a delayed recall task, in which the listener recalled the sentence presented prior to the most recent sentence. Older listeners with normal hearing showed poorer SNR80 scores than younger adults in both sentence contexts and all masker types on the immediate recall task, although both listener groups took advantage of contextual information (i.e., better SNR80 scores in the high-context context compared to the anomalous-context condition). However, even after equating performance for the two age groups on the immediate recall task, substantial age effects were observed in the delayed recall task. In other words, when younger and older listeners are equated in speech recognition performance in noise, older listeners perform more poorly than younger listeners when a memory component is added to the task. These age differences were consistent for both sentence types and across masker types, as shown in Fig. 9.4. Scores on the listening (L)-SPAN test, an auditory version of the reading (R-) SPAN test (Daneman and Carpenter 1980) were highly correlated with delayed recall performance (see Fig. 9.4). The results strongly indicate that working memory is highly related to performance on everyday speech understanding tasks that involve listening in noise and waiting to respond to a target message, simulated in this study as a delayed recall task.

Fig. 9.4
figure 4

Relationship between scores on a test of working memory (Listening Span) and percent correct sentence recognition on a delayed recall task, shown separately for four masker types [1-Talker (1 T), 2-Talker (2 T), 2-talker spatially separated (2 T-spatial) and speech spectrum noise (Noise)]. Listening Span scores are collapsed across four listening span categories (2 = scores of 2 and 2.5; 3 = scores of 3 and 3.5; 4 = scores of 4 and 4.5; 5 = scores of 5 and 5.5). Symbols represent the average percent correct scores in the delayed speech recognition task, with open circles shown for young normal-hearing listeners (YNH) and filled squares shown for older normal-hearing listeners (ONH). Individual listener data points are also plotted. Note that performance on the delayed sentence recall task is poorer for older than for younger listeners, and that scores on the L-SPAN test are highly related to performance on the delayed recall task. (Reproduced from Schurman et al. (2014), with the permission of the Acoustical Society of America)

Working memory declines as a function of the normal aging process in the general population (Lipnicki et al. 2017), which makes it difficult to determine whether or not decline in working memory contributes to speech understanding performance independently of age. Gordon-Salant and Cole (2016) measured word and sentence understanding in noise by four groups of listeners with normal hearing: younger and older listeners with high working memory capacity and younger and older listeners with low working memory capacity. For words, younger listeners achieved lower (better) SNR scores than older listeners in both working memory groups, and listeners with high working memory capacity (both young and older) achieved lower SNR scores than those with low working memory capacity. For sentences, older listeners with low working memory capacity showed higher SNRs than younger listeners with low working memory capacity, but this age effect was not shown for individuals with high working memory capacity. Essentially, the older listeners with high working memory capacity were able to take considerable advantage of contextual information in sentences, which served to minimize age differences. These findings generally suggest that working memory capacity has a significant effect on speech understanding in noise, independent of listener age and hearing sensitivity. However, as some older listeners acquire both age-related hearing loss and age-related decline in working memory capacity, these individuals may be expected to experience considerable difficulties on speech understanding tasks in noise.

Although studies of context effects often contrast SNRs necessary for recognizing words heard within a constraining sentence context versus words heard in the absence of a constraining context, there is a systematic relationship between ease of word recognition and the degree of constraint as a continuous variable. That is, it can be shown that the SNR needed to recognize a word heard within a sentence context is inversely proportional to the logarithm of its probability in that context. Such probabilities are available in published norms developed using a “cloze” procedure (Taylor 1953), in which the transitional probability of a word within a sentence context is estimated by the percentage of individuals who give that word when asked to complete a sentence with what they believe is a likely final word (Lahar et al. 2004).

Benichov et al. (2012) have shown that this relationship between ease of word recognition and the contextual probability of the word holds for both young adults and older adults with normal or impaired hearing acuity, differing only in the y-intercepts and steepness of the slope functions. In addition, post hoc regression analyses showed that while the relative contribution of hearing acuity to identification of words in noise decreased with increasing degrees of contextual support, a cognitive test battery that included working memory and processing speed remained a significant predictor of the SNR needed for identification of words heard in isolation as well as with a constraining linguistic context.

Most studies of context effects have focused on facilitative effects of a sentential context that lead up to a target word. There are, however, occasions when a poorly articulated or noise-masked word goes unrecognized until one hears the context that follows the word. Although older adults are as, or more, effective as young adults in using prior context to facilitate recognition of such words relative to their baselines for words in isolation, older adults are less effective than young adults at using a following context for retrospective recognition of an acoustically indistinct word (Wingfield et al. 1994). Because such retrospective recognition relies on an effective memory trace of the acoustically ambiguous region, this finding highlights an additional area in which an age-limited working memory can place the older adult at a disadvantage in speech understanding.

9.4.2 False Hearing

Older adults’ facility in using a linguistic context to aid word recognition can have negative consequences if context is over-used. Such a case can occur when an acoustically indistinct word is misperceived as a word that fits the context better than the word actually uttered. Rogers et al. (2012) have found that such context-based misrecognitions are more likely to occur in older than younger adults, and that older adults are more likely than younger adults to have inappropriately high confidence in the correctness of such misrecognitions. Rogers and colleagues refer to high confidence misrecognition as “false hearing.” Importantly, the higher incidence of false hearing in older adults has been shown to be largely independent of the acoustic clarity of the target word (Rogers and Wingfield 2015). This raises the likelihood that the effect is a consequence of older adults’ reduced ability to inhibit high probability responses as part of a general inhibitory deficit as previously discussed in Sect. 9.3.

9.5 The Cost of Listening Effort

Although historically audiologists and hearing scientists have concentrated on hearing impaired listeners’ failures in speech perception, there has been increasing attention to a cost of successful perception when faced with a degraded acoustic signal. Sometimes called an “effortfulness effect,” it has been shown that the extra resources needed to successfully recognize a degraded speech signal can draw resources that would otherwise be available for encoding what has been heard in memory (Rabbitt 1991; McCoy et al. 2005) or for successful comprehension of a sentence that expresses its meaning with complex syntax (Wingfield et al. 2006).

This limited resource notion and the central role of listening effort in older (and younger) adults with impaired hearing has been encapsulated in the Framework for Understanding Effortful Listening (FUEL). This framework, derived from Kahneman’s (1973) limited-resource model, conceptualizes successful speech understanding as dependent on a balance between the clarity of an acoustic stimulus, the task demands, and one’s motivation to expend the necessary effort to meet the processing challenge (Pichora-Fuller et al. 2016). At the sentence level, effortful listening consequent to hearing loss or listening in noise, will interact in a multiplicative fashion with the linguistic complexity of the speech. To the extent that older adults have limited working memory resources, it can be seen that the detrimental consequences of listening effort on speech understanding will be differentially greater for older adults relative to young adults. Integral to the FUEL model, detrimental effects of listening effort will appear even when it can be shown that the speech itself has been correctly perceived, albeit with some effort.

Although the focus of this chapter is on recognition of a speech stimulus, a study by DeCaro et al. (2016) illustrates that when this process involves resource-demanding perceptual effort, detrimental consequences appear at the level of linguistic processing. Following a limited-resource postulate of the FUEL model, this detrimental effect will be especially marked when listeners hear sentences with syntactic structures that place a heavy demand on working memory for successful comprehension. DeCaro and colleagues tested comprehension accuracy for syntactically simple and syntactically complex sentences with three groups of listeners: young adults with normal hearing, older adults with good hearing (viz., a pure-tone average across 0.5, 1, 2, and 4 kHz < 25 dB HL), and an age-matched group of older adults with a mild-to-moderate hearing loss.

Although comprehension accuracy for syntactically simple sentences was excellent for all three participant groups, there were significantly more comprehension errors for the syntactically complex sentences, with the good-hearing older adults having more comprehension errors for these complex sentences than the young adults, and more errors still for the older adults with a mild-to-moderate hearing loss. Critically, these data were obtained even though the simpler and more complex sentences were recorded by the same speaker, had the same word-length, and were presented at the same suprathreshold, audible level. They differed only in the working memory demands they placed on the listener as the listener attempted to process the meaning of the sentence at the linguistic/cognitive level.

These data fit well within the FUEL limited-resource model (Pichora-Fuller et al. 2016). Even though the hearing-impaired older adults may have required more resources for perceptual encoding of the acoustic stimuli, the minimal working memory resources required for processing the syntactically simpler sentences left sufficient spare capacity for the excellent comprehension performance observed for these sentences. The corollary to this principle is that this same degree of perceptual effort, but now combined with the heavier resource demands required for comprehension of the complex sentences, left little spare capacity, with the resultant appearance of comprehension errors. That is, the quality of comprehension performance will reflect a balance of the resource demands imposed by the clarity of the acoustic signal, the resources required for successful processing at the linguistic level, and the level of working memory or other cognitive resources available to the participant.

The heavy resource drain of a word-by-word syntactic analysis when complex or syntactically underspecified sentences are encountered can in some cases lead listeners to a resource-conserving strategy of sampling just a few key words and inferring the meaning based on plausibility. For example, even when presented at a suprathreshold level, older adults, and especially older adults with hearing impairment, are more likely than normal-hearing young adults to respond to the sentence, “The eagle that the rabbit attacked was large” by saying that the eagle attacked the rabbit (Amichetti et al. 2016). Because we live in a plausible world, this form of experience-based shallow analysis can yield correct comprehension; it fails, however, when a sentence contains an unexpected meaning or a counterintuitive observation as can sometimes occur.

Many older adults with ARHL report an almost palpable sense of cognitive fatigue after a day of effortful listening, and several published papers have addressed the relationship between effort and fatigue, and have attempted to develop operational definitions of the two (e.g., McGarrigle et al. 2014; Wang et al. 2018). Equally important is the need to develop objective measures of processing effort that can be assessed independently from task performance (see Kuchinsky, Chap. 10).

9.6 Emerging Issues/New Directions

9.6.1 Aging, Cochlear Implants, and Speech Understanding

For individuals with more severe degrees of hearing loss for whom conventional amplification using hearing aids does not improve speech understanding, cochlear implants (CIs) may be considered as a treatment option. CIs are auditory prosthetic devices that are surgically implanted into the cochlea in order to bypass damaged inner ear structures and to directly stimulate the auditory nerve via electrical pulses. The current candidacy criteria for cochlear implantation in adults do not specify an age limit; in fact, there are cases of individuals over 100 years of age receiving a CI. Given the incidence of hearing loss among the growing population of older adults, in addition to the introduction of more inclusive CI candidacy criteria, it is safe to assume that the number of older adults receiving CIs will continue to increase (e.g., Dillon et al. 2013). However, this trend presents an emerging issue of whether CIs are equally beneficial to older recipients as they are to younger adult CI recipients.

Cochlear implantation in individuals over 65 years old is associated with significant improvements in speech understanding scores and quality-of-life measures (Shin et al. 2000; Vermeire et al. 2005). Although CIs undoubtedly improve speech understanding ability in almost all adult CI recipients regardless of their age, postimplantation performance in older CI users may be worse when compared to younger users (Blamey et al. 2013; Sladen and Zappler 2015). However, there is conflicting evidence on the effect of advancing age on CI performance. When the amount of benefit one receives from a CI is defined by the improvement in post-implantation speech understanding scores compared to preimplantation scores, there is no impact of age on implant benefit (Pasanisi et al. 2003; UK Cochlear Implant Study Group 2004). On the other hand, because older CI candidates may have poorer pre-implantation scores than younger candidates, this could ultimately result in a substantial performance gap between younger and older CI users. Sladen and Zappler (2015) evaluated post-implantation speech understanding by measuring word and sentence recognition in quiet and in noise for an older group (mean = 70.7 years) and a younger group (mean = 39.7 years). Results showed that the older group performed significantly worse than the younger group on all speech understanding measures, with the largest group differences observed in the speech-in-noise conditions with the worst SNRs.

Given the decline in central auditory processing and cognition with age, the question remains as to whether special considerations are required for older CI recipients. If older CI users perform more poorly than younger CI users on everyday speech communication tasks, then there is a need to examine the factors that underlie this problem and identify solutions to improve performance specifically for older adults. For example, individualized device programming using a lower electrical stimulation rate for older CI users has been suggested by many clinical audiologists. and by Wolfe and Schafer (2014) and by Shader et al. (2020). Lower stimulation rates below approximately 1000 pulses per second may benefit older CI users due to declines in central auditory processing. Age-related central auditory deficits could prevent older CI users’ auditory systems from processing a higher information rate of the electrical signal delivered with faster stimulation rates.

As a general rule, performance with a CI varies widely across individuals. While some individuals are only able to improve their sound awareness, many others can achieve excellent open-set speech understanding scores in quiet (Gifford et al. 2008; Holden et al. 2013). Much of the variability in speech understanding scores among CI users can be explained by factors that impact the bottom-up integrity of the signal. These factors include age at onset of severe to profound hearing loss, duration of hearing loss prior to implantation, and the etiology of the hearing loss (Blamey et al. 2013). Earlier onsets of hearing loss with prolonged periods of auditory deprivation prior to implantation cause neural degeneration of the spiral ganglion cells (Leake et al. 1999), which can limit the ability of the auditory nerve to accurately encode electrical signals. This would result in further degradation of the signal received by the CI user. However, even when these well-established factors are taken into account, a large amount of unexplained variability in performance remains. Cognitive factors that impact an individual’s top-down processing ability could also affect speech understanding performance and contribute to this individual variability.

Age-related cognitive decline, coupled with the speech signal distortion resulting from digital signal processing algorithms incorporated in CIs, also present a potential issue for older CI users. CIs present electrical pulse trains that are amplitude modulated by the extracted envelopes derived from the acoustic input. The result is an auditory percept that is highly degraded within the spectral domain with a relatively intact temporal envelope. CI-processed speech signals present a unique form of signal degradation that substantially disrupts the bottom-up sensory input, which places a higher demand on top-down processes for successful speech understanding. Therefore, older CI users may be at a greater disadvantage compared to younger users due to age-related cognitive decline.

An age-related decline in cognitive processing has been observed in older CI users (Holden et al. 2013; Moberly et al. 2017a). Moreover, cognitive ability has been shown to correlate with speech understanding scores in CI users. Holden et al. (2013) evaluated speech understanding in 114 adult CI users and found that a composite measure of cognition was positively correlated with word recognition scores. However, when controlling for the negative effect of age on cognitive scores, there was no longer a relationship between speech understanding and cognition. This result suggested that age-related cognitive decline may have negatively impacted word recognition scores. A study by Schvartz et al. (2008) measured CI-simulated phoneme recognition in younger, middle-aged, and older normal-hearing listeners. When the acoustic stimuli were more severely degraded, younger listeners had better phoneme recognition than middle-aged and older listeners. Age of the listener and working memory ability were the primary predictors of vowel recognition performance. Working memory ability specifically has also been shown to correlate with speech understanding scores in CI users (Tao et al. 2014; Moberly et al. 2017b). The combination of age-related cognitive decline and the delivery of highly degraded speech signals presents a special challenge to older CI users.

CI users, like other individuals, can also make excellent use of linguistic context to aid word recognition (Winn 2016). As previously noted (Sect. 9.2.5), however, linguistic context can activate a large number of potential words that might reasonably fit the sentence context. Amichetti et al. (2018) evaluated the positive effects of sentence context and potential negative effects of response competition on word recognition in younger adult (mean age 22.5 years) and older adult (mean age 67.5 years) CI users. The left panel in Fig. 9.5 shows the positive effects of linguistic context on word recognition using word-onset gating: participants heard the first 50 ms of a recorded word, then the first 100 ms of that word, then the first 150 ms, and so on, until the word could be correctly identified (Wingfield et al. 1991; Grosjean 1996). The target words were presented as the final words of sentences; the context of the sentences was varied in their probability of suggesting the target word based on the previously described “cloze” norms. These probabilities are shown in parentheses on the x-axis in the left panel of Fig. 9.5. It can be seen that for both the younger and older adult CI users, the amount of word onset information needed to correctly identify a target word decreased with increasing contextual probability of the target word, with the age difference that appears for words in a low context sentence frame reduced with medium and high contextual constraints.

Fig. 9.5
figure 5

Left panel shows mean onset gate size required for correct recognition of words heard with a low, medium, or high degree of linguistic context by younger and older adult cochlear implant users. Numbers on the abscissa are mean cloze probability values of the target words. Right panel shows mean gate size required for correct word recognition with low medium, or high degrees of response entropy for the same participants. Numbers on the abscissa are mean calculated entropy values of the target words. (Adapted from Amichetti et al. (2018), https://journals.lww.com/ear-hearing/pages/default.aspx, with the permission of the American Auditory Society)

As previously noted (Sect. 9.3.1), a major factor in cognitive aging is reduced efficiency in inhibiting interference from competing responses (Hasher et al. 2007). This is demonstrated in the right panel of Fig. 9.5, which shows the mean gate size needed for correct word recognition as a function of response entropy. As distinct from the stimulus probability, response entropy serves as a measure of response uncertainty, calculated as the number and probability distribution of alternative words that also fit the sentence context. This information is also available from published cloze norms (e.g., Lahar et al. 2004). High entropy occurs when all possible responses are equally likely; lower entropy occurs when some possibilities are more predictable than the others (Shannon and Weaver 1949; van Rooij and Plomp 1991). Consistent with findings for normal-hearing young and older adults (Lash et al. 2013), with high response entropy (many alternatives that could fit the sentence frame) the older CI users required a larger onset gate size for word recognition than the younger adult CI users. This is the result that would be predicted from an age-related inhibition deficit.

These results show that CI users’ word recognition is highly sensitive to linguistic context, with older CI users gaining a larger advantage from sentence context compared to younger CI users. However, this sensitivity to linguistic context resulted in increased interference from other potential words that also fit the semantic context, which had a negative effect on older subjects’ word recognition. Therefore, older adult CI users may still be at a disadvantage compared to younger CI users, even in the presence of a robust linguistic context.

Despite the limited speech cues delivered by a CI and age-related cognitive factors, implantation has been shown to improve cognitive function in older recipients (Cosetti et al. 2016; Völter et al. 2018). Taken together, recent findings suggest that CIs provide benefit to older candidates for improving speech understanding in quiet and in noise, and for reducing age-related cognitive decline. Thus, these devices are a highly viable treatment option for older adults, but performance with CIs likely could improve further with refinement of device settings as well as with training programs that strengthen cognitive skills.

9.6.2 Language Background, Speech Understanding, and Aging

A critical and understudied population is older adults who are nonnative speakers of English. Little is known about the speech understanding abilities of this group, nor about factors that may contribute to their success or limitations in understanding English speech. Demographic data indicate that more than 13.5% of US residents are foreign born (Zong et al. 2018); roughly 12% of this immigrant population is over the age of 65 years (Batalova 2012). Similarly, 12% of older adults residing in the United States are foreign born (Batalova 2012). The majority of these foreign-born individuals speak a language other than English in the home (Camarota and Ziegler 2014), and they are likely to have varying degrees of English listening and speaking proficiency, depending on their age of arrival in the United States, years of residence in the US, age of first exposure to the second language, or other factors (Flege 2002).

The model of speech understanding and language processing presented at the beginning of this chapter (Fig. 9.1) suggests that age-related difficulties in perceiving a degraded speech signal might be compensated by an increased reliance on an individual’s knowledge of the English language. But what happens when an older person’s knowledge of the language is insufficient, because it was acquired as a second language later in life? Studies have reported lower recognition accuracy of spoken English by nonnative English listeners compared to native English listeners, especially in the presence of competing speech (Tamati and Pisoni 2014). Few studies, however, have attempted to examine the speech understanding abilities of older nonnative speakers of English. It might be predicted that older non-native speakers of English exhibit much poorer recognition of English words and sentences than native speakers of English because, in addition to age-related changes in hearing sensitivity, central-temporal auditory processing, and cognitive decline, these individuals may have limited knowledge of lexical, syntactic, and semantic attributes of the English language. In the case of nonnative speakers, the phonology of the native language may be different from that of English, thus rendering the tasks of phonological analysis and lexical identification of English words even more challenging.

Gordon-Salant et al. (2019) compared word recognition performance of younger and older normal-hearing native Spanish speakers to that of younger and older normal-hearing native speakers of English. All native Spanish listeners arrived in the United States after the age of 12 years and resided in the United States for at least one year. Younger adult listeners were aged 19–33 years, and older listeners were aged 60–81 years. Stimuli were English monosyllabic words recorded by a native speaker of English and a native speaker of Spanish that were presented in quiet and noise. The results, shown in Fig. 9.6, demonstrate that the older adults for whom Spanish was their first language exhibited very poor word recognition scores in all conditions, and also showed substantial age effects (relative to young adults with Spanish as their first language) and substantial native language effects (relative to older monolingual native English-speaking listeners). Unlike the other listener groups, the older listeners whose native language was Spanish did not show large variation in performance across the speaker conditions (native English, native Spanish) or across the environmental conditions (quiet and noise).

Fig. 9.6
figure 6

Recognition performance for English words produced by a native English talker and a native Spanish talker in quiet and noise by four listener groups: younger native English (Yng, NE) listeners, older native English (Older, NE) listeners, younger native Spanish (Yng, NS) listeners, and older native Spanish (Older, NS) listeners. Error bars represent 1 standard error. (Adapted from Gordon-Salant et al. (2019), https://pubs.asha.org/journal/jslhr, with the permission of the American Speech, Language, and Hearing Association)

Statistical modeling demonstrated that adding the English vocabulary score into the analyses significantly improved the model fit (relative to the model without this score), but that adding cognitive variables (i.e., working memory, processing speed, attention/inhibition) in a stepwise manner into the model did not improve the model fit. Overall, the findings suggest that when knowledge of English vocabulary is diminished, an older listener is unable to take advantage of available sources to aid lexical access and word recognition, including a quiet environment (relative to noise) and an unaccented speaker (relative to an accented speaker). Research is still needed to develop a comprehensive model of speech understanding of older nonnative listeners, including the relative importance of contextual information, word frequency and neighborhood density, education level, cognition, and numerous other factors. Such a model should also consider arguments that, even when successful, comprehension of accented speech may come at the cost of significant processing effort that may interfere with concurrent cognitive operations (Adank and Janse 2010; Van Engen and Peelle 2014).

9.7 Final Comments

The work reviewed in this chapter leads to the inevitable conclusion that older adults have difficulty understanding speech, especially in the challenging conditions encountered in everyday life that include talkers who are difficult to understand and listening environments that are distracting or serve to mask the speech signal. Because age-related hearing loss reduces audibility of key acoustic information in speech, and age-related central auditory deficits produce delayed and imprecise neural timing, the speech signal to be identified may be highly distorted.

Difficulties in speech stream segregation, as required when listening to a speech signal in a background of other talkers, compound the older adult’s speech understanding task. As a result, older adults often shift their listening strategy to rely on their cognitive abilities and linguistic knowledge to understand the spoken message. They also work harder to understand degraded speech, especially because they must expend more cognitive resources to understand speech and because the pool of these resources may be somewhat limited. Among these resources, working memory ability is tightly linked to speech understanding, with attention and processing speed also related but only in certain circumstances. Nonetheless, recent findings also suggest that an older person’s knowledge of the lexical, linguistic, and semantic properties of the language is a powerful mediator of the speech understanding difficulties experienced by older listeners.

Older adults who are non-native speakers of English are an important subgroup of seniors who demonstrate that limited knowledge of the English language places a heavy toll on the ability to understand spoken English by limiting access to cues that may aid speech understanding. Provision of cochlear implants to older adults with more severe hearing loss has an obvious beneficial effect on speech understanding performance as well as on cognitive function. Future directions aimed at identifying the types of listening experiences, cognitive training paradigms, and signal enhancement devices that may preserve speech recognition and bolster cognitive reserve for seniors, regardless of native language experience and degree of hearing loss, is critical toward maintaining communicative competence among older adults.