Introduction

Age-related hearing loss in older adults, presbycusis, is one of the most prevalent conditions of the old age, doubling its prevalence in each decade from age 60 (Cruickshanks et al. 1998, 2003; Lin 2011; Lin et al. 2011a, b). It is found to be related to high risk of dementia possibly due to neuropathological and genetic factors (Albers et al. 2015). Cross-sectional studies published over the past 25 years have consistently reported a significant age-related decline in speech perception when speech is rapid or accompanied by background noise (Committee on Hearing, Bioacoustics and Biomechanics (CHABA) 1988; Cruickshanks et al. 2003; Fostick et al. 2013; Lee et al. 2005; Schneider et al. 2010; Sommers et al. 2011). Several explanations have been offered to explain this age-related decline in speech perception, including (1) age-related decline in hearing sensitivity (e.g., Anderson et al. 2011; Humes et al. 2012; Schneider et al. 2002); (2) age-related decline in cognitive ability (e.g., Frtusova et al. 2013; Grossman et al. 2003); and (3) age-related decline in auditory temporal processing (Ben-Artzi et al. 2011; Füllgrabe et al. 2015; Humes et al. 2013; Ostroff et al. 2003; Schneider and Pichora-Fuller 2001; Schneider et al. 2002, 2005).

Auditory temporal processing refers to the individual’s ability to perceive brief sounds presented rapidly. This ability has been shown to correlate with linguistic abilities, such as reading, phonological awareness, and speech perception, among different groups, such as dyslexic readers (Fostick et al. 2012, 2014c; Tallal 1980), aphasic patients (Fink et al. 2006; von Steinbuchel et al. 1999), and older adults (Fostick et al. 2013, 2014b). The proposed hypothesis for the age-related decline in auditory temporal processing (as a significant factor in age-related decline in speech perception) is based on findings that the appropriate use of speech cues relies on several types of auditory temporal processing (Anderson et al. 2013a, b; Ezzatian et al. 2015; Gordon-Salant 2005; Fitzgibbons and Gordon-Salant 2010; Fostick and Babkoff 2013a; Kraus and Anderson 2014; Schneider and Pichora-Fuller 2001; Schneider et al. 1998, 2002). Indeed, older participants needed a slower rate of sound presentation than young participants did in order to detect a gap within a sound, report the order of tones, or discriminate between different durations.

Most of the studies of speech perception among older adults, to date, have used a cross-sectional design in which cohorts of participants of different ages were compared on their performance of a number of cognitive, speech, and perceptual auditory tasks. This design emphasizes the examination of differences in the mean perceptual and cognitive performance levels across age ranges, but does not track changes in the individual participants over time. An alternative experimental approach for studying the effects of aging on cognitive and perceptual abilities is a longitudinal design that emphasizes the observation of changes in each individual due to his/her aging. The latter method tests and retests the same individuals usually after 5- to 10-year intervals and reports on changes in cognitive and perceptual performance. Such an approach was also used when testing longitudinal changes in speech perception (Bergman et al. 1976; Divenyi et al. 2005; Dubno et al. 2008; Hietanen et al. 2004; Møller 1981; Pedersen et al. 1991; Pronk et al. 2013).

There is an ongoing debate in the literature whether longitudinal or cross-sectional studies are preferable for exploring age-related changes in perceptual and cognitive functions. On the one hand, a longitudinal model provides a more “direct” examination of age-related changes in perception and cognition by following the same participants over time. This model can minimize problems that arise from a cohort effect to which cross-sectional studies are more vulnerable. Benefits of longitudinal design result from its ability to examine (1) intra-individual and inter-individual variability directly (Humes et al. 2013; Salthouse 2014); (2) baseline performance level; and (3) rates of change in performance (Busey et al. 2010). However, longitudinal studies have several disadvantages, such as data loss due to the death of some participants (usually at the upper end of the age range), as well as missing repeated observations for some participants who will not, or cannot, be retested. Both of these difficulties can result in biased data (Schaie et al. 1973; Schaie 2013). Additionally, Salthouse (2009, 2014) has argued that another major problem with longitudinal studies is the possibility of a familiarity effect and/or learning effect due to prior testing experience, which can bias the data; these effects may explain why longitudinal studies generally report smaller age-related changes than cross-sectional studies (Salthouse 2009, 2014), although this finding could also be due to a decrease in participants’ stress due to familiarity with the testing procedures used in subsequent testing (Sindi et al. 2013). In cross-sectional studies, on the other hand, the participants are tested only once; therefore, no attrition-related, learning-related, or stress-related biases occur. However, as noted above, cross-sectional studies are subject to cohort differences, leading to possible data bias (Schaie et al. 1973; Schaie 2013). Given the advantages and disadvantages of both cross-sectional and longitudinal research designs, it seems reasonable to consider a “combined” analysis which would maximize the advantages and minimize the disadvantages presented by each study design on its own.

Longitudinal studies that tested age-related changes in speech perception usually studied the association between declines in speech perception with age-related decline in hearing sensitivity (Bergman et al. 1976; Divenyi et al. 2005; Hietanen et al. 2004; Møller 1981; Pedersen et al. 1991). However, the association of age-related decline in speech perception with age-related changes in auditory temporal processing has not yet been examined in a longitudinal design, and therefore became the focus of the current study. By complementing a cross-sectional study with a longitudinal study of this topic, we intended to strengthen the resulting analysis to yield data that can help us understand the relationship between age-related decline in speech perception and auditory temporal processing. Furthermore, the additional longitudinal data can help shed light on the rate and type of change (linear or nonlinear) in speech perception and temporal processing over the period between phases 1 and 2 of the study.

Therefore, the present study was undertaken to test the association between changes in auditory processing and speech perception over a seven-year period. This was done taking into account changes in audiometric pure-tone thresholds and cognitive ability that have been shown to be related to age-related decline in speech perception (e.g., Anderson et al. 2011; Frtusova et al. 2013; Grossman et al. 2003; Humes et al. 2012; Schneider et al. 2002). The variables in the present study were chosen based on previous studies that showed them to be sensitive to aging. The auditory temporal processing variables of temporal order judgment (TOJ) and gap detection were previously reported to be related to age and to word recognition accuracy (e.g., Fink et al. 2005; Fostick and Babkoff 2013a; Fostick et al. 2014b; Schneider et al. 1994). Word recognition against a speech-noise background or time-compressed speech has been reported to be sensitive to aging (e.g., Fostick et al. 2013, 2014b; Füllgrabe et al. 2015). Word recognition against a broadband, white-noise background was reported as subject to age-related decline in some studies (e.g., Pichora-Fuller and Souza 2003) but not in others (Fostick et al. 2013, 2014b). Digit Span, a measure of short-term and working memory, and Matrix reasoning, a measure of nonverbal cognitive ability, are both designed for adults aged 16 to 90 years (Wechsler Adults Intelligent Scale (WAIS), Wechsler 1997). Auditory intensity discrimination (an auditory non-temporal-dependent variable) has been used previously as a control variable (Fostick and Babkoff 2013a).

Based on aforementioned findings in the literature, we hypothesized that in both cross-sectional and longitudinal designs, increased age would be associated with (1) increased audiometric pure-tone thresholds and auditory temporal processing thresholds; (2) decreased word recognition accuracy; and (3) decreased cognitive performance. Furthermore, we hypothesized that auditory temporal processing thresholds would be associated with changes in word recognition accuracy over the seven-year period separating the first testing and second testing, after controlling for age and changes in sensory thresholds and cognitive abilities.

Method

Participants

Fifty-eight participants (53% females) were tested twice in two test phases, separated by seven years. Participants were native Hebrew speakers and had 12 to 18 years of education. No correlation was found between age and years of education (r = .126, p = .185). Participants reported being healthy and functionally independent, with no history of disease related to the central nervous system. All had good or corrected visual ability. The age range during the first phase was 22 to 82, and for the second phase, 29 to 89. An additional 31 participants were tested in phase 1, but were not available for testing in phase 2; since they were not available for the longitudinal analysis, their data were also excluded from the cross-sectional analysis. These excluded participants had an age range of 21 to 79 years (mean = 46.2, s.d. = 17.7, mostly participants under age 40) in phase 1 and did not differ significantly from the participants who were available for testing in phase 2 on any of the variables measured in phase 1. One of these participants, age 79, was not available for follow-up because of Alzheimer disease, while the others were unreachable due to outdated contact information. All participants were recruited from the general population using advertisements.

The participants in the study reported no significant history of noise exposure (either occupational or military) and were screened for age-normal hearing according to the American National Standards Institute (ANSI-1969) criteria, during phase 1, using complete standard audiometric testing. Participants using hearing aids or with any clinically significant hearing loss were not included in the study. In both testing phases, the auditory test stimuli were presented at 40 dB SL, measured for each participant in order to control for differences in hearing sensitivity. Participants over 60 years old in phase 1 were also screened for basic cognitive abilities using the Mini-Mental State Examination (MMSE, Folstein et al. 1975). All participants achieved scores of 29–30, reflecting a high level of mental ability.

Tasks and stimuli

Both testing phases included the same tasks, as described below.

Hearing level

Before participants performed the auditory (dichotic temporal order judgment (TOJ), gap detection, and intensity discrimination) and speech perception tasks, an absolute threshold task was performed. The hearing level was measured for 1 kHz and 1.8 kHz, 15 ms tones (the sounds used in the auditory experiments) using a two-alternative forced-choice 2-down-1-up adaptive staircase procedure. Hearing level was calculated as the average of the last eight out of 10 reversals.

Dichotic TOJ

In the dichotic TOJ task (see Fostick and Babkoff 2013a, b; Fostick et al. 2014a), the participants were required to reproduce the order of two identical tones presented asynchronously to each ear (participants’ responses were either right–left or left–right). Both stimuli were 15-ms, 1-kHz pure tones with 2-ms cosine-squared rise/fall envelopes. The tone pairs were presented with an inter-stimulus interval (ISI) of either: 5, 10, 15, 30, 60, 90, 120, or 240 ms. In half of the trials, the order of the presentation of the tones was the left ear first followed by the right ear and in the reverse order on the other half of the trials. Participants pressed relevant keyboard keys in the order corresponding to the order they heard. Each ISI value was repeated 16 times, resulting in a total of 256 trials. The mean accuracy was plotted for each ISI creating psychophysical function. TOJ threshold was calculated as the ISI necessary for 75% correct responses.

A training session with four parts preceded the experiment itself. In the first part, participants were familiarized with the stimuli used in the study by listening to five tones presented to the right ear followed by five tones to the left ear. In the second part, the participants were trained to associate each tone with the proper response key in 32 trials in which they were required to press the correct key for each tone they heard. Feedback was given following each response (“right” or “wrong”). In the third part, the participants were tested for the ear–key association and were required to press the correct key for each tone, as in the second part, but without any feedback. This third part was in fact a test for the association between the ear the tone was presented to and the correct keyboard response key, and participants needed at least 20 correct answers out of 24 in order to complete the training and perform the experiment. All participants in the present study passed this test successfully. In the fourth part, participants were trained to reproduce the order of two tones. This part was similar to the conditions of the experiment, but with an ISI = 240 ms and with only 32 trials. Performance in this part was accompanied by appropriate feedback after each response (“right” or “wrong”).

Gap detection

On each trial, participants were presented binaurally with two pairs of two 50-ms, 1-kHz pure tones. One of the pairs had tones separated by a gap of silence with a duration varying between .5 and 36 ms. The other pair was the no-gap reference tone that had a “gap” between tones of 0 ms. Participants judged which of the two pairs in a trial contained a gap. Trials were separated by an ISI of 100 ms. This procedure of using a reference tone with a “gap” of 0 ms was adopted in order to prevent judgments based on possibly perceived changes in the overall envelope of a tone with a gap, versus a tone with no gap; this might lead to a false identification of the gap based on differences in the envelope between the tones and not based on gap detection (Schneider et al. 1994). Using a 2-interval, 2-alternative forced-choice (2I2AFC) procedure, participants judged which of the two tones contained the longer gap, the first or the second tone. Gap durations were .5, 1, 2, 4, 8, 12, 18, 24, and 36 ms and were presented randomly 16 times each, for a total of 144 trials. After every 32 trials, participants received a short recess. Percent correct was recorded for each participant and each gap value. The mean accuracy was plotted for each gap value, thus creating a psychophysical function for each participant. Gap detection threshold was calculated as the gap necessary for 50% correct responses. The experimental session was preceded by a practice session including 36 and 18 ms gaps that were repeated 16 times each. In this practice session, participants received feedback for each response. No feedback was provided during the experimental session.

Intensity discrimination

For this task, participants were presented with a pair of 500-ms, 1-kHz pure tones, separated by 100 ms. In each pair, one tone was presented at 40 dB above hearing level (40 dB SL) that was measured in the “Hearing level” test prior to performing auditory and speech perception tests. The other tone was presented .25 to 12 dB below 40 dB SL. Using a 2I2AFC, participants were asked to indicate whether the two tones in each pair were the same or differed in intensity. The levels of the variable intensity tone presented during the experimental session all differed from the constant 40 dB SL tone. However, participants expected some of the variable tones to be with the same intensity since they had received tone pairs of 40 dB during the practice session. In fact, all of the participants responded “same” when the intensity difference between the 40 dB SL tone and the variable tone was small. The difference between the intensity of the tones was either .25, .75, 1.5, 2, 4, 6, 8, 10, or 12 dB. Each of these levels was presented randomly 16 times each, for a total of 144 trials. After every 32 trials, participants received a short recess. Percent correct was recorded for each participant for each intensity difference value. The mean accuracy was plotted for each intensity difference value creating psychophysical function. Intensity discrimination threshold was calculated as the difference between the intensity of two tones necessary for 50% correct responses. The experimental session was preceded by a practice session in which practice stimulus pairs were presented with intensity difference between tones of 12, 6, and 0 dB. The practice stimulus pairs were repeated 16 times each. For the 0-dB difference (both tones with intensity of 40 dB SL), participants were expected to respond “same” and receive a feedback of “correct.” For the 6- and 12-dB differences, they were expected to respond “different.” Almost all responses during the practice session were correct. No feedback was provided during the experimental session.

Speech perception

Speech perception for words was tested using the Hebrew version of the AB words test (Boothroyd 1984). The test is composed of lists of ten one-syllable Consonant–Vowel–Consonant meaningful words, which are phonemically balanced (i.e., in each list, every consonant appears once and every vowel appears twice). The words were presented binaurally, and the participants were asked to repeat each word immediately after hearing it. Two lists of 10 words each were used in each of the four study conditions and three seconds separated each word (silent interval). The words were edited using the SoundForge program which digitized (16 bits) at a sampling rate of 44 kHz. The word intensity was normalized using the overall root mean square of each list and was presented to each participant at 40-dB SL. The words were recorded as spoken by a middle-aged (age 50) male speaker under four conditions: (1) quiet; (2) narrowband noise; (3) broadband noise (white noise); and (4) 60% time-compressed speech. The two lists in each condition were presented one after the other successively (20 words for each condition), and the conditions were presented in random order. In the quiet condition, words were presented with no background noise. In the narrowband noise condition, words were accompanied by background noise composed of steady-state noise within the range of .5–2 kHz (band-passed noise). In the broadband noise condition, words were accompanied by broadband white noise which was evenly distributed over frequencies ranging between .25 and 8 kHz. Narrowband and broadband noises were generated by the Diagnostic Audiometer DA64. They were added to the words at a signal-to-noise ratio of 0 dB. The noise started three seconds before the first word in the list and continued nonstop throughout the list, ending with the last word. In the quiet and noise conditions, words were presented at a rate of 120 words per minute (WPM). In the 60% time-compressed speech condition, words were compressed to be 60% of their original length and were presented with no background noise at a rate of about 200 WPM. The compression was carried out using an implementation of the WSOLA (Waveform Similarity Overlap and Add) algorithm (Verhelst 2000) which achieves very high-quality timescale modification of speech signals while leaving other qualities, such as the pitch and the timbre unchanged.

Matrix reasoning and digit span

Cognitive ability was measured using visual Matrix Reasoning and Digit Span tasks from the Wechsler Adult Intelligence Test (WAIS-III, Wechsler 1997). The Matrix Reasoning task includes 26 trials in increasing difficulty levels. In each trial, the participants were presented visually with a matrix of forms, with one piece missing and they were required to answer (among five given options) which is the missing piece. Each incorrect response was marked, and the task was terminated after four consecutive errors. Each participant received individual scores reflecting the number of correct responses before terminating or finishing the task.

In the Digit Span task, participants were required to repeat series of 2 to 9 numbers that were read aloud by the experimenter, in a rate of one digit per second. The task started with 2-digit series and continued with series in increasing length, two trials for each length. The task terminated after two errors in the same length. The task is divided into two parts. In the first part: forward Digit Span, the participant repeats the numbers in the same order they were read by the experimenter. In the second part: backward Digit Span, the participant was asked to repeat the series of numbers in the reverse order they were read by the experimenter. The number of correct responses from the forward and backward parts was combined into one final score.

Apparatus

The auditory and speech tasks were performed using MATLAB software which delivered the sounds and recorded the responses. Sounds were delivered using TDH-49 headphones.

Procedure

Phase 1 of the study was approved by Bar-Ilan University ethics committee and phase 2 by Ariel University ethics committee. The participants received full explanation of the study, agreed to participate, and signed a separate informed consent document for each phase before performing the screening and the study tasks. In phase 1, the participants were tested in Bar-Ilan University in a quiet room, and in Phase 2, about half of the participants were tested at University and half in a quiet room in their home. Testing some of the participants in their home was done due to technical issues, and according to the participants’ preference. There was no difference in mean age between those who were tested in the university and those who were tested at home, and no differences were found in any of the dependent variables. The environmental noise in both the University and participants’ homes was measured using TA 1350A Sound Level Meter and was lower than 30 dB(A).

Data analysis

Pearson correlations between age and the dependent variables were performed for each phase separately. In order to assess the longitudinal change between phases 1 and 2 of the study, the data were analyzed by generalized estimating equations (GEEs, Zeger et al. 1988). These analyses were used to model the relationship over time between participants’ thresholds in auditory perception tasks (temporal processing: dichotic TOJ and gap detection; and non-temporal processing: intensity discrimination) and word recognition accuracy [(a) speech in quiet; (b) speech with a background narrowband noise; (c) speech with a background broadband noise; and (d) compressed speech], adjusted for hearing level and cognitive ability (Matrix Reasoning and Digit Span) that were entered into the analysis as covariates.

Results

Cross-sectional analysis (1): Association between age and dependent variables in phases 1 and 2

Correlations between age and each of the dependent variables in both phases of the study are presented in Table 1a, and were conducted using Bonferroni-adjusted alpha levels of .005 per test (.05/10). In phase 1, age was positively correlated with (1) hearing threshold and (2) dichotic TOJ threshold, while negatively correlated with (1) word recognition accuracy in narrowband noise and (2) 60% time-compressed speech recognition accuracy. In phase 1, age was neither significantly correlated with word recognition accuracy in quiet nor with broadband background noise. Age was also not significantly correlated with gap detection. In phase 2, age was positively correlated with (1) mean hearing threshold; (2) dichotic TOJ threshold; and (3) gap detection threshold, while age was negatively correlated with word recognition accuracy in all four conditions. Age was not correlated with intensity discrimination threshold either in phase 1 or phase 2 of the study. Age was also not correlated with performance on the Matrix Reasoning or Digit Span tasks either in phase 1 or phase 2 of the study. Figure 1a–d presents correlations for tasks that were associated with age at both phases 1 and 2.

Table 1 Correlation and slope data for first testing and second testing
Fig. 1
figure 1

Correlations for tasks associated with age at both phase 1 (full circles, solid lines) and phase 2 (open squares, dotted lines). a Hearing threshold, b TOJ thresholds, c word recognition in NB noise, d word recognition of time-compressed speech

Cross-sectional analysis (2): Association between auditory processing and word recognition accuracy in phases 1 and 2

Correlation analyses between dichotic TOJ thresholds, word recognition accuracy, and cognitive tasks are presented in Table 1b and 1c. Correlations were conducted using Bonferroni-adjusted alpha levels of .008 per test (.05/6). Dichotic TOJ thresholds correlated negatively with word recognition accuracy in narrowband noise in both phase 1 and phase 2. In the second phase, there was an additional significant negative correlation between TOJ thresholds and compressed word recognition accuracy. Gap detection thresholds were correlated with accuracy on all word recognition accuracy tests in phase 2 only, but not in phase 1. No significant correlations were found between intensity discrimination thresholds and word recognition accuracy either in phase 1 or phase 2 (Table 1d).

Comparisons between phases 1 and 2

Table 2 presents means and standard deviations for all study variables in the first and second phases of the study. Significant differences in variance were found for all auditory processing and word recognition accuracy tasks, except for word recognition accuracy in quiet (Table 2). Therefore, the data were transformed using log transformation for the dichotic TOJ and gap detection thresholds, and arcsine transformation for perception accuracy on all of the speech tasks. Multivariate repeated measures analyses were carried out separately for hearing thresholds, the transformed auditory processing and word recognition accuracy, and cognitive data. The results showed significant differences between the first and second phases for (1) mean hearing thresholds; (2) dichotic TOJ thresholds; and (3) for word recognition accuracy in all conditions (Table 2).

Table 2 Mean (SD), Levene test, and repeated measures MANOVA results for all study measures in first testing and second testing

For the tasks whose performance level correlated with age at both the first and second phases of testing [these include (1) hearing thresholds; (2) TOJ thresholds; (3) word recognition accuracy in narrow band noise; and (4) perception of time-compressed speech], a comparison of the correlation slopes revealed significantly steeper correlation slopes at phase 2 than at phase 1 between: (1) age and hearing level (t = −3.074, p = .001); (2) age and narrow band noise (t = −2.828, p = .003); and (3) age and time-compressed speech (t = 2.000, p = .024), but not for age and TOJ thresholds (t = −1.190, p = .118). There were no significant differences in the slopes of the linear correlations relating TOJ thresholds and word recognition accuracy in narrow band noise between the first and second testing phases (t = .091, p = .464). In order to show the extent of change in the dependent variables as a function of age, the percent of change in each variable from phase 1 to phase 2 (7 years) was plotted against the age of the participant at the second phase of the study for hearing thresholds, TOJ thresholds, gap detection thresholds, word recognition accuracy in narrowband and broadband noise, and compressed speech (Fig. 2a–f).

Fig. 2
figure 2

Percent of change from phase 1 to phase 2 (7 years), by age (at phase 1). a Hearing threshold, b TOJ threshold, c gap detection threshold, d word recognition in narrowband noise, e word recognition in broadband noise, f word recognition of time-compressed speech

Longitudinal analysis: changes from phase 1 to phase 2

Analysis of within-subject changes across time (phase 1 to phase 2) as a function of initial age was tested, controlling for hearing level and cognitive ability. Significant negative regression coefficients were obtained for age on word recognition accuracy in narrowband noise (B = −.032, p = .001), broadband noise (B = −.085, p = .03), and compressed speech (B = −.018, p = .01). These results point to an overall within-subject decrease in word recognition accuracy over time, with a larger decrease associated with an older initial age.

Within-subject changes in word recognition accuracy across time were also tested as a function of the initial auditory processing thresholds (dichotic TOJ, gap detection, and intensity discrimination) at phase 1, controlling for age, hearing level, and cognitive ability. Significant negative regression coefficients were found for dichotic TOJ and gap detection thresholds on word recognition accuracy in narrowband noise (dichotic TOJ: B = −.021, p = .001; gap detection: B = −.083, p = .03), broadband noise (dichotic TOJ: B = −.074, p = .01; gap detection: B = −.056, p = .04), and compressed speech (dichotic TOJ: B = −.034, p = .002; gap detection: B = −.076, p = .01). These results indicate an overall within-subject decrease in word recognition accuracy across time, with larger initial auditory processing thresholds associate with larger decrease in word recognition accuracy. No significant association was found for intensity discrimination thresholds at phase 1 with any of the word recognition tests.

Discussion

In the current study, we tested age-related changes in auditory temporal processing thresholds and word recognition accuracy, using both cross-sectional and longitudinal analyses of the same data. Both cross-sectional and longitudinal analyses revealed similar results regarding age-related changes in (1) hearing level; (2) dichotic TOJ threshold; and (3) word recognition accuracy when speech was accompanied by narrowband noise; and (4) when speech was time-compressed. The rate of age-related changes (slope) was steeper when participants were older (phase 2) for all dependent variables except for the dichotic TOJ threshold, which showed the same age-related increase in phase 2 as in phase 1. Indeed, the within-subjects decrease in word recognition accuracy over 7 years was greater for the participants whose initial age at phase 1 was older. Of major importance for the current study is the finding that auditory temporal processing thresholds were associated with age-related changes in word recognition accuracy, even when adjusted for age, hearing level, and cognitive ability. These findings provide support for the hypothesis that decline in temporal processing may underlie the difficulties in speech perception. This leads to the assertion that improvement in auditory temporal processing, for example by training may, in turn, improve speech perception under difficult conditions.

Age-related changes

Earlier studies have shown that neural changes in the auditory system that occur with aging can have consequences beyond the immediate loss of hearing, and may even have profound effects on the general functioning of the individual (e.g., Howarth and Shone 2006). The results of the present study lend support to those findings. In both phases 1 and 2, age-related changes were found in (1) hearing level; (2) dichotic TOJ thresholds; (3) word recognition accuracy in narrowband noise; and (4) compressed speech. These findings are in accord with earlier studies that reported age-related changes in hearing level (Fostick et al. 2013; Gordon-Salant 2005), temporal processing (Ben-Artzi et al. 2011; Fink et al. 2005; Fostick and Babkoff 2013a; Fostick et al. 2014b; Gallun et al. 2014; Humes et al. 2010; Szymaszek et al. 2006, 2009), and word recognition accuracy, especially when the frequency range of the background noise overlaps the speech signal background, and when speech is rapid (Ben-Artzi et al. 2011; Calais et al. 2008; Committee on Hearing, Bioacoustics, and Biomechanics [CHABA] 1988; Fostick et al. 2013; Martin and Jerger 2005; Schneider et al. 2010). For example, Fostick et al. (2013) reported a correlation range of r = .28–.65 between age and hearing threshold, similar to the correlation range in the present study (r = .309–.508, see Table 1). They also reported correlations of r = −.23 to −.51 between age and word recognition accuracy, when speech was accompanied by narrowband noise and when speech was compressed. Those correlations were also similar to the correlations found in the current study (r = −.279 to −.613, see Table 1). Fostick and Babkoff (2013a) reported a correlation of r = .21 between age and temporal processing that was only slightly lower than that found in the current study (r = .380–.388). In the current study, we replicated previous findings and showed that the age-related dependent variables declined (threshold elevation and decrease in accuracy) over the seven years from phases 1 to 2. Moreover, the slopes for age-related changes in hearing sensitivity and word recognition accuracy were significantly steeper in phase 2 than in phase 1, showing steeper age-related decline in these dependent variables when participants had aged seven years.

Hearing level, temporal processing, and word recognition accuracy were all significantly correlated with age, both in the cross-sectional and in the longitudinal analyses. The advantage of the longitudinal design of this study is that it allows us to analyze changes in the performance of each individual from phase 1 to phase 2 (Humes 2013). Moreover, it allowed us to test within-subjects effect of age and time interaction and observe whether age-related changes from phase 1 to 2 decline differently with age. These findings imply that the seven years that passed between phase 1 and phase 2 had a greater effect on the older than on the younger participants. Acceleration of the decrease in performance for older adults was previously reported in studies of hearing threshold (Echt et al. 2010; Kiely et al. 2012), word recognition accuracy (Bergman et al. 1976; Dubno et al. 2008; Pronk et al. 2013), and various cognitive abilities (Caselli et al. 2012; Gale et al. 2012; Mitchell et al. 2012). In contrast, the slope of the correlation relating dichotic TOJ thresholds to age did not change between the two testing phases. Obviously, each one of the phases being a cross-sectional study is subjected to cohort effect. Nevertheless, the comparison of the two slopes suggests a linear age-related decline in dichotic TOJ thresholds that did not accelerate for the older participants over the seven years separating phase 1 and phase 2 of the study.

Gap detection thresholds, speech in quiet, and speech in broadband noise were found to correlate with age only in phase 2. Indeed, more perceptual and speech variables were correlated with age at the second testing phase than at phase 1. Furthermore, the average performance of the entire sample declined from the first to the second phase for most of the dependent variables. The main explanation for this finding is that, as expected, the increase in age resulted in decrease in performance. However, it also should be noted that, in general, the inter-individual variance was larger at phase 2 than at phase 1. This difference in variability can perhaps explain the findings of more significant correlations in phase 2. Larger variance among the older participants in various tasks has been reported previously (e.g., Fogerty et al. 2010; Rabbitt et al. 2004; Reynolds et al. 2005). Similar to the current design, the participants’ inter-individual variance was reported to increase in the second phase of testing (Divenyi et al. 2005).

In contrast to the other auditory processing tasks, intensity discrimination thresholds were not correlated with age at either testing phase. This finding was also reported by others (Fostick and Babkoff 2013a; Fitzgibbons and Gordon-Salant 2010). While the difference in intensity between two auditory stimuli is the main cue for performing an intensity discrimination task, the inter-stimulus (temporal) interval (or gap) separating two tones is the main cue for performing gap detection and dichotic TOJ. We tentatively conclude that within the limits of the current study, tasks based on auditory supra-threshold intensity discrimination are less affected by age than tasks based on temporal processing. This is despite the age-related decrease in audiometric thresholds that accompanies aging.

Temporal processing and speech perception

Researchers have pointed to a number of possible age-related changes that may underlie the difficulties in speech perception reported by older adults, as a result of physiological changes in the aging auditory system (Fitzgibbons and Gordon-Salant 2010; Howarth and Shone 2006; Humes et al. 2012, 2013). While some researchers suggest that the underlying cause is the decline in hearing sensitivity (e.g., Humes et al. 2013), or the decline in temporal processing (e.g., Fostick and Babkoff 2013a), other researchers (Frtusova et al. 2013; Lindenberger and Ghisletta 2009) have suggested that the decline in cognitive ability is the significant factor associated with age-related difficulties in speech perception. More recently, some researchers have argued that the age-related decline in cognitive abilities may be linked with age-related changes in hearing level (Humes et al. 2013; Lin 2011; Lin et al. 2011a, b).

The main findings showed that auditory temporal processing thresholds are significantly associated with longitudinal changes in word recognition accuracy when speech is accompanied by background noise or when speech is compressed. These findings are consistent with earlier studies and suggest that age-related decline in temporal processing may underlie the complaints of older adults regarding difficulties in speech perception, especially when speech is presented against a noise background or when speech is rapid (Anderson et al. 2011, 2013a, b; Ben-Artzi et al. 2011; Fostick et al. 2013; Grossman et al. 2003; Humes et al. 2012, 2013; Lin 2011; Lin et al. 2011a, b; Lindenberger and Ghisletta 2009). This association is independent of age-related decline in audiometric thresholds and cognitive ability, and is specific to auditory temporal processing and not to a general decline in auditory processing (such as processes related to intensity discrimination). This latter conclusion arises from the finding that intensity discrimination, a measure of auditory non-temporal processing, was not related to any of the word recognition conditions. An implication of this finding is that training and remediation methods that improve auditory temporal processing (Fostick et al. 2014c) might help improve word recognition accuracy among the elderly.

Conclusions

The literature suggests that auditory temporal processing is associated with speech perception (e.g., Humes et al. 2013; Schneider and Pichora-Fuller 2001; Schneider et al. 2002, 2005). Most of the evidence from previous studies supporting this hypothesis have been based on cross-sectional group comparisons that provide data on age-related changes, but are limited in their conclusions, due to cohort effects and the limited ability to test age-related differences in the rate of change. The longitudinal data in the current study provide information regarding the relationship between auditory processing, word recognition accuracy, and aging in the individual participants. Similar to previous studies, we found that age was related to changes in hearing level, temporal processing, and speech perception, but not to a non-temporal processing of supra-threshold auditory stimulus (intensity) or to cognitive ability. Age-related change in hearing level and in word recognition accuracy increased more for the older adults than for younger participants. However, for spatial TOJ, there was no greater increase for the older adults than for the younger participants over the seven-year period. Most importantly, the results of this study also suggest that increases in auditory temporal processing thresholds are significantly associated with a decline in word recognition accuracy under difficult conditions (accompanying noise or speech compression), even when adjusted for hearing level and cognitive ability.